Thursday, September 29, 2011

Blog #13

Combining multiple depth cameras and projectors for interactions on, above and between surfaces

Authors:
Andrew D. Wilson - Microsoft Research, Redmond, WA, USA
Hrvoje Benko - Microsoft Research, Redmond, WA, USA

Proceeding
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology

Summary

Depth cameras and projectors are used to make an interactive room named "LightSpace." Interaction are done on plain non-electronic surfaces in a natural fashion that only requires the cameras and projectors. The point of this approach is to limit external devices, since the surfaces do not need attention, and nothing needs to be worn on the person.

Hypothesis

To the author's knowledge, no research has been done on making an interactive space that solely used depth cameras and projectors.

Methods

The system is composed of a central placement of 3 depth cameras to view the entire room, and 3 projectors. Two projectors are for the table and walls, and one is for projections onto the person, such as a virtual object to be held in the hand. After calibration of the depth cameras using 3 points on the two interaction surfaces, a model mesh of the person is created. for the table, all interaction is analyzed only from a 10cm volume above the table. The resolution is great enough to determine touch onto the table, and essentially creates a multitouch interface on any surface. There are three types of interactions possible with this system:

Multi-touch interaction on the "dumb" surfaces
Holding and transferring virtual icons of objects by grabbing them off the side of an interface
Activating a menu by holding a hand in a column of air above an icon of a surface.

Instead of performing computation on the mesh generated from the cameras, "virtual cameras" were used from orthographic projections of the mesh. There were 3 virtual cameras generated: two for each surface, and one for the entire room.

Discussion

When i was watching the video presentation of the LightSpace concept, I couldn't help noticing how rough the interactions were on the surfaces. This is probably due to either the small resolution of the camera prototypes or the face that the underside of the hands can not be seen. One solution would be to use depth cameras in more diverse locations, but then there would be more complexity to the system.

This paper was published approximately one month before the release of the Kinect system. Since then there has been an SDK released for it and many people have used it for several creative hacks. In my opinion, to stay within the main idea of the paper to reduce external complexity, future 3d interaction would have to use actual 3d projection, since currently this can only be emulated by actively measuring the person's position in space. This however requires external hardware to be worn about the person.

Tuesday, September 27, 2011

Blog #12

Enabling beyond-surface interactions for interactive surface with an invisible projection

Authors:
Li-Wei ChanNational Taiwan University, Taipei, Taiwan Roc
Hsiang-Tao WuNational Taiwan University, Taipei, Taiwan Roc
Hui-Shan KaoNational Taiwan University, Taipei, Taiwan Roc
Ju-Chun KoNational Taiwan University, Taipei, Taiwan Roc
Home-Ru LinNational Taiwan University, Taipei, Taiwan Roc
Mike Y. ChenNational Taiwan University, Taipei, Taiwan Roc
Jane HsuNational Taiwan University, Taipei, Taiwan Roc
Yi-Ping HungNational Taiwan University, Taipei, Taiwan Roc

Proceeding:
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology

Summary:

By using infrared cameras, an infrared projector and color projector to facilitate multi-touch along with invisible markers on a table, three mechanisms for further interactions are proposed and explained.

Hypothesis:

There was no specific hypothesis, but they did mention that the system allowed for more intuitive navigation and more enjoyable use.

Methods:

Under the table, there are 2 cameras and 2 projectors. A color projector projects the screen, an IR projector projects infrared markers, and 2 IR cameras on different corners of the table pick up multi-touch interaction. When an interaction is detected from subtracting what is seen by an above camera vs what is expected from the invisible markers, the markers within the zone of interaction are removed to prevent the markers from modifying the input itself. The markers themselves allow for the mobile above camera to calculate it's own 3d position and orientation.

With this system in place, a projector attached to the upper camera can give more information about a subset of the graphics on the table. This in essence allows for greater resolution and interaction.

When the markers are analyzed by a tablet instead, a virtual representation of the table can be shown.

The whole system is powered by a normal consumer desktop.

Results:

The original revisions of the software suffered from vibration of the displays which was fixed by Kalman filtering.

Discussion:

At first I didn't understand the reason to use markers on such a display, but now I see what they were doing, the uses for such interactions are nearly limitless. With a multi-touch display and a multi-touch tablet for further 3d manipulation, I see this could have great use in CAD niches. The one thing that bothered me for the upper cameras was the low resolution of 320x240 pixels. I myself have done this hack to make a normal webcam into an infrared one, but the limits of the CCD were depressing.

Blog #11

Multitoe: high-precision interaction with back-projected floors based on high-resolution multi-touch input

Authors:
Thomas Augsten Hasso Plattner Institute, Potsdam, Germany
Konstantin Kaefer Hasso Plattner Institute, Potsdam, Germany
René Meusel Hasso Plattner Institute, Potsdam, Germany
Caroline Fetzer Hasso Plattner Institute, Potsdam, Germany
Dorian Kanitz Hasso Plattner Institute, Potsdam, Germany
Thomas Stoff Hasso Plattner Institute, Potsdam, Germany
Torsten Becker Hasso Plattner Institute, Potsdam, Germany
Christian Holz Hasso Plattner Institute, Potsdam, Germany
Patrick Baudisch Hasso Plattner Institute, Potsdam, Germany

Proceeding
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology

Summary

User input onto a touch interface using feet is studied. An apparatus using FTIR touch sensing, and projected images to a floor is tested. Participants are asked to test and give input on general ergonomics of the system.

Hypothesis

Since touch interface is limited to whats in "arm's reach" an interface using feet instead would have a much larger possible input area.

Methods

A back-projected surface using an input technology: frustrated total internal reflection or FTIR is used for input resolutions of 1mm. First, 8 participants were asked to show how they could walk over theoretical buttons without activating them, and then activate one with a different gesture based completely on their own ideas. Some ideas were less plausible than others since walking on heels can be difficult, or even dangerous. It was decided that a tap gesture is best for activation, and simply walking would be interpreted as not being any action.

The next test was to determine how tap gestures should be preferred, since the user may want to use their big toe, the ball of their foot, or the tip. It was found that each person could easily have a machine learned default position which helps the next step.

Users were asked to type on 3 different sized keyboard using tap gestures calibrated to each participant. the smallest keyboard measured 1.1cm, while the largest ones measured 5.3cm wide. The users were timed based on the start and end keys and each error is logged.

Another test of the system's usefulness included navigating a game using differing pressures on the feet with natural mappings to left, right, forwards, and backwards, as well as turning.

Results

Since the prototype was rather small, it did not show the full capability of the system. However, user input was done well enough to allow a small 3% error rate on the large keyboard test. Most of the users preferred the larger one, while a few preferred the medium sized keyboard.

Discussion

In my opinion, this prototype does not merit much attention, but the full scale system in development could have much more functionality though. My biggest issue with the paper is that most people simply don't have good eye-foot coordination. Other gestures could plausibly be made while barefoot but due to physiological limitations will still be a hindrance.

Thursday, September 22, 2011

Ethnography Proposal

redacted

Gang Leader for a Day

I found it ironic that I had to keep reminding myself that this is non-fiction. The point in this research in the first place was because researchers were detached from society in Sudhir's opinion. However, when reading this, I did not feel the same way when a person died than if I read about it in the news; It felt just like a fictional character was written off in a story.

One think I noticed about the book itself was how recently it was published. It was nearly a decade after he received his Ph.D. in Sociology, and after all the papers about the projects he has published. Since in the book he frequently mentioned that he didn't actually have plans on making a biography for JT, this book almost functions as one.

Reflecting on the demise of drug trade in Chicago, I am reminded of the current "War on Drugs" in South America. In essence, both instances are simply fights for power. The government both times takes a "burn everything" approach, starving traffickers of their business. Thinking on how to solve the modern drug issues makes my head hurt from politics instead though.

Interestingly, the events in the book represent the point of inflection on this graph.

One question I keep on imagining people asking is "What can be done about the gang problem?" I don't think that is a very good question since gangs were obviously a result of some deeper issues regarding poverty.

Blog #10

Sensing foot gestures from the pocket

Authors:
Jeremy Scott - University of Toronto, Toronto, ON, Canada
David Dearman - University of Toronto, Toronto, ON, Canada
Koji Yatani - University of Toronto, Toronto, ON, Canada
Khai N. Truong - University of Toronto, Toronto, ON, Canada

Proceeding
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology

Summary #1
The first part of the paper is a pilot study on how well people are able to make foot-based gestures without visual feedback. The pilot study involved directly measuring the position of the foot from 4 different gesture types without giving visual feedback.

Hypothesis
Since it was unknown whether or not people would be able to accurately be able to make foot gestures, the pilot study was simply for testing if the research is reasonable.

Methods
Participants were asked to make 4 different types of foot gestures:

Dorsiflexion - moving the foot up with the heel stationary
Plantar flexion - moving the heel up with the plantar stationary
Heel rotation - rotation about the heel
Toe rotation - rotation about the toes.

The results were gathered from direct position measurement using 6 motion capture cameras and a model attached to the foot.

Results
Dorsiflexion was the most inefficient gesture. Not only did the participants not prefer the motion, it took the most time and was the most inaccurate. Otherwise, there was less than 9 degrees of error for the other gestures on average.

Summary #2
Following the pilot study, another study was done to determine whether the motion cameras can be replaced with a simple accelerometer located in or around the participants' pants. Since Dorsiflexion was not a good gesture, it was left out on this study.

Hypothesis
The researchers believed that just using algorithms analyzing acceleration data from an iPhone could be enough to determine foot gestures without feedback.

Methods
The phone was place around the participants' hips in three locations, in the pocket, on the side (as if in a holster), and the back pocket. The participants were then asked to make different gestures with their feet (10 in total) and the phone recorded the acceleration data. 64 FFT coefficients were generated from the
data and Naive Bayes was used to classify the motion. All participants were right footed.

Results
The hypothesis was proven correct for the most part with an accuracy of 82-92%. The least accurate position for the phone was the back pocket.

Discussion
I was surprised they were able to get so much accuracy out of just an accelerometer. while it would have been nearly foolproof to simply attach strain sensors on the participants feet, it is not as elegant or user friendly. One fault I can imagine for this is that errors would not be easily be undone. Since one out of every 10 inputs will be incorrect, there would need to be some mechanism to give feedback for an error, and to easily undo the previous action. I don't know how to start with this issue, and I feel that the researchers acknowledge this.
Since this amount of precision can be done with just acceleration, I wonder how much other problems can be solved with using all a smartphone's sensors in the same efficiency. This represents a large amount of problems that have not been thought of with user input.

Tuesday, September 20, 2011

Alternate Ethnography results

redacted