Thursday, September 29, 2011

Blog #13


Combining multiple depth cameras and projectors for interactions on, above and between surfaces



Authors:
Andrew D. Wilson - Microsoft Research, Redmond, WA, USA
Hrvoje Benko - Microsoft Research, Redmond, WA, USA

Proceeding
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology

Summary
Depth cameras and projectors are used to make an interactive room named "LightSpace." Interaction are done on plain non-electronic surfaces in a natural fashion that only requires the cameras and projectors. The point of this approach is to limit external devices, since the surfaces do not need attention, and nothing needs to be worn on the person.

Hypothesis
To the author's knowledge, no research has been done on making an interactive space that solely used depth cameras and projectors. 

Methods
The system is composed of a central placement of 3 depth cameras to view the entire room, and 3 projectors. Two projectors are for the table and walls, and one is for projections onto the person, such as a virtual object to be held in the hand. After calibration of the depth cameras using 3 points on the two interaction surfaces, a model mesh of the person is created. for the table, all interaction is analyzed only from a 10cm volume above the table. The resolution is great enough to determine touch onto the table, and essentially creates a multitouch interface on any surface. There are three types of interactions possible with this system:

  • Multi-touch interaction on the "dumb" surfaces
  • Holding and transferring virtual icons of objects by grabbing them off the side of an interface
  • Activating a menu by holding a hand in a column of air above an icon of a surface.
Instead of performing computation on the mesh generated from the cameras, "virtual cameras" were used from orthographic projections of the mesh. There were 3 virtual cameras generated: two for each surface, and one for the entire room.

    

Discussion
When i was watching the video presentation of the LightSpace concept, I couldn't help noticing how rough the interactions were on the surfaces. This is probably due to either the small resolution of the camera prototypes or the face that the underside of the hands can not be seen. One solution would be to use depth cameras in more diverse locations, but then there would be more complexity to the system.

This paper was published approximately one month before the release of the Kinect system. Since then there has been an SDK released for it and many people have used it for several creative hacks. In my opinion, to stay within the main idea of the paper to reduce external complexity, future 3d interaction would have to use actual 3d projection, since currently this can only be emulated by actively measuring the person's position in space. This however requires external hardware to be worn about the person.

Tuesday, September 27, 2011

Blog #12

Enabling beyond-surface interactions for interactive surface with an invisible projection


Authors:
Li-Wei ChanNational Taiwan University, Taipei, Taiwan Roc
Hsiang-Tao WuNational Taiwan University, Taipei, Taiwan Roc
Hui-Shan KaoNational Taiwan University, Taipei, Taiwan Roc
Ju-Chun KoNational Taiwan University, Taipei, Taiwan Roc
Home-Ru LinNational Taiwan University, Taipei, Taiwan Roc
Mike Y. ChenNational Taiwan University, Taipei, Taiwan Roc
Jane HsuNational Taiwan University, Taipei, Taiwan Roc
Yi-Ping HungNational Taiwan University, Taipei, Taiwan Roc



Proceeding:
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology

Summary:
By using infrared cameras, an infrared projector and color projector to facilitate multi-touch along with invisible markers on a table, three mechanisms for further interactions are proposed and explained. 

Hypothesis:
There was no specific hypothesis, but they did mention that the system allowed for more intuitive navigation and more enjoyable use.

Methods:
Under the table, there are 2 cameras and 2 projectors. A color projector projects the screen, an IR projector projects infrared markers, and 2 IR cameras on different corners of the table pick up multi-touch interaction. When an interaction is detected from subtracting what is seen by an above camera vs what is expected from the invisible markers, the markers within the zone of interaction are removed to prevent the markers from modifying the input itself. The markers themselves allow for the mobile above camera to calculate it's own 3d position and orientation.

With this system in place, a projector attached to the upper camera can give more information about a subset of the graphics on the table. This in essence allows for greater resolution and interaction.

When the markers are analyzed by a tablet instead, a virtual representation of the table can be shown.

The whole system is powered by a normal consumer desktop.

Results:
The original revisions of the software suffered from vibration of the displays which was fixed by Kalman filtering.

Discussion:
At first I didn't understand the reason to use markers on such a display, but now I see what they were doing, the uses for such interactions are nearly limitless. With a multi-touch display and a multi-touch tablet for further 3d manipulation, I see this could have great use in CAD niches. The one thing that bothered me for the upper cameras was the low resolution of 320x240 pixels. I myself have done this hack to make a normal webcam into an infrared one, but the limits of the CCD were depressing. 


Blog #11


Multitoe: high-precision interaction with back-projected floors based on high-resolution multi-touch input

Authors:
Thomas Augsten Hasso Plattner Institute, Potsdam, Germany
Konstantin Kaefer Hasso Plattner Institute, Potsdam, Germany
René Meusel Hasso Plattner Institute, Potsdam, Germany
Caroline Fetzer Hasso Plattner Institute, Potsdam, Germany
Dorian Kanitz Hasso Plattner Institute, Potsdam, Germany
Thomas Stoff Hasso Plattner Institute, Potsdam, Germany
Torsten Becker Hasso Plattner Institute, Potsdam, Germany
Christian Holz Hasso Plattner Institute, Potsdam, Germany
Patrick Baudisch Hasso Plattner Institute, Potsdam, Germany

Proceeding
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology

Summary
User input onto a touch interface using feet is studied. An apparatus using FTIR touch sensing, and projected images to a floor is tested. Participants are asked to test and give input on general ergonomics of the system.

Hypothesis
Since touch interface is limited to whats in "arm's reach" an interface using feet instead would have a much larger possible input area.

Methods
A back-projected surface using an input technology: frustrated total internal reflection or FTIR is used for input resolutions of 1mm. First, 8 participants were asked to show how they could walk over theoretical buttons without activating them, and then activate one with a different gesture based completely on their own ideas. Some ideas were less plausible than others since walking on heels can be difficult, or even dangerous. It was decided that a tap gesture is best for activation, and simply walking would be interpreted as not being any action.

The next test was to determine how tap gestures should be preferred, since the user may want to use their big toe, the ball of their foot, or the tip. It was found that each person could easily have a machine learned default position which helps the next step.

Users were asked to type on 3 different sized keyboard using tap gestures calibrated to each participant. the smallest keyboard measured 1.1cm, while the largest ones measured 5.3cm wide. The users were timed based on the start and end keys and each error is logged.

Another test of the system's usefulness included navigating a game using differing pressures on the feet with natural mappings to left, right, forwards, and backwards, as well as turning.

Results
Since the prototype was rather small, it did not show the full capability of the system. However, user input was done well enough to allow a small 3% error rate on the large keyboard test. Most of the users preferred the larger one, while a few preferred the medium sized keyboard.

Discussion
In my opinion, this prototype does not merit much attention, but the full scale system in development could have much more functionality though. My biggest issue with the paper is that most people simply don't have good eye-foot coordination. Other gestures could plausibly be made while barefoot but due to physiological limitations will still be a hindrance.

Thursday, September 22, 2011

Ethnography Proposal

redacted

Gang Leader for a Day

I found it ironic that I had to keep reminding myself that this is non-fiction. The point in this research in the first place was because researchers were detached from society in Sudhir's opinion. However, when reading this, I did not feel the same way when a person died than if I read about it in the news; It felt just like a fictional character was written off in a story.

One think I noticed about the book itself was how recently it was published. It was nearly a decade after he received his Ph.D. in Sociology, and after all the papers about the projects he has published. Since in the book he frequently mentioned that he didn't actually have plans on making a biography for JT, this book almost functions as one.

Reflecting on the demise of drug trade in Chicago, I am reminded of the current "War on Drugs" in South America.  In essence, both instances are simply fights for power. The government both times takes a "burn everything" approach, starving traffickers of their business. Thinking on how to solve the modern drug issues makes my head hurt from politics instead though.

Interestingly, the events in the book represent the point of inflection on this graph.









One question I keep on imagining people asking is "What can be done about the gang problem?" I don't think that is a very good question since gangs were obviously a result of some deeper issues regarding poverty.





Blog #10

Sensing foot gestures from the pocket


Authors:
Jeremy Scott - University of Toronto, Toronto, ON, Canada
David Dearman - University of Toronto, Toronto, ON, Canada
Koji Yatani - University of Toronto, Toronto, ON, Canada
Khai N. Truong - University of Toronto, Toronto, ON, Canada


Proceeding
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology

Summary #1
The first part of the paper is a pilot study on how well people are able to make foot-based gestures without visual feedback. The pilot study involved directly measuring the position of the foot from 4 different gesture types without giving visual feedback.

Hypothesis
Since it was unknown whether or not people would be able to accurately be able to make foot gestures, the pilot study was simply for testing if the research is reasonable.

Methods
Participants were asked to make 4 different types of foot gestures:

  • Dorsiflexion - moving the foot up with the heel stationary
  • Plantar flexion - moving the heel up with the plantar stationary
  • Heel rotation - rotation about the heel
  • Toe rotation - rotation about the toes.
The results were gathered from direct position measurement using 6 motion capture cameras and a model attached to the foot.



Results
Dorsiflexion was the most inefficient gesture. Not only did the participants not prefer the motion, it took the most time and was the most inaccurate. Otherwise, there was less than 9 degrees of error for the other gestures on average.

Summary #2
Following the pilot study, another study was done to determine whether the motion cameras can be replaced with a simple accelerometer located in or around the participants' pants. Since Dorsiflexion was not a good gesture, it was left out on this study.

Hypothesis
The researchers believed that just using algorithms analyzing acceleration data from an iPhone could be enough to determine foot gestures without feedback.

Methods
The phone was place around the participants' hips in three locations, in the pocket, on the side (as if in a holster), and the back pocket. The participants were then asked to make different gestures with their feet (10 in total) and the phone recorded the acceleration data. 64 FFT coefficients were generated from the
data and Naive Bayes was used to classify the motion. All participants were right footed.

Results
The hypothesis was proven correct for the most part with an accuracy of 82-92%. The least accurate position for  the phone was the back pocket.

Discussion
I was surprised they were able to get so much accuracy out of just an accelerometer. while it would have been nearly foolproof to simply attach strain sensors on the participants feet, it is not as elegant or user friendly. One fault I can imagine for this is that errors would not be easily be undone. Since one out of every 10 inputs will be incorrect, there would need to be some mechanism to give feedback for an error, and to easily undo the previous action. I don't know how to start with this issue, and I feel that the researchers acknowledge this.
Since this amount of precision can be done with just acceleration, I wonder how much other problems can be solved with using all a smartphone's sensors in the same efficiency. This represents a large amount of problems that have not been thought of with user input.

Tuesday, September 20, 2011

Alternate Ethnography results


redacted




Blog #9

Jogging over a distance between Europe and Australia


Authors: 
Florian Mueller - The University of Melbourne, Melbourne, Australia; Microsoft, Beijing, China; Distance Lab, Morray, and London Knowledge Lab, London, United Kingdom
Frank Vetere - The University of Melbourne, Melbourne, Australia
Martin R. Gibbs - The University of Melbourne, Melbourne, Australia
Darren Edge - Microsoft, Beijing, China
Stefan Agamanolis - Distance Lab, Morray, United Kingdom
Jennifer G. Sheridan - London Knowledge Lab, London, United Kingdom

Proceeding
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology

Summary:
A method of interaction among joggers in separate locations is proposed. The mechanism includes a phone, heart rate sensor, and headset. For normal co-located jogging, a pair would need to stay near each other to be able to communicate. This means that if two people of different physical fitness were jogging together, one would have to work much harder than the other. The headset allows for the joggers to communicate to each other over any distance, and keeping physical pace does not matter anymore.

Hypothesis
The researchers proposed that using heart rate for a heuristic of how is "ahead" or "behind" of the other is more useful than physical location.

Methods
Communication Integration
A headset allowed communication between the jogging partners at all times. If a person is judged to be "ahead" based on how their effort is analyzed, the headset will alter the audio to make it seem that the other jogger is spatially behind the runner. The opposite happens to the other runner, the audio becomes more quiet and changes to the audio make the runner that is "ahead" seem to be so spatially.

Effort Comprehension
Heart rate is used to determine who is exerting more effort. Each participant was asked to jog at the rate where they were comfortable to still talk.


Virtual Mapping
The joggers' heart rates are compared to each other go generate a handicap. People that are not as fit will reach their maximum heart rate more quickly and run more slowly than others. Since physical location is not measures in this paper, it is not counted.


17 participants were asked to use this system and give feedback.

Results
The participants enjoyed the fact that physical pace did not need to be kept in order to sustain communication, supporting the hypothesis.


Discussion
This approach to jogging elegantly solves the issue of differing states of physical fitness. I personally think this is ready for general use directly after the paper, if not for a few design issues. Many people have smart phones which can have Bluetooth interfaces to a headset and heart rate measurement (being a designed solution, or an Arduino interface). One issue that may be encountered is that some phone companies do not allow voice and data to be communicated concurrently, such as those that use CDMA technology.

This paper is a very interesting piece of virtual interaction, but something tells me that this is a very good example in a sea of bad ideas. A lot of virtual interactions like this seem to simply be games, which questions their merit.

Thursday, September 15, 2011

Blog #8

Gesture search: a tool for fast mobile data access

Author
Yang Li - Google Research, Mountain View, CA, USA
He has published 25 documents and been cited 195 times

Proceeding 
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology

Summary:
Standard GUI input on mobile devices is rigid and sometimes difficult. Often a user may need to traverse several pages to get to an item. Gesture search is a gesture based search application for Android devices that allows for a considerable amount of ambiguity. It is a part of Google Labs and should be considered a concept.

Hypotheses
Yank Li hypothesized that Gesture Search provides a quick and less stressful way for users to access mobile data.

Methods
Gesture Search has two components: Latin character gesture recognition and a search algorithm. The algorithm is not explained in detail. The Article primarily describes how ambiguous gesture input can be translated into useful results for the user. To perform a search, a letter is drawn using the whole screen. Since some letters can look similar, both possible cases are considered in the search. After each letter is read, the list of results updates. The results can be any item of data on the phone, including contacts, music, and applications. Hidden Markov Models were used to determine the next possible results during the search. Data is collected from users opting into longitudinal data collection.

Results
For The data collected indicates that 84% of the search results were found within one gesture, and 98% in two gestures. Also, since the application resides in the Android Market, people found it agreeable with 4.5 stars out of 5.


Discussion
I tried out this app and was generally satisfied with it. The first task I tried to do was to search songs, but it didn't seem to search for it or the input was too erratic. However searching for contacts was much more successful. I still prefer SenseUI's input scheme for finding contacts, which involved entering numbers and the interface guessing who to bring up based on an algorithm similar to T9. In its current state, I don't see it as useful since i can search in the file manager Astro or the command line interface ConnectBot just as fast, since all 3 apps require a shortcut to open.

*edit, I found why music wasn't being searched, it wasn't selected for indexing by default.

Blog #7

Performance optimizations of virtual keyboards for stroke-based text entry on a touch-based tabletop

Author:
Jochen Rick - The Open University, Milton Keynes, United Kingdom
He has published 19 documents and been cited 65 times

Proceeding 
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology

Summary
Text entry on touch interfaces can be difficult due to the physical position of a body. A model is created for stroke based text input. Stroke based input is done in either of two ways: matching the entire shape to a collection of precomputed words, or to calculate direct angles of strokes to estimate endpoints.

Hypothesis
Jochen Rick hypothesizes that his stroke based text entry is faster than normal tapping operations on a standard keyboard layout for a touch interface.

Methods
Tapping input follows Fitt's law, which describes the amount of time to perform a task dependent on the size of the goal and the distance from it. The problem of determining points was solved using a model to generate straight lines and angles from natural curved input. Since the hardware used for testing only had a refresh rate of 20Hz, the testing for a line passing through a goal was determined by interpolation of consecutive points. Eight adults were chosen to spend about an hour repeating strokes to generate data regarding time and accuracy. They were told to stroke through 4 points of varying angles as well as make tapping gestures and single stroke tests.

Results
Stroke based entry was superior to tapping speed for every keyboard layout selected. The fastest speed was 52.7 wpm generated from an OPTI II layout.


Discussion
I've had a friend preach to me the benefits of Swype (a similar program for Android and jail-broken iPhones). I have no doubt that it could well be faster in raw typing speed, but in my mind the main issue comes in on making errors. For making a wrong tap in a touch interface, you just fix that letter (with better word guessing algorithms in Android 2.3, this is now unnecessary most of the time). However for fixing a swipe based error, the entire word will likely have to be replaced. Since USB host mode coming to Android 3.0, typing quickly could be simply be done by plugging in a real keyboard instead. Also, I should note that all the stroking speeds are slower than my speed for casual typing on a physical QWERTY keyboard.

Ethnography Initial Report

redacted

Tuesday, September 13, 2011

Paper Reading #6


TurKit: human computation algorithms on mechanical turk




Authors:
Greg Little Massachusettes Institute of Technology, Cambridge, MA, USA
Lydia B. Chilton University of Washington, Seattle, WA, USA
Max Goldman Massachusettes Institute of Technology, Cambridge, MA, USA
Robert C. Miller Massachusettes Institute of Technology, Cambridge, MA, USA

Released:
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology




Summary:
A set of tools is described for using humans to perform computational tasks. Humans tend to take anywhere from a few seconds to a few minutes to accomplish a tash, so repeating tasks should be minimized or eliminated. Mechanical Turk is a system in which programmers can create tasks to be directly assignable to humans in a simple programming standpoint. Also, since machine computation is so much faster than human computational tasks, it is simple to just crash the program to wait and rerunning the program up to the current state instead of blocking the program.

Contributions: (in lieu of hypothesis)


  • TurKit Script: An API for algorithmic MTurk tasks.
  • Crash-and-Rerun Programming: A programming model suited to algorithmic use of human Computation, addressing issues related to high-cost and high-latency steps involving humans.
  • TurKit Online: A public web GUI for running and managing TurKit scripts.

Methods:
Since saving every state of a program in a database is exponential, TurKit only stores the responses from humans. This allows for a new way of thinking about the programming flow since a deterministic program could be run again nearly instantly if the human responses are stored. This is done by using the once identifier to store the human answer and question in a database. Also instead of waiting for humans in a blocking programming standpoint, TurKit offers two alternatives:

  • Forking to implement parallelism.
  • Crashing the program and recomputing up to the last state.
Three examples are shown using the TurKit framework. The first example is a simplified quicksort. By recursively asking a person to objectively rate a subset, objective sorting is possible in parallel. 

The next example is an iterative captioning algorithm. An image is shown and a question is asked to describe the image. After each iteration, people are allowed to vote an improve on the caption. The voting system allows for bad improvements to be sorted out.

The last example is iterative OCR. Humans know how to read mostly, so human computation is a decent way to perform the task. For the example, people are asked to write the words in an excerpt and put a '*' character on words that are not known. This way, each enhancing/voting iteration gets the state closer to the goal. 

Results:
The most visible results came from the OCR test. There was only one error after 12 iterations. However other tests were shown to determine the efficiency of the TurKit system. The time used for completing singular tasks was nearly proportional to the amount of money sent per task, up to 10 cents. Over the whole study, just $364 was spent to generate about 30,000 tasks.'

Discussion:
I find the idea of using humans to do tasks fascinating. Since AI technology sometimes can not cope with problems today, or general programmers do not know AI concepts, this offers a simple solution. In the future, it would be interesting to see actual incorporation of AI to accomplish some tasks where humans accomplish others using the same framework.




Thursday, September 8, 2011

Blog Paper Reading #5


A framework for robust and flexible handling of inputs with uncertainty

Authors: 

Julia Schwarz - Carnegie Mellon University, Pittsburgh, PA, USA
Scott Hudson - Carnegie Mellon University, Pittsburgh, PA, USA
Jennifer Mankoff - Carnegie Mellon University, Pittsburgh, PA, USA
Andrew D. Wilson - Microsoft Research, Redmond, WA, USA





Released: UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology

Summary
There can exist ambiguity in handling gestures. Two gestures may start in the same place, but end differently, and choosing one over the other at the beginning of an input would be a mistake. This is where the concept of interactors comes in. A gesture can trigger two separate invisible analyses (interactors) and not act until the underlying framework is sure that only one or no action has taken place.

Hypothesis

The researchers hypothesized that by treating their input in a way that handles its true uncertainty well, they could increase the accuracy of user interactions.


Methods
The way probability was assigned for each interactor was done by using a probability mass function. All probabilities added up to 100% where 80% represents a successful match. Six examples are shown described.

  • Smart window resizing - A touch input starts at the intersection of an icon in the background and the edge of a window. The gesture could mean the user wants to either move the icon or resize the window. The difference is determined by having the probability of resizing be larger if the input moves perpendicular to the window. However, if another direction is taken, it is more likely the user wanted to move the icon instead.
  • Remote sliders - Two sliders exist parallel to each other, and a touch input is generated directly between the two. Either slider has the same probability, but when the touch is moved up or down closer to a single slider, the probability function decides the closer slider was intended.
  • Tiny buttons - Three buttons which are all smaller than the size of a single touch input exist. If a touch is generated slightly to the left, and if the left button is disabled, it is improbable that the left button was desired and the middle one would receive the input.
  • Smart Text Delivery - If a user does not select a text box before typing, the probabilities of a finalization can help determine the correct text box
  • Using the metrics from a speech recognition API, the same probability based decision process can be applied to speech
  • The largest test was for users with impaired mobility. The amount of uncertainty in this case is much more than a normal user. 



Results
The examples shown are the end result of the framework in question. They work.

Discussion
Many of these examples can be shown already in touch based interfaces for end users. However, being specified and quantified in a framework is useful for future development theory. This paper is interesting to me because I have used touch interfaces that do not account for uncertainty, and those that do. It is clear that the ones that do are superior in a user preference standpoint. These topics will most always have to be considered in touch interfaces.

Blog Paper Reading #4: Gestalt


Gestalt: Integrated Support for Implementation and Analysis in Machine Learning  



Authors:
Kayur Patel - University of Washington, Seattle, WA, USA
Naomi Bancroft - University of Washington, Seattle, WA, USA
Steven M. Drucker - Microsoft Research, Seattle, WA, USA
James Fogarty - University of Washington, Seattle, WA, USA
Andrew J. Ko - University of Washington, Seattle, WA, USA
James Landay - University of Washington, Seattle, WA, USA

Released: 
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology


Summary
Debugging for machine learning is a difficult problem. There is too much complex information to successfully debug a program using print statements, but showing the data visually can be difficult. Gestalt is an API that allows for machine learning problems to be cataloged during creation inside a database and shown graphically. This allows for easy switching between coding and analysis.

Hypothesis
The researchers want to test whether their tool for analyzing machine learning problems is more efficient than not.

Methods
The way they approached the problem was to allow data to be put into a database alongside the coding process. A algorithm pipeline could be traced using data tagged with a "Generate Attribute" menu. Here, correctness and valuation of data can be quantified.
Two sample machine learning problems were attacked:

  • Determining positive or negative tones from text
  • Determining shapes from gestures
Gestalt assists the first problem by allowing literal correct/incorrect responses from the user. This allows misclassified texts to be easily shown in a visually dense manner. The second problem can be shown more visibly since it is a visual problem. From a general view of all the classifications, it is more simple to determine where errors exist, such as triangles being confused with squares. 8 participants with knowledge of machine learning problems and use of python were chosen to test Gestalt. The study involved injecting bugs into working code and timing the participants on how long it takes to debug using generic tools compared to Gestalt.



Results
All subjects found or solved more issues in the same amount of time compared to not using Gestalt.

Discussion
I am convinced this is a helpful tool, however I am not convinced on how often I would use it. This puts its significance on the line. I don't think there are any faults in this paper since they set out in making a tool better than what exists, and they succeeded. Machine learning is a large subject in AI, and will probably be much more prevalent in the future. 

Tuesday, September 6, 2011

Paper Reading #3: Pen + touch = new tools

Pen + touch = new tools

Ken Hinckley, Koji Yatani, Michel Pahud, Nicole Coddington,
Jenny Rodenhouse, Andy Wilson, Hrvoje Benko, and Bill Buxton
Microsoft Research, One Microsoft Way, Redmond, WA 98052

Released: UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology


Summary:
Using both pen and touch interfaces simultaneously allows for more types of gestures to be created. An application called Manual Deskterity is discussed and reviewed. It is a scrapbook application on Microsoft surface designed as a prototype. Normally pen and touch interfaces are not used together or used in a way where differentiation is not possible, such as using a pen for the same tasks used by a finger. Some design considerations were based on observed behaviors:

  • Touch and pen had specific roles due to the capabilities of both. Hands are good for manipulation while pens are good for precise input. 
  • When using both hands for touch interface, people tuck the pen in their hands to give back the role of the hand.
  • People hold clippings of pictures with their non-preferred hand.
  • People hold the paper while writing.
  • People frame paper with their index finger and thumb while reading.
  • People cut clippings from inspirational material while holding it with their non-preferred hand.
  • People arrange their workspace so that the material is close.
  • People pile "interesting things and hold them with their non-preferred hand.
  • Some people draw along the edges for clippings
  • People hold their place in a stack of papers with their non-preferred thumb
Using these behaviors, A system was designed to both use pen and touch in a synergistic way, but not require both for all tasks since the pen may not always be available. The core idea is that the "pen writes, touch manipulates."


Hypothesis:

The purpose of the study was to gain "insight as to how people work with physical tools and pieces of paper."


Methods:
Microsoft Surface is the platform the study is done with. The pen is differentiated by using an infrared pen which is read differently than touch. Several gestures are made using both pen and touch. One gesture is cutting pictures by holding it with a finger and fully crossing it with the pen. Copying an image is done by holding an image and "peeling off" a copy with the pen. Any other image can be used as a straightedge for the other gestures, such as the cut gesture. When a finger is pressed, and transparent dot appears on the original location, and opens up a color menu when moved away from the dot. Lines can be drawn by using the tape tool which is activated by extension of the built in ink tool. To enable the tape mode, the user holds a finger on the ink line. Eight participants were chosen, all having right hand preferences.




Results
The paper's goal was to discuss and observe behaviors for this new interface. It was found to be helpful in supplementing input, as well as allowing for users to switch modes of input from pen to touch. Since not all inputs can be directly implemented in the exact same way the tasks are done in real life, the gestures were "designed" to work in a way that uses the possible movements of pen and touch in a logical way. Since they wont be obvious, some instruction to the test users had to be given. The authors are not convinced themselves that their implementation will scale to a full-blown application, but the idea could.

Discussion
The new tools made by using a pen to enhance gestures could have been just as easily done by adding menus or buttons in my opinion. However the reason for this research is to find ways of abandoning these lines of thought. Since this is an incomplete theory and application, I don't find this as interesting as the Hands-on Math article.

Paper Reading #2: Hands-on math


Hands-on math: a page-based multi-touch and pen desktop for technical work and problem solving

Robert Zeleznik, Andrew Bragdon, Ferdi Adeputra, Hsu-Sheng Ko
Brown University

Released: UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology

Summary:
In the past, mathematics were observed in two seemingly mutually exclusive techniques: using paper or using computer algebra systems (CAS). This is because CAS tends to limit creativity, but paper lacks computational assistance. This paper attempts to solve this by giving a user direct access to the underlying tools seen in CAS but in a way that resembles natural manipulation of algebra. 

Hypothesis:
Hands-on Math is designed to test whether or not a hybrid system is better than CAS individually if "CAS functionality were available in a paper-like environment."


Methods:
The program was built on Microsoft Surface and uses the StarPad SDK for handwriting recognition. The surface contains the workspace and virtual paper can be pulled out of the right bezel. After equations are entered, they can be manipulated in reasonable ways such as factoring, or simplification using gestures where the selected parts depend on where the finger touches the equation. For example, if a plus sign is selected, both the left and right parts are selected. If there is an addition, and the right section is pressed, it can be moved around without factoring. The same communitive property is implemented with multiplication, but there is no symbol used when variables are selected. A gesture for "folding" the previous equations using a vertical pinch gesture is also implemented. The main issue for designing gestures is to have them be reasonable and self evident. This is inherently difficult if the purpose is to expose hidden features of a CAS. This is partially solved by using transparent menus that appear near half finished gestures. This can separate variables into summations or factors. Nine participants were selected from Brown University to test the system and give opinions.  

Results:
Since some of the gestures were ambiguous, the test was not to see who could figure out the program without instruction. Some features that were not obvious seemed to be ergonomic after instruction. Also since this is a prototype, the students were asked to ignore bugs when possible. The paper interface was found to be natural even though the pen used was not accurate all the time. For deleting, the participants did not use the original single gesture to remove documents, but used a two step process that made more sense in a physical sense: by dragging it close to the edge (of a perceived desk) and then selecting the bin that appeared. In general the students enjoyed the program and therefore supported the original hypothesis

Discussion
The goal for touch based interfaces is to have a natural response from the computer that doesn't simply emulate normal actions in real life, but expands upon them. The researchers here could have made the best paper simulator ever but without the elements of CAS added, people would probably prefer paper. Also the gestures used to carry out tasks should be unambiguous but that is not always possible. Instruction on how to use the program is adequate sometimes, but it would be best if no explanation is necessary. This is why I think the iPhone has succeeded: a 5 year old without instruction can operate well. I am not convinced this is a commercializable product, but does prove its hypothesis decently since the information given by WolframAlpha is somewhat interactive already. For it to have widespread use, it would have to support more functions such as advanced calculus. If this was added though, even more ambiguous gestures or convoluted menus would have to be made. At this point I think it would be best to use specific programs to solve specific problems. It could be useful for young children to understand algebra though as a learning tool, which makes it interesting. In the future, many problems will have to be solved by assigning gestures in logical ways, but that is the main issue that will have to be solved. 

Thursday, September 1, 2011

Blog #1: Imaginary Interfaces

Title: Imaginary Interfaces: Spatial Interaction with Empty Hands and without Visual Feedback


Origin: Hasso Plattner Institute Potsdam, Germany


Authors: Sean Gustafson, Daniel Bierwirth and Patrick Baudisch


User Study #1
The problem investigated was whether a person can use an imaginary interface do perform character input. The imaginary device consists of a persons hands, one to make a reference point, and the other to perform interactions in a virtual space. Users would repeat characters and the error would be determined on how well the recognition device could read the characters read on an imaginary screen. Previous experiments have shown that users will not have enough precision to differentiate the letters "D" and "P." 









Blog #0: On Computers

Aristotle did his best to explain plants using the science and observations he had. It makes sense that some of his conclusions were a little off due to lack of modern scientific tools such as microscopes or genome sequencers, and prevalence of alchemy to explain plants. This could be compared to a future where technology becomes so complicated that no human could know enough about a system to explain it in a detail to reverse engineer it. Certainly Aristotle could not have practiced genetic engineering - Can people still engineer computers?

We take information for granted today. Any 7 year old these days can more accurately describe plants than Aristotle after some research on Wikipedia. The one thing that this 7 year old would most likely not be able to offer is an intelligent world view perspective on this information. "On Plants" shows a perspective on science very different from today, and the conclusions generated are most likely different from those today. Most professors would probably agree that plants do not have souls, but that could simply be a different definition of soul, rather than a change in a perspective on life itself.

Can computers have a soul? This question has been asked several times in media, and depends on how life is defined. Aristotle proposed that if plants need water, then they must want water. Computers need electricity to operate, so were they engineered to want electricity? Computers can be categorized in the same way as plants, with differing levels of complexity, but what would complexity actually imply? Until systems gain enough power and intelligence to act as if it had a soul, these questions are just conjecture, but if a "soul" is literally defined and implemented by a programmer in the future, many philosophical and ethical questions will need to be resolved.

To more realistically compare Aristotle's experiences with the subject of plants to modern computing, a fitting scenario is if all semiconductor and software industries disappeared along with their employees. Could an average Joe learn enough about systems to be able to reverse engineer them? It would be reasonable to assume a level of technology could be obtained in 60-70 years since the technology started around WWII. The question I ask is: "How does the complexity in plants compare to the complexity in computers?" It is difficult to answer this without completely knowing both subjects.

Introduction Blog Assignment # -1




Why are you taking this class?
To be honest, I did not put very much thought originally into taking this class. However, it did look interesting since all computers interact with humans in some sort of way, and quantifying that is a useful tool.


What experience do you bring to this class?
I have some experience poking at Arduino projects, but mostly just to control servos. Nothing directly related to human interaction with it though.


What do you expect to be doing in 10 years?
In 10 years, I would like to be engineering a solution to some sort of real world problem. It's hard to say specifically what that would be since 10 years is a very long time.


What do you think will be the next biggest technological advancement in computer science?
I think the next best thing will probably be some tool to compile higher level code directly to support multi-threading in the same way higher level languages gave programmers easy access. Since the only real boost in performance in computers in the future will come from parallelism, it makes sense that it will need to be more easily accessible for general programming. 


If you could travel back in time, who would you like to meet and why?
I would go back and find out whether or not Fermat's Last Theorem was a hoax. It caused 400 years of searching and a 100 page paper for a real proof, all of which could have been prevented if he shared his original. 


What is your favorite style of shoes and why?
I love my Olukai sandals. Every pair of shoes I've owned have a general lifetime of about a year. These sandals have lasted 3 years with extremely heavy use and showing no signs of failure.


If you could be fluent in any foreign language that you're not already fluent in, which one would it be and why?
Japanese culture has fascinated me. Getting a look into it can't always be done by clicking a translate button, or reading Wikipedia articles written in English.


Give some interesting fact about yourself. 
I can lick my elbow.