Monday, March 3, 2008

Using Ultrasonic Hand Tracking to augment Motion Analysis Based Recognition of manipulative Gestures

This paper deals with the introduction of the ultrasonic sensors with accelerometer and gyro meters for capturing the gestures to recognize the activity. The contribution of the paper has been claimed as the use of ultrasonic for the motion analysis and combining the information from the sensors to refine the results. The information obtained by different sensors used, is processed by a classifier so that the motion can be recognized. The classifiers used are: HMM’s, C.4.5, KNN. For capturing the sensor data, three ultrasonic beacons are placed on the top of the roof and the listeners are placed on the arms of the users. It is reported that the ultrasonic deals with the problem of reflection, occlusion and temporal resolution (low sampling rate) and hence the information provided by just ultrasonic sensors is not reliable and there are many false responses associated. Apart from this there is noise associated with the sensor input over the time frames which cannot be smoothened using Kalman filters as the sampling rate is much less than the frequency of hand movement.

In their experiment they have taken an example of bicycle repair and have chosen following gestures involving pumping, screwing screws or de-screwing them, different pedal turnings, assembly of the parts, wheel spinning and carrier object removing/ placing

They have tried approaches that can be classified into two categories:

  1. Model Based Approaches
  2. Frame based approaches.

In the model based approach they are using the HMMs on the sensory information obtained by the 2 gyro meters and 3 accelerometers on the users’ right hand and also the same set of sensors on the upper right hand.

In the frame based approach the feature vectors are obtained during each time frame and used for either training or testing of the classifier. The set of features extracted are: mean, standard deviation, median of the raw sensor data. This approach captures the local features and can be used to obtain the local characteristics which can be exploited for training or testing the classifier. For their approach, they are using no overlaps between the adjacent windows and using the part of the feature vectors from a frame for training and part for testing. For the comparison they are using the Kmeans and the C.4.5 classifier for testing the frame based approach.

They also proposed the use of plausible analysis for classification which actually means restriction of the search space for the vectors and using the information from both frame based and model based approaches for classification. For example the result of the gesture recognition obtained by the HMMs is compared with the constraint restriction imposed and if it is satisfied, the gesture is selected else next best gesture satisfying both constraints is selected.

In the results they have claimed that the ultrasonic with the C.4.5 classifiers produced the results close to 58.7% and with K-means they have shown 60.3 % classification. They have argued that since most of their gestures are not distinguishable using only hand locations, the results were affected. They then used a kind of ensamble to merge the inputs from the accelerometer and gyro meter and obtained a high classification in (90 % range). They argued that for certain gestures they almost achieved 100% while for some gestures, which are ambiguous and can be confused with other gestures, there was a drop in classification.

Disussion:

They have just merged the ultrasonics with the accelerometer data and used the same to get the sensory information of the local movement associated with the body part with the global position of the part. This method is definitely going to yield better results as we have two layers (Global and then Local) of classification as we vote for the gesture that meets requirements for both layers and it is no surprising. The method is intuitive but needs a specialized room, as ultrasonic waves are reflected by metallic objects and also occlusion affects the response. They have not addressed the issues with occlusion associated with our requirements as we are dealing with fingers and hand movements which cannot be prevented from occlusion using the top, mounted beckons. May be we can use array of beckons on ground ,top, sides to capture the responses but that would restrict the doimain of application for the user.

Also I feel, instead of wired accelerometers and gyro meters, it should be nice to use the wireless sensors which were described in the last week’s paper on “ASL for game development”. I would be really light weighted and more particle to use.

1 comment:

Paul Taele said...

The paper (and Josh P.) already convinced me that ultrasonics would be a nice addition to the set of sensors we're working with in the class. I also liked your comment on the benefits of having global and local data available when ultrasonics are augmented to existing input devices. I guess it does bring up the question as to whether too many sensors becomes a problem of being too complex.