Wednesday, January 23, 2008

Flexible Gesture Recognition for Immersive Virtual Environments.

This paper presents a method based on gesture recognition to interact with the virtual environment in 3D. Highlighting the cons of the current means of Gesture Recognition, author emphasizes that it is best to have an interacting system which is more natural and doesn’t require special clothing, background and other location based constraints for recognizing gestures. As a solution, they have proposed usage of an inexpensive hand glove based devices to interact with the virtual environment emphasizing the role of hands while interacting even with the natural environment. Authors have highlighted that though the hand gloves are good for interaction, the problem lies in measuring their orientation and position in the space. With the cheaper P5 gloves (which authors have used) the position in space cannot be found accurately as the position sensor is an infrared sensor based on reflection of the IR radiation and in the professional glove, does not have a location sensor and requires additional Electromagnetic radiation based Flocker birds to locate the position of the gloves in the space. Apart form this, professional gloves require additional wire that connects the glove with the device and send back the sensor information. Though, this advance set up is able to give best estimation of position, it is cumbersome to use and affected if there are metallic surfaces in the vicinity of the usage area. Keeping these problems in mind, the authors have used gestures based on the flexion information and incorporated additional information about the orientation for defining gestures and eliminated the usage of gestures which require hand motion in space over time. Since between gestures there are always some unintended movements which are not the gestures, to deal with them, they have added time constraint. Thus with this time constraint only the gestures that are held static for some predetermined time are recorded and rest non gesture movements are discarded. This method helps in constructing complex gestures formed by multiple small gestures. Based on the discussion above, authors have defined gesture as sequence of succeeding postures.

Gestures are recorded in form of a 5D vector for finger flexions, were each dimension corresponds to the sensor value received from the P5 glove, orientation information and another value, indicating the relevance of orientation. In order to recognize the gestures, they have framed a gesture manager which has a template for each gesture defined by the authors. The template is in form a response from the P5 finger sensors which are framed as a 5D vector, based on the flexion values corresponding to the particular gestures. In order to deal with the variability in the gestures (even by the same person), they have used a gesture averaged over several similar gestures by the person, as a template for a gesture. Each gesture corresponds to some identity which can trigger an event. For the recognition, the input value is obtained from the glove and compared against the templates to measure a distance metric. If there is a gesture in the library, which corresponds to the minimum distance metric keeping cognizance of the defined thresholds, then orientation is compared and if that too is within the defined threshold, the input gesture is recognized. After the gesture is recognized, the identity associated with that gesture triggers the associated event. If there is no match, no gesture is returned.

Discussion:

This is a simple paper presenting a beautiful way of interacting with the virtual environment. I liked the way they have explained the previous works on image based gesture recognition and the problems associated with them and also the problems with the glove based gesture recognition. Based on only the flexion values and the orientation though they have eased out the complexity of the problem, but still the solution is very affective to deal with the simple interactions. But this method lacks a lot. First of all, since we are dealing with just orientations and flexion values, there should be a proper position of the infrared receiver tower and the location of the hand for both training and testing as change in any of the position would add errors to the input as different positions (usually at an angle to the receiver) would add different phase to the reflected infra red (IR) radiation and thus different response will be recorded. Also, we have to be very close to the tower to get orientation feedback as the IR responses are not very strong at larger distances. Secondly, I believe that there are fewer gestures that look very different for each other while using just the fingers. So there is a good chance of misclassification in similar looking gestures. Another drawback of the paper is that they have not mentioned the distance metric they have used, I believe simple Euclidian distance is not a good measure of similarity as larger response of even a single sensor may lead to larger over all distance even though all other responses may lead to smaller distances. May be a normalized measure of distance can provide a better solution. Apart from this, the biggest drawback is that there is no information about the performance results and the gestures that they have tried and if some gestures caused some ambiguity. I would be also interested to know if we can look for a totally user independent way to interact with the virtual environment by using some normalized sensor responses as input.

6 comments:

- D said...

I think in five dimensions, with each of the sensors having the same range of values (0-255?), Euclidean distance would work fine. I only think you'd start to encounter problems using distance when one dimension had a range 0...1, and another had a range of 0...1000, something like that.

That being said, I don't like their use of distance either. But it works in a lot of situations and is a staple method for computing similarity.

Pankaj said...

I agree with you but the problem arises when we measure the distance and observe that the vectorial distance vector is something like [ 100 1 1 4 6] and assuming that it is the shortest distance , if we look at each dimension individually, then this '100' is certainly going to be the measure.

I might be wrong but thats what came to my mind!!!

Brandon said...
This comment has been removed by the author.
Brandon said...

commented on the wrong post originally...

i think having a user independent system would be GREAT but that's a very hard problem.

Pankaj said...

I agree with it Brandon!! User independent is very difficult problem!!!

Paul Taele said...

Most of the blog posts (including mine) have been harping on this paper for not publishing its results. Ignoring that aspect, I think the paper was able to accomplish a lot using an inexpensive commercial product. Sensor values will be strange, but I think it was partly justified in the paper by the authors with how the glove's use would be assumed to be in a normal computer usage position.