Monday, February 25, 2008

Computer Vision based getsure recognition for an augmented reality interface

This paper presents a computer vision based gesture recognition method for augmented reality interface. The presented work can recognize a 3D pointing gesture, a click gesture and five static gestures which are used to interact with the objects in the augmented reality setup.

It is stated that to interact in the virtual environment, system should be able to select the object by pointing towards it and selecting it by clicking. As such for any such system it is important to these two features as the primary features. To make the pointing gesture intuitive, they are using the index finger as the pointer. They are also using some of the basic gestures, very different from each other, shown in the figure below:

By constraining the users to perform the gesture in one plane they are restricting the problem to 2D though it is 3D in nature and it is stated that after some trails users were able to adapt to the constraint without much difficulty.

As a first step, system involves segmentation of the fingers so that they can be distinguished from the place holder objects. For this they have used color cue to segment out the skin from the other objects. In order to deal with the issues of intensity and illumination changes, they are using the color space which is invariant to illumination changes i.e. normalized RGB space (chromaticity). In this space different objects form different clusters. These clusters are used to frame the confidence eclipse, by measuring the mean and the covariance matrices and distance of the chromaticity of the pixel is measured in terms of mahalanobis distance and thus we obtain different labels for different chromaticity value pixels.The predetermined size blob is then labeled as the hand and small blobs which are actually misclassification objects are discarded. The pixels in the hand blob which are missing are filled up using the morphological operators. To take care of the dynamic range issues, only pixels with certain minimum intensity are considered for the process. On the higher end the pixels which have at least one channel with 255, are discarded.

Since each gesture can be recognized by the number of fingers, they have used polar transformation and the number of concentric circle to measure the number of fingers lying in the each radius. The click gesture is the movement of the thumb and they are using the bounding box measure to determine if the thumb has moved or not by measuring the bounding boxes of the series of frames.

Discussion:

I chose this paper as I thought it would be nice to talk about the role of gestures in the augmented reality. This was a very simple paper and with very simple gestures that they are recognizing. The good part is their hand segmentation approach and some new ideas in term of augmented reality office which have come in some of the discussions. I did not like that though they claimed that they are recognizing 3D gestures, but by constraining users to move in a plane they forced the problem to simpler 2D. I believe that their recognition approach, based on counting fingers cannot work in 3D as the occlusion between the fingers will give ambiguous recognition results. However, I liked the approach they have presented, as by using such a system, interacting in a design meeting would be much interactive and less confusing.

1 comment:

Paul Taele said...

I'm not particular too fond of their recognition system. Based on the work this paper did, it might make a great argument against using a vision-based approach. As Brandon and Aaron brought up in their blogs, I feel that a glove-based approach for the limited gesture set and constraints in their paper would fare better.