Sunday, March 30, 2008

Hand Gesture Modelling and Recognition Involving Changing Shapes and Trajectories, Using a Predictive EigenTracker

Summary:

This paper presents an approach of gesture recognition using the vision based techniques. This technique boasts of no training involved like HMM's ,hence faster adaptability to new gestures . The only requirement (which is kind of not nice) is that the gestures should be well distinguishable. The algorithm obtains the affine transforms of the image frames and projects the image to the Eigenspace. In the Eigenspace, since only hand is moving, while the background is stationary, the first few PCA components would capture the maximum variance,i.e in fact the motion of the hand in each frame.

This method is inspired by another similar method called Eigen tracker, however, differs from it because of the added predictive modality. The predictive nature of the proposed method makes it versatile enough to track hand motion on fly without any requirement for description of the orientation and physical dimensions of the object offline as required for the previous eigentracker method. This predictive nature is induced by using the skin color for segmenting out the hand and then using particle filter to track the hand. The information about the position of hand is obtained by the significant eigenvectors in the eigen space as only hand is in motion while rest of the background is stationary.The variations in the motion direction are captured when the error between the prediction and the actual track exceeds certain threshold.Tracking Hand, along with the information about the change in motion track (captured by the error between prediction and actual position)can be used to map the information to a gesture would give the structure of the gesture (assuming linearity between motion ) which can be matched against the available gesture set (which is decided off line).


Discussion:


Well the nice part of the paper is the use of Particle filter for obtaining the information about the segments of the gesture by measuring the error between the prediction and the actual motion and the use of affine eigen space for capturing the hand motion. Well the method proposed is much different from the most of the papers we have read , though the results are not as impressive. As suggested by many in the class 100% doesn't make sense as the test data is 80% similar to the train data (64/80). Also, they have used PCA space to obtain the maximum variance, which is not robust for noise. I agree though with their simple steady background and black arms,it would work though in many real life situations this may not be feasible. I would have been happy with some complex gestures with slightly lower accuracy rather than 100 % accuracy (which I am not impressed with) with very distinct simple gestures.

However, I like vision based techniques as they provide much freedom and space for gestures which gloves do not provide and as such I would add this paper to my favorites in the semester.

With the advancement in the digital imaging techniques and capturing devices, it is possible to change the background to some other stationary background. So if the relative motion of the hand captured is faster than the change in the background, we shoule be able to capture the hand motion and blend it with some artificial background. By such an appraoch, we can tackle the problem of noises associated with changing background in PCA based techniques.

2 comments:

Paul Taele said...

As a vision-based paper, it was pretty good for what I got out of it. It's true that devices for vision-based approaches have improved to resolve the deficiencies of what seemed like a controlled environment in the paper, so vision-based approaches can hold its weight for executing quite accurate gesture recognition. I'm still not totally sold on the idea of vision-based methods being superior to glove-based input though, since I would rather rely on sensor readings from a glove than from a camera. Maybe I'll change my mind when cameras have uber-fine quality.

Kevin Wei said...

Yes, vision-based recognition typically has more applications. But, it does make the recognition more difficult. Anyway, I think this is a good paper combining particle-filter tracking with PCA.