Sunday, April 13, 2008

Feature selection for grasp recognition from optical markers

This paper presents the feature selection methodology to select the relevant features from a large feature set for better recognition results. The aim of the project is to obtain a gesture class for an input using the minimum features. In the approach, the authors have used several markers on the back of the hand and used the calibrated cameras to track the markers in the controlled environment. They have supported the use of this method because this approach doesn’t affect the natural grasping of the subject. Also, it doesn’t affect the natural contact with the object.
In their experiment they have placed markers on the back of the hand and used the local coordinate system which is invariant to the pose.

In order to classify the gestures they have used a linear logical regression classifier which is used for subset selection in a supervised manner. They have used three markers a t a time as a single feature vector and then used across validation with subset selection with the number of features to arrive at the best feature set. They have tried both the backward and the forward approach for the subset selection and observed that the error between the two approaches is just 0.5%.

In their experiment, they have used a 90 dimensional vector for a hand pose representing the 30 markers on the hand. Their domain for the experiment is the daily functional grasps shown in the figure below.

They collected their data using the 46 objects which were grasped in a multiple ways. The objects were divided into two sets A and B, A containing 38 objects with the 88 object-grasp pairs and B containing 8 objects with 19 object grasp pairs. They collected data from 3 subjects for set B and 2 subjects for set A. Then they used 2 fold cross validation and with full 30 marker data obtained accuracy of 91.5 % while with 5 marker selected by subset selection they achieved 86% accuracy.

They also evaluated their feature subset on classifier trained on different data using both 5 and 30 marker set. They trained in total 4 classifiers (2 with the data from subject 1 and subject 2 respectively from object set A, 3rd one with the combined data of subject 1 and subject 2 from object set A and fourth on combined set A+B from subject 1 and subject 2).

They observed that the accuracy was sensitive to weather the data from the subject is used for training or not. With data included the accuracy for the user was 80-93% in reduced space (5 markers) and between 92-97% (with 30 markers). With totally new user accuracy tested on single user trained data, accuracy was abysmal 21-65% for reduced marker space. However the retain-ment of accuracy was over 100% in all cases.

In their analysis they also observed that the grasps for all the three subjects did well for cylindrical and pinch grasps however spherical and lateral tripod performed poorly because of the similarity between three finger precision grasps.


Discussion:

This paper has nothing new except the new complex linear logical regression classifier. Their analysis is also based on small user set and hence cannot be generalized to most of the cases. I think with more users with different hand sizes, it would have been a better paper. Also, I don’t understand why many papers claim that the accuracy increases with the samples from the user included in the training set. I think it is very simple and easy to digest fact which needs no explanation. Also, it would have been nice if they could have mapped the relation between the grasping patterns of the users, which might have been used for making more generalized set of features for a given set of users sharing similar patterns. The work is very similar to our paper on sketch.

No comments: