Sunday, February 24, 2008

Georgia Tech Gesture Toolkit: Supporting Experiments in

This paper presents the gesture recognition tool kit developed by Georgia Tech, based on the Hidden Markov analogy. The tool kit is inspired by the HTK of Cambridge University, which is used for speech recognition. The main motivation behind the development of the kit is to provide the researchers adequate tool to concentrate on their research on gesture recognition, rather than dwelling into the intricacies of speech recognition to understand the HMM which has been widely researched by the speech community.

This tool kit provides the users with the tools for preparation, training, validation and recognition using the HMM. The preparation involves the user to design the appropriate models, determine the appropriate grammar and providing the labeled examples of the gestures to be performed. All these steps require some analysis of the available data and the gestures involved. The validation step involves the evaluation of the potential performance of the overall system. Validations approaches like cross validation and one left out have been used in the paper. In the cross validation, portion of data is used for training and the other part is used for testing where as in the left one out, one data sample is always kept out for testing and the model is iteratively trained on the remaining. Training utilizes the information from the preparation stage to train the models for the each gesture and recognition, based on the HMM’s, is used to classify the new data using the trained models.

Since it is necessary for the system to understand the relevant continuous gestures for any practical application, authors have proposed the use of rule based grammar for the same. With such a grammar the complex gestures can be explained with the set of simple rules.

This toolkit has been used in various projects being undertaken at Georgia tech and authors have brief details of the same. The first application is development of the gesture based system to change the radio stations by performing certain gestures. The data is obtained by the LED sensors. As a gesture is made, some of the LED’s are occluded which provide the information about the gesture. This information is used to train the model which can then be used for the recognition purposes. Authors have claimed a classification of 249 gestures out of 251 gestures by this approach. Another project introduced is the patterned blinked eye based secure entry. In this project, face recognition is coupled with blinking of eye to generate person recognition model. Optical flow from the images was used to capture the blinking pattern. In this model, it was observed that 9 states in left to right HMM were able to model the blinking pattern. This model achieved an accuracy of 89.6%.Another project deals with the integration of computer vision with the sensing devices like accelerometer and other mobile sensors for sensing the motions. For the correct recognition, different color gloves are used. The features are obtained by both the techniques and integrated into combined feature vector representing a given gesture for recognition process. For this project, they have used 5 states left to right HMM, with self transition and two skip states. This project has not been implemented till the paper was published so no information about the results is available. Authors have also mentioned the use of HMM based approach to recognize the working of the workman in the workshop. They are using the vision and sensors to receive the features which are used to recognize the gestures. Since workman are suppose to perform a series of gestures in order, their model keeps track of their moves and reports an error if they miss a gesture. This system according to them has received accuracy of 93.33%.

Discussion:

This paper just provided an overview of a new toolkit for gesture recognition was being developed at Gtech on the top of HTK developed by Cambridge University. Though, after reading the paper, I was happy that they have something ready for gestures, I was disappointed to find no code on the project webpage, which as per the paper should have been made available as early as 2003. The projects where they are applying the technique also don’t look very attractive as with Bluetooth and wireless remotes, changing channels is much easy compared to making gestures. It is quite possible that a slight unintended occlusion can trigger a channel change. I also believe that voice technology is much more superior now for the purpose. Another project of blinking eye based entry was also something that I did not like. It is not very difficult to copy the blinking pattern and also making and remembering the complex blinking patterns is not easy task. (It is torture to eyes if you have some eye infection [J]). With finger biometrics and retinal signatures establishments are more secure. I have an experience of dealing with the people in the workshop, and I know their motions are very much mechanized and measured to meet the fast manufacturing requirements, but still there are still many unintended motions (after all they are humans), which the presented system can interpret as gestures and provide the alarm signal. Also, it will be really troublesome to work in a workshop with accelerometers on your body which can even affect the efficiency.

Well There is nothing much to say about the paper, if this toolkit is available for download some where it will be helpful and good to have a look at it. May be it can save us from some hard core programming.

2 comments:

Brandon said...

yeah i guess this paper isn't as useful as i first thought if the toolkit isn't really publicly available for download. that's the main reason i chose this paper - i thought it could save us some time having to write our own HMM code. i also agree that some of the applications weren't very interesting. the gesture panel for the car probably isn't as good an option as say, buttons on the steering wheel. for one the user has to remove one hand from the steering wheel and then they also would probably have to look down to make sure their hand is properly aligned with the camera (which defeats their whole motivation). i also don't think the workshop idea is very good. why would a computer need to know what you are doing in a workshop anyways?

Paul Taele said...

One of my gripes with the paper, as you have mentioned, was with the applications which used the toolkit. These applications felt more like novelties that were tailored to make the toolkit look favorable. I would have preferred to see this toolkit build more conventional applications to measure its performance instead. Dr. Hammond already posted a link to the actual kit, so perhaps we can have a look of their system with our own eyes to gauge its merits.