Wednesday, January 23, 2008

American Sign Language Finger Spelling Recognition System

This paper presents a simple way of recognizing the sign gestures which after being recognized can be used as an input to a speech engine or text editing software for speaking or displaying the alphabet. The authors have used 18 sensor cyberglove to capture the sensor response for the sign language gesture for an alphabet and trained a neural network (perceptron) to recognize the alphabet corresponding to the gesture. The input to the neural network is the 18x24 matrix which represents the sensor response of 24 alphabets (except J and Z) and also a 24x24 identity matrix which represents the targeted Alphabet. This input is used in to train a Neural Network using MATLAB toolbox and then, for real-time usage, integrated with Lab view (a product developed by National Instruments for real-time applications).

In order to recognize a gesture, user makes the sign language gesture corresponding to the alphabet, which is then fed to the Neural Network framework running on Lab view. This framework responds back with a 1x24 matrix with ‘1’ at the position of the alphabet corresponding to the gesture interpretation by the neural network.

Discussion:

This is a pretty straight forward and simple approach to recognize the sign languages. However the limitation is that it is very user dependent and it works only, if the network is trained from the data obtained by the user who is intended to use the system. Another drawback is the omission of ‘J’ and ‘Z’ which makes it incomplete. It would have been better if some other gestures are used for J and Z which can be taught to the intended users. Also, neural network is very prone to noise and it would have been better, if the authors would have tried to add artificial noise while training which would have made training robust enough and might have increased accuracy. I feel for besides the alphabets, there should be some gesture for deletion and break also as then we can use the system to recognize alphabets which will form words, completion of the word can be represented by the break gesture, and then it can be fed to the speech generation system which can speak the word. Before using break, we can edit the word using the delete gesture which can delete an alphabet.

4 comments:

- D said...

I think that using a perceptron as your classifier is a bad idea, as they require the data to be linearly separable. Apparently their data was, as a linearly inseparable data set would cause the perceptron to iterate infinitely (unless you stopped it after so many iterations). Additionally, I think adding artificial noise is a bad idea. How do you know what kind of noise to add? Use a better neural network (which can be robust to noise) and more users.

You mention adding break and delete gestures. This would be good if you wanted to make a finger-spelling system that you used to type something into MS Word. Bad for people that want to communicate for finger spelling, as it would be completely unnatural to a native signer.

Pankaj said...

I believe this system is meant for the people to communicate through text and speech when the other person is not able to understand the sign language.

As far as noise addition is concerned, yes we can model the noise by analysis of the data but it is cumbersome. Usually in NN we may add a gaussian or poission noise, the idea is to train the network to kind of deal with the corrupted input.

Brandon said...

if this system is meant to be used by deaf people (as mentioned by the author) then i'm not sure how well they would adapt if you came up with alternative ways to represent 'J' and 'Z'. if they have been signing their entire life it would probably be hard to break them of the habit. if the target user was a novice to sign language then it may be easier to teach them an alternative way to sign a 'J' and 'Z'.

Pankaj said...

I agree it requires some training but I don't feel that learning two alphabets in different sign for communication would be difficult.