Sunday, January 27, 2008

An architecture for gesture based control for mobile robots

This paper presents an interesting method to control a mobile robot by using the hand gestures. Authors have stressed that it is very important that the robot should be able to interpret the meaning of the action rather than just imitate the action Hand gestures have been described in the paper as a natural and rich input modality to interact with the robot.

The complete system consists of a mobile robot, a CyberGlove, a Polhemus 6DOF position sensor and a geo-location sensor that tracks the position and orientation of the mobile robot. Apart from this there are two servers, Geo-location server and the Gesture server that are communication with each other. The task of the Geo-Location server is to keep track of the position and the direction of the mobile robot in 3 by 6 universal coordinate system. The role of the Gesture Recognition server is to interpret the gestures of the user, which are captured using the cyber glove and the Polhemus sensors, and then provide the interpretation to robot so that it can act on the input provided. All the components of the system are integrated within CyberRAVE which is a multi-Architecture robot positioning system for distributed robots and these servers communicate using the CyberRAVE interface. In order to recognize the gestures, authors have used a temporal HMM approach in order to take advantage of the temporal nature of the gestures. Instead of using the sensory information from all the 18 sensors, they condensed the feature vector from 18 dimensions to 10 features by linearly combining certain responses. This feature vector is then augmented by the first derivative of the 10 dimensional feature vector obtained in the previous step to obtain a 20 D column vector which is reduced to a single dimension codeword using the famous vector quantization. Authors, after examining the level of detail required for correct interpretation of the actions, chose 32 codewords. The code book for these codewords is trained offline where, they experimented with 5000 measurements which, as per them, captured all the possible samples of gestures and non gestures covering the entire span of the hand space. This set is them partitioned onto 32 final clusters and the centroid of the cluster forms the final codeword for the gestures in that cluster. These 32 code words are then used to define the 6 final gestures as a sequence of the codewords. The selected gestures are:

Opening: Moving from close fist to open hand.

Opened: Flat opened hand.

Closing: Moving from a flat open hand to a closed fist.

Pointing: Moving from the flat open hand to index pointing, or from a closed fist to index finger pointing.

Waiving Left: Fingers extended and waiving left.

Waiving Right: Fingers extended and waiving right.

HMM’s being a learning based models are bound to converge to a gesture interpretation even if the gesture does not imply anything. In order to prevent this, additional state called the wait state is introduced which, is the node state and there is an equal transition probability to all gesture models and itself. As an observation is made, the probability of being in the state is updated for all existing states and is normalized to 1. This model ensures that for all non-identified gestures, probability of being in the wait state is “maximum”. Thus, all unwanted gestures are eliminated from being recognized as one of the selected gestures. In case of correct gestures, the model which represents that gesture, would yield the highest probability and hence would be selected as the interpretation of the gesture.

Discussion:

This paper presents a beautiful approach of using the hand gestures as the mode of communicating with the mobile robot. We humans use hand for communication when words fail to convey the meaning and augmenting the same approach, authors have developed a beautiful way of controlling the actions of the robot. Though, it is controlled environment communication, I consider it as a good step towards more complex systems. I believe, instead of using the ccd camera for the geo-location purposes, GPS can be used along with the magnet and gyroscope on board to convey the geographical and orientation information to the server which can communicate with the gesture interpretation server. This will give more mobility to the robot as well independence from the controlled environment. Also, onboard stereo cameras, along with the IR and ultrasonic sensors can be used for controlling the local motion of the robot. I believe, additional joystick can be used with the other hand to switch between the two modes when required and also to control the orientation of proposed on board stereo cameras.

4 comments:

Paul Taele said...

Meh, I'm a bit iffy on the idea of an additional joystick for switching between the local and global control modes. One of the desired challenges I believed the author was going for was the seamless integration of hand gesture recognition for mobile robot control. That seamlessness goes away when explicitly switching between the two modes with a traditional remote control device. It also doesn't solve the problem of improving the accuracy of pure hand gesture recognition for this particular domain. We instead get a hybrid solution, which would be nice as well, but probably not the author's original goal.

Also, the GPS idea is an interesting alternative for geo-location. It's been awhile, but is the commercial range of GPS still five feet? For a wide outdoor environment, GPS would be viable. For closed areas like a section of the factory floor or a classroom environment, I don't know if GPS would be any better.

Brandon said...

i appreciate seeing all the comments about the hardware aspects of the setup. i was mainly concerned about the HMM part of the paper. if i knew more about hardware issues i probably would have more to say about your comment. :/

Grandmaster Mash said...

If you're going to give the robot a billion sensors you might as well let it think for itself and not worry about small commands like "go" and "stop".

I think the larger issue is communicating intention through gestures. For instance, if I was trying to control an army of roombas, I would want to point them in a direction to indicate "clean there", and then the robots themselves would learn where to go based on my vague gesture.

Test said...

How would the joystick by used? If you are going to use a joystick to control the camera locations, why not just use it to control the movement?