Stanford Wii Project

Exploring Gesture Based

Interfaces using Wii Remotes and IR Lights

Jessica Areias Forbes Yevgeniy Goldenberg

Jeffrey Maqsoudi Ashson Mirza Ritvik Mudur

ECSE 475 – Design Project 2

Presented to: Kenneth Fraser – Coordinator Frank P. Ferrie – Supervisor

April 14, 2009

McGill University - Electrical & Computer Engineering Department

Page | II

ABSTRACT

The purpose of this project is to explore and develop an API that supports simple hand

gestures. The hardware used includes two Wii remotes, a Bluetooth adapter, and infrared

lights. The Wii remote acts as an infrared camera. With a glove having infrared light emitters on

the fingertips, the setup will allow the API to distinguish gestures such as pinching fingers,

turning hands, and so forth. The goal is to use positional information and temporal trajectories

to track gestures.

Page | III

ACKNOWLEDGEMENTS

We would like to thank our supervisor Professor Frank Ferrie for his continued guidance,

support and encouragement. We would also like to thank Professor Joelle Pineau for her advice

and guidance in implementing our neural network. We would also like take this opportunity to

thank Professor Kenneth Fraser, our course coordinator, for his moral support and attendance

at our presentation.

Page | IV

TABLE OF CONTENTS

ABSTRACT ........................................................................................................................................ II

ACKNOWLEDGEMENTS .................................................................................................................. III

TABLE OF CONTENTS...................................................................................................................... IV

1 INTRODUCTION........................................................................................................................ 1

1.1 Why Wii? .......................................................................................................................... 1

1.2 Design Goals ..................................................................................................................... 1

2 Building the System ................................................................................................................. 1

2.1 Setup................................................................................................................................. 2

2.2 The Glove .......................................................................................................................... 2

2.3 The API.............................................................................................................................. 4

2.3.1 Code Structure and Dataflow.................................................................................... 4

2.3.2 Supported Functionality............................................................................................ 5

2.3.3 Gesture Detection ..................................................................................................... 7

2.3.4 Neural Networks ....................................................................................................... 8

3 Sample Applications .............................................................................................................. 12

3.1 Pong ................................................................................................................................ 12

3.2 Space Invaders................................................................................................................ 13

3.3 Paint................................................................................................................................ 14

3.4 3D Modeling ................................................................................................................... 14

3.5 Image Viewer.................................................................................................................. 15

4 Quantifying the System ......................................................................................................... 16

Page | V

4.1 IR light tracking ............................................................................................................... 16

4.2 Gesture recognition performance.................................................................................. 17

5 Limitations ............................................................................................................................. 18

6 Future Improvements ............................................................................................................ 19

7 Conclusion ............................................................................................................................. 20

REFERENCES .................................................................................................................................. 21

Page | 1

1 INTRODUCTION

The main goal of this design project is to replace mouse functions with hand gestures, which

can be more natural and intuitive to use. To accomplish this, an API was created to recognize

basic gestures.

1.1 Why Wii?

The design uses two Wii remotes to track the location of two to four points of light. This

technology was initially chosen due to the resources available online. The open-source C#

library and the several demonstrations using this library greatly helped in the jump start of this

project. The required equipment is also easily accessible and well priced. The main equipment

needed consists of two Wii remotes, a Bluetooth adapter, and IR lights.

1.2 Design Goals

The project consists of building an API that:

Has a set of predetermined functions capable of recognizing basic gestures

Is built in C#

Can be used to implement interfaces for several applications such as:

o Games such as Pong and Space Invaders

o Image Viewer

o 3D Modeling Tool

2 Building the System

Our project contains both software and hardware aspects. This section describes in detail the

design of each component.

Page | 2

2.1 Setup

The setup includes two Wii remotes placed in parallel adjacent to each other. The user wears a

glove with infra red LEDs on the index finger and thumb. The glove must be approximately one

foot away from the Wiii remotes.

Figure 1. Setup

2.2 The Glove

The first step towards building the glove was to build and test a circuit which would meet the

specifications of the LEDs. Namely, the LEDs required a voltage of approximately 1.5 volts

across them and a current in the range of 60-120 mA. Resistors were placed in series with the

LEDs (as shown in Figure 2) in order to set the cut-in voltage of the diodes at the appropriate

Page | 3

level, as well as prevent the LEDs from burning out. After building this circuit on a breadboard

and applying 3.0 V as the source, we found that using 22 ohm resistors achieved the

specifications outlined above. Therefore, in order to build the device, we needed a glove, 2

LEDs, some wires, two AA battery, a battery holder, some electrical tape and two 10 ohm

resistors. The device can be seen in Figure 3. In order to use a single 1.5 V battery, the circuit

requires eliminating any resistance. The setup works but there is the risk of the LEDs burning

out and hence the below design was chosen.

R = 22 Ω

1.5 V

1.5 V R = 22 Ω

Figure 2. Glove circuit

Figure 3. IR Glove

Page | 4

2.3 The API

The main tools used for the software component were Visual Studio 2008 and a C#

programming language which allowed the rapid development of prototypes and easily test

them. Using an object oriented design, the system was divided into separate modules. In

addition, the design was made as efficient as possible since the system must be capable of

detecting gestures in real time.

2.3.1 Code Structure and Dataflow

The central component of the system is the WiimoteLib library; an open source library written

in C#. The library handles the task of communicating with each Wii remote and provides the

system with current IR coordinates. The system has a special data structure called PosData

which is a circular buffer capable of holding the last 100 IR points for each Wii remote and for

each sensor. PosData contains an array and an index pointing to the oldest point. Every time a

new point is added, it overwrites the oldest value and updates the index. The MainLib object

then updates the Windows cursor, attempts to detect gestures using various techniques and

notifies the application if a gesture match is found. Alternatively, the application can poll

MainLib for changes.

Page | 5

Figure 4. Class Diagram

Figure 5. Data Flow Diagram

2.3.2 Supported Functionality

The first step of the design consisted of finding the most efficient way of tracking points of light.

Both IR lights and reflective tape were experimented with. IR lights seemed to be the best

option since reflective tape only works well under certain conditions. In fact, reflective tape

usually works best in a dimmed environment since other sources of light reflect off of it, which

interferes with the Wii’s ability to track a specific point of light.

Page | 6

Note that the most basic gestures were implemented before attempting the more

complex ones. Basic gestures include pinching and scrolling, while the more advanced gestures

include zooming in and out, as well as other gestures that differ in their execution from one

person to another.

2.3.2.1 Pinch

The pinch is the equivalent to a mouse click and can be used to select an object or initialize a

more complex gesture.

To detect a pinch a certain delta was chosen to represent the distance between both IR

lights when attempting to pinch. Once the distance achieved, a pinch is detected.

Through experimentation, it was found that, depending on the orientation of a user’s hand

and of the IR lights, one of the IR lights was lost by the Wii in the attempt of a pinch. As a result,

the pinch was undetected. To resolve this problem, an Almost Pinch state was created. The

Almost Pinch state requires a distance slightly above the delta of a Pinch. If one of the IR lights

is lost right after having entered the Almost Pinch state, a pinch is detected.

2.3.2.2 Scroll

Scrolling is the equivalent to using a mouse wheel and can be used to scroll up, down, left, and

to the right of an image.

To detect a scroll, the cursor, which maps the location of the IR light, must be in a

specific region of the screen or image. To scroll left or right, for example, the cursor must be

detected on the far left or right of the image respectively. Similarly, to scroll up or down an

Page | 7

image, the cursor must be located at the upmost or downmost region of the image. Once the

specific condition detected, the image is scrolled in the corresponding direction.

2.3.2.3 Zoom

The zoom gesture can be used to zoom in and out of an image.

To detect a zoom gesture, one must use two Wii remotes to track the changes in depth (z-

coordinate) of the point of light. The IR light must be moved towards the Wii remotes to zoom

in and away from them to zoom out.

To measure the depth, two Wii remotes must be placed in parallel as shown in Figure 1. The

distance between the Wii remotes must be preset. This allows the API to triangulate and

determine the depth of each IR light detected by both Wii remotes.

2.3.2.4 Gestures that differ

There exist more complex gestures such as drawing a circle, drawing an X to close a window, or

a combination of simple movements to execute other functions.

To detect such gestures, both Finite State Machines (FSM) and Neural Networks were

explored.

2.3.3 Gesture Detection

One way to detect gestures is to directly look at the coordinates of three equally spaced points

in the PosData array. For example, to detect the Up+Left gesture as in Figure 6, the application

verifies if the difference between the y coordinate of the Middle and Oldest points is greater

than the threshold and also checks if the difference in the x coordinate of the Newest and

Page | 8

Middle points is greater than the threshold. If both are true, the gesture is marked as detected

and the application is notified.

Using this approach, a single gesture can be detected many times because many

subsequent points can match the above criteria. To remedy this, once a gesture is detected, a

timer is started and gesture detection is halted until the timer expires.

Figure 6. Up+Left Gesture Detection

Another method of detecting gestures is using a Finite State Machine. This allows the

detection of more complicated gestures such as the X and the pinch.

The last method used was Neural Networks. This approach is explained in detail in the

following section.

2.3.4 Neural Networks

Neural networks present an interesting approach to achieving gesture recognition. It is a

paradigm that is heavily inspired by the way that the human brain processes information.

Similarly to the case of a human attempting to learn a new gesture, learning is done through

examples and learning from errors. Each time a person tries to execute a gesture during the

Page | 9

learning process, they attempt to minimize the error when comparing it to the ideal gesture. In

theory, no human can draw a perfect circle, but it is still possible to draw a shape that most

people would consider as being a circle.

In addition to being an intuitive way of recognizing new gestures, neural networks also

offer other advantages. First, since neural networks receive many trials as its input, somewhere

in the range of 10000 data sets, it is very good at recognizing patterns. As mentioned above, no

human can draw a perfect circle but most people draw circles with similar characteristics. Also,

neural networks are easy to implement because there is abundant literature on the subject.

Many have used this tool to recognize shapes and patterns. Finally, neural networks are very

fast and have a constant run-time. Training only needs to be done once at the beginning and

the same data is used for all subsequent gesture recognitions.

Figure 7. Neural Network

Page | 10

As it is shown in Figure 7, a neural network has several input layers. It may have some

middle layers and some target output layers. For the project’s design, middle layers were not

used since they are not useful for gesture recognition. The neural network used has many

inputs and one target output which is a perfect normalized circle. The inputs are sent across the

network with some random weights and are then compared to the output. At that point, the

error is calculated and the gradient descent is propagated back into the network to recalculate

the weights of the inputs. The best inputs will ultimately have the highest weights.

To recognize a circular motion, ten thousand circles consisting of 100 points each were

generated. Each circle has a random origin and radius length. After a circle is generated,

Gaussian noise is added to it to mimic the imperfections of a human gesture as it is seen in

Figure 8. These inputs are given a value of 1 since they are the expected outputs. With this

input set, the program was able to recognize circles but there were several false-positives.

When a long oval was drawn, it was being detected as being a circle. To resolve this issue,

unwanted inputs such as lines (as seen in Figure 9) were introduced. These inputs are given the

value 0 since they must not match the expected output.

Page | 11

Figure 8. Gaussian Circle

Figure 9. Gaussian Line

The neural network was first trained with 10000 desired inputs and 4000 unwanted

inputs. After the training, to ensure that good results are obtained, the trained neural network

is used on a separate validation set that was not used during the training. When a user uses the

glove to depict a motion, the software tracks the 100 last points at any given moment. These

100 points are sent to the network and a value is outputted. If this value is greater than a

Page | 12

certain threshold value, it is considered to be a circle. As previously mentioned, it is impossible

to draw a perfect circle, but if a value greater than 0.985 is outputted than it is a circle.

Obviously, the testing must also reject non-circles and this is the case because their output is

not greater than the threshold.

The final neural network is able to successfully recognize a circle 3.5/5.0 times and is

also capable of ignoring lines. It is unable to constantly recognize circles and reject false-

positives because our input set is quite limited due to the few gestures. This explains why

motions such as the half-circle are being identified as being a full-circle.

3 Sample Applications

This section provides a brief sample of the applications of our product such as playing games

such as Pong and Space Invaders, drawing using Paint, manipulate 3D objects using a 3D

Modeling tool, and viewing and scrolling images in an Image Viewer application.

3.1 Pong

This application binds the location of the paddle to the vertical position of the user’s hand. The

user can then move the IR light up and down and the paddle moves along with the user. The

point of this application was to verify the sensitivity of the sensor as well as the responsiveness

of the control.

Page | 13

Figure 10. Pong Screenshot

3.2 Space Invaders

This game consists of shooting the invaders that are on top of the screen by pinching. To avoid

the enemy’s attacks, the ship can move horizontally by moving the index finger.

Figure 11. Space Invaders Screenshot

Page | 14

3.3 Paint

Paint is a common application found in the Windows operating system. With a simple pinch, it

is possible to select a tool and then draw a picture.

Figure 12. Paint Screenshot

3.4 3D Modeling

By using the pinch movement, it is possible to select a vertex or edge and stretch the model.

Also, it is possible to change the angle of view by pinching on an open area and moving one’s

fingers.

Page | 15

Figure 13. 3D Modeling Tool Screenshot

3.5 Image Viewer

The Image Viewer application was coded for scratch as a prototype for new ideas and new

types of gestures. The application allows switching images by performing the Up+Right or

Up+Left gestures. The user can also zoom in or out by pinching fingers and moving the fingers

closer or further away from the Wii remotes. The application then zoom in or out

proportionally to the depth. Finally, the user can scroll the image by positioning the cursor on

the appropriate edge of the screen.

Page | 16

Figure 14. Image Viewer Screenshot showing the region where the cursor must be to scroll up the image

4 Quantifying the System

4.1 IR light tracking

An important metric to measure the performance of the system is its repeatability, which is the

main concern of this section. The repeatability of tracking infra-red lights was tested along the

horizontal and vertical axes of the Wii remote’s coordinate system, and the depth measure

provided by our API.

To test the horizontal axis, the Wii remote was placed in a fixed position and the glove

was used to draw a horizontal line (along a fixed trajectory) at various distances from the Wii

remote. The glove was held in place for approximately one second at preset positions. This

approach allowed us to estimate the variation in our readings. For the vertical axis, the Wii

remote was simply placed on its side and the above test was repeated.

The repeatability of the depth measure provided by our API was measured in a similar

fashion. Two Wii remotes were placed in parallel at fixed distance (in a stereo setup). Without

Page | 17

moving the Wii remotes, the glove’s IR light was held in place at three distinct distances from

the remotes over five trials.

The accuracy of the depth measure and the field of view provided by the Wii remotes

was also measured. All trials were performed by the same user. The results of all tests are

summarized below.

Table 1a. Variation (maximum) measured when testing the repeatability of our system.

Attribute Tested Order of variation

15 cm 20 cm 30 cm

IR Tracking - Horizontal Axis 10-2 units 10-2 units 10-3 units

IR Tracking - Vertical Axis 10-2 units 10-2 units 10-3 units

Depth measure 10-1 cm 10-1 cm 10-1 cm

Table 1b. Quantified measure of some other features

Feature Property

Field of View (Horizontal) 22 degrees

Field of View (Vertical) 23 degrees

Depth measure accuracy 0.5 cm

The variation on the horizontal and vertical axis was measured in units of raw data

provided by the Wii remote. The order of variation seen in Table 1a maps from 1-10 pixels (10-3

to 10-2 units) for a 1024 x 768 resolution screen. Bearing in mind that a certain amount of error

is caused by a user’s hand when moving the glove, the results of repeatability on both depth

and infra-red light tracking are encouraging.

4.2 Gesture recognition performance

In order to test the performance of recognizing gestures, the success rate of two experienced

and two inexperienced users was used. Each user attempted each gesture 10 times and the

Page | 18

number of successes was counted. The inexperienced users were provided a brief tutorial in

using the system before their trials. A view of the IR lights being tracked was provided to all

users. The results of these tests are summarized in the table below.

Table 2. Gesture recognition results

Pinch UpRight UpLeft X Circle

Novice 1 9/10 9/10 8/10 8/10 7/10

Novice 2 10/10 9/10 10/10 8/10 5/10

Expert 1 10/10 10/10 10/10 9/10 8/10

Expert 2 10/10 10/10 10/10 10/10 7/10

From Table 2, it is apparent that the success rate drops for inexperienced users. One of the

main causes for missed gestures (for both classes of users) was the Wii remote losing track of

the IR lights on fingertips. The inexperienced users are not aware of how to orient their hands

to ensure that the IR lights are seen by the Wii remote’s camera, which is something learned

from experience.

5 Limitations

There are some limitations to our project and they will be described in this section. Firstly,

there are a limited amount of gestures that are properly detected. Users can only pinch and

move up/down or left/right, or some combination of these gestures. This limitation is mainly

due to the fact that we did not have enough time to implement more complex gestures. As

Page | 19

previously mentioned, neural networks have been used to recognize new gestures, but this

requires proper training and elaborating a wide range of gestures.

Secondly, it is difficult to continuously track the LEDs. This is mainly due to the brightness

of the LEDs and the surrounding environment. If the LEDs are not bright enough, it is difficult

for the Wii remote to detect them. A similar issue is distance. The Wii remote can only detect

the LEDs up to a certain distance, after which point they are too far away to be detected.

Another problem is the orientation of the LEDs with respect to the Wii remote. The

LEDs have to be directly pointed towards the Wii remote; otherwise the Wii remote will not be

able to detect the LEDs. Similarly, if the IR lights are out of the field of view of the Wii remote,

it obviously can no longer track the LEDs.

6 Future Improvements

One of the main hardware limitations is the unidirectional aspect of the IR light LEDs. A possible

improvement can be to use LEDs that are more omnidirectional and spread the light evenly in

all directions. Another solution can be to use multiple LEDs pointed in different directions. We

can also try surrounding each LED with a reflective material that will reflect the IR light when

the finger is pointed away from the Wii remote.

In order to improve the gesture recognition reliability, we can try training the neural

network with other types of inputs. For example, we can try using the cosine of each point

instead of the x and y coordinates to establish more unique features. We can also try sampling

the inputs. For example, instead of using every single point as an input, we can try using every

second point.

Page | 20

Another improvement would be to use a Kalman filter in order to smooth the

measurements of the IR position. The filter will help to get rid of the measurement noise as well

as the trembling in the user’s hand.

Finally, we can use other algorithms for gesture detection such as Support Vector Machines

(SVM) or the Hidden Markov Model (HMM). SVM is a supervised learning algorithm that is

often used for classification. In the Hidden Markov Model, the system is assumed to be a

Markov process where the hidden state is defined by the gesture that the user is performing.

7 Conclusion

Gesture recognition is gaining popularity in today’s world. Products such as the iPhone or the

Nintendo Wii are testaments to this growing trend. The purpose of this project was to explore

such an interface by designing a low-cost system that provides a range of functionality. This was

achieved by developing API that uses the Wii remote with an IR-light glove. The goal of

providing a variety of gestures for different types of applications was accomplished. Moreover,

the API also incorporates Neural Networks that can be trained to recognize various gestures.

Although the performance of the network trained for this project was not optimal, the

framework to support such functionality was implemented. The current version of the product

does have its share of limitations; however these can be addressed with additional

improvements. In the end, the system developed can be extended for use in various

applications, and with some further adjustments, it could be a solid product that might be

worth marketing.

Page | 21

REFERENCES

Research papers:

[1] K. Boehm, W. Broll, M. Sokolewicz, “Dynamic gesture recognition using neural networks: a

fundament for advanced interaction construction”, in Stereoscopic Displays and Virtual Reality Systems ,

Proc. SPIE, Vol. 2177, 336 (1994); DOI:10.1117/12.173889, San Jose, CA, USA, November, 2004.

[2] M. Black, A. Jepson, “Recognizing temporal trajectories using the Condensation algorithm,” In

Proceedings of the International Conference on Automatic Face and Gesture Recognition (Nara, Japan,

1998), pp. 16-21.

[3] Y. Yuan, K. Barner “Hybrid Feature Selection For Gesture Recognition Using Support Vector

Machines, ” IEEEXplore, Accessed: March 30, 2009

[4] Lee, Johnny Chung. Hacking the Nintendo Wii Remote, Pervasive Computing, IEEE, Volume: 7, Issue:

3, pp 39-45, July 15 2008 nteraction, Bonn, Germany, 2008.

[5] T. Schlomer et al., “Gesture Recognition with a Wii Controller”, in Proceedings of the 2nd international

conference on Tangible and embedded I

Stanford Wii Project

Documents

Transcript of Stanford Wii Project