Translation of sign language to speech

Translation of sign language to speech

Michael Abd-El-Malak

Student ID: 12045132

Supervisor: Iain Murray

Presentation outline The requirements of translating level 2 ASL

Problem outline and project objectives

The proposed system

Achievements and results

Conclusion

Level 2 ASL: Hand position and roll

In many cases hand movement and roll are necessary for correct word disambiguation.

Example: Roll required to differentiate between ‘weigh’ and ‘balance’

Level 2 ASL: Facial feature extraction

ASL to English translation is a very difficult task.

Facial expressions and head movement convey lots of added information. Moving eyebrows up at the end of a phrase indicates a

question. Shaking the head implies the opposite.

By using facial feature tracking shorter phrases can be used.

The problem

To track the hand with 4 degrees of freedom x, y & z coordinates Hand roll

To extract the positions of the facial features to facilitate feature tracking Pupils Eyebrows

Project Objectives Develop a system that can simultaneously track

the hand(4 DOF) and facial features which is:

As non obtrusive as possible

Portable

Real time

Robust

Cost efficient

Proposed System Based on the input from a 720 x 540 CCD

video camera. Uses a glove with infrared LEDs to track the

hand. Uses Neural networks to find the facial features.

Tracking System IR LED controllerHead locator

Position list

Tracking the hand: Overview

The hand position is tracked through the use of infrared LEDs which are used as optical trackers.

Six IR LEDs are required for each hand.

The IR LEDs are sequentially turned on and off in synchronisation with the camera.

Single detection cycle requires 8 frames.

Tracking the hand: Example

LED On

Tracking the hand: Problems faced

Infrared LEDs have small viewing angles. LED intensity can be very weak. Requires good noise suppression.

People don’t remain still, even if they’re trying to. Results in a ghostly outline of the person appearing in the difference

image. Can be removed using differences from other frames. Low pass filtering

Reflective items such as glasses can look very similar to LEDs turned on. Use a proximity filter to remove outlier points.

Not enough information to obtain z location. Requires stereo vision.

Facial feature extraction: Overview

Detection is done via neural networks.Non-linear two layer perceptron structure used.Most commonly used structure for pattern

recognition.

256 inputs are used which are obtained using a log-polar mapping around the sampling position.

Facial feature extraction: Problems faced

Number of inputs 256 128 is too undependable

Size of the sampling structure ≈ face size → Predicts the feature position ≈ eye → Simply looks for dark spots ≈ eye separation → Best results

Facial feature extraction: sampling example

Results: Performance summary Hand tracking system

Position and pitch determined in 3 secondsX,Y coordinate is accurate to within 20 pixelsPitch is accurate to ± 10 degrees

Facial feature extraction systemEyes, eyebrows and left eye edge located in

20 secondsGenerally all features are located with an

accuracy of ± 20 pixels (720 x 540)

Results: Hand tracking system

Results: Facial feature localisation

PupilLeft corner of eye

Eyebrow

Ideal conditions

Results: Hand tracking system

Out of focus image Reflection on the pupil

Pupils and right brow slightly off


Detection with an angled face


Matches each side correctly

Summary and future work

A system was developed that is capable of tracking the hand position and roll, extract the position of the eyes & eyebrows.

Further work is required:Add more features such as lipsTo implement stereovisionTo further optimize the algorithmsTo implement the whole system in hardware

Thank you for your attentionAny questions?

“The best way to predict the future is to invent it”Alan Kay

Artificial neural networks in 60 seconds

Simple feed-forward network, no recurrence Choice of transfer function critical Above is an example of a perceptron structure Statistical template is embedded in weights

Inputs

Output∑ TF

x1

x2

x3

xn

Bias

W1

Wn

W2

W3

Hidden layer

Translation of sign language to speech

Documents

Transcript of Translation of sign language to speech