Towards a Smart Wearable Tool to Enable People with SSPI to … · •A system to enable people...

Towards a Smart Wearable Tool to Enable

People with SSPI to

Communicate by Sentence Fragments https://www.youtube.com/watch?v=qzPAY4f8Wc8

András Németh (presenter, constructor, ELTE) – on movie

Anita Verő (natural language processing, ELTE) – on movie

András Sárkány (natural language processing, ELTE) – on movie

Gyula Vörös (natural language processing, ELTE) – on movie

Brigitta Miksztay-Réthey (special need expert, ELTE)

Takumi Toyama (eye-tracking solution, DFKI)

Daniel Sonntag (head of the German group, DFKI)

András Lőrincz (project leader, ELTE)

Challenge Handicap et Technologie

2014, May 26-27, Lille, France

https://www.youtube.com/watch?v=qzPAY4f8Wc8

Motivation

• The ability to communicate with others is of paramount importance for mental

well-being.

• SSPI: Severe speech and physical impairments / communication disorders

– Cerebral palsy, stroke, ALS/motor neurone disease (MND), muscle spacticity

etc.

• People with SSPI face enormous challenges in daily life activities.

• A person who cannot speak may be forced to only communicate directly to

his closest relatives, thereby completely relying on them for interacting with

the external world.

• Understanding people using traditional alternative and augmentative

communication (AAC) methods (such as gestures and communication

boards) requires training.

• These methods restrict possible communication partners to those who are

already familiar with AAC.

– e.g., shopping, in the bakery

• Understanding people with SSPI requires

practice

– They use special communication methods

Motivation

• The ability to communicate with others is of paramount importance for mental

well-being.

• In AAC, utterances consisting of multiple symbols are often telegraphic: they

are unlike natural sentences, often missing words to allow for successful and

detailed communication. (“Water now!” instead of “I would appreciate a glass of water,

immediately.”)

– Some systems allow users to produce whole utterances or sentences

that consist of multiple words. The main task of the AAC system is to

store and retrieve such utterances.

– However, using a predefined set of sentences severely restricts the

applicability.

• Other approaches allow for a generation of utterances from an unordered,

incomplete set of words, but they use predefined rules that constrain

communication.

– e.g., shopping, in the bakery

• Understanding people with SSPI requires

practice

– They use special communication methods

4

Augmented Reality in Medicine

Video: https://dl.dropboxusercontent.com/u/48051165/ERmed_CHI2013video.mp4

https://dl.dropboxusercontent.com/u/48051165/ERmed_CHI2013video.mp4

System Architecture: Mobile Web and App Design

XML-RPC ERmed Proxy

Ermed Bridge

TCP

Speech-based interaction

HMD & Gaze-based interaction

Pen-based interaction

Goals of presented work

• To enable broad accessibility and

communication possibilities for people with

SSPI

• Technical Challenges:

– Overcome physical impairments (for user

input)

– Help non-trained people to understand people

with SSPI (towards generation)

7

Approach

• The most effective way for people to communicate would be spontaneous

language generation.

• Novel utterance generation: the ability to say “almost everything,” without a

strictly predefined set of possible utterances/generations/productions.

• We attempt to give people with SSPI the ability to say almost anything. For

this reason, we would like to build a general system that produces novel

utterances without predefined rules. We chose a data-driven approach in the

form of statistical language modeling.

• In some conditions (e.g., cerebral palsy), people suffer from communication

and very severe movement disorders at the same time. For them, special

peripherals are necessary. Eye tracking provides a promising alternative to

people who cannot use their hands.

8

Fourfold solution

(1) Smart glasses with gaze trackers (thereby

extending ) (gaze tracking and

interpretation/graphical symbol selection)

(2) Symbol / Utterance selection

(3) Language generation (sentence generation,

thereby using language models to propose best

hypotheses)

(4) Text-To-Speech functionality (TTS)

• Eye tracking glasses (ETG)

– Forward-looking camera

– Eye-tracking cameras

• Head mounted display (HMD)

– See-through

• Motion Processing Unit (MPU)

– Capture head gestures that indicate

recalibration need

Hardware components

10

Setup: Moverio Glass with Display

Inertial Motion Unit

WheelPhone

WheelPhone + Smart Phone + Constructor

Setup with eye-tracking

Eye Tracking Glasses by

SensoMotoric Instruments GmbH

AiRScouter Head Mounted Display

Brother Industries

MPU-9150 Motion Processing Unit

InvenSense Inc.

In the experiments

12

Communication Board

Main functions

• Symbol selection with gaze tracking

– Calibration is crucial and poses a problem

– Adapt or recalibrate?

• Utterance generation

– Based on selected symbols

– Uses natural language models

Usage scenario and steps

15

Experiment 1

Retina HMD Setup

Symbol selection on HMD

• Idea: the user selects communication

symbols on the HMD, using eye tracking

• Crucial problems with calibration

– Eye tracking has to be calibrated

– Calibration may degrade over time

– Adapt or recalibrate?

• User might tolerate calibration errors (when their

are small and negligible)

• User should be able to initiate recalibration (head

gesture)

Symbol selection on HMD

Symbol selection game on HMD

• Goal: select the numbers in correct order (1234123…)

• Each selection adds a small amount of error

• Participants indicate when error is too significant to be

tolerated

• 4 participants, 4 experiments each

• On average, errors up to 2.7° deviation from actual eye

focus are tolerable.

Real communication experiments

• The Brother Retina HMD was too small to show a full-

sized communication board (CB) (although resolution is

quite good)

• We use a projector / beamer to display the CB

– Simulate a large HMD

• Recognition Feedback is provided (to the user)

– Estimated gaze position + selection

• Speech generation

– word for the selected symbol synthesized

20

Experiment 2

Projector as HMD Surrogate

Real patient(s)

21

https://www.youtube.com/watch?v=C4DOmMtgqz0

Participant

• The participant of this example test is a 30 year old man

with cerebral palsy.

• He usually communicates with a headstick and an

alphabetical communication board, or with a PC-based

virtual keyboard controlled by head tracking.

• Mousense is already an improvement.

https://www.youtube.com/watch?v=C4DOmMtgqz0

Bliss symbols and words (in Hungarian) on

board

23

tea, two, sugar

(tea, kettő, cukor)

tea, lemon, sugar

(tea, citrom, cukor)

one, glass, soda

(egy, pohár, kóla)

Communication Setting

• The participant could move his head

• The head position must be accounted for (MPU)

• Fiducial markers around the board are recognized by

vision-based pattern recognition techniques

25

Usage and HCI details

• The estimated gaze position was indicated as a small red

circle on the projected board (similar to the previous test).

• A symbol was selected by keeping the red circle on it for

two seconds.

• The eye tracker sometimes needs recalibration;

– the user could initiate recalibration by raising his

head up (detected by the MPU.)

– Once the recalibration process was triggered, a

distinct sound was played, and an arrow indicated

where the participant had to look for doing RC.

Communication Setup

Communication experiments

• Goal: communicate with a partner

• Two situations (with different boards)

– Buying food

– Discussing an appointment

Experimental results

• Verification

– To verify that communication was successful, the participant indicated misunderstandings using well-known yes-no gestures, which were quick and reliable. Moreover, a certified expert in AAC was present to indicate apparent communication problems.

– 205 symbol selections happened

– 23 of them were incorrect

• 89% accuracy

– The error rate was acceptable

• Real communication took place!

29

Experiment 3

External Symbols (towards mixed

reality setup)

External symbols

• Idea

– Not all symbols are present in the system

– Optical character recognition can help

– (Object recognition can help)

• Example

– The user wants to buy a certain type of

sandwich

– In the store, there are labels with the names of

the sandwich types on it

• The OCR was simulated

External symbols - Communication Setting

33

Results

• Wizard-of-Oz experiment for OCR

• Similar “good” results as in experiment 2

34

Technical input processing

and sentence generation

methods

35

Sentence fragment generation

• A word guessing game

– Good afternoon, how are

– I apologize for being late, I am very

– My favorite OS is

• A symbol guessing game

• LM needs to help

– To select words with the right sense (disambiguate homonyms: e.g., river

bank versus money bank)

– To select hypothesis where graphical symbols (words) are ‘in the

right place’

– To increase cohesion between words (agreement)

you?

sorry!

[Linux, Mac OS, Windows XP, Win CE].

Sentence fragment generation

• Understanding symbol communication requires practice

• Symbol communication is non-syntactical – Function words are rarely used

– Order of symbols may vary

• e.g., {lemon, sugar, tea} -> tea with lemon and sugar

• Idea – Generate fragments by inserting function words

– Rank them based on language models

– User should select from top 4 variants

37

Language Models

• Estimate the probabilities of n-grams (sequences of

words)

• P(tea with sugar) > P(tea the sugar)

– Use a corpus (collection of texts)

• Sparsity problem

– Long n-grams tend to be rare

– Smoothing and backoff methods are used to deal with

this problem

Stupid backoff

• Let denote a string of L tokens of a fixed vocabulary

• approximation reflects the Markov assumption that only the most recent n-1 tokens are relevant when predicting the next word.

• For any substring wij of w

1L let denote the frequency of occurrence of that

substring in the longer training data. The maximum likelihood probability estimates for the n-grams are given by their relative frequencies

•

• Problematic because (de-)nominator can be zero.

• Appropriate conditions are needed on the r.h.s.

• Noisy: estimates need smoothing

Language corpora in use

• Google Books n-gram corpus

– Collection of digitized books from Google

– Very large, freely available

– Represents written language

• OpenSubtitles corpus

– Collection of film and TV subtitles

– Moderate size, freely available

– Represents spoken language

Language modeling tools used

• Google Books n-gram corpus – Software: BerkeleyLM

• Pauls, A., Klein, D.: Faster and Smaller N-Gram Language Models. In: 49th Annual Meeting of the ACL: Human Language Technologies, Vol. 1, pp. 258--267. ACL, Stroudsburg, USA (2011)

– Method: Stupid Backoff • Brants, T., Popat, A. C., Xu, P., Och, F. J., Dean, J.: Large language models in machine translation. In:

EMNLP 2007, pp. 858--867.

• OpenSubtitles corpus – Software: KenLM

• Heafield, K.: KenLM: Faster and Smaller Language Model Queries. In: EMNLP 2011 Sixth Workshop on Statistical Machine Translation, pp. 187--197. ACL, Stroudsburg, USA (2011)

– Method: Modified Kneser-Ney smoothing • Heafield, K., Pouzyrevsky, I., Clark, J. H., Koehn, P.: Scalable Modified Kneser-Ney Language

Model Estimation. In: 51st Annual Meeting of the Association for Computational Linguistics, pp. 690--696. Curran Associates, Inc., New York, USA (2013)

Prefix tree building algorithm (1)

• Representation – Work with a prefix tree, where each path represents an n-gram

• Input – Set of named entities (e.g., tea, sugars)

– Set of function words (e.g., you, with, for)

– Desired length of sentence fragment (e.g., 3 words)

• Parameters – Score threshold (minimal score, e.g., 10-30)

– Leaf limit (maximal number of open leafs, e.g., 200)

• Output – Tree (or a list) of potential sentence fragments

• Contain all the important words

• Ordered by estimated probability (score)

Prefix tree building algorithm (2)

• Essentially a breadth-first search, with some constraints

• Start with an empty tree, root node is open

• While there is an open node:

– Extend all open leafs with all available words, if the resulting n-gram’s “score” is above a certain threshold

– Discard every open leaf node that cannot be extended (falls below threshold)

– Close every open leaf except the highest scoring ones (leaf limit)

Initial state

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Generated text fragments examples

Conclusion on NLP

• A system to enable people with SSPI to

communicate with natural language

sentences

• Demonstrated the feasibility of the

approach

Our contribution

• An interaction system to reduce communication barriers for people with

severe speech and physical impairments (SSPI) such as cerebral palsy.

• The system consists of two main components: (i) the head-mounted human-

computer interaction (HCI) part consisting of smart glasses with gaze trackers

and text-to-speech functionality (which implement a communication board

and the selection tool), and (ii) a natural language processing pipeline in the

backend in order to generate complete sentences from the symbols on the

board.

• We developed the components to provide a smooth interaction between the

user and the system thereby including gaze tracking, symbol selection,

symbol recognition, and sentence generation.

• Our results suggest that such systems can dramatically increase

communication efficiency of people with SSPI.

53

Eye Tracking Requirements

• Eye gaze is a compelling interaction

modality but requires user calibration

before interaction can commence.

• State of the art procedures require the user

to fixate on a succession of calibration

markers, a task that is often experienced

as difficult and tedious.

54

Possible improvements and outlook • 3D gaze tracker that estimates the part of 3D space observed by the user

• OCR, signs and object recognition methods to

convert “symbols” to the communication board

• new HMDs and eye tracker setups (ergonomic)

• Advanced NLP to transform series of symbols to whole domain sentences

within the context

• integrated calibration: e.g., https://www.andreas-

bulling.de/fileadmin/docs/pfeuffer13_uist.pdf

• personalisation according to patient record

• CPS integration

https://www.andreas-bulling.de/fileadmin/docs/pfeuffer13_uist.pdf



55

http://getnarrative.com/ The Narrative Clip is a tiny, automatic camera and app

that gives you a searchable and shareable photographic

memory.

57

https://www.spaceglasses.com

https://www.spaceglasses.com

Thank you for your attention!

Towards a Smart Wearable Tool to Enable People with SSPI to … · •A system to enable people...

Documents

Transcript of Towards a Smart Wearable Tool to Enable People with SSPI to … · •A system to enable people...