Towards a Smart Wearable Tool to Enable People with SSPI to … · •A system to enable people...

58
Towards a Smart Wearable Tool to Enable People with SSPI to Communicate by Sentence Fragments https://www.youtube.com/watch?v=qzPAY4f8Wc8 András Németh (presenter, constructor, ELTE) on movie Anita Verő (natural language processing, ELTE) on movie András Sárkány (natural language processing, ELTE) on movie Gyula Vörös (natural language processing, ELTE) on movie Brigitta Miksztay-Réthey (special need expert, ELTE) Takumi Toyama (eye-tracking solution, DFKI) Daniel Sonntag (head of the German group, DFKI) András Lőrincz (project leader, ELTE) Challenge Handicap et Technologie 2014, May 26-27, Lille, France

Transcript of Towards a Smart Wearable Tool to Enable People with SSPI to … · •A system to enable people...

Towards a Smart Wearable Tool to Enable

People with SSPI to

Communicate by Sentence Fragments https://www.youtube.com/watch?v=qzPAY4f8Wc8

András Németh (presenter, constructor, ELTE) – on movie

Anita Verő (natural language processing, ELTE) – on movie

András Sárkány (natural language processing, ELTE) – on movie

Gyula Vörös (natural language processing, ELTE) – on movie

Brigitta Miksztay-Réthey (special need expert, ELTE)

Takumi Toyama (eye-tracking solution, DFKI)

Daniel Sonntag (head of the German group, DFKI)

András Lőrincz (project leader, ELTE)

Challenge Handicap et Technologie

2014, May 26-27, Lille, France

Motivation

• The ability to communicate with others is of paramount importance for mental

well-being.

• SSPI: Severe speech and physical impairments / communication disorders

– Cerebral palsy, stroke, ALS/motor neurone disease (MND), muscle spacticity

etc.

• People with SSPI face enormous challenges in daily life activities.

• A person who cannot speak may be forced to only communicate directly to

his closest relatives, thereby completely relying on them for interacting with

the external world.

• Understanding people using traditional alternative and augmentative

communication (AAC) methods (such as gestures and communication

boards) requires training.

• These methods restrict possible communication partners to those who are

already familiar with AAC.

– e.g., shopping, in the bakery

• Understanding people with SSPI requires

practice

– They use special communication methods

Motivation

• The ability to communicate with others is of paramount importance for mental

well-being.

• In AAC, utterances consisting of multiple symbols are often telegraphic: they

are unlike natural sentences, often missing words to allow for successful and

detailed communication. (“Water now!” instead of “I would appreciate a glass of water,

immediately.”)

– Some systems allow users to produce whole utterances or sentences

that consist of multiple words. The main task of the AAC system is to

store and retrieve such utterances.

– However, using a predefined set of sentences severely restricts the

applicability.

• Other approaches allow for a generation of utterances from an unordered,

incomplete set of words, but they use predefined rules that constrain

communication.

– e.g., shopping, in the bakery

• Understanding people with SSPI requires

practice

– They use special communication methods

4

Augmented Reality in Medicine

Video: https://dl.dropboxusercontent.com/u/48051165/ERmed_CHI2013video.mp4

System Architecture: Mobile Web and App Design

XML-RPC ERmed Proxy

Ermed Bridge

TCP

Speech-based interaction

HMD & Gaze-based interaction

Pen-based interaction

Goals of presented work

• To enable broad accessibility and

communication possibilities for people with

SSPI

• Technical Challenges:

– Overcome physical impairments (for user

input)

– Help non-trained people to understand people

with SSPI (towards generation)

7

Approach

• The most effective way for people to communicate would be spontaneous

language generation.

• Novel utterance generation: the ability to say “almost everything,” without a

strictly predefined set of possible utterances/generations/productions.

• We attempt to give people with SSPI the ability to say almost anything. For

this reason, we would like to build a general system that produces novel

utterances without predefined rules. We chose a data-driven approach in the

form of statistical language modeling.

• In some conditions (e.g., cerebral palsy), people suffer from communication

and very severe movement disorders at the same time. For them, special

peripherals are necessary. Eye tracking provides a promising alternative to

people who cannot use their hands.

8

Fourfold solution

(1) Smart glasses with gaze trackers (thereby

extending ) (gaze tracking and

interpretation/graphical symbol selection)

(2) Symbol / Utterance selection

(3) Language generation (sentence generation,

thereby using language models to propose best

hypotheses)

(4) Text-To-Speech functionality (TTS)

• Eye tracking glasses (ETG)

– Forward-looking camera

– Eye-tracking cameras

• Head mounted display (HMD)

– See-through

• Motion Processing Unit (MPU)

– Capture head gestures that indicate

recalibration need

Hardware components

10

Setup: Moverio Glass with Display

Inertial Motion Unit

WheelPhone

WheelPhone + Smart Phone + Constructor

Setup with eye-tracking

Eye Tracking Glasses by

SensoMotoric Instruments GmbH

AiRScouter Head Mounted Display

Brother Industries

MPU-9150 Motion Processing Unit

InvenSense Inc.

In the experiments

12

Communication Board

Main functions

• Symbol selection with gaze tracking

– Calibration is crucial and poses a problem

– Adapt or recalibrate?

• Utterance generation

– Based on selected symbols

– Uses natural language models

Usage scenario and steps

15

Experiment 1

Retina HMD Setup

Symbol selection on HMD

• Idea: the user selects communication

symbols on the HMD, using eye tracking

• Crucial problems with calibration

– Eye tracking has to be calibrated

– Calibration may degrade over time

– Adapt or recalibrate?

• User might tolerate calibration errors (when their

are small and negligible)

• User should be able to initiate recalibration (head

gesture)

Symbol selection on HMD

Symbol selection game on HMD

• Goal: select the numbers in correct order (1234123…)

• Each selection adds a small amount of error

• Participants indicate when error is too significant to be

tolerated

• 4 participants, 4 experiments each

• On average, errors up to 2.7° deviation from actual eye

focus are tolerable.

Real communication experiments

• The Brother Retina HMD was too small to show a full-

sized communication board (CB) (although resolution is

quite good)

• We use a projector / beamer to display the CB

– Simulate a large HMD

• Recognition Feedback is provided (to the user)

– Estimated gaze position + selection

• Speech generation

– word for the selected symbol synthesized

20

Experiment 2

Projector as HMD Surrogate

Real patient(s)

21

https://www.youtube.com/watch?v=C4DOmMtgqz0

Participant

• The participant of this example test is a 30 year old man

with cerebral palsy.

• He usually communicates with a headstick and an

alphabetical communication board, or with a PC-based

virtual keyboard controlled by head tracking.

• Mousense is already an improvement.

Bliss symbols and words (in Hungarian) on

board

23

tea, two, sugar

(tea, kettő, cukor)

tea, lemon, sugar

(tea, citrom, cukor)

one, glass, soda

(egy, pohár, kóla)

Communication Setting

• The participant could move his head

• The head position must be accounted for (MPU)

• Fiducial markers around the board are recognized by

vision-based pattern recognition techniques

25

Usage and HCI details

• The estimated gaze position was indicated as a small red

circle on the projected board (similar to the previous test).

• A symbol was selected by keeping the red circle on it for

two seconds.

• The eye tracker sometimes needs recalibration;

– the user could initiate recalibration by raising his

head up (detected by the MPU.)

– Once the recalibration process was triggered, a

distinct sound was played, and an arrow indicated

where the participant had to look for doing RC.

Communication Setup

Communication experiments

• Goal: communicate with a partner

• Two situations (with different boards)

– Buying food

– Discussing an appointment

Experimental results

• Verification

– To verify that communication was successful, the participant indicated misunderstandings using well-known yes-no gestures, which were quick and reliable. Moreover, a certified expert in AAC was present to indicate apparent communication problems.

– 205 symbol selections happened

– 23 of them were incorrect

• 89% accuracy

– The error rate was acceptable

• Real communication took place!

29

Experiment 3

External Symbols (towards mixed

reality setup)

External symbols

• Idea

– Not all symbols are present in the system

– Optical character recognition can help

– (Object recognition can help)

• Example

– The user wants to buy a certain type of

sandwich

– In the store, there are labels with the names of

the sandwich types on it

• The OCR was simulated

External symbols - Communication Setting

32

33

Results

• Wizard-of-Oz experiment for OCR

• Similar “good” results as in experiment 2

34

Technical input processing

and sentence generation

methods

35

Sentence fragment generation

• A word guessing game

– Good afternoon, how are

– I apologize for being late, I am very

– My favorite OS is

• A symbol guessing game

• LM needs to help

– To select words with the right sense (disambiguate homonyms: e.g., river

bank versus money bank)

– To select hypothesis where graphical symbols (words) are ‘in the

right place’

– To increase cohesion between words (agreement)

you?

sorry!

[Linux, Mac OS, Windows XP, Win CE].

Sentence fragment generation

• Understanding symbol communication requires practice

• Symbol communication is non-syntactical – Function words are rarely used

– Order of symbols may vary

• e.g., {lemon, sugar, tea} -> tea with lemon and sugar

• Idea – Generate fragments by inserting function words

– Rank them based on language models

– User should select from top 4 variants

37

Language Models

• Estimate the probabilities of n-grams (sequences of

words)

• P(tea with sugar) > P(tea the sugar)

– Use a corpus (collection of texts)

• Sparsity problem

– Long n-grams tend to be rare

– Smoothing and backoff methods are used to deal with

this problem

Stupid backoff

• Let denote a string of L tokens of a fixed vocabulary

• approximation reflects the Markov assumption that only the most recent n-1 tokens are relevant when predicting the next word.

• For any substring wij of w

1L let denote the frequency of occurrence of that

substring in the longer training data. The maximum likelihood probability estimates for the n-grams are given by their relative frequencies

• Problematic because (de-)nominator can be zero.

• Appropriate conditions are needed on the r.h.s.

• Noisy: estimates need smoothing

Language corpora in use

• Google Books n-gram corpus

– Collection of digitized books from Google

– Very large, freely available

– Represents written language

• OpenSubtitles corpus

– Collection of film and TV subtitles

– Moderate size, freely available

– Represents spoken language

Language modeling tools used

• Google Books n-gram corpus – Software: BerkeleyLM

• Pauls, A., Klein, D.: Faster and Smaller N-Gram Language Models. In: 49th Annual Meeting of the ACL: Human Language Technologies, Vol. 1, pp. 258--267. ACL, Stroudsburg, USA (2011)

– Method: Stupid Backoff • Brants, T., Popat, A. C., Xu, P., Och, F. J., Dean, J.: Large language models in machine translation. In:

EMNLP 2007, pp. 858--867.

• OpenSubtitles corpus – Software: KenLM

• Heafield, K.: KenLM: Faster and Smaller Language Model Queries. In: EMNLP 2011 Sixth Workshop on Statistical Machine Translation, pp. 187--197. ACL, Stroudsburg, USA (2011)

– Method: Modified Kneser-Ney smoothing • Heafield, K., Pouzyrevsky, I., Clark, J. H., Koehn, P.: Scalable Modified Kneser-Ney Language

Model Estimation. In: 51st Annual Meeting of the Association for Computational Linguistics, pp. 690--696. Curran Associates, Inc., New York, USA (2013)

Prefix tree building algorithm (1)

• Representation – Work with a prefix tree, where each path represents an n-gram

• Input – Set of named entities (e.g., tea, sugars)

– Set of function words (e.g., you, with, for)

– Desired length of sentence fragment (e.g., 3 words)

• Parameters – Score threshold (minimal score, e.g., 10-30)

– Leaf limit (maximal number of open leafs, e.g., 200)

• Output – Tree (or a list) of potential sentence fragments

• Contain all the important words

• Ordered by estimated probability (score)

Prefix tree building algorithm (2)

• Essentially a breadth-first search, with some constraints

• Start with an empty tree, root node is open

• While there is an open node:

– Extend all open leafs with all available words, if the resulting n-gram’s “score” is above a certain threshold

– Discard every open leaf node that cannot be extended (falls below threshold)

– Close every open leaf except the highest scoring ones (leaf limit)

Initial state

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Generated text fragments examples

Conclusion on NLP

• A system to enable people with SSPI to

communicate with natural language

sentences

• Demonstrated the feasibility of the

approach

Our contribution

• An interaction system to reduce communication barriers for people with

severe speech and physical impairments (SSPI) such as cerebral palsy.

• The system consists of two main components: (i) the head-mounted human-

computer interaction (HCI) part consisting of smart glasses with gaze trackers

and text-to-speech functionality (which implement a communication board

and the selection tool), and (ii) a natural language processing pipeline in the

backend in order to generate complete sentences from the symbols on the

board.

• We developed the components to provide a smooth interaction between the

user and the system thereby including gaze tracking, symbol selection,

symbol recognition, and sentence generation.

• Our results suggest that such systems can dramatically increase

communication efficiency of people with SSPI.

53

Eye Tracking Requirements

• Eye gaze is a compelling interaction

modality but requires user calibration

before interaction can commence.

• State of the art procedures require the user

to fixate on a succession of calibration

markers, a task that is often experienced

as difficult and tedious.

54

Possible improvements and outlook • 3D gaze tracker that estimates the part of 3D space observed by the user

• OCR, signs and object recognition methods to

convert “symbols” to the communication board

• new HMDs and eye tracker setups (ergonomic)

• Advanced NLP to transform series of symbols to whole domain sentences

within the context

• integrated calibration: e.g., https://www.andreas-

bulling.de/fileadmin/docs/pfeuffer13_uist.pdf

• personalisation according to patient record

• CPS integration

55

http://getnarrative.com/ The Narrative Clip is a tiny, automatic camera and app

that gives you a searchable and shareable photographic

memory.

56

57

https://www.spaceglasses.com

Thank you for your attention!