Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language...
-
Upload
janae-verdun -
Category
Documents
-
view
214 -
download
0
Transcript of Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language...
![Page 1: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/1.jpg)
Advances in Phonetics-based Sub-Unit Modeling for
Transcription, Alignment and Sign Language Recognition.
Vassilis Pitsikalis1, Stavros Theodorakis1, Christian Vogler2 and Petros Maragos1
1 School of Electrical and Computer Engineering, National Technical University of Athens2 Institute for Language and Speech Processing/Gallaudet University
Workshop on Gesture Recognition, June 20, 2011 1
![Page 2: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/2.jpg)
Overview1. Gestures, signs, and goals
2. Sign language data and visual processing
3. Data-Driven Sub-Units without Phonetic Evidence for Recognition
4. Phonetic modeling What is it? Annotations vs phonetics Conversion of annotations to structured phonetic description Training and alignment
5. Recognition experiments
6. Conclusions
Workshop on Gesture Recognition, June 20, 20112
![Page 3: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/3.jpg)
1. Gestures versus Signs
Gestures Isolated hand, body, and
facial movements Can be broken down into
primitives (but rarely are in gesture recognition work)
Few constraints, other than convention
Signs Hand body and facial
movements, both in isolation and as part of sentences
Can be broken down into primitives (cheremes/phonemes/phones)
Numerous phonetic, morphological, and syntactic constraints
Workshop on Gesture Recognition, June 20, 20113
![Page 4: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/4.jpg)
SL Recognition vs Gesture Recognition
Continuous SL recognition is invariably more complex than gestures, but: Isolated sign recognition (i.e. the forms found
in a dictionary) is essentially the same as gesture recognition
Methods that work well on isolated sign recognition should work well on gesture recognition
Exploit 30+ years of research into structure of signs
Workshop on Gesture Recognition, June 20, 20114
![Page 5: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/5.jpg)
Subunit Modeling
Two fundamentally different ways to break down signs into parts:Data-drivenPhonetics-based (i.e. linguistics)
Similar benefits:ScalabilityRobustnessReduce required training data
Workshop on Gesture Recognition, June 20, 20115
![Page 6: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/6.jpg)
Goals of this Presentation
Work with large vocabulary (1000 signs) Compare data-driven and phonetic
breakdown of signs into subunits Advance state of the field in phonetic
breakdown of signs
Workshop on Gesture Recognition, June 20, 20116
![Page 7: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/7.jpg)
2. Sign Language Data
Corpus of 1000 Greek Sign Language Lemmata 5 repetitions per sign Signer-dependent, 2 signers (only 1 used for this
paper) HD video, 25 fps interlaced
Tracking and feature extraction Pre-processing, Configuration, Statistics, Skin color
training
Workshop on Gesture Recognition, June 20, 20117
![Page 8: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/8.jpg)
Interlaced data and pre-processing
Workshop on Gesture Recognition, June 20, 20118
RefinedSkin color masks
De-interlacedInterlaced
2nd VersionFull Resolution, Frame rate
![Page 9: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/9.jpg)
Tracking Video, GSL Lemmas Corpus
9Workshop on Gesture Recognition, June 20, 2011
![Page 10: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/10.jpg)
3. Data-Driven Subunit Modeling
Extremely popular in SL recognition lately Good results Different approaches exist
ours is based on distinguishing between dynamic and static subunits
Workshop on Gesture Recognition, June 20, 201110
![Page 11: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/11.jpg)
Workshop on Gesture Recognition, June 20, 201111
Dynamic (Movement)-Static (Position)
Segmentation: Intuitive, Segments + Labels
Separate Modeling, SUs, Clustering wrt. Feature type (e.g. static vs. dynamic features);
Parameters (e.g. Model Order) and Architecture
(HMM, GMM); Normalize features
Training, Data-Driven Lexicon
Recognize SUs, Signs
Dynamic-Static SU Recognition
![Page 12: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/12.jpg)
Dynamic-Static SU Extraction
V. Pitsikalis, S. Theodorakis and P. Maragos, Data-Driven Sub-Units and Modeling Structure for Continuous Sign Language Recognition with Multiple Cues, LREC, 2010
![Page 13: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/13.jpg)
Dynamic-Static SU extraction
Workshop on Gesture Recognition, June 20, 201113
Dynamic clusters Static clusters
![Page 14: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/14.jpg)
4. Phonetic Modeling
Based on modeling signs linguistically Little recent work
Workshop on Gesture Recognition, June 20, 201114
![Page 15: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/15.jpg)
Phonetics
Phonetics: the study of the sounds that constitute a word
Equivalently: the study of the elements that constitute a sign (i.e. its “pronunciation”)
Workshop on Gesture Recognition, June 20, 201115
![Page 16: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/16.jpg)
The Role of Phonetics
Words consist of smaller parts, e.g.:cat → /k/ /æ/ /t/
So do signs, e.g.:CHAIR → (HS, orientation, location,
movement)Parts well-known: 30+ years of researchLess clear: a good structured model
Gestures can borrow from sign inventory
Workshop on Gesture Recognition, June 20, 201116
![Page 17: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/17.jpg)
The Role of Phonetics in Recognition
The most successful speech recognition systems model words in terms of their constituent phones/phonemes, not in terms of data-driven subunitsAdding new words to dictionaryLinguistic knowledge & robustness
Why don’t sign language recognition systems do this?Phonetics are complex, and phonetic
annotations/lexica are expensive to create
Workshop on Gesture Recognition, June 20, 201117
![Page 18: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/18.jpg)
Annotation vs Phonetic Structure
There is a difference between annotations (writing down) of a word and its phonetic structure, required for recognitionAnnotations cannot be applied directly to
recognition, although an expert can infer the full pronunciation and structure from an annotation
Annotations for signed languages are much less time consuming than writing the full phonetic structure
Workshop on Gesture Recognition, June 20, 201118
![Page 19: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/19.jpg)
Annotation of a Sign
Basic HamNoSys annotation of CHAIR:
Workshop on Gesture Recognition, June 20, 201119
Symmetry
Handshape
Orientation
LocationMovement
Repetition
![Page 20: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/20.jpg)
Phonetic Structure of Signs
Postures, Detentions, Transitions, Steady Shifts (PDTS) > improved over 1989 Movement-Hold model Alternating postures and transitions CHAIR:
Workshop on Gesture Recognition, June 20, 201120
Posture Trans Posture Trans Posture Trans Det
shoulder Straightdown
chest Back up shoulder Straightdown
chest
![Page 21: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/21.jpg)
How Expensive is Phonetic Modeling?
Basic HamNoSys annotation of CHAIR:
Same sign with full phonetic structure: (starting posture) (downward
transition & ending posture) (transition back &
starting posture – due to repetition) (downward
transition & ending posture)
Workshop on Gesture Recognition, June 20, 201121
Over 70 characters compared to just 8!Over 70 characters compared to just 8!
![Page 22: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/22.jpg)
Automatic Extraction of Phonetic Structure
First contribution: Automatically extract phonetic structure of sign from HamNoSys
Combines convenience of annotations with required detail for recognition
Recovers segmentation, postures, transitions, and relative timing of hands
Based on symbolic analysis of movements, symmetries, etc.
Workshop on Gesture Recognition, June 20, 201122
![Page 23: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/23.jpg)
Training and Alignment of Phonetic SUs
Second contribution: Train classifiers based on phonetic structure, and align with data to recover frame boundaries Frame boundaries not needed for recognition, but can
be used for further data-driven analysis
Classifiers based on HMMs – why? Proven track record for this type of task No explicit segmentation required, just concatenate
SUs, use Baum-Welch training Trivial to scale up lexicon size to 1000s of signs
Workshop on Gesture Recognition, June 20, 201123
![Page 24: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/24.jpg)
Phonetic Models to HMM
Workshop on Gesture Recognition, June 20, 201124
Posture Trans Posture Trans Posture Trans Det
shoulder Straightdown
chest Back up shoulder Straightdown
chest
…
![Page 25: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/25.jpg)
Phonetic Subunit Training, Alignment
Workshop on Gesture Recognition, June 20, 201125
Transition/Epenthesis SegmentsSuperimposed Initial-End Frames + Arrow
Posture/Detention SegmentsSingle Frame
E T T EPPP
![Page 26: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/26.jpg)
Phonetic Sub-Units
Workshop on Gesture Recognition, June 20, 201126
Transition/Epenthesis
![Page 27: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/27.jpg)
Phonetic Sub-units
Workshop on Gesture Recognition, June 20, 2011
27
Postures
![Page 28: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/28.jpg)
4. Recognition based on both Data-Driven SUs + Phonetic Transcriptions
Workshop on Gesture Recognition, June 20, 201128
SegmentationVisual
Front-EndSub-Units+Training
Deco-ding
Recognize Signs, Data-driven SUs
1. Data-Driven Subunits(based on Dynamic-Static)
Recognize Signs,+ PDTS Sequence+Alignment
VisualFront-End
Deco-ding
2. Data+Phonetic Subunits
PDTS System
Sub-units+Training
Segmentation
Align-ment
PDTS Sequence
![Page 29: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/29.jpg)
Workshop on Gesture Recognition, June 20, 201129
Varying Dynamic SUs and method, Static SUs=300, #Signs = 961
Varying # Signs and method
Data-Driven vs. Phonetic Subunits Recognition
![Page 30: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/30.jpg)
5. Conclusions and The Big Picture Rich SL corpora annotations are rare (in contrast to speech)
Human annotations of sign language (HamNoSys) are expensive, subjective, contain errors, inconsistencies
HamNoSys contain no time structure Data-Driven approaches Efficient but construct abstract SubUnits
Workshop on Gesture Recognition, June 20, 201130
Convert HamNoSys to PDTS; Gain Time Structure and Sequentiality
Construct meaningful phonetics-based SUs
Further exploit the PDTS+Phonetic-SUs Correct Human Annotations Automatically Valuable for SU based SL Recognition, Continuous SLR,
Adaptation, Integration of Multiple Streams
![Page 31: Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,](https://reader030.fdocuments.in/reader030/viewer/2022032701/56649c775503460f9492bd01/html5/thumbnails/31.jpg)
Thank you !
Workshop on Gesture Recognition, June 20, 201131
Workshop on Gesture Recognition, June 20, 2011
The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement n◦ 231135. Theodore Goulas contributed the HamNoSys annotations for GSL.