Iowa State UniversityDevelopmental Robotics Laboratory
Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm
Matthew Miller, Alexander StoytchevDevelopmental Robotics Lab
Department of Electrical and Computer Engineering Iowa State University
[email protected], [email protected]/~mamille/
Iowa State UniversityDevelopmental Robotics Laboratory
Language: A Grand Challenge• A working example• Automatically acquires
language• Well studied
Iowa State UniversityDevelopmental Robotics Laboratory
Statistical Learning Experiments
• Saffran et. al. (1996): 8-month-olds can segment speech.
Artificial Language:tupiro golabu bedaku padoti
Language: tu pi ro go la bu be da kuTransition Prob: 1.0 1.0 .25 1.0 1.0 .25 1.0 1.0 ...
Acclimate
Novel Word
• Hypothesis: Infants use local minima in single syllable transition probabilities to segment speech streams.
Iowa State UniversityDevelopmental Robotics Laboratory
Voting Experts
• An algorithm for unsupervised segmentation• Key Idea: Natural “chunks” have:
– Low Internal Information– High Boundary Entropy
itwasabrightcolddayinaprilandtheclockswere
))"log(Pr(")"(" brightbrightI
)"(")"(" rightcIbrightI
Iowa State UniversityDevelopmental Robotics Laboratory
Voting Experts
• An algorithm for unsupervised segmentation• Key Idea: Natural “chunks” have:
– Low Internal Information– High Boundary Entropy
itwasabrightcolddayinaprilandtheclockswere
)"(")"|"Pr()"("
brightIbrightbrightE
)"(")"(" brighEbrightE
Iowa State UniversityDevelopmental Robotics Laboratory
VE Implementation (Cohen 2006)
1. Build an n-gram trie from text.2. Slide a window along the text sequence3. Two experts vote how to break the window
1. One minimizes internal info2. Other maximizes boundary entropy
i t w a s a b r i g h t c o l d d a y i n a p r i lWindow
1
windowts
II
..
)]()([min ,)"(")"(" abrigIasI
Iowa State UniversityDevelopmental Robotics Laboratory
VE Implementation (Cohen 2006)
1. Build an n-gram trie from text.2. Slide a window along the text sequence3. Two experts vote how to break the window
1. One minimizes internal info2. Other maximizes boundary entropy
i t w a s a b r i g h t c o l d d a y i n a p r i lWindow
2
windowts
E
..
)]([max )"("asaE
Iowa State UniversityDevelopmental Robotics Laboratory
VE Implementation (Cohen 2006)
1. Build an n-gram trie from text.2. Slide a window along the text sequence3. Two experts vote how to break the window
1. One minimizes internal info2. Other maximizes boundary entropy
4. Break at vote peaks
i t w a s a b r i g h t c o l d d a y i n a p r i l
i | t | w | a | s | a | b | r | i | g | h | t | c | o | l | d0
3
1
0
3
2
0
1
1
0
0
6
1
0
0
Iowa State UniversityDevelopmental Robotics Laboratory
VE Results• Results are surprisingly good on text
– Especially giving its simplicity– Accuracy and Hit rate about 75%
• Seems to capture something about the nature of “chunks”
• Can we use this algorithm to segment real audio?
It was a br igh t
Iowa State UniversityDevelopmental Robotics Laboratory
Acoustic Model
Iowa State UniversityDevelopmental Robotics Laboratory
Acoustic Model
• Cluster spectral features using a GGSOM
Iowa State UniversityDevelopmental Robotics Laboratory
Acoustic Model
• Cluster spectral features using a GGSOM• Collapse state sequence
Iowa State UniversityDevelopmental Robotics Laboratory
Acoustic Model
• Cluster spectral features using a GGSOM• Collapse state sequence• Run VE to get breaks
Iowa State UniversityDevelopmental Robotics Laboratory
Experiments and Results• Used the model to segment “1984”
– CD 1 of audio book (40 mins)– Chosen for length, consistency– Evaluation: Human graders
Iowa State UniversityDevelopmental Robotics Laboratory
New Experiments• Trained on infant datasets
• Tested on manually generated keys
Stream A:tupiro golabu bedaku padoti
Stream B:dapiku tilado pagotu burobi
Train Train
Train Train
Test Test
Test Test
Acoustic Model A
Acoustic Model B
VE Model A
VE Model B
Key A
Key B
Iowa State UniversityDevelopmental Robotics Laboratory
New Experiments• Trained on infant datasets
• Tested on manually generated keys
Stream A:tupiro golabu bedaku padoti
Stream B:dapiku tilado pagotu burobi
Test TestTes
t Test
Acoustic Model A
Acoustic Model B
VE Model A
VE Model B
Key B
Key A
Iowa State UniversityDevelopmental Robotics Laboratory
Results• Experiment 1
– Accuracy: 50% on all induced breaks– Hit Rate: 75% of word breaks– Significantly better than chance
• Experiment 2– Accuracy: 16% on all induced breaks– Hit Rate: 1% of word breaks– Worse than chance– 18 breaks, 3 correct
Iowa State UniversityDevelopmental Robotics Laboratory
Conclusions and Future Work• VE Model can be used to segment audio
• Can reproduce the results of Infant studies
• May model part of the human chunking mechanism
• Have built more sophisticated acoustic models– Better results (nearly perfect)
Iowa State UniversityDevelopmental Robotics Laboratory
Thank You• www.cs.iastate.edu/~mamille/
Top Related