BUCLD2011
-
Upload
chigusa-kurumada -
Category
Documents
-
view
214 -
download
0
description
Transcript of BUCLD2011
![Page 1: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/1.jpg)
Statistical wordsegmentation ofZipfian frequencydistributions
Chigusa Kurumada Linguistics, Stanford
Stephan C. Meylan Psychology, Stanford
Michael C. Frank Psychology, Stanford
![Page 2: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/2.jpg)
2
Segmentation of running speech
I l o ve y o u
Saffran, Newport et al.(1996); Saffran,Aslin et al. (1996) ; Jusczyk(1997);Perruchet et al. (1998); Aslin (1998),Brent (1999); Swingley (2005);Thiessenet al. (2005); Monaghan & Christiansen,(2010) among others
![Page 3: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/3.jpg)
3
Example
Listen to a Japanese speakingmother’s speech and find “words”
![Page 4: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/4.jpg)
4
Where is your daddy?
![Page 5: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/5.jpg)
5
Words occur at different frequencies
![Page 6: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/6.jpg)
6
The naturalistic word frequency distribution
Zipfiandistribution
Zipf (1965)
![Page 7: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/7.jpg)
7
This talk
Effects of a Zipfian distribution of wordfrequencies in speech segmentation
• 2 large-scale web-based segmentation experiments
The skewed distribution supports word segmentation
• Implications for existing models
![Page 8: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/8.jpg)
8
A potential problem for statistical word segmentation?
Pre-tty-ba-by
TP = 0.2
(Saffran, Newport, & Aslin, 1996)
(Goldwater et al., 2009)
Uniform Zipfian
TP = 1.0
![Page 9: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/9.jpg)
9
Question 1: Is segmentation of a Zipfianlanguage more difficult?
6types
12types
24types
36types
uniform
zipfian
![Page 10: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/10.jpg)
10
Experiment 1:Task (on Mechanical Turk)
Exposure: 300 word tokens
Subjects: 246 individuals in the 8 conditions
(6, 12, 24, 36 types * uniform/zipfian)
Test: 2 alternative forced choice task
go-la-bu la-bu-bi
![Page 11: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/11.jpg)
11
Results1: Proportion correct in each condition
6 12 24 36 word types
6 12 24 36 word types
Uniform Zipfian
Prop
ortio
n co
rrec
t
![Page 12: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/12.jpg)
12
Result2 : Effects of the (log) input token frequency
![Page 13: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/13.jpg)
13
Experiment 1: Summary
The standard 2AFC paradigm
• Robust segmentation ability
• Strong effects of unigram (log) frequencies
No effects ofuniform
vs.Zipfian
![Page 14: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/14.jpg)
14
Which one’s Daddy?Is it Daddy?That’s Daddy.Is that Daddy too?
Segmentation from the chunk-finding perspective
Chunking (Orban et al. 2008)
Bortfeld et al. (2005)
mommy’s sock familiar new
Brent & Cartwright (1996), Brent(1999), Goldwater et al. (2009),Perruchet & Vinter (1998)
Dahan & Brent (1999), Conway et al. (2010), van de Weijer(2001), Cunillera et al. (2010), Lew-Williams et al. (2011)
![Page 15: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/15.jpg)
15
Question 2
6 9 12 24
uniform
zipfian
Is segmentation based on a Zipfiandistribution more accurate whenwords are presented in context?
![Page 16: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/16.jpg)
16
Experiment 2: Task
Orthographic manual segmentation(50 sentences)
• words are presented in context• active search for words
Unlike the 2AFCgo-la-bu
vs.
mo-go-la • time-course of learning
![Page 17: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/17.jpg)
17
Results1: 6 word types - Uniform
- Zipfian
- Uniform
trials
Recall
(% correct)
![Page 18: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/18.jpg)
18
- Uniform - Zipfian
6 word types
12 word types 24 word types
9 word typesRec
all
(% c
orre
ct)
![Page 19: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/19.jpg)
19
A mixed logit model predicting correct segmentation
LogFrequency(p<0.001)
LogFrequency(p=0.9)
LogFrequency(p<0.001)
target wordword before word after
![Page 20: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/20.jpg)
20
Contextual bootstrapping
The average logfrequency of all thewords that appearedon the left (p<0.001)
No main effect or interaction with the distributiontypes (i.e., uniform vs. Zipfian).
The average logfrequency of all thewords that appeared onthe right (p<0.07)
target word
![Page 21: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/21.jpg)
21
Zipfian
uniform
Experiment 2: Summary
• Clear advantage of a Zipfian distribution
• The advantage is mediated by (log) token frequency
![Page 22: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/22.jpg)
22
Conclusion
I l o ve y o u
The Zipfian structure of natural languagesupports word recognition in context
![Page 23: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/23.jpg)
23
Thanks to:Stanford Language Cognition Lab,Eve Clark, Tom Wasow, Dan Jurafsky, andNoah Goodman (Stanford),T. Florian Jaeger (University of Rochester),Josh Tenenbaum (MIT)
For a full text of this paper, visit theStanford Language Cognition Lab website:http://langcog.stanford.edu/publications.html
Thank you!
![Page 24: BUCLD2011](https://reader034.fdocuments.in/reader034/viewer/2022051519/568c4d251a28ab4916a2d519/html5/thumbnails/24.jpg)
24
Meghan Sumner websitehttp://www.stanford.edu/~sumner/