Jamie Alexandre
description
Transcript of Jamie Alexandre
JamieAlexandre
≠
=
wouldyoulikeacookiejason
Grammatical ComplexityThe Chomsky Hierarchy
Grammatical ComplexityThe Chomsky Hierarchy
Recursion• Something containing an instance of itself.
Recursion in Language
The dog walked down the street.
The dog the cat rode walked down the street.
The dog the cat the rat grabbed rode walked down the street.
Recursion: “Stack” MemoryThe dog the cat the rat grabbed rode walked down the street.
DOG CATRAT WALKRIDEGRAB
Recursion: “Stack” MemoryThe dog the cat the rat grabbed rode walked down the street.
DOGCATRAT
WALKRIDEGRAB
“Limited performance…”
“Infinite competence…”
??
SRNSimple Recurrent Network (Elman, 1990)
• Some ability to use longer contexts• Incremental learning: no looking back• No “rules”: distributed representation
PCFG
• Easily handles recursive structure, long-range context• Hierarchical, “rule”-based representation• More computationally complex, non-incremental learning
Probabilistic Context-Free Grammar
S NP VPN’ AdjP N’N’ NAdjgreen…
0.80.650.350.1…
Serial ReactionTime (SRT) Study
• Buttons flash in short sequences– “press the button as quickly as possible when it lights up”
• Dependent measure: RT– time from light on correct button pressed
• Subjects seem to be making sequential predictionsRT ∝ P(button|context)
also: RT -log(∝ P(button|context))(“surprisal”, e.g. Hale, 2001; Levy, 2008)
Training the Humans
• Eight subjects per experimental condition
• Same sequences, different mappings
• Broken into 16 blocks, with breaks
• About an hour of button-pressing total
• Emphasized speed, while minimizing errors
Training the Models• Trained on exactly the same sequences
as the humans, but not fit to human data
• Predictions at every point based solely on sequences seen prior to that
• Results in sequence of probabilities– correlated with sequence of human RTs,
through surprisal (negative log probability)
Analysis
Analysis
A Case Study in Recursion: Palindromes
A C L Q L C A
(Sequences of length 5 through 15; total of 3728 trials per subject)
1-4 5-8 9-12 13-160
0.1
0.2
0.3
0.4
Blocks (average of 233 trials per block)
Co
rre
latio
n (
Su
rpri
sal v
s R
T)
PCFGSRN
1-4 5-8 9-12 13-160
0.1
0.2
0.3
0.4
0.5
Blocks (average of 233 trials per block)
Co
rre
latio
n (
Su
rpri
sal v
s R
T)
PCFGSRN
“Did you notice any patterns?”Subjects with no awareness of pattern:
“No”, “None”, “Not really” (n=5)
Those with explicit awareness of pattern:
“Circular pattern”, “Mirror pattern” (n=3)
SRN(implicit task performance)
PCFG(explicit task performance)Will this replicate?
2 4 6 8 10 12 14 16-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Block
Cor
rela
tion
(Sur
pris
al v
s R
T)
Implicit, didn't notice (n=8)
PCFG
SRN
• Differences between individuals?– or actually between modes of processing?
• What if we explicitly train subjects on the pattern?
• First half implicit, second half explicit
“This is the middle button in every sequence (and it only occurs in the middle position, halfway through the sequence):
This means that as soon as you see this button, you know that the sequence will start to reverse.
Here are some example sequences of various lengths:
Explicit Training Worksheet
And Quiz Sheet“Now, complete these sequences using the same pattern (crossing out any unneeded boxes at the end of a sequence):
2 4 6 8 10 12 14 16-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Block
Cor
rela
tion
(Sur
pris
al v
s R
T)
Fully explicit from middle (n=8)
PCFG
SRN
(explicit instruction given here)
0 20 40 60 80 100
240
260
280
300
320
340
360
Percentage of the way through sequence
RT
Palindromes: The effect of explicit instruction after block 8
Blocks 1-2
Blocks 3-4Blocks 5-6
Blocks 7-8
Blocks 9-10
Blocks 11-12Blocks 13-14
Blocks 15-16
Before explicit instruction
After
Context-free vs Context-sensitive
A AB BC CD D
1
1
1
2
2
2
1 2
CFG:
CSG:
Explicit Instruction(after block 4)
Methods• Four conditions, with 8 subjects in each
– Implicit context-free grammar (CFG)– Implicit context-sensitive grammar (CSG)– Explicit context-free grammar (CFG)– Explicit context-sensitive grammar (CSG)
• Total of 640 sequences (4,120 trials) per subject– Sequences of length 4, 6, 8, and 10– Around 1.5 hours of button-pressing– In blocks 9-16, 5% of the trials were “errors”
A1 B1 C1 C2 B2 A2
D2
0 20 40 60 80 100
280
300
320
340
Explicit CSG
0 20 40 60 80 100
280
300
320
340
Explicit CFG
0 20 40 60 80 100
280
300
320
340
Implicit CSG
0 20 40 60 80 100
280
300
320
340
Implicit CFG
Blocks 1-4Blocks 5-8Blocks 9-12 (errors thicker)Blocks 13-16 (errors thicker)
Implicit CFG Explicit CFG Implicit CSG Explicit CSG
240
260
280
300
320
340
non-errors
errors**(6ms)
**(27ms)
(2ms) **(11ms)
RT
(m
s)
Conclusions
• Explicit/Implicit processing– Implicit performance correlated with the predictions
made by an SRN (a connectionist model)– Explicit performance correlated with the predictions
made by a PCFG (a rule-based model)
• Grammatical complexity– Able to process context-free, recursive structures at a
very rapid timescale– More limited ability to process context-sensitive
structures
• Longer training
• More complex grammars– Determinism
• Other response measures– EEG: more sensitive than RTs to initial stages
of learning
• Field studies in Switzerland or Brazil…?
Future Directions
Broader Goals
• L2-learning pedagogy
Thankyous!MentorshipJeff ElmanRoger LevyMarta Kutas
AdviceMicah Bregman
Ben CipolliniVicente Malave Nathaniel Smith
Angela YuRachel Mayberry
Tom Urbach
Andrea, Seana and the 3rd Year Class!
Research AssistantsFrances Martin (2010)
Ryan Cordova (2009)
Wai Ho Chiu (2009)
Implicit CFG Explicit CFG Implicit CSG Explicit CSG
240
260
280
300
320
340
360
error position - 2error position -1*error position*error position + 1error position + 2
-0.6 -0.4 -0.2 0
325330335340345
bigr
am
Blocks 1-4
-0.6 -0.4 -0.2 0
320
340
trig
ram
-0.6 -0.4 -0.2 0320
340
360
hmm
5
-0.6 -0.4 -0.2 0320340360380
ihm
m
-0.6 -0.4 -0.2 0
320340360
srn
(one
pas
s)
-0.6 -0.4 -0.2 0300
350
pcfg
8
-0.6 -0.4 -0.2 0280
300
320
Blocks 5-8
-0.6 -0.4 -0.2 0
300
320
-0.6 -0.4 -0.2 0280
300
320
-0.6 -0.4 -0.2 0
280
300
320
-0.6 -0.4 -0.2 0
300320
340
-0.6 -0.4 -0.2 0
320
340
-0.6 -0.4 -0.2 0250
300
Blocks 9-12
-0.6 -0.4 -0.2 0
290300310
-0.6 -0.4 -0.2 0250
300
-0.6 -0.4 -0.2 0260280300320
-0.6 -0.4 -0.2 0
280300320340
-0.6 -0.4 -0.2 0
300
320
-0.6 -0.4 -0.2 0260280300320
Blocks 13-16
-0.6 -0.4 -0.2 0280
300
-0.6 -0.4 -0.2 0
300
320
-0.6 -0.4 -0.2 0250
300
-0.6 -0.4 -0.2 0
280300320
-0.6 -0.4 -0.2 0
300
320
Negative probability plotted against smoothed RTs
2 4 6
325330335340345
bigr
amBlocks 1-4
2 4 6
320
340
trig
ram
2 4 6320340
360
hmm
5
2 4 6320340360380
ihm
m
2 4 6
320340360
srn
(one
pas
s)
2 4 6
320
340
360
pcfg
8
2 4 6280
300
320
Blocks 5-8
2 4 6
300
320
2 4 6
280300320
2 4 6
280
300
320
2 4 6
300
350
2 4 6
310320330340350
2 4 6250
300
Blocks 9-12
2 4 6280
300
2 4 6260280300320340
2 4 6260280300320340
2 4 6260280300320340
2 4 6
300
320
340
2 4 6260280300320
Blocks 13-16
2 4 6
280
300
2 4 6
300
320
2 4 6
250
300
2 4 6
280300320
2 4 6
300310320
Surprisal plotted against smoothed RTs
AGL and Language
• Areas associated with syntax may be involved– Bahlmann, Schubotz, and Friederici (2008).
Hierarchical artificial grammar processing engages Broca's area. NeuroImage, 42(2):525-534.
• P600-like effects can be seen in AGL– Christiansen, Conway, & Onnis (2007). Neural
Responses to Structural Incongruencies in Language and Statistical Learning Point to Similar Underlying Mechanisms.
– “violations in an artificial grammar can elicit late positivities qualitatively and topographically comparable to the P600 seen with syntactic violations in natural language”
-15 -10 -5 0 5 10 15 20-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Lag
Cor
rela
tion
(Pro
b vs
RT
)Sanity Check: Effect is Local
Context-free Grammar
The dog the cat the rat grabbed rode walked.
S NP VP
NP NNP N S
N the dogN the catN the rat
VP grabbedVP rodeVP walked