1 David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to...
-
Upload
todd-marshall -
Category
Documents
-
view
215 -
download
0
Transcript of 1 David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to...
1
David ChenAdvisor: Raymond Mooney
Research Preparation Exam
August 21, 2008
Learning to Sportscast: A Test of Grounded Language Acquisition
2
Semantics of Language
• The meaning of words, phrases, etc
• Crucial in communications
• Example: “Spanish goalkeeper Iker Casillas blocks the ball”– Merriam-Webster: (transitive verb) to interfere
usually legitimately with (as an opponent) in various games or sports
– WordNet: (v) parry, deflect
33
Language Grounding
• Problem: We are circularly defining the meanings of words in terms of other words.
• The meanings of many words are grounded in our perception of the physical world: red, ball, cup, run, hit, fall, etc.– Symbol Grounding: Harnad (1990)
• Even many abstract words and meanings are metaphorical abstractions of terms grounded in the physical world: up, down, over, in, etc.– Lakoff and Johnson’s Metaphors We Live By
• It’s difficult to put my ideas into words.• Interest in competitions is up.
6
Natural Language and Meaning Representation
Natural Language (NL)
NL: A language that has evolved naturally, such as English, German, French, Chinese, etc
MRL: Formal languages such as logic or any computer-executable code
Meaning Representation Language (MRL)
Casillas blocks the ball
Block(Casillas)
7
Semantic Parsing and Tactical Generation
NL
Semantic Parsing: maps a natural-language sentence to a complete, detailed semantic representation
Tactical Generation: Generates a natural-language sentence from a meaning representation.
MRL
Semantic Parsing (NL MRL)
Tactical Generation (NL MRL)
Casillas blocks the ball
Block(Casillas)
8
Learning Approach
Manually Annotated Training Corpora(NL/MRL pairs)
Semantic Parser
MRLNL
Semantic Parser Learner
9
Learning Approach
Manually Annotated Training Corpora(NL/MRL pairs)
Tactical Generator
MRLNL
Tactical Generator Learner
10
Example of Annotated Training Corpus
Alice passes the ball to Bob
Bob turns the ball over to John
John passes to Fred
Fred shoots for the goal
Paul blocks the ball
Paul kicks off to Nancy
…
Pass(Alice, Bob)
Turnover(Bob, John)
Pass(John, Fred)
Kick(Fred)
Block(Paul)
Pass(Paul, Nancy)
…
Natural Language (NL)
Meaning Representation Language (MRL)
11
Example of Annotated Training Corpus
Alice passes the ball to Bob
Bob turns the ball over to John
John passes to Fred
Fred shoots for the goal
Paul blocks the ball
Paul kicks off to Nancy
…
P1(C1, C2)
P2(C2, C3)
P1(C3, C4)
P3(C4)
P4(C5)
P5(C5, C6)
…
Natural Language (NL)
Meaning Representation Language (MRL)
12
Learning Language from Perceptual Context
• Constructing annotated corpora for language learning is difficult
• Children acquire language through exposure to linguistic input in the context of a rich, relevant, perceptual environment
• Ideally, a computer system can learn language in the same manner
13
Goals
• Learn to ground the semantics of language
Block
• Learn language through correlated linguistic and visual inputs
16
Challenge
A linguistic input may correspond to many possible events
?
??
Pass(GermanyPlayer1, GermanyPlayer2)
Kick(GermanyPlayer2)
Block(SpanishGoalie)
“西班牙守門員擋下了球”
17
Overview
• Sportscasting task
• Related works
• Tactical generation
• Strategic generation
• Human evaluation
18
Learning to Sportscast
• Robocup Simulation League games
• No speech recognition– Record commentaries in text form
• No computer vision– Ruled-based system to automatically extract
game events in symbolic form
• Concentrate on linguistic issues
20
Learning to Sportscast
• Learn to sportscast by observing sample human sportscasts
• Build a function that maps between natural language (NL) and meaning representation (MR)– NL: Textual commentaries about the game– MR: Predicate logic formulas that represent
events in the game
21
Robocup Sportscaster TraceNatural Language Commentary Meaning Representation
Purple goalie turns the ball over to Pink8
badPass ( Purple1, Pink8 )
Pink11 looks around for a teammate
Pink8 passes the ball to Pink11
Purple team is very sloppy today
Pink11 makes a long pass to Pink8
Pink8 passes back to Pink11
turnover ( Purple1, Pink8 )
pass ( Pink11, Pink8 )
pass ( Pink8, Pink11 )
ballstopped
pass ( Pink8, Pink11 )
kick ( Pink11 )
kick ( Pink8)
kick ( Pink11 )
kick ( Pink11 )
kick ( Pink8 )
22
Robocup Sportscaster TraceNatural Language Commentary Meaning Representation
Purple goalie turns the ball over to Pink8
P6 ( C1, C19 )
Pink11 looks around for a teammate
Pink8 passes the ball to Pink11
Purple team is very sloppy today
Pink11 makes a long pass to Pink8
Pink8 passes back to Pink11
P5 ( C1, C19 )
P2 ( C22, C19 )
P2 ( C19, C22 )
P0
P2 ( C19, C22 )
P1 ( C22 )
P1( C19 )
P1 ( C22 )
P1 ( C22 )
P1 ( C19 )
23
Robocup Data
• Collected human textual commentary for the 4 Robocup championship games from 2001-2004.– Avg # events/game = 2,613– Avg # sentences/game = 509
• Each sentence matched to all events within previous 5 seconds.– Avg # MRs/sentence = 2.5 (min 1, max 12)
• Manually annotated with correct matchings of sentences to MRs (for evaluation purposes only).
24
Overview
• Sportscasting task
• Related works
• Tactical generation
• Strategic generation
• Human evaluation
25
Semantic Parser Learners
• Learn a function from NL to MR
NL: “Purple3 passes the ball to Purple5”
MR: Pass ( Purple3, Purple5 )
Semantic Parsing (NL MR)
Tactical Generation (MR NL)
•We experiment with two semantic parser learners–WASP (Wong & Mooney, 2006; 2007)–KRISP (Kate & Mooney, 2006)
26
• Uses statistical machine translation techniques– Synchronous context-free grammars (SCFG)
[Wu, 1997; Melamed, 2004; Chiang, 2005]– Word alignments [Brown et al., 1993; Och &
Ney, 2003]
• Capable of both semantic parsing and tactical generation
WASP: Word Alignment-based Semantic Parsing
27
KRISP: Kernel-based Robust Interpretation by Semantic Parsing
• Productions of MR language are treated like semantic concepts
• SVM classifier is trained for each production with string subsequence kernel
• These classifiers are used to compositionally build MRs of the sentences
• More resistant to noisy supervision but incapable of tactical generation
28
KRISPER: KRISP with EM-like Retraining
• Extension of KRISP that learns from ambiguous supervision [Kate & Mooney, 2007]
• Uses an iterative EM-like method to gradually converge on a correct meaning for each sentence.
29
KRISPER
Purple goalie turns the ball over to Pink8
badPass ( Purple1, Pink8 )
Pink11 looks around for a teammate
Pink8 passes the ball to Pink11
Purple team is very sloppy today
Pink11 makes a long pass to Pink8
Pink8 passes back to Pink11
turnover ( Purple1, Pink8 )
pass ( Pink11, Pink8 )
pass ( Pink8, Pink11 )
ballstopped
pass ( Pink8, Pink11 )
kick ( Pink11 )
kick ( Pink8)
kick ( Pink11 )
kick ( Pink11 )
kick ( Pink8 )
1. Assume every possible meaning for a sentence is correct
30
Purple goalie turns the ball over to Pink8
badPass ( Purple1, Pink8 )
Pink11 looks around for a teammate
Pink8 passes the ball to Pink11
Purple team is very sloppy today
Pink11 makes a long pass to Pink8
Pink8 passes back to Pink11
turnover ( Purple1, Pink8 )
pass ( Pink11, Pink8 )
pass ( Pink8, Pink11 )
ballstopped
pass ( Pink8, Pink11 )
kick ( Pink11 )
kick ( Pink8)
kick ( Pink11 )
kick ( Pink11 )
kick ( Pink8 )
1/21/2
1/31/3
1/3
1/31/31/3
1/31/3
1/3
1/21/2
KRISPER
2. Resulting NL-MR pairs are weighted and given to semantic parser learner
31
KRISPER
Purple goalie turns the ball over to Pink8
badPass ( Purple1, Pink8 )
Pink11 looks around for a teammate
Pink8 passes the ball to Pink11
Purple team is very sloppy today
Pink11 makes a long pass to Pink8
Pink8 passes back to Pink11
turnover ( Purple1, Pink8 )
pass ( Pink11, Pink8 )
pass ( Pink8, Pink11 )
ballstopped
pass ( Pink8, Pink11 )
kick ( Pink11 )
kick ( Pink8)
kick ( Pink11 )
kick ( Pink11 )
kick ( Pink8 )
3. Estimate the confidence of each NL-MR pair using the resulting trained semantic parser
0.650.87
0.220.35
0.130.85 0.81
0.37
0.760.49
0.76
0.670.86
32
Purple team is very sloppy today
KRISPER
Purple goalie turns the ball over to Pink8
badPass ( Purple1, Pink8 )
Pink11 looks around for a teammate
Pink8 passes the ball to Pink11
Pink11 makes a long pass to Pink8
Pink8 passes back to Pink11
turnover ( Purple1, Pink8 )
pass ( Pink11, Pink8 )
pass ( Pink8, Pink11 )
ballstopped
pass ( Pink8, Pink11 )
kick ( Pink11 )
kick ( Pink8)
kick ( Pink11 )
kick ( Pink11 )
kick ( Pink8 )
4. Use maximum weighted matching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957]
0.650.87
0.220.35
0.130.85 0.81
0.37
0.760.49
0.76
0.670.86
33
Purple team is very sloppy today
KRISPER
Purple goalie turns the ball over to Pink8
badPass ( Purple1, Pink8 )
Pink11 looks around for a teammate
Pink8 passes the ball to Pink11
Pink11 makes a long pass to Pink8
Pink8 passes back to Pink11
turnover ( Purple1, Pink8 )
pass ( Pink11, Pink8 )
pass ( Pink8, Pink11 )
ballstopped
pass ( Pink8, Pink11 )
kick ( Pink11 )
kick ( Pink8)
kick ( Pink11 )
kick ( Pink11 )
kick ( Pink8 )
5. Give the best pairs to the semantic parser learner in the next iteration, and repeat until convergence
34
Overview
• Sportscasting task
• Related works
• Tactical generation
• Strategic generation
• Human evaluation
35
Tactical Generation
• Learn how to generate NL from MR
• Example:
• Two steps1. Disambiguate the training data
2. Learn a language generator
Pass(Pink2, Pink3) “Pink2 kicks the ball to Pink3”
36
WASPER
• WASP with EM-like retraining to handle ambiguous training data.
• Same augmentation as added to KRISP to create KRISPER.
37
• First train KRISPER to disambiguate the data
• Then train WASP on the resulting unambiguously supervised data.
KRISPER-WASP
38
WASPER-GEN
• Determines the best matching based on generation (MR→NL).
• Score each potential NL/MR pair by using the currently trained WASP-1 generator.
• Compute NIST MT score [NIST report, 2002] between the generated sentence and the potential matching sentence.
39
NIST scores
Target: Purple2 passes quickly to Purple3Candidate: Purple2 passes to Purple3
1-grams: Purple2, passes, to, Purple3
2-grams: Purple2 passes, passes to, to Purple3
3-grams: Purple2 passes to, passes to Purple3
4-gram: Purple2 passes to Purple3
4/4
2/3
0/2
0/2
40
System Overview
Purple7 loses the ball to Pink2
Sportscaster Robocup Simulator
Ambiguous Training Data
Pink2 kicks the ball to Pink5
Pink5 makes a long pass to Pink8
Pink8 shoots the ball
Turnover ( purple7 , pink2 )
Pass ( pink5 , pink8)
Pass ( purple5, purple7 )
Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )
BallstoppedKick ( pink8 )
SemanticParser
Purple7 loses the ball to Pink2
Unambiguous Training Data
Pink2 kicks the ball to Pink5Pink5 makes a long pass
to Pink8Pink8 shoots the ball Kick ( pink8 )
Pass ( pink2 , pink5 )
Kick ( pink5 )
Semantic Parser Learner
Turnover ( purple7 , pink2 )
41
KRISPER and WASPER
Purple7 loses the ball to Pink2
Sportscaster Robocup Simulator
Ambiguous Training Data
Pink2 kicks the ball to Pink5
Pink5 makes a long pass to Pink8
Pink8 shoots the ball
Turnover ( purple7 , pink2 )
Pass ( pink5 , pink8)
Pass ( purple5, purple7 )
Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )
BallstoppedKick ( pink8 )
SemanticParser
Purple7 loses the ball to Pink2
Unambiguous Training Data
Pink2 kicks the ball to Pink5Pink5 makes a long pass
to Pink8Pink8 shoots the ball Kick ( pink8 )
Pass ( pink2 , pink5 )
Kick ( pink5 )
Semantic Parser Learner
(KRISP/WASP)
Turnover ( purple7 , pink2 )
42
WASPER-GEN
Purple7 loses the ball to Pink2
Sportscaster Robocup Simulator
Ambiguous Training Data
Pink2 kicks the ball to Pink5
Pink5 makes a long pass to Pink8
Pink8 shoots the ball
Turnover ( purple7 , pink2 )
Pass ( pink5 , pink8)
Pass ( purple5, purple7 )
Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )
BallstoppedKick ( pink8 )
TacticalGenerator
Purple7 loses the ball to Pink2
Unambiguous Training Data
Pink2 kicks the ball to Pink5Pink5 makes a long pass
to Pink8Pink8 shoots the ball Kick ( pink8 )
Pass ( pink2 , pink5 )
Kick ( pink5 )
Tactical Generator Learner(WASP)
Turnover ( purple7 , pink2 )
43
Disambiguation Learning language generator
WASP Random WASP
KRISPER
(Kate & Mooney, 2007)
KRISP N/A
WASPER WASP WASP
KRISPER-WASP KRISP WASP
WASPER-GEN WASP’s language generator
WASP
WASP with gold matching
N/A WASP
Lower baseline
Upper baseline
Systems
44
Disambiguation Learning language generator
WASP Random WASP
KRISPER
(Kate & Mooney, 2007)
KRISP N/A
WASPER WASP WASP
KRISPER-WASP KRISP WASP
WASPER-GEN WASP’s language generator
WASP
WASP with gold matching
N/A WASP
Lower baseline
Upper baseline
Systems
Matching
45
Matching
• 4 Robocup championship games from 2001-2004.– Avg # events/game = 2,613
– Avg # sentences/game = 509
• Leave-one-game-out cross-validation• Metric:
– Precision: % of system’s annotations that are correct
– Recall: % of gold-standard annotations correctly produced
– F-measure: Harmonic mean of precision and recall
46
Matching Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Average results on leave-one-outexperiments
F-m
easu
re WASPKRISPERWASPERWASPER-GEN
47
Disambiguation Learning language generator
WASP Random WASP
KRISPER
(Kate & Mooney, 2007)
KRISP N/A
WASPER WASP WASP
KRISPER-WASP KRISP WASP
WASPER-GEN WASP’s language generator
WASP
WASP with gold matching
N/A WASP
Lower baseline
Upper baseline
SystemsTactical
Generation
48
Tactical Generation
• 4 Robocup championship games from 2001-2004.– Avg # events/game = 2,613– Avg # sentences/game = 509
• Leave-one-game-out cross-validation• NIST score [NIST report, 2002]
– Evaluate the quality of machine translations based on matching n-grams
49
Tactical Generation Results
0
1
2
3
4
5
6
Average results on leave-one-out experiments
NIS
T
WASP
WASPER
KRISPER-WASPWASPER-GEN
WASP with goldmatching
50
Overview
• Sportscasting task
• Related works
• Tactical generation
• Strategic generation
• Human evaluation
51
Strategic Generation
• Generation requires not only knowing how to say something (tactical generation) but also what to say (strategic generation).
• For automated sportscasting, one must be able to effectively choose which events to describe.
52
Example of Strategic Generation
pass ( purple7 , purple6 )
ballstopped
kick ( purple6 )
pass ( purple6 , purple2 )
ballstopped
kick ( purple2 )
pass ( purple2 , purple3 )
kick ( purple3 )
badPass ( purple3 , pink9 )
turnover ( purple3 , pink9 )
53
Strategic Generation
• For each event type (e.g. pass, kick) estimate the probability that it is described by the sportscaster.
• Requires correct NL/MR matching– Use estimated matching from tactical
generation– Iterative Generation Strategy Learning
54
Iterative Generation Strategy Learning (IGSL)
• Directly estimates the likelihood of an event being commented on
• Self-training iterations to improve estimates
• Uses events not associated with any NL as negative evidence
55
Strategic Generation Performance
• Evaluate how well the system can predict which events a human comments on
• Metric:– Precision: % of system’s annotations that are correct
– Recall: % of gold-standard annotations correctly produced
– F-measure: Harmonic mean of precision and recall
56
Strategic Generation Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Average results on leave-one-game-out cross-validation
F-m
easu
re
inferred fromWASPinferred fromKRISPERinferred fromWASPERinferred fromWASPER-GENIGSL
inferred fromgold matching
57
Overview
• Sportscasting task
• Related works
• Tactical generation
• Strategic generation
• Human evaluation
58
• 4 fluent English speakers as judges• 8 commented game clips
– 2 minute clips randomly selected from each of the 4 games
– Each clip commented once by a human, and once by the machine
• Presented in random counter-balanced order• Judges were not told which ones were human or
machine generated
Human Evaluation (Quasi Turing Test)
59
Demo Clip
• Game clip commentated using WASPER-GEN with IGSL, since this gave the best results for generation.
• FreeTTS was used to synthesize speech from textual output.
60
Human Evaluation
CommentatorEnglishFluency
Semantic Correctness
SportscastingAbility
Human 3.94 4.25 3.63
Machine 3.44 3.56 2.94
Difference 0.5 0.69 0.69
ScoreEnglish Fluency
Semantic Correctness
Sportscasting Ability
5 Flawless Always Excellent
4 Good Usually Good
3 Non-native Sometimes Average
2 Disfluent Rarely Bad
1 Gibberish Never Terrible
61
Future Work
• Expand MRs to beyond simple logic formulas• Apply approach to learning situated language in a
computer video-game environment (Gorniak & Roy, 2005)
• Apply approach to captioned images or video using computer vision to extract objects, relations, and events from real perceptual data (Fleischman & Roy, 2007)
62
Conclusion
• Current language learning work uses expensive, unrealistic training data.
• We have developed a language learning system that can learn from language paired with an ambiguous perceptual environment.
• We have evaluated it on the task of learning to sportscast simulated Robocup games.
• The system learns to sportscast almost as well as humans.