Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa...
-
Upload
jordy-ellingham -
Category
Documents
-
view
216 -
download
2
Transcript of Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa...
![Page 1: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/1.jpg)
Learning with lookahead:Can history-based models rival globally
optimized models?
Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology (JAIST)
Yusuke Miyao National Institute of Informatics (NII)
Jun’ichi KazamaNational Institute of Information and Communications Technology (NICT)
![Page 2: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/2.jpg)
History-based models
• Structured prediction problems in NLP– POS tagging, named entity recognition, parsing, …
• History-based models– Decompose the structured prediction problem into a
series of classification problems• Have been widely used in many NLP tasks– MEMMs (Ratnaparkhi, 1996; McCallum et al., 2000)– Transition-based parsers (Yamada & Matsumoto, 2003;
Nivre et al., 2006)• Becoming less popular
![Page 3: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/3.jpg)
Part-of-speech (POS) tagging
• Perform multi-class classification at each word• Features are defined on observations (i.e.
words) and the POS tags on the left
I saw a dog with eyebrowsNVDP
NVDP
NVDP
NVDP
NVDP
NVDP
![Page 4: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/4.jpg)
Dependency parsing
I saw a dog with eyebrows
OPERATION STACK QUEUEShiftReduceLReduceR
I saw a dog with eyebrows
![Page 5: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/5.jpg)
Dependency parsing
I saw a dog with eyebrows
OPERATION STACK QUEUEShiftReduceLReduceR
I saw a dog with eyebrows
![Page 6: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/6.jpg)
Dependency parsing
I saw a dog with eyebrows
OPERATION STACK QUEUEShiftReduceLReduceR
a dog with eyebrowsI saw
![Page 7: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/7.jpg)
Dependency parsing
I saw a dog with eyebrows
OPERATION STACK QUEUEShiftReduceLReduceR
saw a dog with eyebrows
![Page 8: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/8.jpg)
Dependency parsing
I saw a dog with eyebrows
OPERATION STACK QUEUEShiftReduceLReduceR
saw a dog with eyebrows
![Page 9: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/9.jpg)
Dependency parsing
I saw a dog with eyebrows
OPERATION STACK QUEUEShiftReduceLReduceR
saw a dog with eyebrows
![Page 10: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/10.jpg)
Dependency parsing
I saw a dog with eyebrows
OPERATION STACK QUEUEShiftReduceLReduceR
saw dog with eyebrows
![Page 11: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/11.jpg)
Dependency parsing
I saw a dog with eyebrows
OPERATION STACK QUEUEShiftReduceLReduceR
saw dog with eyebrows
![Page 12: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/12.jpg)
Dependency parsing
I saw a dog with eyebrows
OPERATION STACK QUEUEShiftReduceLReduceR
saw dog with eyebrows
![Page 13: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/13.jpg)
Dependency parsing
I saw a dog with eyebrows
OPERATION STACK QUEUEShiftReduceLReduceR
saw dog with
![Page 14: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/14.jpg)
Dependency parsing
I saw a dog with eyebrows
OPERATION STACK QUEUEShiftReduceLReduceR
saw dog
![Page 15: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/15.jpg)
Lookahead
• Playing ChessIf I move this pawn, then the knight will be captured by that bishop, but then I
can …
![Page 16: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/16.jpg)
POS tagging with lookahead
• Consider all possible sequences of future tagging actions to a certain depth
I saw a dog with eyebrowsN V D N
VDP
NVDP
![Page 17: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/17.jpg)
POS tagging with lookahead
• Consider all possible sequences of future tagging actions to a certain depth
I saw a dog with eyebrowsN V D N
VDP
NVDP
![Page 18: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/18.jpg)
POS tagging with lookahead
• Consider all possible sequences of future tagging actions to a certain depth
I saw a dog with eyebrowsN V D N
VDP
NVDP
![Page 19: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/19.jpg)
POS tagging with lookahead
• Consider all possible sequences of future tagging actions to a certain depth
I saw a dog with eyebrowsN V D N
VDP
NVDP
![Page 20: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/20.jpg)
POS tagging with lookahead
• Consider all possible sequences of future tagging actions to a certain depth
I saw a dog with eyebrowsN V D N
VDP
NVDP
![Page 21: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/21.jpg)
Dependency parsing
I saw a dog with eyebrows
OPERATION STACK QUEUEShiftReduceLReduceR
saw dog with eyebrows
ShiftReduceLReduceR
saw dog with eyebrows
![Page 22: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/22.jpg)
Dependency parsing
I saw a dog with eyebrows
OPERATION STACK QUEUEShiftReduceLReduceR
saw dog with eyebrows
ShiftReduceLReduceR
saw with eyebrows
![Page 23: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/23.jpg)
Choosing the best action by search
S1 S2 Sm. . . . . . .
a1 a2 am
S1* S2* S3*
searchdepth
S
![Page 24: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/24.jpg)
Search
![Page 25: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/25.jpg)
Decoding cost
• Time complexity: O(nm^(D+1))– n: number of actions to complete the structure– m: average number of possible actions at each state– D: search depth
• Time complexity of k-th order CRFs: O(nm^(k+1))
• History-based models with k-depth lookahead are comparable to k-th order CRFs in terms of training/testing time
![Page 26: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/26.jpg)
Perceptron learning with Lookahead
S1 S2 Sm. . . . . . .
S1* S2* Sm*
a1 a2 am Without lookahead
With lookahead
*1Sw
Linear scoring model
kSS 1ww
**1 kSS ww
Correct action
Guaranteed to converge
![Page 27: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/27.jpg)
Experiments
• Sequence prediction tasks– POS tagging– Text chunking (a.k.a. shallow parsing)– Named entity recognition
• Syntactic parsing– Dependency parsing
• Compared to first-order CRFs in terms of speed and accuracy
![Page 28: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/28.jpg)
POS tagging
CRF
depth = 2
depth = 1
depth = 0
96.9 97 97.1 97.2 97.3
Accuracy
• WSJ corpus
![Page 29: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/29.jpg)
Training time
CRF
depth = 2
depth = 1
depth = 0
10 100 1000 10000
Second
• WSJ corpus
![Page 30: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/30.jpg)
POS tagging (+ tag trigram features)
CRF
depth = 2
depth = 1
depth = 0
96.9 97 97.1 97.2 97.3
Accuracy
• WSJ corpus
![Page 31: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/31.jpg)
Chunking (shallow parsing)
CRF
depth = 2
depth = 1
depth = 0
93.35 93.4 93.45 93.5 93.55 93.6 93.65 93.7 93.75 93.8 93.85
F-score
• CoNLL 2000 data set
![Page 32: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/32.jpg)
Named entity recognition
CRF
depth = 3
depth = 2
depth = 1
depth = 0
69 69.5 70 70.5 71 71.5 72 72.5
F-score
• BioNLP/NLPBA 2004 data set
![Page 33: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/33.jpg)
Dependency parsing
Struc. Perc.
depth = 3
depth = 2
depth = 1
depth = 0
88.5 89 89.5 90 90.5 91 91.5
F-score
• WSJ corpus
(Zhang and Clark, 2008)
![Page 34: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/34.jpg)
Related work
• MEMMs + Viterbi– label bias problem (Lafferty et al., 2001)
• Learning as search optimization (LaSO) (Daume III and Marcu 2005)– No lookahead
• Structured perceptron with beam search (Zhang and Clark, 2008)
![Page 35: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/35.jpg)
Conclusion
• Can history-based models rival globally optimized models? – Yes, they can be more accurate than CRFs
• The same computational cost as CRFs
![Page 36: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/36.jpg)
Future work
• Feature Engineering
• Flexible search extension/reduction
• Easy-first tagging/parsing– (Goldbergand & Elhadad, 2010)
• Max-margin learning
![Page 37: Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.](https://reader038.fdocuments.in/reader038/viewer/2022103111/55163e1f550346b2068b51f1/html5/thumbnails/37.jpg)
THANK YOU