Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for...
Transcript of Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Recent Work on Deep Learningat Fudan NLP Group
Xipeng [email protected]
http://nlp.fudan.edu.cn/~xpqiu
Fudan University
Nov 20, 2015ACML 2015Hong Kong
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Outline
1 Neural Models for NLP
2 Our Focused ProblemsDense Feature CompositionLong Distance Dependence for LSTM
3 Our Recent WorkConvolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
General Neural Architectures for NLP
from [Collobert et al., 2011]
1 represent the words/featureswith dense vectors(embeddings) by lookup table;
2 concatenate the vectors;
3 classify/match/rank withmulti-layer neural networks.
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Distributed Representation for Three Levels
Word LevelNNLMC&WCBOW & Skip-Gram
Sentence LevelNBOWSequence Models: Recurrent NN (LSTM and GRU),Paragraph VectorTopological Models: Recursive NN,Convolutional Models: DCNN
Document LevelNBOWHierachical Models two-level CNNSequence Models LSTM, Paragraph Vector
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Dense Feature CompositionLong Distance Dependence for LSTM
Quite Simple Feature Composition
Given two embeddings a and b,1 how to calculate their similarity/relevence/relation?
1 Concatenation
a⊕ b → ANN → output
2 Bilinear
aTMb → output
2 how to use them in classification task?1 Concatenation
a⊕ b → ANN → output
2 Sum/Average
a+ b → ANN → output
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Dense Feature CompositionLong Distance Dependence for LSTM
Focused Problem 1: Dense Feature Composition
Not “Really” Deep Learning in NLP
Most of the neural models is very shallow in NLP.
The major benefit is introducing dense representation.
The feature composition is also quite simple.
ConcatenationSum/AverageBilinear model
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Dense Feature CompositionLong Distance Dependence for LSTM
Focused Problem 1: Dense Feature Composition
Many previous works (half a year ago) have shown that increasingthe network depth cannot improve the performances of many NLPtasks.
Our Motivation:How to enhance the neural model by modeling compositions of thedense features?
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Dense Feature CompositionLong Distance Dependence for LSTM
Focused Problem 2: Long Distance Dependence for LSTM
h1 h2 h3 h4 · · · hT softmax
x1 x2 x3 x4 xT y
Unfolded LSTM for Text Classification
Drawback: long-term dependencies need to be transmittedone-by-one along the sequence.
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM
Convolutional Neural Tensor Network for Text Matching[Qiu and Huang, 2015]
question+f +
answer MatchingScore
Architecture of Convolutional Neural Tensor Network
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM
Recursive Neural Network (RecNN) [Socher et al., 2013]
a,Det red,JJ bike,NN
red bike,NP
a red bike,NP
Topological modelscompose the sentencerepresentation following agiven topological structureover the words.
Given a labeled binary parse tree,((p2 → ap1), (p1 → bc)), thenode representations arecomputed by
p1 = f (W
[bc
]),
p2 = f (W
[ap1
]).
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM
A variant of RecNN: Recursive Convolutional NeuralNetwork [Zhu et al., 2015]
Recursive neural network can only process the binary combinationand is not suitable for dependency parsing.
a,Det red,JJ bike,NN
Convolution
Pooling
a red bike,NN
a bike,NN red bike,NN
Recursive Convolutional Neural Network
introducing the convolution andpooling layers;
modeling the complicatedinteractions of the head word andits children.
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM
Gated Recursive Neural Network [Chen et al., 2015a]
Rainy
下 雨Day
天Ground
地 面Accumulated water
积 水
M E SB
DAG based Recursive NeuralNetwork
Gating mechanism
An relative complicated solution
GRNN models the complicatedcombinations of the features,which selects and preserves theuseful combinations via reset andupdate gates.
A similar model: AdaSent [Zhaoet al., 2015]
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM
GRNN Unit
Gate z
Gate rL Gate rR
hj-1(l-1) hj
(l-1)
hj^(l)
hj(l)
Two Gates
reset gate
update gate
Chinese Word Segmentation [Chenet al., 2015a]
Dependency Parsing [Chen et al.,2015c]
Sentence Modeling [Chen et al.,2015b]
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM
Multi-Timescale LSTM
(a) Fast-to-Slow Strategy (b) Slow-to-Fast Strategy
Figure: Two feedback strategies of our model. The dashed line shows thefeedback connection, and the solid link shows the connection at currenttime.
from [Liu et al., 2015]
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM
Unfolded Multi-Timescale LSTM with Fast-to-SlowFeedback Strategy
g31 g3
2 g33 g3
4 · · · g3T
g21 g2
2 g23 g2
4 · · · g2T softmax
g11 g1
2 g13 g1
4 · · · g1T y
x1 x2 x3 x4 xT
from [Liu et al., 2015]
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM
LSTM for Sentiment Analysis
<s> Is this progress ? </s>
0.2
0.3
0.4
0.5
LSTMMT-LSTM
<s> He ’d create a movie better than this . </s>
0
0.2
0.4
0.6
0.8
LSTMMT-LSTM
<s> It ’s not exactly a gourmetmeal but the fare is fair , even coming from the drive . </s>
0
0.2
0.4
0.6
0.8
LSTMMT-LSTM
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
References I
Xinchi Chen, Xipeng Qiu, Chenxi Zhu, and Xuanjing Huang. Gated recursiveneural network for Chinese word segmentation. In Proceedings of AnnualMeeting of the Association for Computational Linguistics, 2015a.
Xinchi Chen, Xipeng Qiu, Chenxi Zhu, Shiyu Wu, and Xuanjing Huang.Sentence modeling with gated recursive neural network. In Proceedings ofthe Conference on Empirical Methods in Natural Language Processing,2015b.
Xinchi Chen, Yaqian Zhou, Chenxi Zhu, Xipeng Qiu, and Xuanjing Huang.Transition-based dependency parsing using two heterogeneous gatedrecursive neural networks. In Proceedings of the Conference on EmpiricalMethods in Natural Language Processing, 2015c.
Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, KorayKavukcuoglu, and Pavel Kuksa. Natural language processing (almost) fromscratch. The Journal of Machine Learning Research, 12:2493–2537, 2011.
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Neural Models for NLPOur Focused Problems
Our Recent WorkReferences
References II
PengFei Liu, Xipeng Qiu, Xinchi Chen, Shiyu Wu, and Xuanjing Huang.Multi-timescale long short-term memory neural network for modellingsentences and documents. In Proceedings of the Conference on EmpiricalMethods in Natural Language Processing, 2015.
Xipeng Qiu and Xuanjing Huang. Convolutional neural tensor networkarchitecture for community-based question answering. In Proceedings ofInternational Joint Conference on Artificial Intelligence, 2015.
Richard Socher, John Bauer, Christopher D Manning, and Andrew Y Ng.Parsing with compositional vector grammars. In In Proceedings of the ACLconference. Citeseer, 2013.
Han Zhao, Zhengdong Lu, and Pascal Poupart. Self-adaptive hierarchicalsentence model. arXiv preprint arXiv:1504.05070, 2015.
Chenxi Zhu, Xipeng Qiu, Xinchi Chen, and Xuanjing Huang. A re-rankingmodel for dependency parser with recursive convolutional neural network. InProceedings of Annual Meeting of the Association for ComputationalLinguistics, 2015.
Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group