Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for...

18
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Neural Models for NLP Our Focused Problems Our Recent Work References Recent Work on Deep Learning at Fudan NLP Group Xipeng Qiu [email protected] http://nlp.fudan.edu.cn/ ~ xpqiu Fudan University Nov 20, 2015 ACML 2015 Hong Kong Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Transcript of Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for...

Page 1: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Recent Work on Deep Learningat Fudan NLP Group

Xipeng [email protected]

http://nlp.fudan.edu.cn/~xpqiu

Fudan University

Nov 20, 2015ACML 2015Hong Kong

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 2: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Outline

1 Neural Models for NLP

2 Our Focused ProblemsDense Feature CompositionLong Distance Dependence for LSTM

3 Our Recent WorkConvolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 3: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

General Neural Architectures for NLP

from [Collobert et al., 2011]

1 represent the words/featureswith dense vectors(embeddings) by lookup table;

2 concatenate the vectors;

3 classify/match/rank withmulti-layer neural networks.

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 4: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Distributed Representation for Three Levels

Word LevelNNLMC&WCBOW & Skip-Gram

Sentence LevelNBOWSequence Models: Recurrent NN (LSTM and GRU),Paragraph VectorTopological Models: Recursive NN,Convolutional Models: DCNN

Document LevelNBOWHierachical Models two-level CNNSequence Models LSTM, Paragraph Vector

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 5: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Dense Feature CompositionLong Distance Dependence for LSTM

Quite Simple Feature Composition

Given two embeddings a and b,1 how to calculate their similarity/relevence/relation?

1 Concatenation

a⊕ b → ANN → output

2 Bilinear

aTMb → output

2 how to use them in classification task?1 Concatenation

a⊕ b → ANN → output

2 Sum/Average

a+ b → ANN → output

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 6: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Dense Feature CompositionLong Distance Dependence for LSTM

Focused Problem 1: Dense Feature Composition

Not “Really” Deep Learning in NLP

Most of the neural models is very shallow in NLP.

The major benefit is introducing dense representation.

The feature composition is also quite simple.

ConcatenationSum/AverageBilinear model

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 7: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Dense Feature CompositionLong Distance Dependence for LSTM

Focused Problem 1: Dense Feature Composition

Many previous works (half a year ago) have shown that increasingthe network depth cannot improve the performances of many NLPtasks.

Our Motivation:How to enhance the neural model by modeling compositions of thedense features?

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 8: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Dense Feature CompositionLong Distance Dependence for LSTM

Focused Problem 2: Long Distance Dependence for LSTM

h1 h2 h3 h4 · · · hT softmax

x1 x2 x3 x4 xT y

Unfolded LSTM for Text Classification

Drawback: long-term dependencies need to be transmittedone-by-one along the sequence.

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 9: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM

Convolutional Neural Tensor Network for Text Matching[Qiu and Huang, 2015]

question+f +

answer MatchingScore

Architecture of Convolutional Neural Tensor Network

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 10: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM

Recursive Neural Network (RecNN) [Socher et al., 2013]

a,Det red,JJ bike,NN

red bike,NP

a red bike,NP

Topological modelscompose the sentencerepresentation following agiven topological structureover the words.

Given a labeled binary parse tree,((p2 → ap1), (p1 → bc)), thenode representations arecomputed by

p1 = f (W

[bc

]),

p2 = f (W

[ap1

]).

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 11: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM

A variant of RecNN: Recursive Convolutional NeuralNetwork [Zhu et al., 2015]

Recursive neural network can only process the binary combinationand is not suitable for dependency parsing.

a,Det red,JJ bike,NN

Convolution

Pooling

a red bike,NN

a bike,NN red bike,NN

Recursive Convolutional Neural Network

introducing the convolution andpooling layers;

modeling the complicatedinteractions of the head word andits children.

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 12: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM

Gated Recursive Neural Network [Chen et al., 2015a]

Rainy

下 雨Day

天Ground

地 面Accumulated water

积 水

M E SB

DAG based Recursive NeuralNetwork

Gating mechanism

An relative complicated solution

GRNN models the complicatedcombinations of the features,which selects and preserves theuseful combinations via reset andupdate gates.

A similar model: AdaSent [Zhaoet al., 2015]

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 13: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM

GRNN Unit

Gate z

Gate rL Gate rR

hj-1(l-1) hj

(l-1)

hj^(l)

hj(l)

Two Gates

reset gate

update gate

Chinese Word Segmentation [Chenet al., 2015a]

Dependency Parsing [Chen et al.,2015c]

Sentence Modeling [Chen et al.,2015b]

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 14: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM

Multi-Timescale LSTM

(a) Fast-to-Slow Strategy (b) Slow-to-Fast Strategy

Figure: Two feedback strategies of our model. The dashed line shows thefeedback connection, and the solid link shows the connection at currenttime.

from [Liu et al., 2015]

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 15: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM

Unfolded Multi-Timescale LSTM with Fast-to-SlowFeedback Strategy

g31 g3

2 g33 g3

4 · · · g3T

g21 g2

2 g23 g2

4 · · · g2T softmax

g11 g1

2 g13 g1

4 · · · g1T y

x1 x2 x3 x4 xT

from [Liu et al., 2015]

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 16: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

Convolutional Neural Tensor NetworkRecursive Convolutional Neural NetworkGated Recursive Neural NetworkMulti-Timescale LSTM

LSTM for Sentiment Analysis

<s> Is this progress ? </s>

0.2

0.3

0.4

0.5

LSTMMT-LSTM

<s> He ’d create a movie better than this . </s>

0

0.2

0.4

0.6

0.8

LSTMMT-LSTM

<s> It ’s not exactly a gourmetmeal but the fare is fair , even coming from the drive . </s>

0

0.2

0.4

0.6

0.8

LSTMMT-LSTM

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 17: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

References I

Xinchi Chen, Xipeng Qiu, Chenxi Zhu, and Xuanjing Huang. Gated recursiveneural network for Chinese word segmentation. In Proceedings of AnnualMeeting of the Association for Computational Linguistics, 2015a.

Xinchi Chen, Xipeng Qiu, Chenxi Zhu, Shiyu Wu, and Xuanjing Huang.Sentence modeling with gated recursive neural network. In Proceedings ofthe Conference on Empirical Methods in Natural Language Processing,2015b.

Xinchi Chen, Yaqian Zhou, Chenxi Zhu, Xipeng Qiu, and Xuanjing Huang.Transition-based dependency parsing using two heterogeneous gatedrecursive neural networks. In Proceedings of the Conference on EmpiricalMethods in Natural Language Processing, 2015c.

Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, KorayKavukcuoglu, and Pavel Kuksa. Natural language processing (almost) fromscratch. The Journal of Machine Learning Research, 12:2493–2537, 2011.

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group

Page 18: Recent Work on Deep Learning at Fudan NLP GroupACML2015.pdf · 2020. 9. 4. · Neural Models for NLP Our Focused Problems Our Recent Work References General Neural Architectures for

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Neural Models for NLPOur Focused Problems

Our Recent WorkReferences

References II

PengFei Liu, Xipeng Qiu, Xinchi Chen, Shiyu Wu, and Xuanjing Huang.Multi-timescale long short-term memory neural network for modellingsentences and documents. In Proceedings of the Conference on EmpiricalMethods in Natural Language Processing, 2015.

Xipeng Qiu and Xuanjing Huang. Convolutional neural tensor networkarchitecture for community-based question answering. In Proceedings ofInternational Joint Conference on Artificial Intelligence, 2015.

Richard Socher, John Bauer, Christopher D Manning, and Andrew Y Ng.Parsing with compositional vector grammars. In In Proceedings of the ACLconference. Citeseer, 2013.

Han Zhao, Zhengdong Lu, and Pascal Poupart. Self-adaptive hierarchicalsentence model. arXiv preprint arXiv:1504.05070, 2015.

Chenxi Zhu, Xipeng Qiu, Xinchi Chen, and Xuanjing Huang. A re-rankingmodel for dependency parser with recursive convolutional neural network. InProceedings of Annual Meeting of the Association for ComputationalLinguistics, 2015.

Xipeng Qiu Recent Work on Deep Learning at Fudan NLP Group