Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP....

29
Knowledge-Guided Natural Language Processing Zhiyuan Liu NExT++ 2018 Workshop @ Nanjing

Transcript of Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP....

Page 1: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Knowledge-GuidedNatural Language Processing

Zhiyuan Liu

NExT++ 2018 Workshop @ Nanjing

Page 2: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Natural Language Processing

2Advances in Natural Language Processing. Science 2015.

• NLP aims to understand human language

• Nature of NLP is structure prediction

Page 3: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Deep Learning for NLP

3Advances in Natural Language Processing. Science 2015.

Page 4: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Characteristics of DL

• Distributed representation

– Embeddings

– Dense, real-valued, low-dimensional vectors

• Hierarchical structure

– Corresponding to world hierarchy

– Generalization

• Data-driven approach

– Learn from large-scale training data

4

Page 5: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Challenges of DL for NLP

5

… we feel confident that more data and

computation, in addition to recent

advances in ML and deep learning, will

lead to further substantial progress in NLP.

However, the truly difficult problems of

semantics, context, and knowledge will

probably require new discoveries in

linguistics and inference.

Advances in Natural Language Processing. Science 2015.

Page 6: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Characteristics of Natural Language

• There are multiple-grained units in languages

• Words/Chinese characters are minimal units ofusages, but not minimal units of semantics

6

sense

word

phrase

sentence

document

web

char

Page 7: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Characteristics of Natural Language

• There are rich knowledge in text

7

World knowledge

word

phrase

sentence

document

web

char

LinguisticKnowledge

DomainKnowledge

Page 8: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Use Sememes to Break Word Boundary

• Lexical sememes: minimal units of semantics

8

sense

sememe

word

phrase

sentence

document

web

char

Page 9: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Linguistic Knowledge with Lexical Sememes

• Lexical sememes: minimal units of semantics

9

顶点(apex)

实体(entity)

角(angular)

界限(Boundary)

最(most)

高于正常(GreaterThanNormal)

degree

位置(location)

点(dot)

Sense1(acme) Sense2(vertex)

Page 10: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

HowNet

• Linguistic knowledge base of lexical sememes, released in 1999

• Manually create ~2,000 sememes

• Manually annotat3 ~100,000 words with sememes

10

Page 11: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

11

Data-DrivenDL

Symbol-basedSememe Knowledge

Guide

Page 12: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

WORD EMBEDDING WITH SEMEMES

12

Page 13: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Word Embedding

• Learn low-dimensional semantic representations for words

13

word2vecTomas Mikolov et al. Distributed representations of words and phrases and their compositionality. NIPS 2013.

Page 14: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Word Embedding with Sememes

• 考虑HowNet的词义-义原标注信息,提升词义表示性能

14

Yilin Niu, Ruobing Xie, Zhiyuan Liu, Maosong Sun. Improved Word Representation Learning with Sememes. ACL 2017.

顶点(apex)

实体(entity)

角(angular)

界限(Boundary)

最(most)

高于正常(GreaterThanNormal)

degree

位置(location)

点(dot)

Sense1(acme) Sense2(vertex)

Sememe-Sense-Word Joint Model

Page 15: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Experiment Results

15

• The enhanced word embeddings perform betteron the tasks of analogy reasoning and wordsimilarity

Page 16: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Experiment Examples

• The model can conduct sense disambiguation based on sememes and contexts

16

Page 17: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

LANGUAGE MODELING WITH SEMEMES

17

Page 18: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Language Modeling

• Modeling word sequence with Markov property

• Sememe-Driven Language Modeling

18

Page 19: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Sememe-Driven Neural Language Modeling

19

Page 20: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Experiment Results

• Sememe knowledge can significantly reduce the perplexity of language models

20

Page 21: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Experiment Examples

21

Page 22: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

22

Data-DrivenDL

Symbol-basedSememe Knowledge

Prediction

Page 23: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Sememe Prediction

23

• Use both external and internal information topredict sememes

Huiming Jin, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin, Leyu Lin. Incorporating Chinese Characters of Words for Lexical Sememe Prediction. ACL 2018.

Page 24: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Experiment Results

• We propose several models for sememe predictionwith either internal and external information

24

Page 25: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Experiment Examples

• Both internal and external information can helpsememe prediction

25

Page 26: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Related Papers• Yihong Gu, Jun Yan, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin

and Leyu Lin. Language Modeling with Sparse Product of SememeExperts. EMNLP 2018.

• Fanchao Qi, Yankai Lin, Maosong Sun, Hao Zhu, Ruobing Xie, Zhiyuan Liu.Cross-lingual Lexical Sememe Prediction. EMNLP 2018.

• Huiming Jin, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin, Leyu Lin.Incorporating Chinese Characters of Words for Lexical Sememe Prediction. ACL2018.

• Xiangkai Zeng, Cheng Yang, Cunchao Tu, Zhiyuan Liu, Maosong Sun. Chinese LIWCLexicon Expansion via Hierarchical Classification of Word Embeddings withSememe Attention. AAAI 2018.

• Ruobing Xie, Xingchi Yuan, Zhiyuan Liu, Maosong Sun. Lexical Sememe Predictionvia Word Embeddings and Matrix Factorization. IJCAI 2017.

• Yilin Niu, Ruobing Xie, Zhiyuan Liu, Maosong Sun. Improved WordRepresentation Learning with Sememes. ACL 2017.

26

Page 27: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Open Source

• Packages for representation and acquisition of linguistic and world knowledge

• The projects have obtained 9400+ stars on github

https://github.com/thunlp

27

Page 28: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

Summary

• Linguistic knowledge of lexical sememes can breakword boundary for language modeling, andimprove interpretability of neural language models

• DL methods for NLP can also be used forknowledge acquisition

28

NLP/AI = Data-Driven + Knowledge-Guide

Data DrivenDL

Symbol-basedKnowledge

Acquisition

Guide

Page 29: Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP. However, the truly difficult problems of semantics, context, and knowledge will probably

THANKS!

29

[email protected]