Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP....

Post on 24-May-2020

1 views 0 download

Transcript of Knowledge-Guided Natural Language Processing · lead to further substantial progress in NLP....

Knowledge-GuidedNatural Language Processing

Zhiyuan Liu

NExT++ 2018 Workshop @ Nanjing

Natural Language Processing

2Advances in Natural Language Processing. Science 2015.

• NLP aims to understand human language

• Nature of NLP is structure prediction

Deep Learning for NLP

3Advances in Natural Language Processing. Science 2015.

Characteristics of DL

• Distributed representation

– Embeddings

– Dense, real-valued, low-dimensional vectors

• Hierarchical structure

– Corresponding to world hierarchy

– Generalization

• Data-driven approach

– Learn from large-scale training data

4

Challenges of DL for NLP

5

… we feel confident that more data and

computation, in addition to recent

advances in ML and deep learning, will

lead to further substantial progress in NLP.

However, the truly difficult problems of

semantics, context, and knowledge will

probably require new discoveries in

linguistics and inference.

Advances in Natural Language Processing. Science 2015.

Characteristics of Natural Language

• There are multiple-grained units in languages

• Words/Chinese characters are minimal units ofusages, but not minimal units of semantics

6

sense

word

phrase

sentence

document

web

char

Characteristics of Natural Language

• There are rich knowledge in text

7

World knowledge

word

phrase

sentence

document

web

char

LinguisticKnowledge

DomainKnowledge

Use Sememes to Break Word Boundary

• Lexical sememes: minimal units of semantics

8

sense

sememe

word

phrase

sentence

document

web

char

Linguistic Knowledge with Lexical Sememes

• Lexical sememes: minimal units of semantics

9

顶点(apex)

实体(entity)

角(angular)

界限(Boundary)

最(most)

高于正常(GreaterThanNormal)

degree

位置(location)

点(dot)

Sense1(acme) Sense2(vertex)

HowNet

• Linguistic knowledge base of lexical sememes, released in 1999

• Manually create ~2,000 sememes

• Manually annotat3 ~100,000 words with sememes

10

11

Data-DrivenDL

Symbol-basedSememe Knowledge

Guide

WORD EMBEDDING WITH SEMEMES

12

Word Embedding

• Learn low-dimensional semantic representations for words

13

word2vecTomas Mikolov et al. Distributed representations of words and phrases and their compositionality. NIPS 2013.

Word Embedding with Sememes

• 考虑HowNet的词义-义原标注信息,提升词义表示性能

14

Yilin Niu, Ruobing Xie, Zhiyuan Liu, Maosong Sun. Improved Word Representation Learning with Sememes. ACL 2017.

顶点(apex)

实体(entity)

角(angular)

界限(Boundary)

最(most)

高于正常(GreaterThanNormal)

degree

位置(location)

点(dot)

Sense1(acme) Sense2(vertex)

Sememe-Sense-Word Joint Model

Experiment Results

15

• The enhanced word embeddings perform betteron the tasks of analogy reasoning and wordsimilarity

Experiment Examples

• The model can conduct sense disambiguation based on sememes and contexts

16

LANGUAGE MODELING WITH SEMEMES

17

Language Modeling

• Modeling word sequence with Markov property

• Sememe-Driven Language Modeling

18

Sememe-Driven Neural Language Modeling

19

Experiment Results

• Sememe knowledge can significantly reduce the perplexity of language models

20

Experiment Examples

21

22

Data-DrivenDL

Symbol-basedSememe Knowledge

Prediction

Sememe Prediction

23

• Use both external and internal information topredict sememes

Huiming Jin, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin, Leyu Lin. Incorporating Chinese Characters of Words for Lexical Sememe Prediction. ACL 2018.

Experiment Results

• We propose several models for sememe predictionwith either internal and external information

24

Experiment Examples

• Both internal and external information can helpsememe prediction

25

Related Papers• Yihong Gu, Jun Yan, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin

and Leyu Lin. Language Modeling with Sparse Product of SememeExperts. EMNLP 2018.

• Fanchao Qi, Yankai Lin, Maosong Sun, Hao Zhu, Ruobing Xie, Zhiyuan Liu.Cross-lingual Lexical Sememe Prediction. EMNLP 2018.

• Huiming Jin, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin, Leyu Lin.Incorporating Chinese Characters of Words for Lexical Sememe Prediction. ACL2018.

• Xiangkai Zeng, Cheng Yang, Cunchao Tu, Zhiyuan Liu, Maosong Sun. Chinese LIWCLexicon Expansion via Hierarchical Classification of Word Embeddings withSememe Attention. AAAI 2018.

• Ruobing Xie, Xingchi Yuan, Zhiyuan Liu, Maosong Sun. Lexical Sememe Predictionvia Word Embeddings and Matrix Factorization. IJCAI 2017.

• Yilin Niu, Ruobing Xie, Zhiyuan Liu, Maosong Sun. Improved WordRepresentation Learning with Sememes. ACL 2017.

26

Open Source

• Packages for representation and acquisition of linguistic and world knowledge

• The projects have obtained 9400+ stars on github

https://github.com/thunlp

27

Summary

• Linguistic knowledge of lexical sememes can breakword boundary for language modeling, andimprove interpretability of neural language models

• DL methods for NLP can also be used forknowledge acquisition

28

NLP/AI = Data-Driven + Knowledge-Guide

Data DrivenDL

Symbol-basedKnowledge

Acquisition

Guide

THANKS!

29

liuzy@tsinghua.edu.cn