What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM...

72
What Will Search Engines be Changed by NLP Advancements Dr. Ming Zhou Microsoft Research Asia ICTIR-2018, Tianjin

Transcript of What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM...

Page 1: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

What Will Search Engines be Changed

by NLP Advancements

Dr. Ming Zhou

Microsoft Research Asia

ICTIR-2018, Tianjin

Page 2: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 3: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Offline (Crawler)

Multi-Media Data

MM Pages

Unstructured Data

Web Pages

Superfresh Discovery

Structured Data

Satori/Freebase

Semi-Structured Data

LinkedIn

Yelp

TripAdvisor

HEADER & SEARCH BOX

ANSWER

WEB RESULT

TASK & SOCIAL

PANE

AD

Online

Post-Web Phase

Whole Page Ranking & Suppression

Web Phase

Ten Blue Links

Superfresh

Instant Answer/Chat

Task & Social Pane

Advertisement

Pre-Web Phase

Context & Query Understanding

UX

Web Index

Page 4: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Pre-Web Phase

Word Breaking Spelling Synonym Dependency Parsing Entity Linking

Intent Classification Semantic Parsing Word/Sentence Embedding User Profiling Query Rewriting

Web Phase

Ten Bleu Links

• Bag-of-word model• Translation model• Click-through• Summarization/snippet• …

Instant Answer/Chat

• Knowledge-based QA• Visual/Video QA• MRC• Task-oriented bot• …

Task & Social Pane

• Knowledge graph• Entity linking• Social networks• User profiling• …

Super Fresh

• News recommendation• Personalized news feed • Event detection• News classification• …

Advertisement

• User intent detection• Slot tagging• Query-Ads matching• Ads keyword generation• …

Post-Web Phase

Whole Page Ranking Suppression Coherence Ranking Model Query Suggestion

Search Result Page

Page 5: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

• What are new advancements of NLP?

• What new changes will be brought to search?

Page 6: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

NLP fundamental

Word embedding

Word breaker, LM

Syntactic-semantic

analysis

Discourse analysis

IR

NLP core tech

MT

QA, QU and QG

Dialogue

Knowledge

engineering

Language

generation

NLP+

Search engine

Customer service

Business intelligence

Spoken assistant

MLBig dataUser modelling

Recommendation

systemIE

KB/common

sense

Computing

power

Page 7: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 8: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 9: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

61.34%

76.88%

62.42%

Grammar check82.65%

Machine Reading

23 CPS

Chatbot

69%Machine Translation

CoNLL-2014

CoNLL-10

JFLEGXiaoice SQuAD WMT-2017

Page 10: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

• QA

• Multi-lingual

• Multi-modal

• MRC

• Personalized recommendation

Page 11: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

• QA

• Multi-lingual

• Multi-modal

• MRC

• Personalized recommendation

Page 12: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

AutomaticResponse

DocTableKnowledge

Query

Edited Response

AutomaticResponse

AutomaticResponse

AutomaticResponse

Human-in-the-Loop (HI)

KBQA TableQA PassageQACommunityQA

Question Generatio

n

Image

VQA

Page 13: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 14: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Information Retrieval-based MethodSemantic Parsing-based Method

Barack Obama Place_of_Birth Honolulu

Semantic Parsing-based KBQA Answer Ranking-based KBQA

Barack Obama Place_of_Birth Honolulu

𝑥 𝑃𝑙𝑎𝑐𝑒_𝑜𝑓_𝐵𝑖𝑟𝑡ℎ 𝐵𝑎𝑟𝑎𝑐𝑘 𝑂𝑏𝑎𝑚𝑎

CCG SCFG DCS NN …

Feature NLG Sub-graph Embedding …

Page 15: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Natural Language Question Answer Entity CandidateKB

Question Feature Extraction Answer Feature Extraction

Question Features Answer Features

Learning-to-Rank Model

Page 16: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Ranking with NL description of the candidate answers (Berant and Liang, 2014)

Ranking Model(question-generated question)

which city was Obama born ?

date of birth Barack Obama city place of birth Barack Obama

DataOfBirth.BarackObama Type.City ∩ PlaceOfBirth.BarackObama

Predicate POS Question Generation Pattern

NP What TYPE is the NP of ENTITY ?

NP VP What NP is VP by ENTITY ?

… …

Page 17: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

(Bordes et al., 2014)

Who did Clooney marry in 1987?K. Preston

G. Clooney

Honolulu

1987

J. Travolta

Model

embedding matrix 𝑊 embedding matrix 𝑊dot product

embedding of 𝑞 embedding of 𝑎

answer candidate

binary encoding of 𝑞 binary encoding of 𝑎

Page 18: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

border ≔ (S\NP)/NP ∶

natural language syntax semantics

A CCG Rule Example

syntactic semantic

• Syntactic symbols: S, N, NP, ADJ and PP• Syntactic combinator: / and \ which specify

combination orders and directions

• Match natural language input

• λ-Calculus expression

Page 19: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

(Zettlemoyer and Collins, 2007; Kwiatkowski et al., 2011)

State borders New Mexico

NP (S\NP)/NP NP

S\NP

>

<S

Page 20: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Semantic parsing with encoder-decoder (Dong and Lapata, 2018)

Page 21: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Semantic parsing with multi-gates and syntax constraintsMSRA-NLC @ ACL 2018

TableQuestion SQL

𝑆𝐸𝐿𝐸𝐶𝑇,𝑊𝐻𝐸𝑅𝐸, 𝐶𝑂𝑈𝑁𝑇,𝑀𝐼𝑁,𝑀𝐴𝑋, 𝐴𝑁𝐷,>,<,=.

Pick # CFL Team Player Position College

27 Hamilton Tiger-Cats Connor Healy DB Wilfrid Laurier

28 Calgary Stampeders Anthony Forgone OL York

29 Toronto Argonauts Frank Hoffman DL York

Attention

Decoder

Encoder

<𝑆>column valueSQL columnSQL SQL SQL

SQL

value

column

SQL

value

column column

value

𝑆𝐸𝐿𝐸𝐶𝑇𝑊𝐻𝐸𝑅𝐸

𝑀𝐼𝑁

𝐶𝑂𝑈𝑁𝑇

𝑀𝐴𝑋

𝐴𝑁𝐷>

<

=

York

Wilfrid Laurier

York

𝒕 = 𝟎 𝒕 = 𝟐 𝒕 = 𝟔

𝑆𝐸𝐿𝐸𝐶𝑇 𝐶𝑂𝑈𝑁𝑇 𝐶𝐹𝐿 𝑇𝑒𝑎𝑚 𝑊𝐻𝐸𝑅𝐸 𝐶𝑜𝑙𝑙𝑒𝑔𝑒 = "𝑌𝑜𝑟𝑘"

SQL

Output

Page 22: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

?

▪ what is its population?

?

▪ New York City

▪ How about China

Relation Ellipsis

Entity Coreference

▪ Where did president of the United States born?

▪ New York City

▪ Where did he graduate from?

Subsequence Coreference

Page 23: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Neural Network-based Semantic Parsing for Conversational KBQA

(MSRA-NLC @ NIPS 2018)

Page 24: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Bing Table QA Bing Knowledge QA

Page 25: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 26: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

1. Big progress was made in question understanding, knowledge

graph and answer extraction. QA for simple question has been

successfully used in search engine.

2. Traditionally, grammar-based semantic parser was applied, but now

encoder-decoder approach becomes mainstreams. But I think that

no conclusion can be made on which method is the best.

3. Context-aware semantic parser, as a vital technologies for

conversational QA, has achieved promising result, but it needs more

efforts in data annotation, modeling and testing.

Page 27: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

• QA

• Multi-lingual

• Multi-modal

• MRC

• Personalized recommendation

Page 28: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 29: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Large Bilingual Corpus

Rich Resource MT(Zh-En, Fr-En, De-En,…)

Low Resource MT(En-He, Fr-Ro, He-Ro,…)

Small or No Bilingual Corpus

Page 30: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

𝒆=(Economic, growth, has, slowed, down, in, recent, years,.)

𝒇=( )经济, 发展, 变, 慢, 了, .近, 几年,

Encoder

Decoder

-0.2 0.9 -0.1 0.50.7 0.0 0.2

Page 31: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

𝒛𝒊

𝒖𝒊

𝒄𝒊

𝒉𝒋

Wo

rdSa

mp

leR

ecu

rren

tSt

ate

Inte

rnal

Se

man

tic

Sou

rce

Vec

tors

Attention Weight

En

cod

er

Atte

ntio

n

Deco

der

近, 几年,

Left-to-Right

Right-to-Left

𝒆=(Economic, growth, has, slowed, down, in, recent, years,.)

Bahdanau et al., ICLR, 2015

𝒇=( )

Page 32: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Bahdanau et al., ICLR, 2015

𝒛𝒊

𝒖𝒊

𝒄𝒊

𝒉𝒋

Wo

rdSa

mp

leR

ecu

rren

tSt

ate

Inte

rnal

Se

man

tic

Sou

rce

Vec

tors

En

cod

er

Atte

ntio

nD

eco

der

发展, 变, 慢, 了, .近, 几年,

⨀ Attention Weight ⊕

经济,

𝒆=(Economic, growth, has, slowed, down, in, recent, years,.)

𝒇=( )

Page 33: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Human-Parity results on WMT 2017

24.0

24.5

25.0

25.5

26.0

26.5

27.0

27.5

28.0

28.5

26.38(Sogou, Ensemble)

BLEU (%)

25.57(Back Translation)

24.2(Transformer Baseline)

26.51(Dual Learning)

27.71(Joint Training)

26.91(Agreement

Regularization)

28.46(System Combination)

27.40(Deliberation Nets)

Page 34: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

News translation results

Source input 他 的 职业 生涯 如 过 山 车 一般 。

NMT output It has been a rollercoaster ride .

Human reference His career is like a roller coaster.

Source input 有线索人士 请 拨打 旧金山 警察局 举报 电话 4 15- 575 - 44 44 。

NMT output For clues, call the San Francisco Police Department at 415-575 - 4444.

Human reference Anyone with information is asked to call the SFPD Tip Line at 415-575-4444 .

• Sampled from WMT2017 Chinese-English task

Source input 霍夫 施泰特尔 表示 : " 这将由检察官来确定 " 。

NMT output " That 's what the prosecutor must determine , " said Hofstetter .

Human reference Mr Hoff Steitel said: "It will be up to the prosecutors to determine.

Page 35: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

News stream data

History Current

Chs

Enu

特金会

the

Trump-Kim

meeting

Burst distribution

Page 36: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

𝑋: Chinese

𝑌: English

𝑍: Japanese

𝑌𝑋

𝑍

Language 𝑍 is the hidden space between language 𝑋 and language 𝑌.

EM training is leveraged for fine tuning.

𝑝 𝑦 𝑥 =

𝑧∈𝑍

𝑝 𝑧 𝑥 𝑝(𝑦|𝑧)

𝑝 𝑥 𝑦 =

𝑧∈𝑍

𝑝 𝑧 𝑦 𝑝(𝑥|𝑧)

𝑌

𝑋

𝑍

Chinese

English

Japanese

𝑝 𝑦 𝑥

𝑝 𝑧 𝑦

𝑥𝑧,

𝑥𝑦 𝑦 𝑧

Joint Training NMT

Word-based MT

Cross-Lingual Embedding

Page 37: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

1. Amazing progress in rich language translation at single-sentence

level, such as news translation.

2. We have witnessed some progress in low-resource language

translation, but it is still at very early stage. I think that it will be the

most important topic in NMT.

3. We need to develop better models for term translation and query

translation.

Page 38: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

• QA

• Multi-lingual

• Multi-modal

• MRC

• Personalized recommendation

Page 39: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) evaluates algorithms for object detection and image classification at large scale.

Image

Ground Truth

Predication

Page 40: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 41: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Video captioning

A car is running A man is cutting

a piece of meatA man is performing

on a stage

A man is riding

a bike

A man is singing A panda is walking A woman is riding

a horse

A man is flying in a field

Page 42: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

What sport is the boy playing?

Visual QA System

question

Image

baseball

answer

Given an image and a related question, predict the most possible answer

VQA dataset (Agrawal et al., 2016), which contains ∼0.25M images, ∼0.76M questions, and ∼10M answers.

Page 43: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Answer Label

“yes” 0

“baseball” 1

“brown” 0

“horse” 0

“grass” 0

“What sport is the boy playing?” RNN

CNN

SoftMaxFusion

Encode Fusion Decode

Attention

Attention

The size of answer candidates is ~3,000.

Page 44: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

• Image mis-understanding due to relation missing

Question: what is the girl holding?Ground Truth Answer: racketPredicted Answer: [hand: 0.630] [racket: 0.21] [cone: 0.072] [mouse: 0.004]

If we know (girl, hold, racket) is valid, and (girl, hold, hand) is not valid, we probably can infer “racket” as a better answer.

Page 45: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

girl by cone

girl holding racketgirl is standing

racket is yellow

cone is orange

“what is the girl holding?” RNN

CNN

Fusion SoftMax

Object Classifier

Relation Classifier

Subject Classifier

Lu et al. R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering. KDD, 2018.

Training data comes from Visual

Genome dataset (Krishna et al., 2017)

Page 46: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

(A) Context-aware Attention

(B)Relation Mining (C)Fact-aware Attention

(D)Joint Learning“what is the

girl holding?”RNN

CNN

Attention

Relation Mining

Facts Attention

<girl, holding, racket>

Fusion Fusion SoftMax “racket”

Lu et al. R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering. KDD, 2018.

By introducing knowledge, we obtain +1.3 improvement (acc@1 ) on VQA dataset, and +2.5 improvement (acc@1) on COCO QA dataset.

Page 47: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 48: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 49: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Image query: Search Engine

Page 50: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 51: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

1. Recent progress of ImageNet and image captioning inspires

researchers to conduct research on multi-modal understanding and

search.

2. VQA is still at very early stage. There are many challenging topics

such as use of knowledge and common sense.

3. There are a few applications in search engine but all are very simple.

We expect that with the technologies continually advance, new

scenarios will appear.

Page 52: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

• QA

• Multi-lingual

• Multi-modal

• MRC

• Personalized recommendation

Page 53: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Machine reading comprehension

Passage (P) Question (Q) Answer (A)+

Tesla later approached Morgan to ask for more funds to build a more powerful transmitter. When asked where all the money had gone, Tesla responded by saying that he was affected by thePanic of 1901, which he (Morgan) had caused. Morgan was shocked by the reminder of his part in the stock market crash and by Tesla’s breach of contract by asking for more funds.

P

Q A Panic of 1901On what did Tesla blame for the loss ofthe initial money?

Read a document (passage) and then answer questions about it

Page 54: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

queryanswer

passage

Dataset # of <question, passage>

pairs

Training 87,599

Dev 10,570

Test (not available

to participants)

> 10K

ImageNet style competition for machine readingcomprehension

Best Resource Paper in EMNLP 2016

Page 55: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Progress of MRC experiments

74.520

75.860

76.920

77.68877.845

78.70678.842 78.926

79.083

81.003

81.685

82.136

82.65082.440 82.5

MSRA

2016.12.6

MSRA

2017.1.20

MSRA

2017.3.7

MSRA

2017.7.2

iFLYTEK

2017.7.25

Salesforce

2017.8.16

Microsoft

Business AI

2017.9.20

MSRA

2017.10.13

iFLYTEK/HIT

2017.10.17

AI2

2017.11.17

MSRA

2017.11.21

MSRA

2017.12.18

MSRA

2018.1.3

Alibaba iDST

2018.1.5

iFLYTEK/HIT

2018.1.22

Human EM Performance: 82.304

Best System EM Scores on SQuAD Machine Reading Comprehension Dataset (Dec. 6, 2016-Jan. 26, 2018)

Surpass Human EM [2018.1.3]

Page 56: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

e.g., ELMo – Embeddings from Language Models

Page 57: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

Passage selection

Answer span

Page 58: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 59: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 60: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 61: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

1. Recently we’ve sees a big progress in MRC, such as SQuAD

evaluation. End-end training, pre-trained models, contextualized

vectors from other tasks contribute to the growth of MRC.

2. MRC will significantly change search engine with better relevance

and accuracy. New scenarios such as manual search are promising.

3. Further developing MRC needs contextual inference supported by

knowledge and common sense. This would be very challenging and

exciting.

Page 62: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

• QA

• Multi-lingual

• Multi-modal

• MRC

• Personalized recommendation

Page 63: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 64: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 65: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

gifts for classmates

cool math games

mickey mouse cartoon

shower chair for elderly

presbyopic glasses

costco hearing aids

groom to bride gifts

tie clips

philips shaver

lipstick color chart

womans ana blouse

Dior Makeup

Age Inference Gender Inference

Page 66: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 67: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

user

Item

(news, blogs, videos)

Feed

User Modeling

Personalization

Text Understanding in User and Item Modeling Knowledge Aware Recommendation Text Generation

Page 68: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase
Page 69: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

1. Personalized recommendation has become the strategically

important for search companies.

2. There are many interesting research topics in user modelling with

various user behavior data, timely recommendation of user-like

contents, providing personalized recommendation comments.

Page 70: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

• QA

• Multi-lingual

• Multi-modal

• MRC

• Personalized recommendation

With new advancements of NLP, what new changes will be brought

to search?

Page 71: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

• Knowledge acquisition and representation

• word/sentence embedding, commonsense knowledge, specific domain

knowledge, knowledge graph

• New learning methods

• Multi-task and transfer learning, reinforcement learning, semi and

unsupervised learning for low-resource tasks, reasoning for MRC

• Context modeling

• Multi-turn modeling, context-aware semantic parser, dialogue system

• New search modality

• Conversational search, multi-modal search

• Search results generation and summarization

• Auto-generation of a comprehensive report for certain type of queries

• Feeds

• User modelling, content generation, recommendation, comments

Page 72: What Will Search Engines be Changed by NLP Advancements · Offline (Crawler) Multi-Media Data MM Pages Unstructured Data Web Pages Superfresh Discovery Structured Data Satori/Freebase

A big thanks go to my colleagues and students in MSRA who are

working on various NLP tasks mentioned in this talk, especially Nan

Duan, Shujie Liu, Dongdong Zhang, Furu Wei and Xing Xie.