Deep Learning for Question Answering - Meetupfiles.meetup.com/7616132/DC-NLP-2015-10 Mohit...

50
Deep Learning for Question Answering Mohit Iyyer 1

Transcript of Deep Learning for Question Answering - Meetupfiles.meetup.com/7616132/DC-NLP-2015-10 Mohit...

Deep Learning for Question AnsweringMohit Iyyer

1

2

Who wrote the song “Kiss from a Rose”?

Question Analysis: POS/Parsing/NER

Query Formulation/ Template Extraction

Knowledge Base Search/ Candidate Answer Generation

Answer Type Selection

Evidence Retrieval/ Candidate Scoring

Final Ranking

Seal

3

Neural Network

External Knowledge

Classifier

Who wrote the song “Kiss from a Rose”?

Seal

Can we replace all of these modules with a single neural network?

Outline• Briefly: deep learning + NLP basics

• Factoid QA

• Reasoning-based QA

• Visual QA

• Future directions!

4

Neural Networks for NLP

5

Let’s start with words• Represent words by low-dimensional vectors called

embeddings

• e.g., president [0.23, 1.3, -0.3, 0.43]

6

Computing Vectors for Questions• How do we compose word embeddings into vectors

that capture the meanings of questions?

Who wrote Macbeth ? g( ) =

7

Neural Net!

Recurrent Neural Networks

Who directed Predator ?c1 c2 c3 c4

8

softmax: predict answer

h1 = f(W

. . .c1

�) h2 = f(W

h1

c2

�) h3 = f(W

h2

c3

�) h4 = f(W

h3

c4

�)

More complex variants: LSTMs, GRUs

Hidden layer

Recursive Neural Networks

• g can also depend on a parse tree of the input text sequence

Who directed ?c1 c2 c3 c4

softmax: predict answer

9

Predator

h1 = f(W

c2c3

�)

h2 = f(W

h1

c4

�)

h3 = f(W

c1h2

�)

Deep Averaging Networks

Who directed Predator ?c1 c2 c3 c4

av =4X

i=1

ci4

softmax: predict answer

z1 = f(W1 · av)

z2 = f(W2 · z1)

10

Softmax Answer Classification• Multinomial logistic regression

• Output is a distribution over a finite set of answers

• Later on: a max-margin answer ranking approach can yield better results

11

yp = softmax(Wans · hq)

softmax(q) =exp q

Pkj=1 exp qj

How do we train these models?

• Model parameters learned through variants of backpropagation (Rumelhart et al., 1986; Goller and Kuchler, 1996) given QA pairs as input

• In theory, use the chain rule to compute partial derivatives of the error function with respect to every parameter

• In practice, use Theano (or Torch) and never have to compute any derivatives by hand!

12

Application 1: Quiz Bowl

13

Factoid QA• Given a description of an entity, identify the person,

place, or thing discussed.

• Neural nets never previously applied to this task

• Traditionally approached using information retrieval, querying huge knowledge bases for the answer

14

Quiz BowlThis creature has female counterparts named Penny and Gown.

15

This creature appears dressed in Viking armor and carrying an ax when he is used as the mascot of PaX, a least privilege protection patch.This creature’s counterparts include Daemon on the Berkeley Software Distribution, or BSD.For ten points, name this mascot of the Linux operating system, a penguin whose name refers to formal male attire.

Answer: Tux

16

Neural Network Classifier

Identify this mascot of Linux… Tux

Simple Approach!

Two Neural Models

• Dependency-tree recursive neural network (DT-RNN)

• Deep averaging network (DAN)

• Both models are initialized with pretrained word2vec embeddings and have the same hidden layer dimensionality for fair comparison

17

Experimental Datasets

• History: 4,415 QA pairs with 16,895 sentences and 451 unique answers

• Literature: 5,685 QA pairs with 21,549 sentences and 595 unique answers

18

Choosing an Error Function

• Answers can appear as part of question text (e.g., a question on World War II might mention the Battle of the Bulge and vice versa)

• Instead of using a softmax output layer, can we take advantage of these co-occurrences by modeling answers and questions in the same vector space?

19

Max-Margin Objective• Replace softmax output layer with a contrastive max-

margin function

• Given a question q with correct answer a and an incorrect answer b, the loss is

20

max(0, 1� xa · hq + xb · hq)

Geometric Intuition

21

Q: name this fascist leader of Italy

Benito Mussolini

Rihanna

Walt Disney

Maximize inner productMinimize inner product

Experiments

22

History Literature

Model Pos 1 Pos 2 Full Pos 1 Pos 2 Full

BOW-DT 35.4 57.7 60.2 24.4 51.8 55.7IR 37.5 65.9 71.4 27.4 54.0 61.9DAN 46.4 70.8 71.8 35.3 67.9 69.0DT-RNN 47.1 72.1 73.7 36.4 68.2 69.1

DAN is 20 times faster to train than DT-RNN

Learning a Vector Space

23

TSNE-1

TSNE-2

Wars, rebellions, and battlesU.S. presidentsPrime ministersExplorers & emperorsPoliciesOther

tammany_hall

calvin_coolidge

lollardy

fourth_crusade

songhai_empire

peace_of_westphalia

inca_empire

atahualpa

charles_sumner

john_paul_jones

wounded_knee_massacre

huldrych_zwingli

darius_i

battle_of_ayacucho

john_cabotghana

ulysses_s._grant

hartford_conventioncivilian_conservation_corps

roger_williams_(theologian)

george_h._pendleton

william_mckinley

victoria_woodhull

credit_mobilier_of_america_scandal henry_cabot_lodge,_jr.

mughal_empire

john_marshall

cultural_revolution

guadalcanal

louisiana_purchase

night_of_the_long_knives

chandragupta_maurya

samuel_de_champlain

thirty_years'_war

compromise_of_1850

battle_of_hastings

battle_of_salamis

akbar

lewis_cass

dawes_plan

hernando_de_soto

carthage

joseph_mccarthy

maine

salvador_allende

battle_of_gettysburg

mikhail_gorbachev

aaron_burr

equal_rights_amendment

war_of_the_spanish_succession

coxey's_army

george_meade

fourteen_points

mapp_v._ohiosam_houston

ming_dynasty

boxer_rebellion

anti-masonic_party

porfirio_diaz

treaty_of_portsmouth

thebes,_greece

golden_horde

francisco_i._madero

hittites

james_g._blaineschenck_v._united_states

caligula

william_walker_(filibuster)

henry_vii_of_england

konrad_adenauer

kellogg-briand_pact

battle_of_culloden

treaty_of_brest-litovsk

william_penn

a._philip_randolph

henry_l._stimson

whig_party_(united_states)

caroline_affairclarence_darrow

whiskey_rebellion

battle_of_midway

battle_of_lepanto

adolf_eichmann

georges_clemenceau

battle_of_the_little_bighornpontiac_(person)

black_hawk_war

battle_of_tannenberg

clayton_antitrust_act

provisions_of_oxford

battle_of_actium

suez_crisis

spartacus

dorr_rebellion

jay_treaty

triangle_shirtwaist_factory_fire

kamakura_shogunate

julius_nyerere

frederick_douglass

pierre_trudeau

nagasaki

suleiman_the_magnificent

falklands_war

war_of_devolution

charlemagne

daniel_boone

edict_of_nantes

harry_s._truman

shaka

pedro_alvares_cabral

thomas_hart_benton_(politician)

battle_of_the_coral_sea

peterloo_massacre

battle_of_bosworth_field

roger_b._taney

bernardo_o'higgins

neville_chamberlain

henry_hudson

cyrus_the_great

jane_addams

rough_riders

james_a._garfield

napoleon_iii

missouri_compromise

battle_of_leyte_gulf

ambrose_burnside

trent_affair

maria_theresa

william_ewart_gladstone

walter_mondale

barry_goldwaterlouis_riel

hideki_tojo

marco_polo

brian_mulroney

truman_doctrine

roald_amundsen

tokugawa_shogunate

eleanor_of_aquitaine

louis_brandeis

battle_of_trenton

khmer_empire

benito_juarez

battle_of_antietam

whiskey_ring

otto_von_bismarck

booker_t._washington

battle_of_bannockburneugene_v._debs

erie_canal

jameson_raid

green_mountain_boys

haymarket_affair

finland

fashoda_incident

battle_of_shiloh

hannibal

john_jay

easter_rising

jamaica

brook_farm

umayyad_caliphate

muhammad

francis_drake

clara_barton

shays'_rebellionverdun

hadrianvyacheslav_molotovoda_nobunaga

canossa

samuel_gompers

battle_of_bunker_hillbattle_of_plassey

david_livingstone

solonpericles

tang_dynasty

teutonic_knights

second_vatican_council

alfred_dreyfus

henry_the_navigator

nelson_mandela

peasants'_revolt

gaius_marius

getulio_vargas

horatio_gates

john_t._scopes

league_of_nations

first_battle_of_bull_run

alfred_the_great

leonid_brezhnev

cherokee

long_march

emiliano_zapata

james_monroe

woodrow_wilson

vandals

william_henry_harrison

battle_of_puebla

battle_of_zama

justinian_i

thaddeus_stevens

cecil_rhodes

kwame_nkrumah

diet_of_worms

george_armstrong_custer

battle_of_agincourt

seminole_wars

shah_jahan

amerigo_vespucci

john_foster_dulles

lester_b._pearson

oregon_trail

claudius

lateran_treaty

chester_a._arthur

opium_wars

treaty_of_utrechtknights_of_labor

alexander_hamilton

plessy_v._ferguson

horace_greeley

mary_baker_eddy

alexander_kerensky

jacquerie

treaty_of_ghentbay_of_pigs_invasion

antonio_lopez_de_santa_anna

great_northern_war

henry_i_of_england

council_of_trent

chiang_kai-shek

samuel_j._tilden

fidel_castro

wilmot_proviso

yuan_dynasty

bastille

benjamin_harrison

war_of_the_austrian_successioncrimean_war

john_brown_(abolitionist)

teapot_dome_scandal

albert_b._fall

marcus_licinius_crassus

earl_warren

warren_g._harding

gunpowder_plot

homestead_strike

samuel_adams

john_peter_zenger

thomas_paine

free_soil_party

st._bartholomew's_day_massacre

arthur_wellesley,_1st_duke_of_wellington

charles_de_gaulle

leon_trotsky

hugh_capet

alexander_h._stephens

haile_selassie

william_h._seward

rutherford_b._hayes

safavid_dynasty

muhammad_ali_jinnah

kulturkampf

maximilien_de_robespierre

hubert_humphrey

luddite

hull_house

philip_ii_of_macedon

guelphs_and_ghibellines

byzantine_empire

albigensian_crusade

diocletian

fort_ticonderoga

parthian_empire

charles_martel

william_jennings_bryan

alexander_ii_of_russia

ferdinand_magellan

state_of_franklin

ivan_the_terrible

martin_luther_(1953_film)

millard_fillmore

francisco_franco

aethelred_the_unready

ronald_reagan

benito_mussolini

henry_clay

kitchen_cabinet

black_hole_of_calcutta

ancient_corinth

john_wilkes_booth

john_tyler

robert_walpole

huey_long

tokugawa_ieyasu

thomas_nast

nikita_khrushchev

andrew_jackson

portugal

labour_party_(uk)

monroe_doctrine

john_quincy_adams

congress_of_berlin

tecumseh

jacques_cartier

battle_of_the_thames

spanish_civil_war

ethiopia

fugitive_slave_laws

john_a._macdonald

council_of_chalcedon

pancho_villa

war_of_the_pacific

george_wallace

susan_b._anthony

marcus_garvey

grover_clevelandjohn_hay

george_b._mcclellan

october_manifesto

vitus_bering

john_hancock

william_lloyd_garrison

platt_amendment

mary,_queen_of_scots

first_triumvirate

francisco_vasquez_de_coronado

margaret_thatcher

sherman_antitrust_act

hanseatic_league

henry_morton_stanley

july_revolution

stephen_a._douglas

xyz_affair

jimmy_carter

francisco_pizarro

kublai_khan

vasco_da_gama

sparta

battle_of_caporetto

ostend_manifesto

mustafa_kemal_ataturk

peter_the_great

gang_of_four

battle_of_chancellorsville

david_lloyd_george

cardinal_mazarin

embargo_act_of_1807

brigham_young

charles_lindbergh

hudson's_bay_company

attila

paris_commune

jefferson_davis

amelia_earhart

mali_empire

adolf_hitler

benedict_arnold

camillo_benso,_count_of_cavour

meiji_restoration

black_panther_party

mark_antony

franklin_pierce

molly_maguires

zachary_taylor

han_dynasty

adlai_stevenson_ii

james_k._polk

douglas_macarthur

boston_massacre

toyotomi_hideyoshi

greenback_party

second_boer_warthird_crusade

james_buchanan

john_sherman

george_washington

wars_of_the_roses

atlantic_charter

eleanor_roosevelt

congress_of_vienna

john_wycliffe

winston_churchill

emilio_aguinaldo

miguel_hidalgo_y_costilla

second_bank_of_the_united_states

council_of_constance

seneca_falls_convention

first_crusade

spiro_agnew

taiping_rebellion

mao_zedong

paul_von_hindenburg

albany_congress

jawaharlal_nehru

battle_of_blenheim

ethan_allen

antonio_de_oliveira_salazar

herbert_hoover

pepin_the_short

indira_gandhi

william_howard_taftthomas_jefferson

gamal_abdel_nasser

oliver_cromwell

salmon_p._chase

battle_of_austerlitz

benjamin_disraeli

gadsden_purchase

girolamo_savonarola

treaty_of_tordesillas

battle_of_marathon

elizabeth_cady_stanton

battle_of_kings_mountainchristopher_columbus

william_the_conqueror

battle_of_trafalgar

charles_evans_hughes

cleisthenes

william_tecumseh_sherman

mobutu_sese_seko

prague_spring

babur

peloponnesian_war

jacques_marquette

nero

paraguay

hyksos

martin_van_buren

bonus_army

charles_stewart_parnell

edward_the_confessor

bartolomeu_dias

salem_witch_trials

battle_of_the_bulge

john_adams

maginot_line

henry_cabot_lodge

giuseppe_garibaldi

daniel_webster

john_c._calhoun

treaty_of_waitangi

zebulon_pike

genghis_khan

calvin_coolidgewilliam_mckinley

james_monroe

woodrow_wilson

william_henry_harrison

benjamin_harrison

millard_fillmore

ronald_reagan

john_tyler andrew_jacksonjohn_quincy_adams

grover_cleveland

jimmy_carter

franklin_pierce

zachary_taylor

james_buchanan

george_washington

herbert_hooverwilliam_howard_taft

thomas_jefferson

martin_van_buren

john_adams

Exact String Matches• One current weakness of neural models

• Practical solution: train language model on original source material / Wikipedia and combine with output of neural network

24

In this poem, the narrator meets a “traveller from an antique land” who tells of a statue with a “wrinkled lip,

and sneer of cold command”.

QA: Man vs. Machine• Scaled up a DAN to handle ~100k Q/A pairs with

~5k unique answers! Also added thousands of Wikipedia sentence/page-title pairs

• To play against humans, we need to decide not only what answer to give but also when we are confident enough to buzz in. • Another classifier re-ranks the top 200 guesses of

the DAN using language model features to decide whether to buzz on any of them or wait for more clues

25

V1: tied team of ex-Jeopardy champions 200-200

26

V2: defeated Ken Jennings 300-160

27

Code available!

DT-RNN code: cs.umd.edu/~miyyer/qblearn DAN code: github.com/miyyer/dan Full quiz bowl system code: github.com/miyyer/qb Video of Ken Jennings match: youtu.be/kTXJCEvCDYk

1. Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daumé III. A Neural Network for Factoid Question Answering over Paragraphs. EMNLP 2014.

2. Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daumé III. Deep Unordered Composition Rivals Syntactic Methods for Text Classification. ACL 2015.

28

Application 2: Reasoning-based QA

29

30

John moved to the bedroom. Mary grabbed the football there. Sandra journeyed to the bedroom. Sandra went back to the hallway. Mary moved to the garden. Mary journeyed to the office.

Where is the football?

Naïve Neural Approach

31

softmax: predict answer

image: http://smerity.com/articles/2015/keras_qa.html

Problems• Doesn’t scale to long / complex question types

• RNNs/LSTMs are very bad at remembering facts from the distant past!

• Solution: add an external memory component that learns to store important facts and reason about them

32

Dynamic Memory Networks• Collaboration with Richard Socher and colleagues

from MetaMind

• Extends simple RNNs with an iterative attention mechanism that focuses on one fact at a time and enables transitive reasoning

33

Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, and Richard Socher. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. NIPS Deep Learning Symposium, 2015.

34

1. Compute vector si for every sentence in input and vector q for the question using recurrent network A

2. Compute an attention score ai for every sentence

3. Compute an episodic memory mt by weighting each si with its corresponding ai and passing them through another recurrent network B

4. Repeat until network B outputs a “finished reading” signal

5. Feed final episodic memory m to a softmax layer to predict the answer

ai = G(si,mt�1, q)

35

John moved to the bedroom. Mary grabbed the football there.

Sandra journeyed to the bedroom. Sandra went back to the hallway.

Mary moved to the garden. Mary journeyed to the office.

Where is the football?

m1

0.10.10.00.00.80.0

0.20.60.00.00.20.0

m2

0.70.20.00.00.10.0

m3m3

softmax: predict answer

Evaluation: FB bAbi• 20 very simple tasks (e.g., counting, basic deduction,

induction, coreference)

• DMNs solve 18 out of 20 tasks with over 95% accuracy, comparable to other baselines that use hand-engineered features (e.g., n-grams, positional features)

• Can also be applied to many other NLP tasks (what is the sentiment of this sentence? what is this sentence’s translation in French?)

36

Application 3: Visual QA

37

38

• Is this truck considered “vintage”?

• Does the road look new? • What kind of tree is

behind the truck?

VisualQA Dataset• collaboration between Virginia Tech and Microsoft

Research

• questions were created and answered by Amazon Turkers (800k questions on 250k images)

39

“We have built a smart robot. It understands a lot about images. It can recognize and name all

the objects, it knows where the objects are, it can recognize the scene (e.g., kitchen, beach), people’s expressions and

poses, and properties of objects (e.g., color of objects, their texture). Your task is to stump this smart robot!”

Brief Aside: ConvNets

40

Convolutional Layers: slide a set of small

filters over the image

Pooling Layers: reduce dimensionality

of representation

image: https://cs231n.github.io/convolutional-networks/

41

( )ConvNet =

softmax: predict ‘truck’

Naïve VisualQA• i = ConvNet(image) > use an existing network trained

for image classification and freeze weights

• q = RNN(question) > learn weights

• answer = softmax([i;q])

42

Visual Attention• Use the question representation q to determine

where in the image to look

43

How many benches are shown?

44

How many benches are shown?

0.2

0.20.3

0.1

0.05

0.05

0.05 0.05 0.0

0.00.0

0.0 0.0 0.0

0.0

0.0 0.0 0.0

0.00.00.00.00.00.0

softmax: predict answer

attention over final convolutional layer in network: 196 boxes, captures

color and positional information

Issues• Visual attention is more complicated than textual

attention; requires many more QA pairs than are currently available

• focusing on more than one “box” at a time is difficult for the current model; perhaps an iterative attention mechanism like the DMN’s can solve this problem

• Work in progress, full evaluation coming soon!

45

Closing Thoughts

46

Future Trends• Neural networks with attention mechanisms are

cutting-edge models with broad applications!

• With more data and bigger networks, we can begin to answer more complex questions

• Multi-task learning, such as a single model that learns to reason over both text and images

47

A Major Limitation• All of these networks generalize very poorly to new

facts or information at test-time, would fail at:

48

xxwf moved to the rfecs. dawas grabbed the gndsa there. gfdg journeyed to the klnmkb. gfdg went back to the aqqs. dawas moved to the mnsh. dawas journeyed to the taaaed.

Where is the gndsa?

Constants vs. Variables• Currently, every word in a question is represented

with an embedding.

• This doesn’t make much sense for numbers, proper names, or other entities

49

An amusement park sells 2 kinds of tickets. Tickets for children cost $1.50. Adult tickets cost $4. On a certain day, 278 people entered the park. On that same day the admission fees collected totaled $792.

How many children were admitted on that day?

Thanks! Questions?

And thanks to my advisors, Jordan Boyd-Graber at U. Colorado and Hal Daumé III at UMD, and to Richard

Socher and colleagues from MetaMind.

50