New Resources and Ideas for Semantic Parsing · Analogy with Compiler Design NL text Syntax...

92
New Resources and Ideas for Semantic Parsing Kyle Richardson Institute for Natural Language Processing (IMS) School of Computer Science, Electrical Engineering and Information Technology University of Stuttgart May 28, 2018 Collaborators: Jonas Kuhn (advisor, Stuttgart) and Jonathan Berant (work on ”polyglot semantic parsing”, Tel Aviv)

Transcript of New Resources and Ideas for Semantic Parsing · Analogy with Compiler Design NL text Syntax...

New Resources and Ideas for Semantic Parsing

Kyle Richardson

Institute for Natural Language Processing (IMS)

School of Computer Science, Electrical Engineering and Information TechnologyUniversity of Stuttgart

May 28, 2018

Collaborators: Jonas Kuhn (advisor, Stuttgart) and Jonathan Berant (work on”polyglot semantic parsing”, Tel Aviv)

Main Topic: Semantic Parsing

I Task: mapping text to formal meaning representations (ex., from Herzigand Berant (2017)).

Text: Find an article with no more than two authors.→

LF: Type.Article u R[λx .count(AuthorOf.x)] ≤ 2

”Machines and programs which attempt to answer Englishquestion have existed for only about five years.... Attemptsto build machine to test logical consistency date back to atleast Roman Lull in the thirteenth century... Only in recentyears have attempts been made to translate mechanicallyfrom English into logical formalisms...”

R.F. Simmons. 1965, Answering English Question by Computer: A Survey.Communications of the ACM

2

Main Topic: Semantic Parsing

I Task: mapping text to formal meaning representations (ex., from Herzigand Berant (2017)).

Text: Find an article with no more than two authors.→

LF: Type.Article u R[λx .count(AuthorOf.x)] ≤ 2

”Machines and programs which attempt to answer Englishquestion have existed for only about five years.... Attemptsto build machine to test logical consistency date back to atleast Roman Lull in the thirteenth century... Only in recentyears have attempts been made to translate mechanicallyfrom English into logical formalisms...”

R.F. Simmons. 1965, Answering English Question by Computer: A Survey.Communications of the ACM

2

Classical Natural Language Understanding

I Conventional pipeline model: focus on capturing deep inference andentailment.

input semList samples that containevery major element

databaseJsemK ={S10019,S10059,...}

1. Semantic Parsing

3. Reasoning

(FOR EVERY X /MAJORELT : T;(FOR EVERY Y /SAMPLE : (CONTAINS Y X);(PRINTOUT Y)))

2. Knowledge Representation

Lunar QA system of Woods (1973)

3

Classical Natural Language Understanding

input semList samples that containevery major element

databaseJsemK ={S10019,S10059,...}

1. Semantic Parsing

3. Reasoning

(FOR EVERY X /MAJORELT : T;(FOR EVERY Y /SAMPLE : (CONTAINS Y X);(PRINTOUT Y)))

2. Knowledge Representation

Sub-problem Problem Description1. Semantic Parsing Translating input to sem, input → sem2. Knowledge Representation Defining a sufficiently expressive sem language.3. Reasoning/Execution Going from sem to denotations in the real-word.

3

Why and How? Analogy with Compiler Design

NL text

Syntax

Logic/Semantics

Model

Programs

Translation

Interpretation

pos := i + rate * 60

Syntax :=

+

*

60id3

id2

id1

Semantics :=

+

*

int2real

60

id3

id2

id1

Code

MOVF id2, R2MULF #60.0, R2MOVF id2, R1ADDF R2, R1MOVF R1,id1

Programs

lex: id1 := id2 + id3 * 60

Translation

Generation

I NLU model is a kind of compiler, involves a transduction from NL to aformal (usually logical) language.

4

Data-driven Semantic Parsing and NLU

input semList samples that containevery major element

databaseJsemK ={S10019,S10059,...}

1. Semantic Parsing

3. Reasoning

(FOR EVERY X /MAJORELT : T;(FOR EVERY Y /SAMPLE : (CONTAINS Y X);(PRINTOUT Y)))

2. Knowledge Representation

I Data-driven NLU: Asks an empirical question: Can we learn NLUmodels from examples? Building a NL compiler by hand is hard....

5

Data-driven Semantic Parsing and NLU

input semList samples that containevery major element

databaseJsemK ={S10019,S10059,...}

1. Semantic Parsing

3. Reasoning

machine learning(FOR EVERY X /MAJORELT : T;(FOR EVERY Y /SAMPLE : (CONTAINS Y X);(PRINTOUT Y)))

2. Knowledge Representation

I Semantic Parser Induction: Learn semantic parser (weightedtransduction) from parallel text/meaning data, constrained SMT task.

I Resource Problem: Where does the parallel data come from, whatdo we learn from? Does not occur ’in the wild’

5

Talk Overview

Sub-problem Problem Description1. Semantic Parsing Translating input to sem, input → sem2. Knowledge Representation Defining a sufficiently expressive sem language.3. Reasoning/Execution Going from sem to denotations in the real-word.4. Resource Problem Finding/building parallel data for SP induction.

I Resource Problem: Using technical documentation as a parallel corpus(problems 1,4), from Richardson and Kuhn (2017b,a)

I Polyglot Modeling: Building semantic parsers from multiple datasets,”polyglot decoding” (problems 1,4), from Richardson et al. (2018)

I Learning from Entailment: Integrating entailment into a semanticparsing pipeline (problems 1,2,3), from Richardson and Kuhn (2016).

6

<Resource Problem>

7

Semantic Parsing and Parallel Data

input x What state has the largest population?

sem z (argmax (λx . (state x) λx . (population x)))

y

I Learning from LFs: Pairs of text x and logical forms z, D = {(x, z)i}ni ,

learn sem : x→ zI Modularity: Study the translation independent of other semantic issues.

I Resource issue: Finding parallel data, current lack of resources.

8

Semantic Parsing and Parallel Data

input x What state has the largest population?

sem z (argmax (λx . (state x) λx . (population x)))

y

I Learning from LFs: Pairs of text x and logical forms z, D = {(x, z)i}ni ,

learn sem : x→ zI Modularity: Study the translation independent of other semantic issues.

I Resource issue: Finding parallel data, current lack of resources.

8

Source Code and API Documentation

* Returns the greater of two long values** @param a an argument* @param b another argument* @return the larger of a and b* @see java.lang.Long#MAX VALUE*/public static Long max(long a, long b)

I Source Code Documentation: High-level descriptions of internalsoftware functionality paired with code.

I Idea: Treat as a parallel corpus (Allamanis et al., 2015; Gu et al., 2016;Iyer et al., 2016), or synthetic semantic parsing dataset.

9

Source Code and API Documentation

* Returns the greater of two long values** @param a an argument* @param b another argument* @return the larger of a and b* @see java.lang.Long#MAX VALUE*/public static Long max(long a, long b)

I Source Code Documentation: High-level descriptions of internalsoftware functionality paired with code.

I Idea: Treat as a parallel corpus (Allamanis et al., 2015; Gu et al., 2016;Iyer et al., 2016), or synthetic semantic parsing dataset.

9

Source Code as a Parallel Corpus

I Observation 1: Tight coupling between high-level text and code.

* Returns the greater of two long values** @param a an argument* @param b another argument* @return the larger of a and b* @see java.lang.Long#MAX VALUE*/public static Long max(long a, long b)

(ns ... clojure.core)

(defn random-sample"Returns items from coll with randomprobability of prob (0.0 - 1.0)"([prob] ...)([prob coll] ...))

I Function signatures: Header-like representations, containing functionname, (optionally typed) arguments, (optional) return value, namespace.

Returns the greater of two long values math.util Long max(long a, long b)

Returns items from coll with random... (core.random-sample prob coll)

10

Source Code as a Parallel Corpus

I Observation 1: Tight coupling between high-level text and code.

* Returns the greater of two long values** @param a an argument* @param b another argument* @return the larger of a and b* @see java.lang.Long#MAX VALUE*/public static Long max(long a, long b)

(ns ... clojure.core)

(defn random-sample"Returns items from coll with randomprobability of prob (0.0 - 1.0)"([prob] ...)([prob coll] ...))

I Function signatures: Header-like representations, containing functionname, (optionally typed) arguments, (optional) return value, namespace.

Returns the greater of two long values math.util Long max(long a, long b)

Returns items from coll with random... (core.random-sample prob coll)

10

Source Code as a Parallel Corpus

I Observation 1: Tight coupling between high-level text and code.

* Returns the greater of two long values** @param a an argument* @param b another argument* @return the larger of a and b* @see java.lang.Long#MAX VALUE*/public static Long max(long a, long b)

(ns ... clojure.core)

(defn random-sample"Returns items from coll with randomprobability of prob (0.0 - 1.0)"([prob] ...)([prob coll] ...))

I Function signatures: Header-like representations, containing functionname, (optionally typed) arguments, (optional) return value, namespace.

Returns the greater of two long values math.util Long max(long a, long b)

Returns items from coll with random... (core.random-sample prob coll)

10

Source Code: Many Formal Languages

I Observation 2: There are many languages, hence many datasets.

* Returns the greater of two long values** @param a an argument* @param b another argument* @return the larger of a and b* @see java.lang.Long#MAX VALUE*/public static Long max(long a, long b)

(ns ... clojure.core)

(defn random-sample"Returns items from coll with randomprobability of prob (0.0 - 1.0)"([prob] ...)([prob coll] ...))

# zipfile.py"""Read and write ZIP files"""

class ZipFile(object):

"""Class to open ... zip files."""

def write(filename,arcname,....):"""Put the bytes from filenameinto the archive under the name.."""

--| Mostly functions for reading andshowing RealFloat like valuesmodule Numeric

-- | Show non-negative Integral numbersin base 10.showInt :: Integral a => a -> ShowS

11

Source Code: Multilingual

I Observation 3: Many NLs, hence many multilingual datasets.

namespace ArrayIterator;

/** Appends values as the last element** @param value The value to append* @see ArrayIterator::next()*/public void append(mixed $value)

namespace ArrayIterator;

/** Ajoute une valeur comme dernier element** @param value La valeur a ajouter* @see ArrayIterator::next()*/public void append(mixed $value)

namespace ArrayIterator;

/** Dobavlyaet znachenie value, kakposledni element massiva.** @param value znachenie, kotoroe nuzhnodobavit’.* @see ArrayIterator::next()*/public void append(mixed $value)

namespace ArrayIterator;

/*/* Anade el valor como el ultimoelemento./** @param value El valor a anadir.* @see ArrayIterator::next()*/public void append(mixed $value)

12

Beyond raw pairs: Background Information

I Observation 4: Code collections contain rich amount of background info.

NAME : dappprofprofile user and lib function usage.

SYNOPSISdappprof [-ac..] .. -p PID | command

DESCRIPTION--a print all data--p PID examine the PID

EXAMPLESRun and examine the ‘‘df -h’’ command

dappprof command=‘‘df -h’’

Print elapsed time for PID 1871dappprof -p PID=1871

SEE ALSOdapptrace(1M), dtrace(1M), ...

namespace ArrayIterator;

/** Appends values as the last element** @param value The value to append* @see ArrayIterator::next()*/public void append(mixed $value)

I Descriptions: textual descriptions of parameters, return values, ...I Cluster information: pointers to related functions/utilities, ...I Syntactic information: function/code syntax

13

Resource 1: Standard Library DocumentationDataset #Pairs #Descr.Symbols#WordsVocab. Example Pairs (x, z), Goal: learn a function x → z

Java 7,183 4,804 4,072 82,696 3,721 x : Compares this Calendar to the specified Object.z : boolean util.Calendar.equals(Object obj)

Ruby 6,885 1,849 3,803 67,274 5,131 x : Computes the arc tangent given y and x.z : Math.atan2(y,x) → Float

PHPen 6,611 13,943 8,308 68,921 4,874 x : Delete an entry in the archive using its name.z : bool ZipArchive::deleteName(string $name)

Python 3,085 429 3,991 27,012 2,768 x : Remove the specific filter from this handler.z : logging.Filterer.removeFilter(filter)

Elisp 2,089 1,365 1,883 30,248 2,644 x : Returns the total height of the window.z : (window-total-height window round)

Haskell 1,633 255 1,604 19,242 2,192 x : Extract the second component of a pair.z : Data.Tuple.snd :: (a, b) -> b

Clojure 1,739 – 2,569 17,568 2,233 x : Returns a lazy seq of every nth item in coll.z : (core.take-nth n coll)

C 1,436 1,478 1,452 12,811 1,835 x : Returns current file position of the stream.z : long int ftell(FILE *stream)

Scheme 1,301 376 1,343 15,574 1,756 x : Returns a new port and the given state.z : (make-port port-type state)

Geoquery 880 – 167 6,663 279 x : What is the tallest mountain in America?z : (highest(mountain(loc 2(countryid usa))))

I Standard library documentation for 9+ programming languages, 7 naturallanguages, from Richardson and Kuhn (2017b).

14

Resource 2: Open source Python projectsProject # Pairs # Symbols # Words Vocab.scapy 757 1,029 7,839 1,576zipline 753 1,122 8,184 1,517biopython 2,496 2,224 20,532 2,586renpy 912 889 10,183 1,540pyglet 1,400 1,354 12,218 2,181kivy 820 861 7,621 1,456pip 1,292 1,359 13,011 2,201twisted 5,137 3,129 49,457 4,830vispy 1,094 1,026 9,744 1,740orange 1,392 1,125 11,596 1,761tensorflow 5,724 4,321 45,006 4,672pandas 1,969 1,517 17,816 2,371sqlalchemy 1,737 1,374 15,606 2,039pyspark 1,851 1,276 18,775 2,200nupic 1,663 1,533 16,750 2,135astropy 2,325 2,054 24,567 3,007sympy 5,523 3,201 52,236 4,777ipython 1,034 1,115 9,114 1,771orator 817 499 6,511 670obspy 1,577 1,861 14,847 2,169rdkit 1,006 1,380 9,758 1,739django 2,790 2,026 31,531 3,484ansible 2,124 1,884 20,677 2,593statsmodels 2,357 2,352 21,716 2,733theano 1,223 1,364 12,018 2,152nltk 2,383 2,324 25,823 3,151sklearn 1,532 1,519 13,897 2,115

I 27 Python projects from Github, from Richardson and Kuhn (2017a),similar to Barone and Sennrich (2017)

15

Summary of Current ResourcesI API Datasets: Stdlib collection and Py27, consists of 45 APIs across 11

programming languages, 8 natural languages.I Other Resources: Function Assistant, tool for extracting parallel

datasets from Python projectsI forthcoming: around 460 Python/Java API datasets for data-to-text

generation (Richardson et al., 2017).

https://github.com/yakazimir/Code-Datasets

16

Text to Signature Translation: How hard is it?I Task: For each API dataset of text/signature pairs (each within a finite

signature space), learn a sp: text → signature.I Question Can background info. from API help?

I SMT Baseline: (Deng and Chrupa la, 2014), sequence prediction model.

Gets the total cache size

× string APCIterator::key(void)× int APCIterator::getTotalHits(void)× int APCterator::getSize(void)int APCIterator::getTotalSize(void)× int Memcached::append(string $key)...

Accuracy @i?

int APCIterator::getTotalSize(void)× int APCterator::getSize(void)× string APCIterator::key(void)× int Memcached::append(string $key)× int APCIterator::getTotalHits(void)...

SMT Model

Task specific decoder

Discriminative Model

Evaluation

Evaluation

reranked k-best

k-best signature translation list

17

Text to Signature Translation: How hard is it?I Task: For each API dataset of text/signature pairs (each within a finite

signature space), learn a sp: text → signature.I Question Can background info. from API help?

I SMT Baseline: (Deng and Chrupa la, 2014), sequence prediction model.

Gets the total cache size

× string APCIterator::key(void)× int APCIterator::getTotalHits(void)× int APCterator::getSize(void)int APCIterator::getTotalSize(void)× int Memcached::append(string $key)...

Accuracy @i?

int APCIterator::getTotalSize(void)× int APCterator::getSize(void)× string APCIterator::key(void)× int Memcached::append(string $key)× int APCIterator::getTotalHits(void)...

SMT Model

Task specific decoder

Discriminative Model

Evaluation

Evaluation

reranked k-best

k-best signature translation list

17

Background API Information

I Reranker Model: See-also annotations, abstract syntax info., parameterdescriptions,...

{cos,acosh,sinh}

z: function float cosh float $arg

x: Returns the hyperbolic cosine of arg

φ(x,z) =

Model score: is it in top 5..10?Alignments: (hyperbolic, cosh), (cosine, cosh), ...Phrases: (hyperbolic cosine, cosh), (of arg, float $arg), ...See also classes: (hyperbolic, {cos,acosh,sinh,..}), ...In descriptions: (arg, , $arg)Matches/Tree position: ......

18

Text to Signature Translation: How hard is it?

Gets the total cache size

× string APCIterator::key(void)× int APCIterator::getTotalHits(void)× int APCterator::getSize(void)int APCIterator::getTotalSize(void)× int Memcached::append(string $key)...

Accuracy @i?

int APCIterator::getTotalSize(void)× int APCterator::getSize(void)× string APCIterator::key(void)× int Memcached::append(string $key)× int APCIterator::getTotalHits(void)...

SMT Model

Task specific decoder

Discriminative Model

Evaluation

Evaluation

reranked k-best

k-best signature translation list

Dataset(Avg.) Term Matching SMT SMT + ReankerStandard Library Docs. 12.7 32.0 19.2 28.9 67.7 41.9 31.1 71.1 44.5Py27 22.9 50.6 32.4 29.3 67.4 42.5 32.4 73.5 46.5

accuracy @1 accuracy @10 MRR

19

What do these results mean?

Dataset(Avg.) Term Matching SMT SMT + ReankerStandard Library Docs. 12.7 32.0 19.2 28.9 67.7 41.9 31.1 71.1 44.5Py27 22.9 50.6 32.4 29.3 67.4 42.5 32.4 73.5 46.5

accuracy @1 accuracy @10 MRR

20

Semantic Parsing in Tech Docs: General Findings

I How hard is it?: Certainly not trivial, simple SMT models do alright, butlots of room for improvement

I New Challenges: Highly sparse vocabulary, very hard to apply existingsemantic parsing and MT methods (more about this next).

21

The Semantics of Function Signatures

Returns the greater of two long valuesySignature (informal) lang Math long max(long a,long b)Normalized java lang Math::max(long:a,long:b) -> long

Expansion to Logic

Jjava lang Math::max(long:a,long:b) -> longKm

λx1λx2∃v∃f ∃n∃c eq(v, max(x1, x2)) ∧ fun(f , max) ∧ type(v ,long)∧ lang(f ,java)∧ var(x1,a) ∧ param(x1,f ,1) ∧ type(x1,long)∧ var(x2,b) ∧ param(x2,f ,2) ∧ type(x2,long)∧ namespace(n,lang) ∧ in namespace(f ,n)∧ class(c,Math) ∧ in class(f ,c)

I Disclaimer: not real logical forms, but we can formalize the functionsignature languages and define a translation to logic (Richardson, 2018).

I Reasoning: A lot of declarative knowledge can be extracted from librariesdirectly, and via natural language.

22

</Resource Issue>

23

Other Resource Issues

(en,PHP)(en,Lisp)(de, PHP)

(ja, Python)(en, Haskell)

...

θen → PHP

θen → Lisp

θde → PHP

θja → Python

θen → Haskell

I Traditional approaches to semantic parsing train individual models foreach available parallel dataset.

I Resource Problem: Datasets tend to be small, hard and unlikely to getcertain types of parallel data, e.g., (de,Haskell).

24

Code Domain: Projects often Lack Documentation

I Ideally, we want each dataset to have tens of thousands of documentedfunctions.

I Most projects have 500 or less documented functions.

25

Polyglot Models: Training on Multiple Datasets

(en,PHP)(en,Lisp)(de, PHP)

(ja, Python)(en, Haskell)

...

θpolyglot

I Idea: concatenate all datasets into one, build a single-model with sharedparameters, capture redundancy (Herzig and Berant, 2017).

I Polyglot Translator: translates from any input language to any output(programming) language.

1. Multiple Datasets: Does this help learn better translators?

2. Zero-Short Translation (Johnson et al., 2016): Can we translatebetween different APIs and unobserved language pairs?

26

Polyglot Models: Training on Multiple Datasets

(en,PHP)(en,Lisp)(de, PHP)

(ja, Python)(en, Haskell)

...

θpolyglot

I Idea: concatenate all datasets into one, build a single-model with sharedparameters, capture redundancy (Herzig and Berant, 2017).

I Polyglot Translator: translates from any input language to any output(programming) language.

1. Multiple Datasets: Does this help learn better translators?

2. Zero-Short Translation (Johnson et al., 2016): Can we translatebetween different APIs and unobserved language pairs?

26

Polyglot Models: Training on Multiple Datasets

I Challenge: Building a polyglot decoder, or translation mechanism thatfacilitates crossing between (potentially unobserved) language pairs.

I Constraint 1: Ensure well-formed code output (not guaranteed inordinary MT, cf. Cheng et al. (2017); Krishnamurthy et al. (2017))

I Constraint 2: Must be able to translate to targetAPIs/programming languages on demand.

27

Polyglot Models: Training on Multiple Datasets

I Challenge: Building a polyglot decoder, or translation mechanism thatfacilitates crossing between (potentially unobserved) language pairs.

I Constraint 1: Ensure well-formed code output (not guaranteed inordinary MT, cf. Cheng et al. (2017); Krishnamurthy et al. (2017))

I Constraint 2: Must be able to translate to targetAPIs/programming languages on demand.

27

Graph Based Approach

0.00

s0

∞5

s5

∞1

s1

∞6

s6

∞2

s2

∞3

s3

∞7

s7

∞4

s4

∞9

s9

∞10

s10

∞11

s11

∞8

s82C

2Clojure

numeric

algo

math

mathceil

atan2

atan2

ceil

x

xarg

y

I Idea: Exploit finite-ness of target translation space, represent full searchspace as directed acyclic graph (DAG).

I Trick: Prepend to each signature an artificial token that identifiers theAPI project or programming language (Johnson et al., 2016).

I Decoding: Reduces to finding a path given an input x:

x : The ceiling of a number

Can be solved using variant of single-source shortest path (SSSP)problem (Cormen et al., 2009), extendible to k-SSSP paths.

28

Graph Based Approach

0.00

s0

∞5

s5

∞1

s1

∞6

s6

∞2

s2

∞3

s3

∞7

s7

∞4

s4

∞9

s9

∞10

s10

∞11

s11

∞8

s82C

2Clojure

numeric

algo

math

mathceil

atan2

atan2

ceil

x

xarg

y

I Idea: Exploit finite-ness of target translation space, represent full searchspace as directed acyclic graph (DAG).

I Trick: Prepend to each signature an artificial token that identifiers theAPI project or programming language (Johnson et al., 2016).

I Decoding: Reduces to finding a path given an input x:

x : The ceiling of a number

Can be solved using variant of single-source shortest path (SSSP)problem (Cormen et al., 2009), extendible to k-SSSP paths.

28

Graph Based Approach

0.00

s0

∞5

s5

∞1

s1

∞6

s6

∞2

s2

∞3

s3

∞7

s7

∞4

s4

∞9

s9

∞10

s10

∞11

s11

∞8

s82C

2Clojure

numeric

algo

math

mathceil

atan2

atan2

ceil

x

xarg

y

I Idea: Exploit finite-ness of target translation space, represent full searchspace as directed acyclic graph (DAG).

I Trick: Prepend to each signature an artificial token that identifiers theAPI project or programming language (Johnson et al., 2016).

I Decoding: Reduces to finding a path given an input x:

x : The ceiling of a number

Can be solved using variant of single-source shortest path (SSSP)problem (Cormen et al., 2009), extendible to k-SSSP paths.

28

Graph Decoder: Shortest Path Decoding

0.00

s0

∞5

s5

∞1

s1

∞6

s6

∞2

s2

∞3

s3

∞7

s7

∞4

s4

∞9

s9

∞10

s10

∞11

s11

∞8

s82C

2Clojure

numeric

algo

math

mathceil

atan2

atan2

ceil

x

xarg

y

I Standard SSSP: assumes a DAG G = (V ,E), a weight function:w : E → R, (initialized) vector d ∈ ∞|V |, unique source node b

0: d[b]← 0.01: for vertex u ∈ V in top sorted order2: do d(v) = min

(u,v,z)∈E

{d(u) + w(u, v , z)

}3: return min

v∈V

{d(v)

}

I Variant: replace w(..) with translation model, dynamically generatesweights correspond. to translation scores for x and labels in SSSP search.

29

Graph Decoder: Shortest Path Decoding

0.00

s0

∞5

s5

∞1

s1

∞6

s6

∞2

s2

∞3

s3

∞7

s7

∞4

s4

∞9

s9

∞10

s10

∞11

s11

∞8

s82C

2Clojure

numeric

algo

math

mathceil

atan2

atan2

ceil

x

xarg

y

I Standard SSSP: assumes a DAG G = (V ,E), a weight function:w : E → R, (initialized) vector d ∈ ∞|V |, unique source node b

0: d[b]← 0.01: for vertex u ∈ V in top sorted order2: do d(v) = min

(u,v,z)∈E

{d(u) + w(u, v , z)

}3: return min

v∈V

{d(v)

}I Variant: replace w(..) with translation model, dynamically generates

weights correspond. to translation scores for x and labels in SSSP search.

29

Neural Sequence to Sequence Models

decoderc z1 z2 z3 ... z|z|

...

...

...

x1 x2 x3 ... x|x|

encoder

I Encoder Model: neural sequence model, builds a distributedrepresentation of the source sentence and its words x = (h1, h2, .., h|x|):

I Decoder Model: RNN language model additionally conditioned on inputx/Encoder states.

p(z | x) =|z|∏i

pΘ(zi | z<i , x)

I Modification (at decode/test time): Constrain search (each new zi )to allowable transitions and paths in the graph.

30

Neural Sequence to Sequence Models

decoderc z1 z2 z3 ... z|z|

...

...

...

x1 x2 x3 ... x|x|

encoder

I Encoder Model: neural sequence model, builds a distributedrepresentation of the source sentence and its words x = (h1, h2, .., h|x|):

I Decoder Model: RNN language model additionally conditioned on inputx/Encoder states.

p(z | x) =|z|∏i

pΘ(zi | z<i , x)

I Modification (at decode/test time): Constrain search (each new zi )to allowable transitions and paths in the graph.

30

Graph Decoder: Shortest Path DecodingI Standard SSSP: assumes a DAG G = (V ,E), a weight function:

w : E → R, (initialized) vector d ∈ ∞|V |, unique source node b

0: d[b]← 0.01: for each vertex u ∈ V in top sorted order2: do d(v) = min

(u,v,z)∈E

{d(u) + w(u, v , z)

}3: return min

v∈V

{d(v)

}I Translation models: Any model can be used, we experiment with lexical

translation models (see paper) and attentive encoder-decoder models.

I Neural Variant: assumes input x , G, neural decoder parameters Θ(trained normally), d , and s (state map):

0: d[b]← 0.01: for each vertex u ∈ V in top sorted order2: do d(v) = min

(u,v,z)∈E

{− log pΘ(z | z<i , x) + d(u)

}3: s[v ]← RNN state for min edge4: return min

v∈V

{d(v)

}

31

Graph Decoder: Shortest Path DecodingI Standard SSSP: assumes a DAG G = (V ,E), a weight function:

w : E → R, (initialized) vector d ∈ ∞|V |, unique source node b

0: d[b]← 0.01: for each vertex u ∈ V in top sorted order2: do d(v) = min

(u,v,z)∈E

{d(u) + w(u, v , z)

}3: return min

v∈V

{d(v)

}I Translation models: Any model can be used, we experiment with lexical

translation models (see paper) and attentive encoder-decoder models.I Neural Variant: assumes input x , G, neural decoder parameters Θ

(trained normally), d , and s (state map):

0: d[b]← 0.01: for each vertex u ∈ V in top sorted order2: do d(v) = min

(u,v,z)∈E

{− log pΘ(z | z<i , x) + d(u)

}3: s[v ]← RNN state for min edge4: return min

v∈V

{d(v)

}31

Polyglot vs. Monolingual Decoding

I The difference is the type of input data, and starting point (i.e., sourcenode) in the graph search.

I Any Language Decoding: Letting the decoder decide.

1. Source API (stdlib): (es, PHP) Input: Devuelve el mensaje asociado al objeto lanzado.

Out

put Language: PHP Translation: public string Throwable::getMessage ( void )

Language: Java Translation: public String lang.getMessage( void )Language: Clojure Translation: (tools.logging.fatal throwable message & more)

2. Source API (stdlib): (ru, PHP) Input: konvertiruet stroku iz formata UTF-32 v format UTF-16.

Out

put Language: PHP Translation: string PDF utf32 to utf16 ( ... )

Language: Ruby Translation: String#toutf16 => stringLanguage: Haskell Translation: Encoding.encodeUtf16LE :: Text -> ByteString

3. Source API (py): (en, stats) Input: Compute the Moore-Penrose pseudo-inverse of a matrix.

Out

put Project: sympy Translation: matrices.matrix.base.pinv solve( B, ... )

Project: sklearn Translation: utils.pinvh( a, cond=None,rcond=None,... )Project: stats Translation: tools.pinv2( a,cond=None,rcond=None )

I Can be used for extracting declarative knowledge about functionequivalences (e.g., for the logical approach introduced).

32

Polyglot vs. Monolingual Decoding

I The difference is the type of input data, and starting point (i.e., sourcenode) in the graph search.

I Any Language Decoding: Letting the decoder decide.

1. Source API (stdlib): (es, PHP) Input: Devuelve el mensaje asociado al objeto lanzado.

Out

put Language: PHP Translation: public string Throwable::getMessage ( void )

Language: Java Translation: public String lang.getMessage( void )Language: Clojure Translation: (tools.logging.fatal throwable message & more)

2. Source API (stdlib): (ru, PHP) Input: konvertiruet stroku iz formata UTF-32 v format UTF-16.

Out

put Language: PHP Translation: string PDF utf32 to utf16 ( ... )

Language: Ruby Translation: String#toutf16 => stringLanguage: Haskell Translation: Encoding.encodeUtf16LE :: Text -> ByteString

3. Source API (py): (en, stats) Input: Compute the Moore-Penrose pseudo-inverse of a matrix.

Out

put Project: sympy Translation: matrices.matrix.base.pinv solve( B, ... )

Project: sklearn Translation: utils.pinvh( a, cond=None,rcond=None,... )Project: stats Translation: tools.pinv2( a,cond=None,rcond=None )

I Can be used for extracting declarative knowledge about functionequivalences (e.g., for the logical approach introduced).

32

Polyglot vs. Monolingual Decoding

I The difference is the type of input data, and starting point (i.e., sourcenode) in the graph search.

I Any Language Decoding: Letting the decoder decide.

1. Source API (stdlib): (es, PHP) Input: Devuelve el mensaje asociado al objeto lanzado.

Out

put Language: PHP Translation: public string Throwable::getMessage ( void )

Language: Java Translation: public String lang.getMessage( void )Language: Clojure Translation: (tools.logging.fatal throwable message & more)

2. Source API (stdlib): (ru, PHP) Input: konvertiruet stroku iz formata UTF-32 v format UTF-16.

Out

put Language: PHP Translation: string PDF utf32 to utf16 ( ... )

Language: Ruby Translation: String#toutf16 => stringLanguage: Haskell Translation: Encoding.encodeUtf16LE :: Text -> ByteString

3. Source API (py): (en, stats) Input: Compute the Moore-Penrose pseudo-inverse of a matrix.

Out

put Project: sympy Translation: matrices.matrix.base.pinv solve( B, ... )

Project: sklearn Translation: utils.pinvh( a, cond=None,rcond=None,... )Project: stats Translation: tools.pinv2( a,cond=None,rcond=None )

I Can be used for extracting declarative knowledge about functionequivalences (e.g., for the logical approach introduced).

32

Polyglot vs. Monolingual Decoding: Tech Doc Task

I Our Focus: Does training on multiple datasets (i.e., polyglot models)improve monolingual decoding?

I Monolingual models: current best models, primarily SMT based.

Method Acc@1 Acc@10 MRR

stdl

ib

mono. Best Monolingual Model 29.9 69.2 43.1poly. Lexical SMT SSSP 33.2 70.7 45.9

Best Seq2Seq SSSP 13.9 36.5 21.5

py27

mono. Best Monolingual Model 32.4 73.5 46.5poly. Lexical SMT SSSP 41.3 77.7 54.1

Best Seq2Seq SSSP 9.0 26.9 15.1

I Findings: Polyglot models can improve performance using SMT models,do not work for Seq2Seq models.

I Standard set of tricks: copying, lexical biasing (Arthur et al., 2016).

33

Polyglot vs. Monolingual Decoding: Tech Doc Task

I Our Focus: Does training on multiple datasets (i.e., polyglot models)improve monolingual decoding?

I Monolingual models: current best models, primarily SMT based.

Method Acc@1 Acc@10 MRR

stdl

ib

mono. Best Monolingual Model 29.9 69.2 43.1poly. Lexical SMT SSSP 33.2 70.7 45.9

Best Seq2Seq SSSP 13.9 36.5 21.5

py27

mono. Best Monolingual Model 32.4 73.5 46.5poly. Lexical SMT SSSP 41.3 77.7 54.1

Best Seq2Seq SSSP 9.0 26.9 15.1

I Findings: Polyglot models can improve performance using SMT models,do not work for Seq2Seq models.

I Standard set of tricks: copying, lexical biasing (Arthur et al., 2016).

33

Polyglot vs. Monolingual Decoding: Tech Doc Task

I Our Focus: Does training on multiple datasets (i.e., polyglot models)improve monolingual decoding?

I Monolingual models: current best models, primarily SMT based.

Method Acc@1 Acc@10 MRR

stdl

ib

mono. Best Monolingual Model 29.9 69.2 43.1poly. Lexical SMT SSSP 33.2 70.7 45.9

Best Seq2Seq SSSP 13.9 36.5 21.5

py27

mono. Best Monolingual Model 32.4 73.5 46.5poly. Lexical SMT SSSP 41.3 77.7 54.1

Best Seq2Seq SSSP 9.0 26.9 15.1

I Findings: Polyglot models can improve performance using SMT models,do not work for Seq2Seq models.

I Standard set of tricks: copying, lexical biasing (Arthur et al., 2016).

33

Polyglot vs. Monolingual Decoding: Tech Doc Task

I Our Focus: Does training on multiple datasets (i.e., polyglot models)improve monolingual decoding?

I Monolingual models: current best models, primarily SMT based.

Method Acc@1 Acc@10 MRR

stdl

ib

mono. Best Monolingual Model 29.9 69.2 43.1poly. Lexical SMT SSSP 33.2 70.7 45.9

Best Seq2Seq SSSP 13.9 36.5 21.5

py27

mono. Best Monolingual Model 32.4 73.5 46.5poly. Lexical SMT SSSP 41.3 77.7 54.1

Best Seq2Seq SSSP 9.0 26.9 15.1

I Findings: Polyglot models can improve performance using SMT models,do not work for Seq2Seq models.

I Standard set of tricks: copying, lexical biasing (Arthur et al., 2016).

33

Polyglot Modeling on Benchmark SP Tasks

I Our Focus: Does this help on benchmark semantic parsing tasks?

Method Acc@1 (averaged)

Mul

tilin

gual

Geo

quer

y

mon

o.

UBL Kwiatkowski et al. (2010) 74.2TreeTrans Jones et al. (2012) 76.8Lexical SMT SSSP 68.6

poly.

Best Seq2Seq SSSP 78.0Lexical SMT SSSP 67.3Best Seq2Seq SSSP 79.6

I Multilingual Geoquery: monolingual/polyglot models on Geoquery inen, de, gr , th, polyglot setting improves accuracy, neural Seq2Seq modelsperform best (consistent with recent findings, (Dong and Lapata, 2016)).

I Recall that these same Seq2Seq models do not work in the technicaldocumentation tasks.

34

Polyglot Modeling on Benchmark SP Tasks

I Our Focus: Does this help on benchmark semantic parsing tasks?

Method Acc@1 (averaged)

Mul

tilin

gual

Geo

quer

y

mon

o.

UBL Kwiatkowski et al. (2010) 74.2TreeTrans Jones et al. (2012) 76.8Lexical SMT SSSP 68.6

poly.

Best Seq2Seq SSSP 78.0Lexical SMT SSSP 67.3Best Seq2Seq SSSP 79.6

I Multilingual Geoquery: monolingual/polyglot models on Geoquery inen, de, gr , th, polyglot setting improves accuracy, neural Seq2Seq modelsperform best (consistent with recent findings, (Dong and Lapata, 2016)).

I Recall that these same Seq2Seq models do not work in the technicaldocumentation tasks.

34

Polyglot Modeling on Benchmark SP Tasks

I Our Focus: Does this help on benchmark semantic parsing tasks?

Method Acc@1 (averaged)

Mul

tilin

gual

Geo

quer

y

mon

o.

UBL Kwiatkowski et al. (2010) 74.2TreeTrans Jones et al. (2012) 76.8Lexical SMT SSSP 68.6

poly.

Best Seq2Seq SSSP 78.0Lexical SMT SSSP 67.3Best Seq2Seq SSSP 79.6

I Multilingual Geoquery: monolingual/polyglot models on Geoquery inen, de, gr , th, polyglot setting improves accuracy, neural Seq2Seq modelsperform best (consistent with recent findings, (Dong and Lapata, 2016)).

I Recall that these same Seq2Seq models do not work in the technicaldocumentation tasks.

34

Benchmark SP Tasks: Mixed Language Decoding

I Introduced a new mixed language GeoQuery test set, each sentencecontains NPs from two or more languages.

Mixed Lang. Input: Wie hoch liegt der hochstgelegene punkt in Αλαμπάμα?LF: answer(elevation 1(highest(place(loc 2(stateid(’alabama’))))))

Method Acc@1 (averaged) Acc@10 (averaged)Mixed Best Monolingual Seq2Seq 4.2 18.2

Polyglot Seq2Seq 75.2 90.0

35

Learning from multiple datasets: Summary

I Take Away: polyglot modeling can be a useful technique for improvingsemantic parsing and transfer learning.

I New Ideas: Translating between datasets and languages, mixed languageparsing, hardness of classical SMT decoding..?

I Technical Docs: has features of a low-resource translation task, difficultespecially for neural modeling.

36

<Learning from Entailment>(high level overview)

37

Semantic Parsing and Entailment

I Entailment: One of the basic aims of semantics. (Montague (1970))I Representations should be grounded in judgements about entailment.

input sem

All samples that containa major element

→Some sample that containsa major element

databaseJsemK ={S10019,S10059,...} ⊇ {S10019}

Semantic Parsing

Reasoning

(FOR EVERY X /MAJORELT : T;(FOR EVERY Y /SAMPLE : (CONTAINS Y X);(PRINTOUT Y)))

Knowledge Representation

38

Unit Testing our Semantic Parsers

I Minimal requirement: A semantic parser should be able to recognizecertain types of entailments.

I RTE: Would a human reading t infer h? Dagan et al. (2005)

sentence Logical Form (from corpus)t Pink3 passes to Pink7 pass(pink3,pink7)h Pink3 quickly passes to Pink7 pass(pink3,pink7)

Inference t → h Uncertain (human)

Sportscaster corpus (Chen and Mooney (2008))

39

Unit Testing our Semantic Parsers

I Minimal requirement: A semantic parser should be able to recognizecertain types of entailments.

I RTE: Would a human reading t infer h? Dagan et al. (2005)

sentence Logical Form (from corpus)t Pink3 passes to Pink7 pass(pink3,pink7)h Pink3 quickly passes to Pink7 pass(pink3,pink7)

Inference t → h Uncertain (human)t → h Entail (LF only)

Sportscaster corpus (Chen and Mooney (2008))

39

Unit Testing our Semantic Parsers

I Minimal requirement: A semantic parser should be able to recognizecertain types of entailments.

I RTE: Would a human reading t infer h? Dagan et al. (2005)

sentence Logical Form (from corpus)t Pink3 passes to Pink7 pass(pink3,pink7)h Pink3 kicks the ball kick(Pink3)

Inference t → h Entail (human)t → h Contradict (LF only)

Sportscaster corpus (Chen and Mooney (2008))

39

Unit Testing our Semantic ParsersI Minimal requirement: A semantic parser should be able to recognize

certain types of entailments.I RTE: Would a human reading t infer h? Dagan et al. (2005)

sentence Logical Form (from corpus)t Pink3 passes to Pink7 pass(pink3,pink7)h Pink3 kicks the ball kick(Pink3)

Inference t → h Entail (human)t → h Contradict (LF only)

Inference Prediction AccuracyMajority Baseline 0.33Logical Form Matching 0.60

Sportscaster corpus (Chen and Mooney (2008))

39

How to improve this? Test Driven Learning...

I General Problem: Semantic representations are underspecified, fail tocapture entailments, background knowledge missing.

I Goal: Capture the missing knowledge and inferential properties of text,incorporate entailment information into learning.

I Solution: Use entailment information (EI) and logical inference as weaksignal to train parser, in Richardson and Kuhn (2016).

Paradigm and Supervision Dataset D = Learning Goal

Learning from LFs {(inputi , LFi )}Ni input

Trans.−−−−→ LF

Learning from Entailment {(inputt , input′h)i , EIi )}N

i (inputt , input′h)

Proof−−−→ EI

40

How to improve this? Test Driven Learning...

I General Problem: Semantic representations are underspecified, fail tocapture entailments, background knowledge missing.

I Goal: Capture the missing knowledge and inferential properties of text,incorporate entailment information into learning.

I Solution: Use entailment information (EI) and logical inference as weaksignal to train parser, in Richardson and Kuhn (2016).

Paradigm and Supervision Dataset D = Learning Goal

Learning from LFs {(inputi , LFi )}Ni input

Trans.−−−−→ LF

Learning from Entailment {(inputt , input′h)i , EIi )}N

i (inputt , input′h)

Proof−−−→ EI

40

Learning from Entailment: IllustrationI Entailments are used to reason about target symbols and find holes in the

analyses.

input: (t,h) t pink3 λ passes to pink1

a

h pink3 quickly kicks λ

ypink3 ≡ pink3

Ipink3 ≡ pink3

λ wvcI

λ w quickly

pass v kick, pink1 v λ./

passes to pink1 v kicks./

passes to pink 1 # quickly kicks./

pink3 passes to pink1 # pink3 quickly kicks

EI z Uncertain

world

pink3/pink3

λ/wc

pass/kick

pink1/λ

Data: D = {((t, h)i , zi )}Ni=1, generic logical calculus. Task: learn (latent) proof y

41

Learning from Entailment: IllustrationI Entailments are used to reason about target symbols and find holes in the

analyses.

input: (t,h) t pink3 λ passes to pink1

a

h pink3 quickly kicks λ

ypink3 ≡ pink3

Ipink3 ≡ pink3

λ wvcI

λ w quickly

pass v kick, pink1 v λ./

passes to pink1 v kicks./

passes to pink 1 # quickly kicks./

pink3 passes to pink1 # pink3 quickly kicks

EI z Uncertain

world

pink3/pink3

λ/wc

pass/kick

pink1/λ

Data: D = {((t, h)i , zi )}Ni=1, generic logical calculus. Task: learn (latent) proof y

41

Learning from Entailment: IllustrationI Entailments are used to reason about target symbols and find holes in the

analyses.

input: (t,h) t pink3 λ passes to pink1

a

h pink3 quickly kicks λ

ypink3 ≡ pink3

Ipink3 ≡ pink3

λ wvcI

λ w quickly

pass v kick, pink1 v λ./

passes to pink1 v kicks./

passes to pink 1 # quickly kicks./

pink3 passes to pink1 # pink3 quickly kicks

EI z Uncertain

world

pink3/pink3

λ/wc

pass/kick

pink1/λ

Data: D = {((t, h)i , zi )}Ni=1, generic logical calculus. Task: learn (latent) proof y

41

Learning from Entailment: IllustrationI Entailments are used to reason about target symbols and find holes in the

analyses.

input: (t,h) t pink3 λ passes to pink1

a

h pink3 quickly kicks λ

ypink3 ≡ pink3

Ipink3 ≡ pink3

λ wvcI

λ w quickly

pass v kick, pink1 v λ./

passes to pink1 v kicks./

passes to pink 1 # quickly kicks./

pink3 passes to pink1 # pink3 quickly kicks

EI z Uncertain

world

pink3/pink3

λ/wc

pass/kick

pink1/λ

Data: D = {((t, h)i , zi )}Ni=1, generic logical calculus. Task: learn (latent) proof y

41

Learning from Entailment: IllustrationI Entailments are used to reason about target symbols and find holes in the

analyses.

input: (t,h) t pink3 λ passes to pink1

a

h pink3 quickly kicks λ

ypink3 ≡ pink3

Ipink3 ≡ pink3

λ wvcI

λ w quickly

pass v kick, pink1 v λ./

passes to pink1 v kicks./

passes to pink 1 # quickly kicks./

pink3 passes to pink1 # pink3 quickly kicks

EI z Uncertain

world

pink3/pink3

λ/wc

pass/kick

pink1/λ

Data: D = {((t, h)i , zi )}Ni=1, generic logical calculus. Task: learn (latent) proof y

41

Grammar Approach: Sentences to Logical Form

I Use a semantic CFG, rules constructed from target representations usingsmall set of templates (Borschinger et al. (2011))

(x : purple 10 quickly kicks, z : {kick(purple10), block(purple7),...})

↓ (rule extraction)

Rep

in transitive

kickc

kickw

kicks

λc

quickly

arg1

purple10c

purple10w

purple 10

Rep

arg1×

purple10c

purple10w

kicks

λc

quickly

in transitive×

kickc

kickw

purple 10

Rep

in transitive

blockc

blockw

kicks

λc

quickly

arg1

purple7c

purple7w

purple 10

Rep

in transitive

blockc

blockw

kicks

blockw

quickly

arg1

purple7c

purple7w

purple 10

kick(purple10) kick(purple10) block(purple7) block(purple9)

42

Grammar Approach: Sentences to Logical Form

I Use a semantic CFG, rules constructed from target representations usingsmall set of templates (Borschinger et al. (2011))

(x : purple 10 quickly kicks, z : {kick(purple10), block(purple7),...})

↓ (rule extraction)

Rep

in transitive

kickc

kickw

kicks

λc

quickly

arg1

purple10c

purple10w

purple 10

Rep

arg1×

purple10c

purple10w

kicks

λc

quickly

in transitive×

kickc

kickw

purple 10

Rep

in transitive

blockc

blockw

kicks

λc

quickly

arg1

purple7c

purple7w

purple 10

Rep

in transitive

blockc

blockw

kicks

blockw

quickly

arg1

purple9c

purple9w

purple 10

kick(purple10) kick(purple10) block(purple7) block(purple9)

42

Grammar Approach: Sentences to Logical Form

I Use a semantic CFG, rules constructed from target representations usingsmall set of templates (Borschinger et al. (2011))

(x : purple 10 quickly kicks, z : {kick(purple10), block(purple7),...})

↓ (rule extraction)

X X × ×Rep

in transitive

kickc

kickw

kicks

λc

quickly

arg1

purple10c

purple10w

purple 10

Rep

arg1×

purple10c

purple10w

kicks

λc

quickly

in transitive×

kickc

kickw

purple 10

Rep

in transitive

blockc

blockw

kicks

λc

quickly

arg1

purple7c

purple7w

purple 10

Rep

in transitive

blockc

blockw

kicks

blockw

quickly

arg1

purple9c

purple9w

purple 10

kick(purple10) kick(purple10) block(purple7) block(purple9)

42

Semantic Parsing as Grammatical Inference

I Rules used to define a PCFG Gθ, learn correct derivations.I Learning: EM bootstrapping approach (Angeli et al. (2012)

input d

Purple 7 kicks to Purple 4

worldz = {pass(purple7,purple4)}

Beam Parser θt

Interpretation

d1

Semsv

play-transitive

playerarg2

purple4c

purple 4

passr

passc

passes to

playerarg1

purple7c

purple 7

d2

Semsv

play-transitive

playerarg2

purple4c

purple 4

turnoverr

turnoverc

passes to

playerarg1

purple7c

purple 7

d3

Semsv

play-transitive

playerarg2

purple8c

purple 8

kickr

passc

passes to

playerarg1

purple7c

purple 7

d4

Semsv

playerarg1

play-transitive

passr

passc

purple 4

playerarg2

purple4c

passes to

playerarg1

purple7c

purple 7

... ... dk ...

k-best list

43

Semantic Parsing as Grammatical Inference

I Rules used to define a PCFG Gθ, learn correct derivations.I Learning: EM bootstrapping approach (Angeli et al. (2012)

input d

Purple 7 kicks to Purple 4

worldz = {pass(purple7,purple4)}

Beam Parser θt

Interpretation

d1

Semsv

play-transitive

playerarg2

purple4c

purple 4

passr

passc

passes to

playerarg1

purple7c

purple 7

d2

Semsv

play-transitive

playerarg2

purple4c

purple 4

turnoverr

turnoverc

passes to

playerarg1

purple7c

purple 7

d3

Semsv

play-transitive

playerarg2

purple8c

purple 8

kickr

passc

passes to

playerarg1

purple7c

purple 7

d4

Semsv

playerarg1

play-transitive

passr

passc

purple 4

playerarg2

purple4c

passes to

playerarg1

purple7c

purple 7

... ... dk ...

k-best list

43

Semantic Parsing as Grammatical Inference

I Rules used to define a PCFG Gθ, learn correct derivations.I Learning: EM bootstrapping approach (Angeli et al. (2012)

input d

Purple 7 kicks to Purple 4

worldz = {pass(purple7,purple4)}

Beam Parser θt

Interpretation

d1

Semsv

play-transitive

playerarg2

purple4c

purple 4

passr

passc

passes to

playerarg1

purple7c

purple 7

d2

Semsv

play-transitive

playerarg2

purple4c

purple 4

turnoverr

turnoverc

passes to

playerarg1

purple7c

purple 7

d3

Semsv

play-transitive

playerarg2

purple8c

purple 8

kickr

passc

passes to

playerarg1

purple7c

purple 7

d4

Semsv

playerarg1

play-transitive

passr

passc

purple 4

playerarg2

purple4c

passes to

playerarg1

purple7c

purple 7

... ... dk ...

k-best list

θt+1

43

Learning Entailment RulesI Rules define an inference PCFG G′θ, learn correct proofs, uses natural

logic calculus from MacCartney and Manning (2009).I Learning: Grammatical inference problem as before, EM bootsrapping.

input dt: pink 1 kicksh: pink 1 quickly passes to pink 2

world

z = Uncertain

Beam Parser θt

Interpretation

d1

w

w

w

λ/pink2

λ/ pink2

w

w

kick/pass

kicks / passes to

wc

λ/ v

λ / quickly

pink1/pink1

pink 1 / pink 1

d2

w

w

w

λ/pink2

λ/ pink2

w

w

kick/pass

kicks / passes to

≡c

λ/ ≡

λ / quickly

pink1/pink1

pink 1 / pink 1

d3

|

|

w

λ/pink2

λ/ pink2

|

|

kick/pass

kicks / passes to

wc

λ/ v

λ / quickly

pink1/pink1

pink 1 / pink 1

d4

|

|

w

λ/pink2

λ/ pink2

|

|

kick/pass

kicks / passes to

≡c

λ/ ≡

λ / quickly

pink1/pink1

pink 1 / pink 1

... ... dk ...

k-best list

θt+1

44

Reasoning about EntailmentI Improving the internal representations (before, a, after, b).

a.

Semsv

play-transitive

playerarg2

purple6c

6c

6 under pressure

purplec

purple

passr

passc

passp

passes to

playerarg1

purple9c

purple 9

b

Semsv

play-transitive

playerarg2

vc

vp

under pressure

purple6c

purple 6

passr

passc

passp

passes to

playerarg1

purple9c

purple 9

45

Reasoning about EntailmentI Learned modifiers from example proofs trees.

(t, h): (a beautiful pass to,passes to) (gets a free kick,freekick from the)

analysis:

vc ./≡play-tran=vplay-tran

modifier≡play-tran.

pass/pass

“pass to’/“passes to”

vc

vc /λ

“a beautiful”/λ

≡c ./≡game-play=≡game-play

modifier≡game-play

freekick/freekick

“free kick” / “freekick from the”

≡c

≡c /λ

“gets a”/λ

generalization: beautiful(X) v X get(X) ≡ X

(t, h): (yet again passes to,kicks to) (purple 10,purple 10 who is out front)

analysis:

vc ./≡play-tran.=vplay-tran

modifier

≡play-tran.

pass/pass

“passes to”/“kicks to”

vc

vc /λ

“yet again”/λ

≡playerarg2 ./wc=wplayerarg2

modifier

wc

λ/ vc

λ/“who is out front”

≡playerarg2

purple10/purple10

“purple 10”/“purple 10”

generalization: yet-again(X) v X X w out front(X)

45

Reasoning about EntailmentI Learned lexical relations from example proof trees

(t, h): (pink team is offsides,purple 9 passes) (bad pass.., loses the ball to)

analysis:

|teamarg1

substitute

pink team/purple9

“pink team’/“purple 9”

vplay-tran

substitute

bad pass/turnover

“bad pass .. picked off by”/“loses the ball to”

relation: pink team | purple9 bad pass v turnover

(t, h): (free kick for, steals the ball from) (purple 6 kicks to,purple 6 kicks)

analysis:

|game-play

substitute

free kick/steal

“free kick for”/“steals the ball from”

vplay-tran.

substitute

pass/kick

“kicks to”/“kicks’

relation: free kick| steal pass v kick

45

Reasoning about Entailment: Summary

I New Ideas: Evaluating semantic parsers on RTE.

Inference Prediction AccuracyMajority Baseline 0.33Logical Form Matching 0.60Learning from Entailment 0.73

I Take Away: Using entailment as a weak signal can help improverepresentations being learned, help theory construction.

I Conceptually: Training our semantic parsers to be more like asemanticist, work backwards from entailments to representations.

46

Reasoning about Entailment: Summary

I New Ideas: Evaluating semantic parsers on RTE.

Inference Prediction AccuracyMajority Baseline 0.33Logical Form Matching 0.60Learning from Entailment 0.73

I Take Away: Using entailment as a weak signal can help improverepresentations being learned, help theory construction.

I Conceptually: Training our semantic parsers to be more like asemanticist, work backwards from entailments to representations.

46

Reasoning about Entailment: Summary

I New Ideas: Evaluating semantic parsers on RTE.

Inference Prediction AccuracyMajority Baseline 0.33Logical Form Matching 0.60Learning from Entailment 0.73

I Take Away: Using entailment as a weak signal can help improverepresentations being learned, help theory construction.

I Conceptually: Training our semantic parsers to be more like asemanticist, work backwards from entailments to representations.

46

Reasoning about Entailment: Summary

I New Ideas: Evaluating semantic parsers on RTE.

Inference Prediction AccuracyMajority Baseline 0.33Logical Form Matching 0.60Learning from Entailment 0.73

I Take Away: Using entailment as a weak signal can help improverepresentations being learned, help theory construction.

I Conceptually: Training our semantic parsers to be more like asemanticist, work backwards from entailments to representations.

46

</Learning from Entailment>

47

Conclusions

I Natural language understanding: pursue a data-driven approach,centering around semantic parser induction.

I Explore many areas of the problem, developed new resources andtechniques, looking at the problem holistically.

I Easy to get stuck on the translation problem, must be integratedwithin a broader theory of KR and reasoning.

I Question: To what extend can we use source code in the wild as aKR for deep NLU?

48

Conclusions

I Natural language understanding: pursue a data-driven approach,centering around semantic parser induction.

I Explore many areas of the problem, developed new resources andtechniques, looking at the problem holistically.

I Easy to get stuck on the translation problem, must be integratedwithin a broader theory of KR and reasoning.

I Question: To what extend can we use source code in the wild as aKR for deep NLU?

48

Thank You

49

References IAllamanis, M., Tarlow, D., Gordon, A., and Wei, Y. (2015). Bimodal modelling of

source code and natural language. In International Conference on MachineLearning, pages 2123–2132.

Angeli, G., Manning, C. D., and Jurafsky, D. (2012). Parsing time: Learning tointerpret time expressions. In Proceedings of NAACL-2012, pages 446–455.

Arthur, P., Neubig, G., and Nakamura, S. (2016). Incorporating Discrete TranslationLexicons into Neural Machine Translation. arXiv preprint arXiv:1606.02006.

Barone, A. V. M. and Sennrich, R. (2017). A parallel corpus of python functions anddocumentation strings for automated code documentation and code generation.arXiv preprint arXiv:1707.02275.

Borschinger, B., Jones, B. K., and Johnson, M. (2011). Reducing grounded learningtasks to grammatical inference. In Proceedings of EMNLP-2011, pages 1416–1425.

Chen, D. L. and Mooney, R. J. (2008). Learning to sportscast: A test of groundedlanguage acquisition. In Proceedings of ICML-2008, pages 128–135.

Cheng, J., Reddy, S., Saraswat, V., and Lapata, M. (2017). Learning an executableneural semantic parser. arXiv preprint arXiv:1711.05066.

Cormen, T., Leiserson, C., Rivest, R., and Stein, C. (2009). Introduction toAlgorithms. MIT Press.

Dagan, I., Glickman, O., and Magnini, B. (2005). The pascal recognizing textualentailment challenge. In Proceedings of the PASCAL Challenges Workshop onRecognizing Textual Entailment.

50

References IIDeng, H. and Chrupa la, G. (2014). Semantic approaches to software component

retrieval with English queries. In Proceedings of LREC-14, pages 441–450.Dong, L. and Lapata, M. (2016). Language to logical form with neural attention.

arXiv preprint arXiv:1601.01280.Gu, X., Zhang, H., Zhang, D., and Kim, S. (2016). Deep api learning. In Proceedings

of the 2016 24th ACM SIGSOFT International Symposium on Foundations ofSoftware Engineering, pages 631–642. ACM.

Herzig, J. and Berant, J. (2017). Neural semantic parsing over multipleknowledge-bases. In Proceedings of ACL.

Iyer, S., Konstas, I., Cheung, A., and Zettlemoyer, L. (2016). Summarizing SourceCode using a Neural Attention Model. In Proceedings of ACL, volume 1.

Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., Thorat, N.,Viegas, F., Wattenberg, M., Corrado, G., et al. (2016). Google’s MultilingualNeural Machine Translation System: Enabling Zero-Shot Translation. arXivpreprint arXiv:1611.04558.

Jones, B. K., Johnson, M., and Goldwater, S. (2012). Semantic parsing with BayesianTree Transducers. In Proceedings of ACL-2012, pages 488–496.

Krishnamurthy, J., Dasigi, P., and Gardner, M. (2017). Neural Semantic Parsing withType Constraints for Semi-Structured Tables. In Proceedings of EMNLP.

51

References IIIKwiatkowski, T., Zettlemoyer, L., Goldwater, S., and Steedman, M. (2010). Inducing

probabilistic CCG grammars from logical form with higher-order unification. InProceedings of EMNLP-2010, pages 1223–1233.

MacCartney, B. and Manning, C. D. (2009). An extended model of natural logic. InProceedings of the eighth International Conference on Computational Semantics,pages 140–156.

Montague, R. (1970). Universal grammar. Theoria, 36(3):373–398.Richardson, K. (2018). A Language for Function Signature Representations. arXiv

preprint arXiv:1804.00987.Richardson, K., Berant, J., and Kuhn, J. (2018). Polyglot Semantic Parsing in APIs.

In Proceedings of NAACL.Richardson, K. and Kuhn, J. (2016). Learning to Make Inferences in a Semantic

Parsing Task. Transactions of the Association for Computational Linguistics,4:155–168.

Richardson, K. and Kuhn, J. (2017a). Function Assistant: A Tool for NL Querying ofAPIs. In Proceedings of EMNLP-17.

Richardson, K. and Kuhn, J. (2017b). Learning Semantic Correspondences inTechnical Documentation. In Proceedings of ACL-17.

Richardson, K., Zarrieß, S., and Kuhn, J. (2017). The Code2Text Challenge: TextGeneration in Source Code Libraries. In Proceedings of INLG.

Woods, W. A. (1973). Progress in natural language understanding: an application tolunar geology. In Proceedings of the June 4-8, 1973, National ComputerConference and Exposition, pages 441–450.

52