Provenance for Natural Language Queries
Daniel Deutch Nave Frost Amir Gilad
Tel Aviv University
August 2017
Presented by Amir Gilad
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 1 / 23
Outline
1 Introduction
2 Mappings and Answer Tree - Single Assignment
3 Factorization
4 Summarization
5 Experiments
6 Related Work and Conclusions
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 2 / 23
Motivation
NL QueryReturn the organization of authors who published papers in database conferences after
2005
Formal Queryquery(oname) :- org(oid, oname), conf(cid, cname),
pub(wid, cid, ptitle, pyear), author(aid, aname, oid),
domainConf(cid, did), domain(did, dname),
writes(aid, wid), dname = ’Databases’, pyear > 2005
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 3 / 23
Motivation
NL QueryReturn the organization of authors who published papers in database conferences after
2005
Formal Queryquery(oname) :- org(oid, oname), conf(cid, cname),
pub(wid, cid, ptitle, pyear), author(aid, aname, oid),
domainConf(cid, did), domain(did, dname),
writes(aid, wid), dname = ’Databases’, pyear > 2005
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 3 / 23
Motivation
NL QueryReturn the organization of authors who published papers in database conferences after
2005
Formal Queryquery(oname) :- org(oid, oname), conf(cid, cname),
pub(wid, cid, ptitle, pyear), author(aid, aname, oid),
domainConf(cid, did), domain(did, dname),
writes(aid, wid), dname = ’Databases’, pyear > 2005
ResultTel Aviv University (TAU)
(why?)
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 3 / 23
Motivation
NL QueryReturn the organization of authors who published papers in database conferences after
2005
Formal Queryquery(oname) :- org(oid, oname), conf(cid, cname),
pub(wid, cid, ptitle, pyear), author(aid, aname, oid),
domainConf(cid, did), domain(did, dname),
writes(aid, wid), dname = ’Databases’, pyear > 2005
ResultTel Aviv University (TAU) (why?)
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 3 / 23
Motivation
NL QueryReturn the organization of authors who published papers in database conferences after
2005
Formal Queryquery(oname) :- org(oid, oname), conf(cid, cname),
pub(wid, cid, ptitle, pyear), author(aid, aname, oid),
domainConf(cid, did), domain(did, dname),
writes(aid, wid), dname = ’Databases’, pyear > 2005
What We Have - Provenance(oname,TAU)·(aname,Tova M.)·(ptitle,OASSIS...)·(cname,SIGMOD)·(pyear,14’)+(oname,TAU)·(aname,Tova M.)·(ptitle,Querying...)·(cname,VLDB)·(pyear,06’)+(oname,TAU)·(aname,Tova M.)· (ptitle,Monitoring...)·(cname,VLDB)·(pyear,07’)+(oname,TAU)·(aname,Slava N.)·(ptitle,OASSIS...)·(cname,SIGMOD)·(pyear,14’)+(oname,TAU)·(aname,Tova M.)·(ptitle,A sample...)·(cname,SIGMOD)·(pyear,14’)+...
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 3 / 23
Motivation
NL QueryReturn the organization of authors who published papers in database conferences after
2005
Formal Queryquery(oname) :- org(oid, oname), conf(cid, cname),
pub(wid, cid, ptitle, pyear), author(aid, aname, oid),
domainConf(cid, did), domain(did, dname),
writes(aid, wid), dname = ’Databases’, pyear > 2005
What We Want - ExplanationsTAU is the organization of 43 authors who published 170 papers
in 31 conferences in 2006 - 2015
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 3 / 23
Solution Overview
Solution
Use the input question to formulate a detailed NL answer by replacingwords with values
I This is a general idea: showing provenance in a way that correspondsto the standard user interaction
When a long answer is needed, compact it using algebraicfactorization and summarization
I Here, again, we leverage the structure of the user question
Current Limitations
Only conjunctive queries are supported
Some aspects of the solution are limited to a specific NLIDBI But the general idea is not
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 4 / 23
Framework
Fact. +Sentence
Fact. +Sentence
Parser(Augmented) NaLIR(Augmented) NaLIR
BuilderQuery Builder
NL Query
NL Query
DBDBSelP
Factorization GenerationSentence
GenerationFact. +MappingFact. +
Mapping
Results + Provenance + MappingResults + Provenance + Mapping
Query + MappingQuery + MappingDep.
TreeDep. Tree
SummarizationSentenceSentenceSentenceSentence Summarized SentenceSummarized Sentence
Augment NaLIR [Fei Li, Jagadish, 15’]
Use a provenance-aware engine - SelP [Deutch et al., 15’]
Store the provenance and mappings
Translate results and provenance to NL using factorization andsummarization
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 5 / 23
Outline
1 Introduction
2 Mappings and Answer Tree - Single Assignment
3 Factorization
4 Summarization
5 Experiments
6 Related Work and Conclusions
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 6 / 23
Mappings
(oname, TAU)
Return
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 7 / 23
Return the organization of authors who published papers in database conferences after 2005
query(oname) :- org(oid, oname), conf(cid, cname), pub(wid, cid, ptitle, pyear), author(aid,
aname, oid), domainConf(cid, did), domain(did, dname), writes(aid, wid), dname =
’Databases’, pyear > 2005
Mappings
(oname, TAU)
Return
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 7 / 23
Return the organization of authors who published papers in database conferences after 2005
query(oname) :- org(oid, oname), conf(cid, cname), pub(wid, cid, ptitle, pyear), author(aid,
aname, oid), domainConf(cid, did), domain(did, dname), writes(aid, wid), dname =
’Databases’, pyear > 2005
Mappings
(oname, TAU)
(aname, Tova M.)
Return
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 7 / 23
Return the organization of authors who published papers in database conferences after 2005
query(oname) :- org(oid, oname), conf(cid, cname), pub(wid, cid, ptitle, pyear), author(aid,
aname, oid), domainConf(cid, did), domain(did, dname), writes(aid, wid), dname =
’Databases’, pyear > 2005
Mappings
(oname, TAU)
(aname, Tova M.)
(ptitle, ‘OASSIS...’)
(cname, SIGMOD)
(pyear, 2014)
Return
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 7 / 23
Return the organization of authors who published papers in database conferences after 2005
query(oname) :- org(oid, oname), conf(cid, cname), pub(wid, cid, ptitle, pyear), author(aid,
aname, oid), domainConf(cid, did), domain(did, dname), writes(aid, wid), dname =
’Databases’, pyear > 2005
From Mappings to an Answer
(oname, TAU)
(aname, Tova M.)
(ptitle, ‘OASSIS...’)
(cname, SIGMOD)
(pyear, 2014)
Return
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
organization
of
Tova M.
published
in
SIGMOD
in
2014
’OASSIS...’who
TAU (is the)
AnswerTAU is the organization of Tova M. who published ’OASSIS...’ in SIGMOD in 2014
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 8 / 23
From Mappings to an Answer
(oname, TAU)
(aname, Tova M.)
(ptitle, ‘OASSIS...’)
(cname, SIGMOD)
(pyear, 2014)
Return
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
organization
of
Tova M.
published
in
SIGMOD
in
2014
’OASSIS...’who
TAU (is the)
AnswerTAU is the organization of Tova M. who published ’OASSIS...’ in SIGMOD in 2014
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 8 / 23
From Mappings to an Answer
(oname, TAU)
(aname, Tova M.)
(ptitle, ‘OASSIS...’)
(cname, SIGMOD)
(pyear, 2014)
Return
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
organization
of
Tova M.
published
in
SIGMOD
in
2014
’OASSIS...’who
TAU (is the)
AnswerTAU is the organization of Tova M. who published ’OASSIS...’ in SIGMOD in 2014
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 8 / 23
From Mappings to an Answer
(oname, TAU)
(aname, Tova M.)
(ptitle, ‘OASSIS...’)
(cname, SIGMOD)
(pyear, 2014)
Return
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
organization
of
Tova M.
published
in
SIGMOD
in
2014
’OASSIS...’who
TAU (is the)
AnswerTAU is the organization of Tova M. who published ’OASSIS...’ in SIGMOD in 2014
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 8 / 23
Outline
1 Introduction
2 Mappings and Answer Tree - Single Assignment
3 Factorization
4 Summarization
5 Experiments
6 Related Work and Conclusions
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 9 / 23
Provenance Factorization
Idea
Use algebraic factorization of the provenance to take-out commonvalues that appear in multiple assignments
Provenance[TAU]·[Tova M.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[Querying...]·[VLDB]·[2006]+[TAU]·[Tova M.]· [Monitoring..]·[VLDB]·[2007]+[TAU]·[Slava N.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[A sample...]·[SIGMOD]·[2014]
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 10 / 23
Provenance Factorization
Idea
Use algebraic factorization of the provenance to take-out commonvalues that appear in multiple assignments
Provenance[TAU]·[Tova M.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[Querying...]·[VLDB]·[2006]+[TAU]·[Tova M.]· [Monitoring..]·[VLDB]·[2007]+[TAU]·[Slava N.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[A sample...]·[SIGMOD]·[2014]
Two Different Factorizations[TAU] ·([SIGMOD] · [2014] ·([OASSIS...] ·([Tova M.] + [Slava N.]))
+ [Tova M.] · [A Sample...])
+ [VLDB] · [Tova M.] ·([2006] · [Querying...]+ [2007] · [Monitoring...])
[TAU] ·([Tova M.] ·([VLDB] ·([2006] · [Querying...]+ [2007] · [Monitoring...]))
+ [SIGMOD] · [2014] ·([OASSIS...] + [A Sample...]))
+ [Slava N.] · [OASSIS...] · [SIGMOD] · [2014])
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 10 / 23
Provenance Factorization
Idea
Use algebraic factorization of the provenance to take-out commonvalues that appear in multiple assignments
Provenance[TAU]·[Tova M.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[Querying...]·[VLDB]·[2006]+[TAU]·[Tova M.]· [Monitoring..]·[VLDB]·[2007]+[TAU]·[Slava N.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[A sample...]·[SIGMOD]·[2014]
Two Different Factorizations[TAU] ·([SIGMOD] · [2014] ·([OASSIS...] ·([Tova M.] + [Slava N.]))
+ [Tova M.] · [A Sample...])
+ [VLDB] · [Tova M.] ·([2006] · [Querying...]+ [2007] · [Monitoring...])
[TAU] ·([Tova M.] ·([VLDB] ·([2006] · [Querying...]+ [2007] · [Monitoring...]))
+ [SIGMOD] · [2014] ·([OASSIS...] + [A Sample...]))
+ [Slava N.] · [OASSIS...] · [SIGMOD] · [2014])
Shortermeansbetter?
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 10 / 23
T -CompatibilityNL Query
Return the organization of authors who published papers in database conferences after 2005
Shortest Factorization
[TAU] ·([SIGMOD] · [2014] ·([OASSIS...] ·([Tova M.] + [Slava N.]))
+ [Tova M.] · [A Sample...])
+ [VLDB] · [Tova M.] ·([2006] · [Querying...]+ [2007] · [Monitoring...])
As a Sentence
TAU is the organization of authors who published inSIGMOD 2014
’OASSIS...’ which was published byTova M. and Slava N.
and Tova M. published ’A sample...’
and Tova M. published in VLDB
’Querying...’ in 2014
and ’Monitoring...’ in 2007.
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 11 / 23
T -Compatibility
Shortest Factorization[TAU] ·([SIGMOD] · [2014] ·([OASSIS...] ·([Tova M.] + [Slava N.]))
+ [Tova M.] · [A Sample...])
+ [VLDB] · [Tova M.] ·([2006] · [Querying...]+ [2007] · [Monitoring...])
Return
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
conferences ≤T authors but conferences 6≤fbad authors
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 11 / 23
T -CompatibilityNL Query
Return the organization of authors who published papers in database conferences after 2005
Longer, T -Compatible Factorization
[TAU] ·([Tova M.] ·([VLDB] ·([2006] · [Querying...]+ [2007] · [Monitoring...]))
+ [SIGMOD] · [2014] ·([OASSIS...] + [A Sample...]))
+ [Slava N.] · [OASSIS...] · [SIGMOD] · [2014])
As a Sentence
TAU is the organization of
Tova M. who published
in VLDB
’Querying...’ in 2006 and
’Monitoring...’ in 2007
and in SIGMOD in 2014
’OASSIS...’ and ’A sample...’
and Slava N. who published
’OASSIS...’ in SIGMOD in 2014.
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 11 / 23
Factorization Algorithm
Proposition
Obtaining a minimal T -compatible factorization is coNP-hard
Algorithm
Factorize greedily: traverse the dependency tree level-by-level
For every level with mapped words, factorize their correspondingvalues in the provenance
Prioritize which values to take-out in each level by frequency
Complexity
O(n2 · log n): recursively traverse the dependency tree and sort thevariables at each layer by their frequency in O(n · log n)
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 12 / 23
Factorization Algorithm
Proposition
Obtaining a minimal T -compatible factorization is coNP-hard
Algorithm
Factorize greedily: traverse the dependency tree level-by-level
For every level with mapped words, factorize their correspondingvalues in the provenance
Prioritize which values to take-out in each level by frequency
Complexity
O(n2 · log n): recursively traverse the dependency tree and sort thevariables at each layer by their frequency in O(n · log n)
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 12 / 23
Factorization Algorithm
Proposition
Obtaining a minimal T -compatible factorization is coNP-hard
Algorithm
Factorize greedily: traverse the dependency tree level-by-level
For every level with mapped words, factorize their correspondingvalues in the provenance
Prioritize which values to take-out in each level by frequency
Complexity
O(n2 · log n): recursively traverse the dependency tree and sort thevariables at each layer by their frequency in O(n · log n)
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 12 / 23
Factorization Example
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
[TAU]·[Tova M.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[Querying...]·[VLDB]·[2006]+[TAU]·[Tova M.]· [Monitoring..]·[VLDB]·[2007]+[TAU]·[Slava N.]·[OASSIS...]·[SIGMOD]·[2014]+[TAU]·[Tova M.]·[A sample...]·[SIGMOD]·[2014]
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 13 / 23
Factorization Example
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
[TAU] ·([Tova M.]·[OASSIS...]·[SIGMOD]·[2014]+[Tova M.]·[Querying...]·[VLDB]·[2006]+[Tova M.]· [Monitoring..]·[VLDB]·[2007]+[Slava N.]·[OASSIS...]·[SIGMOD]·[2014]+[Tova M.]·[A sample...]·[SIGMOD]·[2014])
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 13 / 23
Factorization Example
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
[TAU] ·([Tova M.] ·([OASSIS...]·[SIGMOD]·[2014]+[Querying...]·[VLDB]·[2006]+[Monitoring..]·[VLDB]·[2007]+[A sample...]·[SIGMOD]·[2014])+[Slava N.] · [OASSIS...] · [SIGMOD] · [2014])
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 13 / 23
Factorization Example
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
[TAU] ·([Tova M.] ·([VLDB] ·([2006] · [Querying...]+ [2007] · [Monitoring...]))
+ [SIGMOD] · [2014] ·([OASSIS...] + [A Sample...]))
+ [Slava N.] · [OASSIS...] · [SIGMOD] · [2014])
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 13 / 23
Factorization Example
organizationPOS=NN, REL=dobj
ofPOS=IN, REL=prep
authorsPOS=NNS, REL=pobj
publishedPOS=VBD, REL=rcmod
in
conferencesPOS=NNS, REL=pobj
databasePOS=NN, REL=nn
afterPOS=IN, REL=prep
2005POS=CD, REL=pobj
paperswho
the
TAU is the organization of
Tova M. who published
in VLDB
’Querying...’ in 2006 and
’Monitoring...’ in 2007
and in SIGMOD in 2014
’OASSIS...’ and ’A sample...’
and Slava N. who published
’OASSIS...’ in SIGMOD in 2014.
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 13 / 23
Outline
1 Introduction
2 Mappings and Answer Tree - Single Assignment
3 Factorization
4 Summarization
5 Experiments
6 Related Work and Conclusions
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 14 / 23
Summarization
Two Levels of Summarization[TAU] ·
A
([Tova M.] ·
B
([VLDB] ·([2006] · [Querying...]+ [2007] · [Monitoring...]))
+ [SIGMOD] · [2014] ·([OASSIS...] + [A Sample...]))
B
+ [Slava N.] · [OASSIS...] · [SIGMOD] · [2014])
A
Shorter Summarized Answer Based on A
TAU is the organization of 2 authors who published
4 papers in 2 conferences in 2006 - 2014
More Detailed Summarized Answer Based on B
TAU is the organization of Tova M. who published
4 papers in 2 conferences in 2006 - 2014 and Slava N.
who published ’OASSIS...’ in SIGMOD in 2014.
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 15 / 23
Outline
1 Introduction
2 Mappings and Answer Tree - Single Assignment
3 Factorization
4 Summarization
5 Experiments
6 Related Work and Conclusions
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 16 / 23
Sample Use-Cases
Q: Return the authors who published papers in VLDB before 2016 andafter 2007
A: Tova M. published 16 papers in VLDB in 2008 - 2015
Q: Return the authors who published papers in database conferences
A: Tova M. published 134 papers in 18 conferences
Q: Return the organization of authors who published papers in databaseconferences after 2005
A: TAU is the organization of 43 authors who published 170 papers in31 conferences in 2006 - 2015
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 17 / 23
Sample Use-Cases
Q: Return the authors who published papers in VLDB before 2016 andafter 2007
A: Tova M. published 16 papers in VLDB in 2008 - 2015
Q: Return the authors who published papers in database conferences
A: Tova M. published 134 papers in 18 conferences
Q: Return the organization of authors who published papers in databaseconferences after 2005
A: TAU is the organization of 43 authors who published 170 papers in31 conferences in 2006 - 2015
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 17 / 23
Sample Scalability ResultsComputation time as a function of the number of assignments.Overhead of only 16% w.r.t evaluation time.
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 1000 2000 3000 4000 5000
Tim
e (
sec)
Number of Assignments
Query 4
Query 5
Query 6
Query 7
Query 8
Query 9
Query 10
Query 11
Query 12
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 18 / 23
Breakdown of Computation Time
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1000 2000 3000 4000 5000
Tim
e (
sec)
Domain of Unique Values Per Answer
Query 4 Query 5 Query 6 Qurey 7 Query 8 Query 9 Query 10 Query 11 Query 12
(a) Factorization time
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 1000 2000 3000 4000 5000
Tim
e (
sec)
Domain of Unique Values Per Answer
Query 4 Query 5 Query 6 Qurey 7 Query 8 Query 9 Query 10 Query 11 Query 12
(b) Sentence gen. time
Summarization time was negligible.
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 19 / 23
Outline
1 Introduction
2 Mappings and Answer Tree - Single Assignment
3 Factorization
4 Summarization
5 Experiments
6 Related Work and Conclusions
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 20 / 23
Related Work
NL Interfaces:
Formulate the NL query and present the answers, e.g., [Fei Li et al.,15’], [Song et al., 15’]
Present the answers in NL based on the schema [Franconi et al., 14’]
Explain the query in NL [Koutrika et al., 10’]
Provenance:
Showing the provenance in graph form, e.g., [Ailamaki et al., 98’],[Davidson et al., 08’]
Allowing user control over granularity [Cohen-Boulakia et al., 08’]
Provenance factorization and Summarization, e.g., [Chapman et al.,08’], [Olteanu et al., 12’]
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 21 / 23
Summary
Main Contributions:
First to formulate the provenance of output tuples in NL
Employing both factorization and summarization to make provenancemore understandable
Devising a criterion for provenance factorization that accounts for itspresentation in NL
Future Work:
Extend the solution to UCQs, aggregation, nested queries, ...
Support more provenance models
Generalize the requirements from NL interfaces
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 22 / 23
Summary
Main Contributions:
First to formulate the provenance of output tuples in NL
Employing both factorization and summarization to make provenancemore understandable
Devising a criterion for provenance factorization that accounts for itspresentation in NL
Future Work:
Extend the solution to UCQs, aggregation, nested queries, ...
Support more provenance models
Generalize the requirements from NL interfaces
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 22 / 23
Thank YouQuestions?
Daniel Deutch, Nave Frost, Amir Gilad Provenance for Natural Language Queries August 2017 23 / 23