1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of...
-
date post
19-Dec-2015 -
Category
Documents
-
view
219 -
download
3
Transcript of 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of...
![Page 1: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/1.jpg)
1
Schema-based Natural Language Semantic
Parsing
Niculae Stratica and Bipin C. DesaiDepartment of Computer Science
Concordia University1455 de Maisonneuve Blvd. West, Montreal, H3G 1M8, Canada
![Page 2: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/2.jpg)
2
Introduction 1/2
This paper addresses the mapping of Natural Language to SQL queries.
It details a methodology to build the SQL query based on:
The input sentenceA dictionaryA set of production rules
![Page 3: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/3.jpg)
3
Introduction 2/2
Context-Free Grammar
I nputLanguage(English)
TargetLanguage
(SQL)
DictionaryProduction
Rules
Semantic Sets I ndex Files
![Page 4: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/4.jpg)
4
Summary
Early workTemplate based parsingToken matching parsingImplementationCapabilities and LimitationsFuture work
![Page 5: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/5.jpg)
Ref. [3] Minker, J., Information storage and retrieval - a survey and functional description, SIGIR, 12, pp.1-108, 1997 5
Early workNatural Language Processing through learning
algorithms and statistical methods
Internalrepresentation
Input Language(English)
Syntax AnalysisSemanticanalysis
Target Language(SQL)
Learning Algorithms Statistical methods
![Page 6: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/6.jpg)
Ref. [2] N. Stratica, L. Kosseim and B.C. Desai, NLIDB Templates for Semantic Parsing. Proceedings of Applications of Natural Language to Data Bases, NLDB’2003, pp. 235-241, June 2003, Burg, Germany.
6
Template-based parsing (NLIDB)
User input
Token parsingand tagging
Syntacticanalysis
Semanticanalysis
Build set ofqueries
Extractanswers
LinkParser
Data
Schema
Pre-processor
Domain-specifi cinterpretation rules
User-defi nedVocabulary rules
WordNet
Semantictemplates
![Page 7: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/7.jpg)
Ref. N. Stratica and B.C. Desai, Schema-based Natural Language Semantic Mapping. Proceedings of Applications of Natural Language to Data Bases, NLDB’2004, June 2004, Manchester, England.
7
Token-matching parsing
User input
Token parsing
TokenMatching
Build SQLquery
Extractanswers Data
Schema
Pre-processor
Semantic SetsI ndex Files
Production Rules
WordNet
WordNet
![Page 8: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/8.jpg)
8
Template versus Token-matching parsing
User input
Token parsing
TokenMatching
Build SQLquery
Extractanswers Data
Schema
Pre-processor
Semantic SetsI ndex Files
Production Rules
WordNet
WordNet
User input
Token parsingand tagging
Syntacticanalysis
Semanticanalysis
Build set ofqueries
Extractanswers
LinkParser
Data
Schema
Pre-processor
Domain-specificinterpretation rules
User-definedVocabulary rules
WordNet
Semantictemplates
![Page 9: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/9.jpg)
9
From the NL input to the SQL query
Semantic SetsIndex Files
Production Rules
Internalrepresentation
Input Language(English)
TokenizerSemanticanalysis
Target Language(SQL)
Databaserecords
Databaseschema
WordnetDerivationally related
forms
![Page 10: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/10.jpg)
10
The Database Schema
Table AUTHORS (LONG id PRIMARY_KEY, CHAR name, DATE DOB)
Table BOOKS (LONG id PRIMARY_KEY, CHAR title, LONG isbn, LONG pages, DATE date_published, ENUM type)
Table BOOKAUTHORS (LONG aid references AUTHORS, LONG bid references BOOKS)
Table STUDENTS (LONG id PRIMARY_KEY, CHAR name, DATE DOB)
Table BORROWEDBOOKS (LONG bid references BOOKS, LONG sid references STUDENTS)
![Page 11: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/11.jpg)
Ref [1] Miller, G., WordNet: A Lexical Database for English, Communications of the ACM, 38 (1), pp. 39-41, November 1995 11
Wordnet and the Semantic Sets 1/2
For table AUTHORS WordNet returns the following list:
WordNet 2.0 Search
Overview for ‘author’The noun ‘author’ has 2 senses in WordNet.
1. writer, author -- (writes (books or stories or articles or the like) professionally (for pay))
2. generator, source, author -- (someone who originates or causes or initiates something; ‘he was the generator of several complaints’)
![Page 12: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/12.jpg)
12
Wordnet and the Semantic Sets 2/2
Results for ‘Synonyms, hypernyms and hyponyms ordered by estimated frequency’ search of noun ‘authors’2 senses of authorSense 1 writer, author => communicatorSense 2 generator, source, author => maker, shaper
The semantic set for AUTHORS becomes:
{writer, author, generator, source, maker, communicator, shaper}
![Page 13: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/13.jpg)
13
Indexing the ENUM Types
TABLE=BOOKSATTRIBUTE TYPE ENUM={NOVEL, POEM}
If any of the TYPE values occurs in the input sentence, a new production rule is added to the SQL relating BOOKS.TYPE to VALUE such as the one in the example below:
Sentence: “List all novels”NOVEL is a valid value for BOOK.TYPEThe production rule is:
WHERE BOOKS.TYPE=‘NOVEL’
The resulting SQL query is:SELECT BOOKS.* FROM BOOKS WHERE BOOKS.TYPE=‘NOVEL’
![Page 14: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/14.jpg)
14
The Production Rules
Based on the Database schema, the pre-processor builds the following production rule:
IF AUTHORS in Table List AND BOOKS in Table List Then BOOKAUTHORS is in Table List
and the following SQL template:
SELECT Attribute List FROM Table ListWHERE BOOKAUTHORS.AID=AUTHORS.ID AND BOOKAUTHORS.BID=BOOKS.ID
![Page 15: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/15.jpg)
15
The Workflow
Run time engine
Dictionary
Preprocessor
Databaserecords
NL Input:Show all
books writtenby MarkTwain
tokenize
Show
all
books
written
by
Mark
Twain
No match
No match
No match
Database Schema
Table BOOKS
Table AUTHORS
TableBOOKAUTHORS
BOOKS SemanticSet
AUTHORSSemantic Set
Wordnet
ENUM Index file
TEXT Index file
LONG Index file
DATE Index file
Match
Match
Match
SQLProduction
Rules
FROM BOOKS,AUTHORS,
BOOKAUTHORS
SELECT..FROM.. WHEREBOOKAUTHORS.AID=AUTHOR.
ID ANDBOOKAUTHORS.BID=BOOKS.I
D
SELECT AUTHORS.*, BOOKS.*FROM.. WHERE
AUTHORS.NAME="Mark Twain"
TableSTUDENTS
No match
![Page 16: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/16.jpg)
16
‘Show all novels written by Mark Twain and William Shakespeare that have been borrowed by John Markus’
The method retains the following tokens: ‘... ... novels ... .. Mark Twain ... William Shakespeare ... ... ...
borrowed ... John Markus’
The token ‘novels’ point to table BOOKS. ‘novels’ is found in the INDEX files for ENUM values of the attribute BOOKS.TYPE
Capabilities and Limitations 1/5
![Page 17: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/17.jpg)
17
‘novels’ matches the ENUM value BOOKS.TYPE=’NOVEL’
‘Mark Twain’ matches the AUTHORS.NAME=’Mark Twain’
‘William Shakespeare’ matches the AUTHOR.NAME=’William Shakespeare’
‘borrowed’ is disambiguated at run time through BORROWEDBOOKS
‘John Markus’ matches STUDENTS.NAME=’John Markus’
Schema correlates AUTHORS, BOOKS and BOOKAUTHORS
Schema correlates STUDENT, BOOKS and BORROWEDBOOKS
The table list is: AUTHORS, BOOKS, BOOKAUTHORS,STUDENTS, BORROWEDBOOKS
Capabilities and Limitations 2/5
![Page 18: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/18.jpg)
18
The SQL Constraints are:
BOOKAUTHORS.AID=AUTHORS.ID AND BOOKAUTHORS.BID=BOOKS.IDAND BOOKS.TYPE=’NOVEL’AND (AUTHORS.NAME=’Mark Twain’ OR AUTHORS.NAME=’William Shakespeare’)AND STUDENTS.NAME=’John Markus’AND BOOKS.ID=BORROWEDBOOKS.BID AND STUDENT.ID=BORROWEDBOOKS.SID
The two constraints to the AUTHORS.NAME have been OR-ed because they point to the same attribute.
The method allows to construct the correct SQL query.
The method can address context and value ambiguities
Capabilities and Limitations 3/5
![Page 19: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/19.jpg)
19
1. The current architecture does not support operators such as: greater then, less then, count, average and sum.
2. It does not resolve dates as in: before, after, between.
3. The generated SQL does not support imbricate queries.
4. The proposed method eliminates all tokens that cannot be matched with either the semantic sets or with the index files and it works for semantically stable databases.
5. The preprocessor must be used after each semantic update of the database in order to modify the index files.
Capabilities and Limitations 4/5
![Page 20: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/20.jpg)
20
6. The context disambiguation is limited to the semantic sets related to a given schema.
7. Errors related to tokenizing, WordNet and the human intervention propagate in the SQL query.
8. The method completely disregards the unmatched tokes and thus it cannot correct the input query if it has errors.
9. However, the method correctly interprets the tokens that are found in the semantic sets or among the derivationally related terms at run time
Capabilities and Limitations 5/5
![Page 21: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/21.jpg)
21
Future WorkThe future work will focus on the operator resolution. We believe that the approach presented in this paper can give good results with a minimum of effort in implementation and avoids specific problems related to the various existing semantic analyses approaches. This is partly made possible by the highly organized data in the RDBMS.
The method will be implemented and the results will be measured against complex sentences involving more than 4 tables from the database. A study will be done to show the performance dependency on the size of the database records and on the database schema.
![Page 22: 1 Schema-based Natural Language Semantic Parsing Niculae Stratica and Bipin C. Desai Department of Computer Science Concordia University 1455 de Maisonneuve.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d3a5503460f94a14cda/html5/thumbnails/22.jpg)
22
References[1] Miller, G., WordNet: A Lexical Database for English, Communications of the ACM, 38 (1),
pp. 39-41, November 1995[2] N. Stratica, L. Kosseim and B.C. Desai, NLIDB Templates for Semantic Parsing.
Proceedings of Applications of Natural Language to Data Bases, NLDB’2003, pp. 235-241, June 2003, Burg, Germany.
[3] Minker, J., Information storage and retrieval - a survey and functional description, SIGIR, 12, pp.1-108, 1997
[4] Stuart H. Rubin, Shu-Ching Chen, and Mei-Ling Shyu, Field-Effect Natural Language Semantic Mapping, Proceedings of the 2003 IEEE International Conference on Systems, Man & Cybernetics, pp. 2483-2487, October 5-8, 2003, Washington, D.C., USA.
[5] Lawrence J. Mazlack, Richard A. Feinauer, Establishing a Basis for Mapping Natural-Language Statements Onto a Database Query Language, SIGIR 1980: 192-202
[6] Allen, James, Natural Language Understanding, University of Rochester 1995 The Benjamin Cummings Publishing Company, Inc. ISBN: 0-8053-0334-0
[7] Kathryn Baker, Alexander Franz, and Pamela Jordan, Coping with Ambiguity in Knowledge-based Natural Language Analysis, Florida AI Research Symposium, pages 155-159, 1994 Pensacola, Florida
[8] Hirst G., Semantic Interpretation and the Resolution of Ambiguity, Cambridge University Press 1986, Cambridge
[9] Latent Semantic Analysis Laboratory at the Colorado University, http://lsa.colorado.edu/ site visited in March 2004
[10] Sleator D., Davy Temperley, D., Parsing English with A Link Grammar,Proceedings of the Third Annual Workshop on Parsing Technologies, 1993