A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada...

20
A Natural Language Interface for Crime-related Spatia Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science and Engineering University of North Texas ISI 2009 Presentation

Transcript of A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada...

Page 1: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

A Natural Language Interface for Crime-related Spatial Queries

Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector CuellarDepartment of Computer Science and Engineering

University of North Texas

ISI 2009 Presentation

Page 2: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

Motivation

• Related Work

• Proposed Method

• System Evaluation

Outline

Page 3: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

• The databases and query interfaces hosted by Federal and state justice departments are heterogeneous and complicated.

Motivation

Page 4: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

• Need tools for crime-related spatial queries.

Motivation

Find a police office near the school1

Find a house in neighborhood with low crime rate2

Page 5: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

• Neither web forms nor keyword search has the expressive power and flexibility desired in crime-related spatial queries.

• But natural language does!• No need for training• No need for proprietary user interface or esoteric formal language like SQL or Xquery• Ideal for ad-hoc real time query in emergency conditions

Motivation

Page 6: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

We propose a method to translate crime-related natural language spatial queries into spatial data queries

We implement a prototype query system

Experiments show that the system achieves results significantly better than those obtained by using Google Maps.

Our Contributions

Page 7: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

Motivation

Related Work

• Proposed Method

• System Evaluation

Outline

Page 8: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

CSCE 5290

Related Work

• Syntax-based methods[3-4] use template or grammar rules to match natural language sentences into database schemas

• Simple but not scalable• Sometimes may lead to serious errors

• Semantic Parsing algorithms[5-9] preserve syntactic dependencies, but also seek to enforce semantic constraints over the possible mappings

• The quality of mapping is significantly improved• Precise system in [9] focused on high precision only

Page 9: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

CSCE 5290

Related Work

• Lambda-calculus encoding can be used as the intermediate representation between natural language and database queries.[10]

• Training corpus is used to derive lexicons and grammars for the specific domain• The approach was found to lead to good results

• Structure of XML documents can be used to match natural language parse trees.[11]

• Identify a meaningful lowest common ancestor structure (MLCAS) from the tree structure

• Includes an interactive component to receive help from the user when formulating the query

Page 10: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

Motivation

Related Work

Proposed Method

• System Evaluation

Outline

Page 11: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

System Framework

Page 12: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

1. Part of Speech Tagging

• In POS tagging, we employ the classic Viterbi algorithm.• dynamic programming framework coupled with a Markov assumption• Efficient and widely used• Use manually labeled Penn Treebank Dataset for training purpose

Running Example:

I wish to find a police department within 2 miles of a law court

POS Tagging:

I/NP wish/VB to/IN find/VB a/DT police/NN department/NN

within/IN 2/CD miles/NNS of/IN a/DT law/NN court/NN

Page 13: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

2. Semantic Parsing

• In semantic parsing, we identify three type of “key words” using the parsing tree.

• Target object• Spatial predicate• Reference object

Example Parsing tree:

Page 14: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

2. Semantic Parsing

Running Example:

I wish to find a police department within 2 miles of a law court

Semantic Parsing:

Target Object: police department

Spatial predicate: within 2 miles

Reference object: law court

Page 15: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

3. Schema Matching

• In schema matching, we try to match target and reference spatial objects from the backend spatial database using

• Table name• Attribute name• Content of the database

• We then perform a spatial join for each retrieved candidate pair based on spatial predicate

Page 16: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

Motivation

Related Work

Proposed Method

System Evaluation

Outline

Page 17: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

Query Interface

Page 18: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

Experimental Evaluation

• Database contains real spatial data obtained from City of Denton• 32 tables• Including crime-related objects such as police office, law courts

• Gold standard: human prepared answers for 30 different crime-related queries.

• Baseline: Top 10 answers from Google Maps

• Result:

Page 19: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

Summary

• We proposed a method to build a natural language interface to spatial database queries. The prototype system demonstrated effectiveness of our approach in crime-related spatial queries.

• In our future work, we plan to extend our system by increasing the dataset size, and improving the accuracy of the tagging and parsing algorithms. We will collect more user queries and improve the system performance based on a larger evaluation dataset.

Page 20: A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

ISI 2009

References1. http://maps.google.com/

2. http://maps.met.police.uk/

3. I. Androutsopoulos, G. Ritchie, and P. Thanisch, “Natural language interfaces to databases – an introduction,” Journal of Natural Language Engineering, vol. 1, no. 1, 1995.

4. W. Woods, R. Kaplan, and B. Webber, “The Lunar sciences natural language information system,” Bolt Beranek and Newmann, Tech. Rep.,1972.

5. R. Ge and R. J. Mooney, “A statistical semantic parser that integrates syntax and semantics,” in Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), Ann Arbor, MI, Jul. 2005, pp. 9–16.

6. R. J. Kate and R. J. Mooney, “Using string-kernels for learning semantic parsers,” in Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL-06), Sydney, Australia, July 2006, pp. 913–920.

7. R. J. Mooney, “Learning for semantic parsing,” in Computational Linguistics and Intelligent Text Processing: Proceedings of the 8th International Conference, CICLing 2007, Mexico City, A. Gelbukh, Ed. Berlin: Springer Verlag, 2007, pp. 311–324.

8. Y. Wong and R. J. Mooney, “Learning for semantic parsing with statistical machine translation,” in Proceedings of Human Language Technology Conference / North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT-NAACL-06), New York City, NY, 2006, pp. 439–446.

9. A. Popescu, A. Armanasu, and O. Etzioni, “Modern natural language interfaces to databases: Composing statistical parsing with semantic tractability,” in Proceedings of the 20st International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, 2004.

10. L. Zettlemoyer and M. Collins, “Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars,” in Proceedings of the Twenty First Conference on Uncertainty in Artificial Intelligence (UAI-05), 2005.

11. Y. Li, H. Yang, and H. Jagadish, “NaLIX: an interactive natural language interface for querying XML,” in Proceedings of SIGMOD 2005, Baltimore, MD, 2005.