Intelligent natural language system
-
Upload
rajendra-akerkar -
Category
Technology
-
view
804 -
download
0
description
Transcript of Intelligent natural language system
![Page 1: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/1.jpg)
I lli N l LIntelligent Natural Language System
M A N I S H J O S H I
System
M A N I S H J O S H I R A J E N D R A A K E R K A R
![Page 2: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/2.jpg)
Open Domain Question Answering
What is Question Answering?
How is QA related to IR, IE?
S i l d QA Some issues related to QA
Question taxonomiesQ
General approach to QA
8 July, 2007ENLIGHT sys 2
![Page 3: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/3.jpg)
Question Answering Systems
These types of systems try to provide exact informationas an answer in response to the natural language queryraised by the user.by
Motivation: given a question, system should provideMotivation: given a question, system should provide an answer instead of requiring user to search for the answer in a set of documents
Example:
Q: What year was Mozart born? A: Mozart was born in 1756.
8 July, 2007ENLIGHT sys 3
![Page 4: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/4.jpg)
Information Retrieval
Document is the unit of information
Answers questions indirectly One has to search into the Document
Results: (ranked) list based on estimated relevance
Effective approaches are predominantly statistical pp p y(“bag of words”)
QA = (very short) passage retrieval with natural language Q ( y ) p g g g
questions (not queries)
8 July, 2007ENLIGHT sys 4
![Page 5: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/5.jpg)
Information Extraction
Task
Identify messages that fall under a number of specific topics
Extract information according to pre-defined templates g p p
Place the information into frame-like database records
Limitations
Templates are hand-crafted by human experts
Templates are domain dependent and not easily portable
8 July, 2007ENLIGHT sys 5
![Page 6: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/6.jpg)
Issues
Applications
Source of the answers Source of the answers Structured data — natural language queries on databases A fixed collection or book — encyclopedia
b dWeb data
Domain-independent vs. Domain specificp p
Users
Casual users vs. Regular users — Profile, History, etc.
May be maintained for regular users
8 July, 2007ENLIGHT sys 6
y g
![Page 7: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/7.jpg)
Question Taxonomy
Factual questions: answer is often found in a text snippet
from one or more documents
Questions that may have yes/no answers
h i ( h h h ) wh questions (who, where, when, etc.)
what, which questions are hard
Questions may be phrased as requests or commands Questions may be phrased as requests or commands
Questions requiring simple reasoning: Some world
knowledge elementary reasoning may be required to relateknowledge, elementary reasoning may be required to relate
the question with the answer. why, how questions
e g How did Socrates die? (by) drinking poisoned wine
8 July, 2007ENLIGHT sys 7
e.g. How did Socrates die? (by) drinking poisoned wine.
![Page 8: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/8.jpg)
Question Taxonomy
Context questions: Questions have to be answered in the
context of previous interactions with the usercontext of previous interactions with the user
Who assassinated Indira Gandhi?
When did this happen?
List questions: Fusion of partial answers scattered over
several documents is necessary
Ex. - List 3 major rice producing nations.
How do I assemble a bicycle?
8 July, 2007ENLIGHT sys 8
![Page 9: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/9.jpg)
QA System Architecture
8 July, 2007ENLIGHT sys 9
![Page 10: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/10.jpg)
General Approach
Question analysis: Find type of object that answers question: "when" -time, date "who" -person, organization, etc.
Document collection preprocessing: Prepare documents for real-time query processing q y p g
Document retrieval (IR): Using (augmented) question, retrieve set of possible relevant documents/passages using IR
8 July, 2007ENLIGHT sys 10
![Page 11: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/11.jpg)
General Approach
Document processing (IE): Search documents for entities of the desired type and in appropriate relations using NLPof the desired type and in appropriate relations using NLP
Answer extraction and ranking: Extract and rank candidate answers from the documents
Answer construction: Provide (links to) context evidenceAnswer construction: Provide (links to) context, evidence, etc.
8 July, 2007ENLIGHT sys 11
![Page 12: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/12.jpg)
Question Analysis
Identify semantic type of the entity sought by the question when, where, who — easy to handle which, what — ambiguous
e.g. What was the Beatles’ first hit single?e.g. What was the Beatles first hit single?
Determine additional constraints on the answer entity key words that will be used to locate candidatekey words that will be used to locate candidateanswer-bearing sentencesrelations (syntactic/semantic) that should hold between a candidate answer entity and other entities mentioned in the question
8 July, 2007ENLIGHT sys 12
![Page 13: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/13.jpg)
Document Processing
Preprocessing: Detailed analysis of all texts in the corpus
b d i imay be done a priori
one group annotates terms with one of 50 semantic
tags which are indexed along with terms
Retrieval: Initial set of candidate answer bearing documentsRetrieval: Initial set of candidate answer-bearing documents
are selected from a large collection
Boolean retrieval methods may be used profitably
Passage retrieval may be more appropriate
8 July, 2007ENLIGHT sys 13
![Page 14: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/14.jpg)
Document Processing
Analysis:
P t f h t i Part of speech tagging
Named entity identification: recognizes multi-wordstrings as names of companies/persons, locations/addresses, quantities, etc.
Shallow/deep syntactic analysis: Obtains informationabout syntactic relations, semantic roles
8 July, 2007ENLIGHT sys 14
![Page 15: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/15.jpg)
History
(MURAX ((Kupiec, 1993 )
was designed to answer questions from the Trivial Pursuit general-knowledge board game – drawing answers from Grolier’s on-line encyclopaedia (1990).
Text Retrieval Conference (TREC). TREC was started in 1992 with the aim of supporting information retrieval research by pp g yproviding the infrastructure necessary for large-scale evaluation of text retrieval methodologies.
The QA track was first included as part of TREC in 1999 with seventeen research groups entering one or more systems
8 July, 2007ENLIGHT sys 15
seventeen research groups entering one or more systems.
![Page 16: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/16.jpg)
Techniques for performing open-domain question answering
Manual and automatically constructed question analysers,
answering
Document retrieval specifically for question answering,
Semantic type answer extractionSemantic type answer extraction,
Answer extraction via automatically acquired surface matching text patterns, p ,
principled target processing combined with document retrieval for definition questions,
and various approaches to sentence simplification which aid in the generation of concise definitions.
8 July, 2007ENLIGHT sys 16
![Page 17: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/17.jpg)
Answer ExtractionLook for strings whose semantic type matches that of theLook for strings whose semantic type matches that of the expected answer - matching may include subasumption (incorporating something under a more general category )
Check additional constraints Select a window around matching candidate and
calculate word overlap between window and query;calculate word overlap between window and query;OR
Check how many distinct question keywords are found in a matching sentence order of occurrence etc
Check syntactic/semantic role of matching candidate
in a matching sentence, order of occurrence, etc.
Semantic Symmetry
Ambiguous Modification
8 July, 2007ENLIGHT sys 17
![Page 18: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/18.jpg)
Semantic Symmetry
Question – Who killed militants?
Militants killed five innocents in Doda District.
After 6 hour long encounter army soldiers killed 3 Militants.
We are looking for sentences containing word ‘Militant’ assubject but we got a sentence where word ‘Militant’ acts asobject (second sentence)
It is a Linguistic Phenomena which occur when an entity acts
object (second sentence)
It is a Linguistic Phenomena which occur when an entity actsas subject in some sentences and as object in anothersentences.
8 July, 2007ENLIGHT sys 18
![Page 19: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/19.jpg)
Example
Following Example illustrates the phenomenon of semantic symmetry and demonstrates problems caused thereof.
Question : Who visited President of India?
Candidate Answer 1: George Bush visited President of India
Candidate Answer 2: President of India visited flood affected area ofMumbai.
More than one sentences are similar at the word level, but they havevery different meanings.
8 July, 2007ENLIGHT sys 19
![Page 20: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/20.jpg)
Some more examples showing semantic symmetry
(1) The birds ate the snake. (1) The snake ate the bird.
(2) Communists in India are
(What does snake eat?)
(2) Small parties are supportingsupporting UPA government.(To whom communists aresupporting?)
Communists in Kerala.
8 July, 2007ENLIGHT sys 20
![Page 21: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/21.jpg)
Ambiguous Modification
It is a Linguistic Phenomena which occurs when an adjective in the sentence may modify more than one nounnoun.Question : What is the largest volcano in the Solar System?
Candidate Answer 1: In the Solar System, the largest planetJupitor has several volcanoes. ---- Wrong
Candidate Answer 2: Olympus Mons, the largest volcano inthe solar system. --- Correct
In first sentence Largest modifies word ‘planet’ whereas in second sentence Largest modifies word ‘volcano’.
8 July, 2007ENLIGHT sys 21
![Page 22: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/22.jpg)
Approaches to tackle the problem
Boris Katz and James Lin of MIT developed a system SAPERE that handles problems occurring due to semantic symmetry and ambiguous modification.
These problems occurs at semantic levelThese problems occurs at semantic level.
To deal with problems occurring at semantic level detailed information at syntactic level is gathered in all approachesy g pp
System developed by Katz and Lin gives results after utilizing syntactic relations. These typical S-V-O ternary relations are obtained after processing the information gathered by Minipar functional dependency parser.
8 July, 2007ENLIGHT sys 22
gat e ed by pa u ct o a depe de cy pa se .
![Page 23: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/23.jpg)
Our Approach
To deal with problems at semantic level most of the approaches available need to obtain and work on
We have proposed a new approach to deal with the
information gathered at syntactic level.
We have proposed a new approach to deal with the problems caused by Linguistic phenomena of Semantic Symmetry and Ambiguous Modification.
The Algorithms based on our approach removes wrong t f th ith th h l f i f tisentences from the answer with the help of information
obtained at Lexical level (Lexical Analysis).
8 July, 2007ENLIGHT sys 23
![Page 24: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/24.jpg)
Algorithm for Handling Semantic Symmetry
Rule 1 -If (sequence of keywords in question and candidateIf (sequence of keywords in question and candidate answer matches) then
If (POS of verb keyword are same) thenC did t i C tCandidate answer is Correct
Rule 2 -If (sequence of keywords in question and candidate answer do not match) then
If (POS verb keyword are not same) thenCandidate answer is CorrectCandidate answer is Correct
Otherwise -Candidate Answer is wrong
8 July, 2007ENLIGHT sys 24
![Page 25: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/25.jpg)
Algorithm for Handling Ambiguous Modification
We have identified the adjective as Adj, Scope defining noun as SN and the Identifier noun as IN.
Rules –If the sentence contains keywords in following order –
Adj α SN Where α indicate string of zero or more keywords.Then e
Rule1-a If α is IN == Correct Answer Or
Rule1 b If α is Blank == Correct AnswerRule1-b If α is Blank == Correct AnswerElse
Rule 2 If α is Otherwise == Wrong Answer
8 July, 2007ENLIGHT sys 25
![Page 26: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/26.jpg)
Algorithm for Handling Ambiguous Modification (Cont.)
If the sentence contains keywords in following order –
(Cont.)
y gSN α Adj β IN Where α and β indicate string
of zero or more keywords.ThenThen
Rule 3 If β is Blank == Correct Answer
(V l f D t tt )(Value of α Does not matter)Else
Rule 4 If β is Otherwise == Wrong Answer
8 July, 2007ENLIGHT sys 26
![Page 27: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/27.jpg)
Working System - ENLIGHT
We have developed a system that answers questions using ‘keyword based matching paradigm’.
We have incorporated newly formulated algorithms in the system and we got goodalgorithms in the system and we got good results.
8 July, 2007ENLIGHT sys 27
![Page 28: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/28.jpg)
ENLIGHT System Architecture
8 July, 2007ENLIGHT sys 28
![Page 29: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/29.jpg)
Thi d l l tf f th I t lli t d
PreprocessingThis module prepares platform for the Intelligent and Effective interface.
This module transfer raw format data into well organized corpus with the help of following activities.
Keyword Extraction Sentence Segmentation Handling of Abbreviations and Punctuation MarksHandling of Abbreviations and Punctuation Marks Tokenization
Stemming Identifying Group of Words with Specific MeaningIdentifying Group of Words with Specific Meaning Shallow Parsing Reference Resolution
8 July, 2007ENLIGHT sys 29
![Page 30: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/30.jpg)
Question Analysis
Q i T k i iQuestion Tokenization Question Classification
C M tCorpus Management Various database tables are created to manage the vast data
InfoKeywordsQuestionKeyword QuestionAnswer CorpusSentences Abb i iAbbreviations Apostrophes StopWords
Answer RetrievalAnswer Searching
8 July, 2007ENLIGHT sys 30
Answer Generation
![Page 31: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/31.jpg)
Answer Rescoring
Handling problems caused due to linguistic phenomena using shallow parsing based algorithms
Intelligence Incorporation
Semantic SymmetryAmbiguous Modification
LearningRote Learning
Intelligence Incorporation
g Feedback
Can ImproveSatisfactoryWrong AnswerLoose criterion
Automated Classification
8 July, 2007ENLIGHT sys 31
![Page 32: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/32.jpg)
Results
P iPreciseness
Response Timep
Adaptability
8 July, 2007ENLIGHT sys 32
![Page 33: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/33.jpg)
Preciseness
ENLIGHT Basic Keyword Matchingg
Average Number of sentences returned as Answer 3 34.6
Average Number of correct sentences 2.63 6
Average precision 84 % 32 %Average precision 84 % 32 %
8 July, 2007ENLIGHT sys 33
![Page 34: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/34.jpg)
Response Time (ENLIGHT Vs Sapere)
Type of Data andN f d
Time Required byQTAG
Time Required by MiniparNo. of words QTAG
(Used in ENLIGHT)Minipar
(Used in Sapere)
News extract, Times ofIndia 202 Words 1.71 s 2.88 sIndia. 202 Words
Reply, START QASystem. 251 Words 1.89 s 3.11 s
Google Search EngineResult 1.55 s 2.86 s
Y h S h E iYahoo Search EngineResults 1.67 s 3.13 s
AVERAGE 1.705 s 2.995 s
8 July, 2007ENLIGHT sys 34
![Page 35: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/35.jpg)
Adaptability
Handling Additional Keywords
Question like ‘who killed the Prime Minister?’ can also be handled by ENLIGHT Systemy y
Use of synonyms
If the question and answer contains synonyms ENLIGHT System can associate these two words using the Learning phase.
8 July, 2007ENLIGHT sys 35
![Page 36: Intelligent natural language system](https://reader034.fdocuments.in/reader034/viewer/2022051209/5485acecb47959ec0c8b4f13/html5/thumbnails/36.jpg)
References
L. Hirschman, R. Gaizauskas, Natural language question answering: the view from here, Natural Language engineering, 7(4), December 2001.
Manish Joshi, Rajendra Akerkar, The ENLIGHT System, Intelligent Natural Language System, Journal of Digital Information M JManagement, June 2007.
8 July, 2007ENLIGHT sys 36