Post on 11-Jun-2015
Imam University College of Computer and Information systems
Computer sciences Department
Arabic Question Answering :by Asma Ahmad Asma alharbi
nadia AL-Mutiri Supervised by: Dr .Amal Al seef
Second semester :1434-14352013
Arabic Question Answering
Overview:O The implementation of Arabic
Question-Answering system components .
O QASAL & QARAB System components.
O Yes/No Arabic Question Answering.
ARABIQA GENERIC ARCHITECTURE
Named Entity Recognizer
O A NER system identifies proper names, temporal and numeric expressions .
O in this Arabic NER system is based ME approach.
O For the proper names recognition:
O For temporal and numeric expressions: is totally based on patterns and a small dictionary containing the names of days and months in Arabic, and numbers written in letters.
The implementation of Arabic Question-Answering
systemO NooJ is a linguistic environment that
includes large-coverage dictionaries and grammars.
O a spell-checker that corrects the most frequent errors.
O a named entity recognition tool which is set of rules described into local grammars
QASAL System components
Question analysis: this step it is apply the set of linguistic resources to the input
question.For example shows the NooJ’s text annotation structure that gives the
linguistic analysis of each word form in our sample question
Passage retrieval: The first task of this step could be the selection of one or more automatically extract the answer of the input question.
Answer Extraction: this last step uses the displayed concordance table to automatically extract the answer of the input question.
Example1 :Answer Extraction for the factoid question: متى تونس ؟استقّل0ت
Example 2:
QARAB System
f
NLB Tool
Question
Question analyzer
IR Ranked Document
s
Passage selection
Hypothesized
Answer
Al-Raya Newspape
rDocument
Answer Generati
on
full IR
system
Information Retrieval system.
O To search the document collection to select documents containing information relevant to the user’s query.
O Lundquist et al. [1999] IR system that can be constructed using a relational database management system (RDBMS).
O But in this paper it contain following database relations:
1. ROOT_TABLE.2. STEM_TABLE.3. POSTING_TABLE.4. DOCUMENT_TABLE.5. PARAGRAPH_TABLE.
The NLb system
The NLB model is:1. Tokenizer.2. type finder.3. feature finder.4. proper noun phrase parser.
How to extract the Answer
Assume the user posed the following question to QARAB:
ليس بالده بأن قال والذي الكويتي المركزي البنك محافظ هو منالميزانية؟ عجز من للحد الدينار قيمه لخفض نيه لديها
The IR return this passage . How!?
الصباح العزيز عبد سالم الشيخ الكويتي المركزي البنك محافظللحد الكويتي الدينار قيمة لخفض النية لديها ليس بالده ان أمس
الدينار . قيمة خفض بأن وقال الميزانية في المتزايد العجز من. الدولية المالية األسواق في ومصداقيتها الكويت باقتصاد سيضر
Step1: O performing token and remove the
stop word of question , Then tagging the word for POS.
Step 2:O QARAB constructs the query as a
“bag of words” and passes it to the IR system.
Exampleالكويتي محافظقال المركزي العزيز البنك عبد سالم الشيخ
لديها ليس بالده ان امس الدينار الصباح قيمة لخفض النيةلّلحد في العجزمن الكويتي خفض. الميزانيةالمتزايد بأن وقالالدينار باقتصاد قيمة االسواق الكويت سيضر في ومصداقيتها. الدولية المالية
Step 3: Determine the expected type of the answer: Who? >>> personal name .من
Step4: Generating the answer.الكويتي المركزي البنك محافظ الصباح قال العزيز عبد سالم ان الشيخ امس
في المتزايد العجز من للحد الكويتي الدينار قيمة لخفض النية لديها ليس بالدهفي. ومصداقيتها الكويت باقتصاد سيضر الدينار قيمة خفض بأن وقال الميزانية
. الدولية المالية االسواق
Yes/No ArabicQuestion
Answering
SYSTEM ARCHITECTURE:
Question Analysis module
Text retrieval module
Answer Selection module
Question AnalysisO Removing the question mark.O Removing the interrogative particleO Tokenizing: the tokenizer divides the
user question into its separate words .And normalize the (Alef) letter.
O Removing the stop words.O Removing the negation particles. (if it
exits) and set the negation property of the question representation
Question AnalysisO Tagging: to determine the type of a
word, verb or noun and obtain its root.
O Parsing: recall that the Arabic sentence after the interrogative particle is nominal or verbal.
Question AnalysisIn nominal sentence, we are interested with the
beginning noun “topic” (مبتدأ) which is the firstnoun after the interrogative particle (هل). And
the comment noun (خبر) and we can mark it as the
last noun without the article (ال).In verbal sentence we are interested with the
verb of the sentence which occur immediately after
the interrogative particle (ال) , and the subject that follow the verb.
Question Analysis
Logical Representation(With Nominal Sentences)Affirmative questions O N (Topic, root (Comment), root
({remaining words }))O N (Topic, root (Comment Synonyms),
root ({remaining words}))O ~N (Topic, root (Comment
Antonyms), root ({remaining words}))
Question AnalysisLogical Representation(With Nominal Sentences)
O Negated questions :O ~N (Topic, root (Comment), root
({remaining words}))O ~N (topic, root (Comment
Synonyms), root ({remaining words}))
O N (Topic, root (Comment Antonyms), root ({remaining words}))
Question AnalysisO Example
النافذه؟ كسرت سميره هلمبتدأ : سميره
حطمت -----> ) خبر (synonymكسرتO N(سميره, root ( كسرت),root(النافذه))O N(سميره, root (حطمت ),root(النافذه))
Question AnalysisLogical Representation(With Verbal Sentences)Affirmative questions :O V (Subject noun, root (verb), root ({remaining
words}))O V (Subject noun, root (verb Synonyms), root
({remaining words}))O ~V (Subject noun, root (verb Antonyms), root
({remaining words}))
Question Analysis
Logical Representation(With Verbal Sentences)
Negated questions O ~V (Subject noun, root (verb), root
({remaining words}))O ~ V (Subject noun, root (verb Synonyms),
root ({remaining words}))O V (Subject noun, root (verb Antonyms),
root ({remaining words}))
Question Analysis
Exampleالباب؟ محمد فتح هل
( اغلق : ---> فعل (Antonymفتحفاعل : محمد
O V(محمد, root (فتح),root(الباب))O ~V(محمد, root (اغلق),root(الباب))
Text Processing & Retrieval
They are 20 documents in corpus. This module uses two techniques to retrieve the top 5
candidate paragraphs (with variable length (that are most relevant to the user question:
O Paragraphs technique: - Split the documents into its built-in paragraphs and retrieve the top 5 paragraphs regardless from which document they are, according to some indexing scheme.
O Document technique-:Retrieve the top 5 documents after they are ranked, then use the first indexing scheme to retrieve the top 5 paragraphs.
Answer Selection & generation
After the 5 paragraphs are selected using documents technique or paragraphs technique, we need to select the best sentence to represent the answer, and accordingly generates yes or no .
Answer Selection & generation
O Split the paragraphs into their sentences .
O In normal sentences we are interested in the exact topic (مبتدأ) not its used root, so we omit each sentence that does not contain it (in the original form )In verbal sentence we are interested in the exact subject (فاعل) not its used root , so we omit each sentence that does not contain it (in the original form )
Answer Selection & generation
O In the result sentence , we look for the remaining terms (in root form) that derived from the
question in the logical representation (except the subject or the topic ), if the they exist , assign
those indexes according to their position in the sentence. So each sentence will have its own rank
as follow :Rank =last occurrence - first occurrenceO look for ( النفي negation particles in the (ادوات
selected answer (if exist).
Answer Selection & generation
O Using the selected answer and the logical representation of the question to generate yes ,or no a follows :
1. Yes ,if : The question and the answer are affirmative .The question and the answer are negated.
2. No, if :The question if affirmative and the answer are negated.The question is negated and the answer is affirmative.
EXPERIMENTS AND RESULTS
69% Arabic QA system
97.3% Arabic Q-A uses QARAB
83.3% PR system
conclusionO We have described the generic
architecture for AQ answer O compare with deferent system O How presses the question and give
the answers.