Use of Patterns for Detection of Answer Strings

14
Use of Patterns for Detection of Answer Strings Soubbotin and Soubbotin

description

Use of Patterns for Detection of Answer Strings. Soubbotin and Soubbotin. Essentials of Approach. A certain shift from deep text analysis and NLP methods to surface techniques Use of formulas describing the structure of strings likely bearing certain semantic information. Example. - PowerPoint PPT Presentation

Transcript of Use of Patterns for Detection of Answer Strings

Page 1: Use of Patterns for Detection of Answer Strings

Use of Patterns for Detection of Answer Strings

Soubbotin and Soubbotin

Page 2: Use of Patterns for Detection of Answer Strings

Essentials of Approach

A certain shift from deep text analysis and NLP methods to surface techniques

Use of formulas describing the structure of strings likely bearing certain semantic information

Page 3: Use of Patterns for Detection of Answer Strings

Example

FBI Director Louis Freeh A person represented by his/her first/last

names A person occupies a post in an

organization

Page 4: Use of Patterns for Detection of Answer Strings

The formula

A word composed of capital letters An item from a list of posts in an

organization An item from a list of first names A capitalized word

Page 5: Use of Patterns for Detection of Answer Strings

Patterns

Formulas of such kind were called “patterns”

First used at TREC-10 QA track Each pattern is characterized by a certain

generalized semantics

Page 6: Use of Patterns for Detection of Answer Strings

Steps (Overview)

Identify strings corresponding to a formula Identify the question terms (types) Check for expressions negating the

semantics of the found strings Apply the set of formulas (for a particular

question type) to match the strings in question-relevant passages

Page 7: Use of Patterns for Detection of Answer Strings

A Surface Approach

No need to distinguish linguistic entities Formulas for strings look like regular

expressions But patterns include elements referring to

lists of predefined words/phrases

Page 8: Use of Patterns for Detection of Answer Strings

Patterns and Question Types

Who is person X? Who occupies post Y in organization Z?

A relationship is established between 2 or more entities: person, post, organization etc

Where-question: suggest geographical items as answersConstruct formulas like: item from list of

cities/towns/counties, countries/states.

Page 9: Use of Patterns for Detection of Answer Strings

Examples

”In what year” – questionsFind strings with a sequence of 4 digits

Questions regarding length, area, weight, speed, etcDigits plus units of measurement

“What is the area of Venezuela?”340,569 square miles (a simple pattern

match)

Page 10: Use of Patterns for Detection of Answer Strings

Complex Patterns

Strings expressing relationship between several semantic entities

The more complex a pattern is, the higher its reliability

Page 11: Use of Patterns for Detection of Answer Strings

Names and Dates

People Names Items from first name list Capitalized words Specific name elements (bin, van, etc) Abbreviations like Sr. and Jr.

Dates Prepositions, articles, digits, month names, commas,

dashes, brackets, phrases like “early,” “in the period of,” “years ago,” “B.C.”

Page 12: Use of Patterns for Detection of Answer Strings

Pattern-Matching Strings and Question Semantics How question words are located in the pattern-

matching string (distance, left/right, position to other matching strings etc)

Simplicity of a pattern’s structure is compensated by complexity of rules

Without applying heuristic rules, sufficiently reliable results cannot be ensured

Rank assigned to question words/phrases and score assigned to candidate answers

Page 13: Use of Patterns for Detection of Answer Strings

QA Process

Define question types for all questions Order the questions with more reliable patterns Form and rank queries from question terms Modify queries (if score is below threshold) Identify pattern-matching strings (apply complex

and then simple) Check correlation between patterns and

question semantics Identify exact answers and calculate their scores

Page 14: Use of Patterns for Detection of Answer Strings

Analysis of Results

TREC 2002: confidence-weighted score = 0.691271 right answers, 209 wrong answers, 148

“no answer”First 29 correct answers belonged to question

types with highly reliable patterns Incorrectly identified answer strings = 13.6%

(excluding NIL answers)