IBM’s DeepQA, or Watson. Little history Carnegie Mellon (CMU) collab. OpenEphyra (2002) Piquant...

IBM’s DeepQA, or

Watson

Little history

• Carnegie Mellon (CMU) collab.

• OpenEphyra (2002)• Piquant (2004)

• Initially 15% accuracy • 15% is not very good, is it?

OpenEphyra, Piquant, & Jeopardy!

Source: [1] http://www.aaai.org/Magazine/Watson/watson.php

http://www.aaai.org/Magazine/Watson/watson.php

Principles

• Massive parallelism: Exploit massive parallelism in the consideration of multiple interpretations and hypotheses.

• Many experts: Facilitate the integration, application, and contextual evaluation of a wide range of loosely coupled probabilistic question and content analytics.

• Pervasive confidence estimation: No component commits to an answer; all components produce features and associated confidences, scoring different question and content interpretations. An underlying confidence-processing substrate learns how to stack and combine the scores.

• Integrate shallow and deep knowledge: Balance the use of strict semantics and shallow semantics, leveraging many loosely formed ontologies.

Source: [4] http://xkcd.com/720/ Randall Munroe (CC BY-NC 2.5)

http://xkcd.com/720/

20 researchers, 3 years later (2008)

Source: [1] http://www.aaai.org/Magazine/Watson/watson.php


What’s Watson’s source of information?

Structured content

databases, taxonomies, ontologies

Domain data

encyclopedias, dictionaries, thesauri, newswire articles, literary works

Machine learning? Test question training sets

Learning framework• Trained with a set of approximately 25,000

Jeopardy! questions comprising 5.7 million question-answer pairs (instances) where each instance had 550 features.

• Implemented machine learning techniques such as: transfer learning, stacking, and successive refinement.

Learning framework is based on phases

• Configurable• Uses 7 phases for Jeopardy• Trained with a set of approximately 25,000

Jeopardy! questions comprising 5.7 million question-answer pairs (instances) where each instance had 550 features.

• Implemented machine learning techniques such as: transfer learning, stacking, and successive refinement.

Phases• 1. Hitlist normalization• 2. Base• 3. Transfer learning• 4. Merge evidence• 5. Elite• 6. Evidence Diffusion• 7. Multi-answers

– Within phases there are 3 main steps:• 1. Evidence Merging• 2. Postprocessing• 3. Classifier: Training/Application?

1. Hitlist Normalization:

• Merge identical strings from different sources. Partition into question classes. Different classes of questions such as multiple choice, useless LAT: eg. “it” “this”, date questions, and so forth may require different weighing of evidence. The DeepQA confidence estimation framework supports this through the concept of routes. In the Jeopardy! system, profitable question classes for specialized routing were manually identified. Routes are archetypes.

2. Base:

• Weed out extremely bad candidates; the top 100 candidates after hitlist normalization are passed to later phases. With at most 100 answers per question, the standardized features are recomputed. The recomputation of the standardized features at the start of the base phase is the primary reason that the Hitlist Normalization phase exists: By eliminating a large volume of junk answers (ie. ones that were not remotely close to being considered), the remaining answers provide a more useful distribution of feature values to compare each answer to.

3. Transfer learning:

• For uncommon question classes ie. Adding more routed models

• The phase-based framework supports a straightforward parameter-transfer approach to transfer learning by passing one phase’s output of a general model into the next phase as a feature into a specialized model.

• Logistic regression uses a linear combination of weights, the weights that are learned in the transfer phase can be roughly interpreted as an update to the parameters learned from the general task.

Logistic regression• Research group experimented with:

• Logistic regression• Support vector machines (SVMs) with linear and nonlinear kernels, • Single and multilayer neural nets• Boosting• Decision trees• Locally weighted learning• Etc.

• Logistic regression found to be the best method for classifying / gauging weights. Used in all phases / steps.

4. Evidence Merging(=ANSWERS):

• Between equivalent answers • Selecting a canonical form. E.g.:

• John F. Kennedy• J.F.K.• Kennedy.

• Need robust methods!• Neuro-linguistic programming (NLP)

• Can merge answers that are connected by a relation other than equivalence. It merges answers when it detects a more_specific relation between them.

• “MYTHING IN ACTION: One legend says this was given by the Lady of the Lake & thrown back in the lake on King Arthur’s

death.”

• Watson merged the two answers ”sword”, ”Excalibur” and selected ”sword” as canonical form because it had higher initial points.

5. Elite:

• Near the end of the learning pipeline trains and applies to only the top five answers as ranked by the previous phase.

• Similar to phase 2. Base.

6. Evidence Diffusion:

• Diffusing evidence between related answers. • Diffusion criteria.• Similar to the Answer Merging phase but

combines evidence from related answers, not equivalent ones

• WORLD TRAVEL: If you want to visit this country, you can fly into Sunan

International Airport or ... or not visit this country. (Correct answer: North Korea)

• Most sources would cite Pyongjang as the location of the airport, overwhelming the answer North Korea.

• Evidence may be diffused in this phase from source (North Korea) to target (Pyongjang)

• 1. Has to meet expected target type (is a country)• 2. There is a semantic relation (located_in)• 3. The transitivity of the relation allows for

meaningful diffusion given the question.

7. Multi-answers:

• Join answer candidates for questions requiring multiple answers.

3 Steps…• 1. Evidence Merging: combines evidence for a given question-

answer pair across different occurrences (e.g., different passages containing a given answer).

• 2. Postprocessing: transforms the matrix of question-answer pairs and their feature values (e.g., removing answers and/or features, deriving new features from existing features). Sensitivity and dynamic range. Relativity!

• 3. Classifier Training/Application: runs in either training mode, in which a model is produced over training data, or application mode, where the previously trained models are used to rank and estimate confidence in answers for given question.

3. Application classifier

• After merging, time to rank answer confidence based on merged scores.

• Watson uses machine learning to assign a confidence level to each of the merged answers, on how likely they are correct.

• Ensemble methods:• Mixture of experts• Stacked generalisation metalearner

This is just the learning framework.

Sources• [1] http://www.aaai.org/Magazine/Watson/watson.php • ASSOCIATION FOR THE ADVANCEMENT OF ARTIFICIAL INTELLIGENCE• Building Watson: An Overview of the DeepQA Project• Published in AI Magazine Fall, 2010. Copyright ©2010 AAAI. All rights reserved.• Written by David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek,

Aditya A. Kalyanpur, Adam Lally, J. William Murdock, Eric Nyberg, John Prager, Nico Schlaefer, and Chris Welty

• [2] http://brenocon.com/watson_special_issue/14%20a%20framework%20for%20merging%20and%20ranking%20answers.pdf

• A framework for merging and ranking of answers in DeepQA.• D. C. Gondek A. Lally A. Kalyanpur J. W. Murdock P. A. Duboue L. Zhang Y. Pan Z. M. Qiu C.

Welty• [3]

https://laplacian.wordpress.com/2011/02/27/how-ibms-watson-computer-thinks-on-jeopardy/

• Blog, Free Won’t• [4] http://imgs.xkcd.com/comics/recipes.png


http://brenocon.com/watson_special_issue/14%20a%20framework%20for%20merging%20and%20ranking%20answers.pdf






Questions?

• Further reading:• http://en.wikipedia.org/wiki/Learning_to_rank• http://en.wikipedia.org/wiki/Supervised_learning• http://en.wikipedia.org/wiki/Neuro-linguistic_progr

amming

http://en.wikipedia.org/wiki/Learning_to_rank

http://en.wikipedia.org/wiki/Supervised_learning

http://en.wikipedia.org/wiki/Neuro-linguistic_programming

http://en.wikipedia.org/wiki/Neuro-linguistic_programming

IBM’s DeepQA, or Watson. Little history Carnegie Mellon (CMU) collab. OpenEphyra (2002) Piquant...

Documents

Transcript of IBM’s DeepQA, or Watson. Little history Carnegie Mellon (CMU) collab. OpenEphyra (2002) Piquant...