INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore...

12
INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel

Transcript of INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore...

Page 1: INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel.

INFORMATION EXTRACTION FROM QUERIESEd Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel

Page 2: INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel.

Information extraction from queries

Page 3: INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel.

Templates

Page 4: INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel.

Probabilistic query modelling

Page 5: INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel.

Key details

EP message passing for inference within single query model

ADF single pass through queries Sparse messages within query Bootstrap from initial seed sets of

instances/attributes Directed processing of queries based on

current top beliefs

Page 6: INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel.

Data

10 months, Live Search query logs 100 Million unique queries, with

associated counts Preliminary experiments on small

specific subsets e.g. 50,000 unique queries related to

actors, cars and national parks

Page 7: INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel.

Seed lists

Page 8: INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel.

Actors

Instances Attributes

tom cruise moviesbrad pitt picturesjohnny depp dealer.commatt damon photosgeorge clooney angelina joliecameron diaz nudescarlett johansson biographymel gibson newsgrand canyon heightsharon stone wedding

Page 9: INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel.

Cars

Instances Attributes

dealer {Year}honda civic partshonda accord hybridford mustang dealerdodge charger usedtoyota camry worldford explorer accessoriestoyota corolla fordford focus cleveland plaindodge durango wachovia

Page 10: INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel.

National Parks

Instances Attributes

grand canyon national parkyellowstone parkyosemite toursredwood lodgingdenali hotelseverglades lodgealgonquin westjoshua tree skywalkwest yellowstone gmcshenandoah college

Page 11: INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel.

Templates

Templates

[Inst] [Attr][Attr] [Inst]{Year} [Inst] [Attr][Attr] of [Inst][Inst] and [Attr][Attr] and [Inst][Attr] in [Inst]the [Attr] [Inst]how [Attr] is [Inst][Attr] [Inst] coupe[Attr] [Inst] partsthe [Inst] [Attr][Inst] 's [Attr][Inst] in [Attr]

Page 12: INFORMATION EXTRACTION FROM QUERIES Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel.

Future improvements

Class/Attribute dependent templates A garbage class to deal with “noise” Reducing sensitivity to order of

processing initial queries Disambiguation, synonyms etc. Use of part-of-speech tagger Combination with standard hand-crafted

entity extraction techniques