Welcome to 3 rd Grade! Let’s learn some fun facts about OCES together!
Corroborate and Learn Facts from the Web
description
Transcript of Corroborate and Learn Facts from the Web
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Corroborate and Learn Factsfrom the Web
Presenter : Lin, Shu-HanAuthors : Shubin Zhao, Jonathan Betz
SIGKDD (2008)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
2
Outline
Motivation Objective Methodology Experiments Conclusion Comments
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Motivation
Many “Facts” The movie
“Independence day”
3
Wikipedia
Infoplease.com
moviefone.com
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Motivation
Combine them Mentioned
The director of movie
“Roland Emmerich”
4
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Objectives
Cache the new “facts”: attribute + value Have the same HTML patterns
Then corroborate these new “facts” Check other website also mentioned
about these “facts” or not
Learn this factGood fact: commonly referenced.Incorrect facts: very few mentioned.
5
Attribute value
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Overview
6
3. Extract New facts
2. Match
1. Relevant Page
Wiki、 Seed set
Search Engine
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – Corroborate fact– Common fact
7
A common fact “Susan”, gender: female
Threshold:
Match
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Extract New facts
8
Cache “Repeated HTML patterns”
3. Extract New facts
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments
9
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments
10
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
11
Conclusions
Find relevant pages about entities Extract new facts by corroborating existing facts Base on string match and HTML pattern discovery
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
12
Comments
Advantage Idea is intuitive Language independent Search and integrate information/data on web
Drawback Can only adapt to the old entities or Lots of information hide in the articles, not only tables.
Application We can’t use it to extract the comment or new information, such as the
comments of food in the blog
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
13
Edit distance