The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight...

26
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly Zhen Zhang, Bin He, and Kevin C. Chang

description

MetaQuerier 3 The Need: Querying alternative sources in the same domain Sources are proliferating in the same domain  2004 survey found 10% Web sites are “deep”  totaling 450,000 DBs on the Web Each query can often find many useful DBs Different query needs different sources  How to query across dynamic sources?

Transcript of The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight...

Page 1: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

The Database and Info. Systems Lab.University of Illinois at Urbana-Champaign

Light-weight Domain-based Form Assistant:

Querying Web Databases On the Fly

Zhen Zhang, Bin He, and Kevin C. Chang

Page 2: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 2

The Context: MetaQuerier @ UIUC Exploring and integrating the deep Web

Explorer• source discovery• source modeling• source indexing

Integrator• source selection• schema integration• query mediation

FIND sources

QUERY sources

db of dbs

unified query interface

Amazon.comCars.com

411localte.com

Apartments.com

Page 3: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 3

The Need: Querying alternative sources

in the same domain Sources are proliferating in the same domain

2004 survey found 10% Web sites are “deep” totaling 450,000 DBs on the Web

Each query can often find many useful DBs Different query needs different sources

How to query across dynamic sources?

Page 4: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 4

The Problem: Query translation on-the-fly

Challenge: No pre-configured source-specific translation knowledge

Requirements: Within domain: Source generality Across domain: Domain portability

Page 5: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 5

Dynamic query translation – Essential tasks Reconcile three levels of query heterogeneities

Attribute level: schema matching Predicate level: predicate mapping Query level: query rewriting

Page 6: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 6

Demo.Form Assistant to help navigate the deep Web.

Page 7: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 7

Translation objective: Closest among the valid

Tom ClancyTom Clancy

Source query Qs on source form S

U

Target query form T

Query Translation

Filter : σtitle contain “red storm” and price < 35 and age > 12

Union Query Qt*:

Input:

output:

Two goals: Syntactic valid semantic close

Page 8: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 8

What is valid? Each source has a query model Vocabulary: predicate templates

{ P1, P2, P3, P4, P5 }

Syntax: valid combination of predicate templates { F1, F2, F3, F4, F5, F6, F7, F8 }

P1 P3 P4P2

F1 F2 F3 F4 F5 F6 F7 F8

P1 ν νP2 ν νP3 ν νP4 ν νP5 ν ν ν ν

Tom Clancy

P5

F5:

F6:

Page 9: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 9

What is close? Define semantic closeness.

Minimal subsuming Cmin

No false positive: Miss no answer Minimizing false negative: Fewest extra answers Clear semantics: DB content independent Modular translation: Reduce translation complexity

t1:0 25

t2:25 45

s: 350

t1 v t2:0 45

t3:6545

t1 v t2 v t3:0 65

? Cmin

Page 10: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 10

Target Query

Source Query

Enumeratevalid

Search for closest

Target Query

Query Translation

Source Query

What mechanism?

Attribute Match

Predicate Mapping

Query Rewriter

Cmin?

Page 11: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 11

Form Extractor Form Extractor

Source query Qs Target query form QI

Attribute Matcher:Syntax-based schema matching

Predicate Mapper:Type-based search-driven mapping

Query Rewriter:Constraint-based query rewriting

Target query Qt*

Domain-specificThesaurus

Domain-specific type handlers

System architecture: Modular & lightweight

Modularized mechanism

Lightweight domain knowledge

[RahmBernstein- VLDBJ01]

[Halevy-VLDBJ01]

?

[ZhangHC-SIGMOD04]

[HeChang-SIGMOD03]

[WuYDM-SIGMOD04]

Page 12: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 12

The core challenge: Predicate mapping Tasks

Choose operator Fill in values

Union of target predicate t*

Predicate MappingPredicate Mapping

U

Objective Minimal subsuming

Input:

output:

Page 13: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 13

Is source-specific translation applicable?

1 ………1

……

……

..

1 ……

1

…….

adult = $t passenger = $t… …

price<$t if $t<25:

[price:between:0,25] elseif $t<45: …… …

Page 14: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 14

Enable source-generic predicate mapping?

What is the scope of translation?

What is the mechanism of translation?

Page 15: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 15

The right scope? Survey 150 sources for the Correspondence Matrix.

Correspondences occur within localities!

Page 16: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 16

The right scope? Correspondence locality Type-based translation Target template P

Target Predicate t*

Type Recognizer

Domain Specific Handler

Text Handler

Numeric Handler

Datetime Handler

Predicate Mapper

Source predicate s

Correspondences occur within localities Translation by type-handler

Page 17: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 17

The right mechanism: Is pairwise-rule based mechanism suitable?

Template

new template

1 n n+11

n

n+1

Adding one template needs to add 2n rules! And need knowledge of the old templates.

attr<$t if $t<25: [attr:between:0,25] elseif $t<45: …… …

Rule:

Page 18: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 18

More extendable mechanism? Search-driven.

Values of the type(virtual database)

Evaluate over “database”

Templates of same type

Evaluation resultsSearch for closest

evaluator

-infinite +infinite0 1

t1: 0 25

t2:25 45

s: 350

t1 v t2:25 45

st

… …

uevaluator

Page 19: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 19

Greedy search to construct Cmin mapping Find mapping iteratively Each iteration, greedily choose the one covering

maximal uncovered

t1:0 25

t2:25 45

s:350

t3:45 65

Page 20: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 20

Experiments Translating 120 queries in total Between randomly paired sources from 8 domains With domain thesaurus but no type handler Accuracy as ratio of correct condition per query

Mat

chin

g

18%

40%

42%

Extraction

Mapping

Average accuracy Error distributionBasic: 3 domains New: 5 domains

Page 21: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 21

Conclusion

System: Form assistant for querying Web databases

Problem Dynamic query translation

Contributions: Framework: Light-weight domain-based architecture Techniques: Type-based search-driven pred. mapping

Insight: Holistic integration holds promise!

Page 22: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 22

Thank You!

For more information:

http://metaquerier.cs.uiuc.edu [email protected]

Page 23: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 23

What is close? Define semantic closeness.

Minimal subsuming Cmin

No false positive Miss no correct answer

Minimizing false negative Contain fewest extra answers

Clear semantic Database content independent

Modular translation Reduce translation complexity

t1: 0 25

t2:25 45

s: 350

t1 v t2:25 65

t3:6545

t2 v t3:25 65

?

Cmin

Page 24: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 24

Experiment: Accuracy distribution

Accuracy distribution for Basic dataset Accuracy distribution for New dataset

Page 25: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 25

Text handler: Search space

Conceptually, union of all target predicate Practically, close-world assumption

Page 26: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

MetaQuerier 26

Text handler: Closeness estimation Ideally, logic reasoning Practically, evaluation-by-materialization

Materialize query against a “complete” database