Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter...

68
Keyword Proximity Keyword Proximity Search on Graphs Search on Graphs M.Sc. Systems Course M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    212
  • download

    0

Transcript of Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter...

Page 1: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Keyword Proximity Search on Graphs Graphs

M.Sc. Systems CourseM.Sc. Systems CourseThe Hebrew University of Jerusalem, Winter 2006

Page 2: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

A rapidly evolving paradigm for data extraction

Data have varying degrees of structure

Queries are sets of keywords− No structural constraints

Keyword Proximity Search

Relational Databases

Web SitesXML

Documents

The Goal:The Goal:

Extract meaningful parts of data w.r.t. the keywords

Page 3: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Recent Work on KPS (Keyword Proximity Search)

• DataSpotDataSpot (Sigmod 1998)

• Information Units Information Units (WWW 2001)

• BANKSBANKS (ICDE 2002, VLDB 2005)

• DISCOVERDISCOVER (VLDB 2002)

• DBXplorerDBXplorer (ICDE 2002)

• XKeyword XKeyword (ICDE 2003)

• ……

Page 4: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Systems for KPS on Relational Data

BANKS, DISCOVER and DBXplorer implemented KPS (Keyword Proximity Search) on relational databases Different algorithms are used Slight differences in semantics

G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, pages 431–440, 2002.

V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, pages 670–681, 2002.

S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: enabling keyword search over relational databases. In SIGMOD Conference, page 627, 2002.

Page 5: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Example: KPS on RDB

IDNamePopulation

22Amsterdam1101407

73Brussels951580

IDNameHead Q.

135EU73

175ESA81

CountryOrg.

B135

NL135

search Belgium , Brussels

CodeNameAreaCapital

NLNetherlands3733022

BBelgium3051073

CitiesCities OrganizationsOrganizations

CountriesCountries MembershipsMemberships

Page 6: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

IDNamePopulation

22Amsterdam1101407

73Brussels951580

IDNameHead Q.

135EU73

175ESA81

CountryOrg.

B135

NL135

search Belgium , Brussels

CodeNameAreaCapital

NLNetherlands3733022

BBelgium3051073

CitiesCities OrganizationsOrganizations

CountriesCountries MembershipsMemberships

Brussels is the capital city of Belgium

Page 7: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

IDNamePopulation

22Amsterdam1101407

73Brussels951580

IDNameHead Q.

135EU73

175ESA81

CountryOrg.

B135

NL135

search Belgium , Brussels

CodeNameAreaCapital

NLNetherlands3733022

BBelgium3051073

CitiesCities OrganizationsOrganizations

CountriesCountries MembershipsMemberships

BBelgium3051073 73Brussels951580

Brussels is the capital city of Belgium

Page 8: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

IDNamePopulation

22Amsterdam1101407

73Brussels951580

IDNameHead Q.

135EU73

175ESA81

CountryOrg.

B135

NL135

CodeNameAreaCapital

NLNetherlands3733022

BBelgium3051073

CitiesCities OrganizationsOrganizations

CountriesCountries MembershipsMemberships

Brussels hosts EU and Belgium is a member

search Belgium , Brussels

Page 9: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

IDNamePopulation

22Amsterdam1101407

73Brussels951580

IDNameHead Q.

135EU73

175ESA81

CountryOrg.

B135

NL135

CodeNameAreaCapital

NLNetherlands3733022

BBelgium3051073

CitiesCities OrganizationsOrganizations

CountriesCountries MembershipsMemberships

BBelgium3051073

73Brussels951580

Brussels hosts EU and Belgium is a member

search Belgium , Brussels

B135 135EU73

Page 10: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

XKeyword: KPS on XML

XKeyword implemented KPS on XML Architecture is based on that of DISCOVER

A demo over DBLP is available

• http://kebab.ucsd.edu:81/xkeyword

V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on XML graphs. In ICDE, pages 367–378, 2003.

Page 11: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Example: KPS on XML

dblp

title

author

article

MihalisYannakakis

On theApproximationof MaximumSatisfiability

title

author

article

ImprovedApproximationAlgorithms for

MAX SAT

TakaoAsano

David P.Williamson

authorreferences

cite

search Yannakakis , Approximation

Page 12: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Yannakakis wrote a paper about Approximation

dblp

title

author

article

MihalisYannakakis

On theApproximation

of MaximumSatisfiability

title

author

article

ImprovedApproximationAlgorithms for

MAX SAT

TakaoAsano

David P.Williamson

authorreferences

cite

search Yannakakis , Approximation

Page 13: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

dblp

title

author

article

MihalisYannakakis

On theApproximationof MaximumSatisfiability

title

author

article

ImprovedApproximationAlgorithms for

MAX SAT

TakaoAsano

David P.Williamson

authorreferences

cite

Yannakakis is cited by a paper about Approximation

search Yannakakis , Approximation

Page 14: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

KPS on Web Sites (Information Units)

• KPS can also be used for retrieving information from Web sites

• For a given query, results are collections of Web pages from the site

– Pages are relevant w.r.t. the keywords

– Pages are connected by hyperlinks

Wen-Syan Li, K. Selçuk Candan, Quoc Vu, and Divyakant Agrawal. Retrieving and organizing web pages by “information unit”. In WWW, pages 230-244, 2001.

Page 15: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Example: KPS in Web Sites

http://www.goisrael.com/http://www.goisrael.com/

search Hilton , Beach

Page 16: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Example: KPS in Web Sites

Eilat Beaches

Hilton Eilat Queen of Sheba

search Hilton , Beach

Eilat

Page 17: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

A Formal Framework for KPSA Formal Framework for KPS

Page 18: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Data Graphs

company

supplies

supply

product

supplier

papersA4

company

supplies

supply

product

supplier

coffee

president

Cohen

department

Summers

manager

Parishqhq

Data graphs have two types of nodes: Structural nodes

Keywords

Page 19: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Queries

K={ Summers , Cohen , coffee }company

supplies

supply

product

supplier

papersA4

company

supplies

supply

product

supplier

coffee

president

Cohen

department

Summers

manager

Parishqhq

Queries are sets of keywords from the data graph

Page 20: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Query Results

company

supplies

supply

product

supplier

papersA4

company

supplies

supply

product

supplier

coffee

president

Cohen

department

Summers

manager

Parishqhq

Page 21: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Query Results

company

supplies

supply

product

supplier

papersA4

company

supplies

supply

product

supplier

coffee

president

Cohen

department

Summers

manager

Parishqhq

Query results are subtrees of the data graph Contain all keywords in the query

Have no redundant edges

A subtree that isreduced w.r.t. thekeywords

Page 22: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Three Variants

Three variants of keyword proximity search are considered:

Rooted proximity

Undirected proximity

Strong proximity

Page 23: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Rooted Variant

company

supplies

supply

product

supplier

papersA4

company

supplies

supply

product

supplier

coffee

president

Cohen

department

Summers

manager

Parishqhq

Used in BANKS BANKS

Results are rooted trees

Page 24: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Undirected Variant

company

supplies

supply

product

supplier

papersA4

company

supplies

supply

product

supplier

coffee

president

Cohen

department

Summers

manager

Parishqhq

Used in Interconnection Interconnection Semantics for XMLSemantics for XML

Results are undirected trees

Page 25: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Strong Variant

company

supplies

supply

product

supplier

papersA4

company

supplies

supply

product

supplier

coffee

president

Cohen

department

Summers

manager

Parishqhq

Used in XKeywordXKeyword, Information Information UnitsUnits, DBXplorerDBXplorer and DISCOVERDISCOVER

Results are undirected treesand keywords are leaves

Page 26: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

DataData

A data graph G

Problem Definition

QueryQuery

A set K of keywords in G

Query ResultsQuery Results

Subtrees of G that are reduced w.r.t. K

Input:Input:

Output:Output:

Rooted/Undirected/Strong

Page 27: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Creating Data Graphs from Relational Databases

Nodes are tuples

Edges are foreign-key references

Page 28: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Creating Data Graphs from Relational Databases

Edges from each tuple node to all the keywords in that tuple

Belgium 30510B 73

Belgium 30510B 73

Page 29: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Creating Data Graphs from XML

Nodes are XML elements

dblp

article article

On theApproximationof MaximumSatisfiability

titleMihalis

Yannakakis

authorTakao Asano

authorDavid P.

Williamson

authorImproved

ApproximationAlgorithms for

MAX SAT

titlecite

Page 30: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Creating Data Graphs from XML

dblp

article article

On theApproximationof MaximumSatisfiability

titleMihalis

Yannakakis

authorTakao Asano

authorDavid P.

Williamson

authorImproved

ApproximationAlgorithms for

MAX SAT

titlecite

Nodes are XML elements

Edges are nesting of elements …Edges represent

nesting of elements …

Page 31: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Creating Data Graphs from XML

dblp

article article

On theApproximationof MaximumSatisfiability

titleMihalis

Yannakakis

authorTakao Asano

authorDavid P.

Williamson

authorImproved

ApproximationAlgorithms for

MAX SAT

titlecite

Nodes are XML elements

Edges represent nesting of elements …

… and ID references

Page 32: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Creating Data Graphs from XMLKeywords appear in PCDATA

dblp

article article

On theApproximationof MaximumSatisfiability

titleMihalis

Yannakakis

authorTakao Asano

authorDavid P.

Williamson

authorImproved

ApproximationAlgorithms for

MAX SAT

titlecite

Nodes are XML elements

… and ID references

Edges are nesting of elements …Edges represent

nesting of elements …

Page 33: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

All Occurrences of a Keyword are Represented by One Node

dblp

article article

On theApproximationof MaximumSatisfiability

titleMihalis

Yannakakis

authorTakao Asano

authorDavid P.

Williamson

authorImproved

ApproximationAlgorithms for

MAX SAT

titlecite

Approximation Approximation

A keywords is represented by a single node

Page 34: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Creating Data Graphs from Web Sites

Nodes are Web pages …

Keywords appear in these pages …

Edges are hyperlinks/XLinks

http://www.goisrael.com/http://www.goisrael.com/

A keywords is represented by a single

node

Page 35: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Ranking and Enumeration OrderRanking and Enumeration Order

Page 36: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Ranking Results

Yannakakis

Approximation

title

Yannakakis

Approximation

dblp

article

title

article

title

Yannakakis

Approximation

article

title

article

title

cite

references

Ranking of results is determined by size

2 13

Page 37: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Edges Have Weights

Yannakakis

Approximation

dblp2

article

2

title

1

1

article

title1

1

Yannakakis

Approximation

article

title

article

title

cite

references1

1.5

1

1

1

1 1

1

Yannakakis

Approximation

title

edges incident to dblp have a large weight

edges from cite to article have a medium weight

2 13

Page 38: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Order of Results

Arbitrary Order

Exact Order ji RRji ,

Page 39: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Order of Results (cont’d)

Heuristic Order

C-Approximate Order

ji RCRji ,

Page 40: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Measuring the Efficiency of Measuring the Efficiency of EnumerationsEnumerations

Page 41: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Polynomial Runtime is not Appropriate for KPS

• In the theory of CS, the usual notion of efficiency is polynomial running time That is, the algorithm terminates in time that is

polynomial in the size of the input

• However, in KPS the number of results can be exponential in the size of the input Algorithms cannot be expected to terminate in

polynomial time

Even for two keywords

• Therefore, other notions are required

Page 42: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Time Efficiency

Polynomial Total TimePolynomial Total Time

Polynomial runtime in the combined size of the input and the output

Polynomial DelayPolynomial Delay

The runtime between two successive results is polynomial in the size of the input

Page 43: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

About Polynomial Delay

• With polynomial delay you can: Generate the first few results quickly

Efficiently return results in pages

• In most cases of keyword search, this is the suitable notion of efficiency

• Goal: develop algorithms that enumerate KPS results with polynomial delay

Page 44: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Space Efficiency

Polynomial Space

Linearly-Incremental Space i results require i times polynomial space in

the input

Page 45: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Data and Query-and-Data Complexity

• Under query-and-data complexity, we assume that both the query and the data are of unbounded size Many problems in database theory, e.g.,

computing joins of relational tables, are intractable under this measure

• In practice, however, queries are very small compared to the data

• Under data complexity, the size of the query is assumed to be fixed

Page 46: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Enumerating Results of KS with Enumerating Results of KS with Polynomial DelayPolynomial Delay

Page 47: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Keyword Search with Polynomial Delay

• The following algorithm enumerates reduced subtrees (i.e., results of keyword search) with polynomial delay Results are not ranked

• A different version of the algorithm for each of the three variants: rooted

undirected

strong

Page 48: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Importance of the Algorithm

• An upper bound for ranked keyword search: Results can be enumerated in ranked order in polynomial total time Generate all the results and then sort them

• In some cases, ranking is not required

• A basis for developing efficient heuristics that enumerate in an “almost” ranked order (discussed later)

Page 49: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

The Algorithm for Enumerating The Algorithm for Enumerating Rooted Reduced SubtreesRooted Reduced Subtrees

Page 50: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Overview

• The algorithm uses two reductions

• Each reduction alone either does not solve the problem or runs in exponential total time

• However, the two reductions can be combined together to enumerate reduced subtrees with polynomial delay

Page 51: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Data Reduction

1. Choose an arbitrary node v in K

2. For each parent p of v do:

I. In K: replace v with p

II. In G: remove v

III. Generate all results for the new input

IV. Add p→v to each result of the new input

A

KKGG

A B

p

vA B v

pvv

A B

p

v

p

A B

p

v A B A Bv v

B

Page 52: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Example Showing Failure

B

AC

KK

A

BC

Four results!

Two with this

root

Two with this

root

Page 53: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Failure Example

B

AC

KK

A

BC

Page 54: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Failure Example

B

C

KK BC

Page 55: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Failure Example

C

KK

C

Page 56: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Failure Example

KK

Page 57: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Failure Example

C

B

A

Only one result!

Three others are missing!

Page 58: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Why Data Reduction Fails

• We assumed that v is a leaf in every result

• It does not hold for structural nodes in recursive steps!

• Therefore, some results are not found!

• Solution(?): Repeat data reduction for every v in K Exponential total time in the worst case!

Page 59: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Query Reduction

1. Remove one keyword from the query

2. Find all results for the smaller query

3. Extend each result to include the missing keyword, in every possible way

A K= {A,B,C}

A

B

BA

A

BC

C

A

B

C BA A

C

B CBA

Page 60: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Extending Partial Results

• In query reduction, we need to extend a result T of the query K\{k} to all results of the query K

• This is done as follows: For all nodes v of T:

• Remove from G all nodes of T, except for v

• Find all simple directed paths P from v to k and print the concatenation of T and P

• If v is the root of T, we also need to concatenate T with all subtrees that are reduced w.r.t. v and k

• More details are can be found in the paper

Page 61: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Extensions by Directed Paths

Page 62: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Extensions by Directed Subtrees

Page 63: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Query Reduction is not Efficient!

• Query reduction completely solves the problem, but it is inefficient

• Problem: A subset of the query may have much more results than the query itself

Exponential total time!

A B CnA B CnA B CnA B Cn

2n results

for {A,B}

1 result for

{A,B,C}

Page 64: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Combining the Reductions

• In order to enumerate in polynomial total time, combine query and data reductions: If some node v of K is reachable, in the data

graph, from another node u of K, use query reduction

• remove v from K

Otherwise, use data reduction

• By combining the two reductions, results can be enumerated in polynomial total time

v

u

Page 65: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

Achieving Polynomial Delay

• To achieve polynomial delay, we cannot wait until a recursive subroutine terminates

• Use coroutines instead of subroutines!

• That is, each recursive execution of the algorithm

stops after generating each result

resumes when the next result is required

Page 66: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

routine 3 routine 2 routine 1

Subroutines

Base

Polynomial Polynomial Total TimeTotal Time

Page 67: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

Keyword Proximity Search on Graphs MSSYS 2006

routine 3 routine 2 routine 1

Coroutines

Base

Polynomial Polynomial DelayDelay

Page 68: Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.

For papers and projects related to this topic, see the home page of Benny Kimelfeld