Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita...
-
Upload
janice-atkinson -
Category
Documents
-
view
215 -
download
0
Transcript of Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita...
![Page 1: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/1.jpg)
Answering Table Queries on the Web using Column Keywords
Rakesh PimplikarIBM Research
Sunita Sarawagi IIT Bombay
1
![Page 2: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/2.jpg)
User Query
Answer Table
Table Query Example
Name of Explorers Nationality Areas Explored
Vasco da Gama Portuguese Sea route to India
Abel Tasman Dutch Oceania
Christopher Columbus Caribbean
. . . . . . . . .
2
Name of Explorers Nationality Areas Explored
![Page 3: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/3.jpg)
Mountains in North America Mountains in North America
Mount McKinley
Mount Saint Elias
Mount Lucania
. . .
Types of Structured Queries• Entities
• Relationship between two entities
• Entities with values of attributes
Pain Killers Side Effects Pain Killers Side Effects
aspirin asthma
ibuprofen asthma, upset stomach
naproxen sodium upset stomach
. . . . . .
Name of Explorers
Nationality Areas Explored
Name of Explorers Nationality Areas Explored
Vasco da Gama Portuguese Sea route to India
Abel Tasman Dutch Oceania
Christopher Columbus Caribbean
. . . . . . . . .
3
![Page 4: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/4.jpg)
Our Data SourceTables on the Web
(elizabethan-era.org.uk)
4
(wikipedia.org)
(vaughns-1-pagers.com)
• Richer sources of structured knowledge than free-format text
![Page 5: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/5.jpg)
Afghanistan Kabul
Albania Tirana
Algeria Algiers
Andorra Andorra la Vella
. . . . . .
Challenges
Name Nationality Main areas
exploredAbel Tasman Dutch Oceania
Vasco da Gama Portuguese Sea route to India
Alexander Mackenzie British Canada
. . . . . . . . .
Forest reserves
ID Name Area7 Shakespeare Hills 2236
9 Plains Creek 880
13 Welcome Swamp 168
. . . . . . . . .
<table>
<tr><th>…</th></tr>
…</table>
<table>
<tr><td>…</td></tr>
…</table>
5
Year Name Subject1902 Ronald Ross Medicine
1907 Rudyard Kipling Literature
. . . . . . . . .
The present list contains winners under the country/countries that are stated by the Nobel Prize committee on its website.Nobel Prize Winners
User Query
• Limited column specific information
Query has set of keywords. Web tables have headers.
• Designated HTML table header tag is not always used (80%).
• Many tables have no headers (18%).
• Header text is often uninformative.
• Context of a table can be helpful, but it does not give column specific information and it is often noisy.
![Page 6: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/6.jpg)
System Architecture of WWT
6
![Page 7: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/7.jpg)
The Column Mapping Task
Name Nationality Main areas
explored
Abel Tasman Dutch Oceania
Vasco da Gama Portuguese Sea route to India
Alexander Mackenzie British Canada
. . . . . . . . .
Exploration Who (explorer)
(Chronological order)
Sea route to India Vasco da Gama
Caribbean Christopher Columbus
Oceania Abel Tasman
. . . . . .
This article lists the explorations in history. For the documentary 'Explorations, powered by Duracell', see Explorations (TV)List of explorers - Wikipedia, the free encyclopedia
Forest reserves
ID Name Area
7 Shakespeare Hills 2236
9 Plains Creek 880
13 Welcome Swamp 168
. . . . . . . . .
Other Formal Reserves 1.3 Forest Reserves under the Forestry Act 1920All areas will be available for mineral exploration and mining
Name of Explorers Nationality Areas Explored
User Query
Web Table 3Web Table 2Web Table 1
Index Probe Relevant Tables
7
![Page 8: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/8.jpg)
Name Nationality Main areas
explored
Abel Tasman Dutch Oceania
Vasco da Gama Portuguese Sea route to India
Alexander Mackenzie British Canada
. . . . . . . . .
User Query
The Column Mapping Task
Name of Explorers Nationality Areas Explored
Name of Explorers Nationality Areas Explored
Vasco da Gama Portuguese Sea route to India
Abel Tasman Dutch Oceania
Christopher Columbus Caribbean
Alexander Mackenzie British Canada
. . . . . . . . .
Answer Table
Map Columns
Consolidation
Name of Explorers Nationality Areas Explored
Exploration Who (explorer)
(Chronological order)
Sea route to India Vasco da Gama
Caribbean Christopher Columbus
Oceania Abel Tasman
. . . . . .
This article lists the explorations in history. For the documentary 'Explorations, powered by Duracell', see Explorations (TV)List of explorers - Wikipedia, the free encyclopedia
Forest reserves
ID Name Area
7 Shakespeare Hills 2236
9 Plains Creek 880
13 Welcome Swamp 168
. . . . . . . . .
Other Formal Reserves 1.3 Forest Reserves under the Forestry Act 1920All areas will be available for mineral exploration and mining
Web Table 3Web Table 2Web Table 1
8
![Page 9: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/9.jpg)
Q1 Q2 Q3
• For each table t
• Step 1: Is t relevant?
IR_Sim(Q, C) + λ . IR_Sim(Q, h) > Threshold ?
• Step 2: If yes, map columns of t to columns in Q
A Baseline Approach
9
User Query, Q
h1 h2 h3 h4
Table, t
Context, CEdge Weight = IR_Sim(Qi , hj)
![Page 10: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/10.jpg)
Limitations of Baseline Cannot match tables with poor/missing headers.
E.g.
Exploit content overlap with related tables How?
10
Name Nationality Main areas
exploredAbel Tasman Dutch Oceania
. . . . . . . . .
![Page 11: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/11.jpg)
Graphical Model ApproachName Nationality Main areas
explored
Abel Tasman Dutch Oceania
Vasco da Gama Portuguese Sea route to India
Alexander Mackenzie British Canada
. . . . . . . . .
Exploration Who (explorer) Century
(Chronological order)
Sea route to India Vasco da Gama 15th/16th
Caribbean Christopher Columbus 15th/16th
Oceania Abel Tasman 17th
. . . . . . . . .
Forest reserves
ID Name Area
7 Shakespeare Hills 2236
9 Plains Creek 880
13 Welcome Swamp 168
. . . . . . . . .
N1 N2
N4 N5
N3
N7 N8 N9
Name of Explorers Nationality Areas Explored
User Query
• Create a node for every column
11
N6
![Page 12: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/12.jpg)
Graphical Model ApproachName Nationality Main areas
explored
Abel Tasman Dutch Oceania
Vasco da Gama Portuguese Sea route to India
Alexander Mackenzie British Canada
. . . . . . . . .
Forest reserves
ID Name Area
7 Shakespeare Hills 2236
9 Plains Creek 880
13 Welcome Swamp 168
. . . . . . . . .
Name of Explorers Nationality Areas Explored
User Query
• Possible labels for every node
1. Name of explorers
2. Nationality
3. Areas Explored
12
N1 N2
N4 N5
N3
N7 N8 N9N6
Exploration Who (explorer) Century
(Chronological order)
Sea route to India Vasco da Gama 15th/16th
Caribbean Christopher Columbus 15th/16th
Oceania Abel Tasman 17th
. . . . . . . . .
1
1
2 3
3
![Page 13: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/13.jpg)
Graphical Model ApproachName Nationality Main areas
explored
Abel Tasman Dutch Oceania
Vasco da Gama Portuguese Sea route to India
Alexander Mackenzie British Canada
. . . . . . . . .
Forest reserves
ID Name Area
7 Shakespeare Hills 2236
9 Plains Creek 880
13 Welcome Swamp 168
. . . . . . . . .
Name of Explorers Nationality Areas Explored
User Query
• Possible labels for every node
1. Name of explorers
2. Nationality
3. Areas Explored
4. NA (Not Assigned)
5. NR (Not Relevant)
13
N1 N2
N4 N5
N3
N7 N8 N9N6
Exploration Who (explorer) Century
(Chronological order)
Sea route to India Vasco da Gama 15th/16th
Caribbean Christopher Columbus 15th/16th
Oceania Abel Tasman 17th
. . . . . . . . .
1
1
2 3
3 NA NR NR NR
![Page 14: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/14.jpg)
Graphical Model ApproachName Nationality Main areas
explored
Abel Tasman Dutch Oceania
Vasco da Gama Portuguese Sea route to India
Alexander Mackenzie British Canada
. . . . . . . . .
Exploration Who (explorer) Century
(Chronological order)
Sea route to India Vasco da Gama 15th/16th
Caribbean Christopher Columbus 15th/16th
Oceania Abel Tasman 17th
. . . . . . . . .
N1 N2
N4 N5
N3
Name of Explorers Nationality Areas Explored
User Query
• Edges Complete Bipartite Graph
between nodes of two tables Content overlap between
column contents and headers
Maximum Bipartite Matching
14
N6
Edge
Weights0.6
0.20.1
0.7
0
0
0
00.1
![Page 15: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/15.jpg)
Graphical Model ApproachName Nationality Main areas
explored
Abel Tasman Dutch Oceania
Vasco da Gama Portuguese Sea route to India
Alexander Mackenzie British Canada
. . . . . . . . .
Exploration Who (explorer) Century
(Chronological order)
Sea route to India Vasco da Gama 15th/16th
Caribbean Christopher Columbus 15th/16th
Oceania Abel Tasman 17th
. . . . . . . . .
Forest reserves
ID Name Area
7 Shakespeare Hills 2236
9 Plains Creek 880
13 Welcome Swamp 168
. . . . . . . . .
N1 N2
N4 N5
N3
N7 N8 N9
Name of Explorers Nationality Areas Explored
User Query
• Edge Potentials Large weights Same label Soft Constraint
15
N6
0.6
0.7
0.3
0.4 0.1
![Page 16: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/16.jpg)
Node Potentials
16
Score expressing the affinity of a table column ci to a query column Qj
Baseline approach: IR similarity between Qj and header of ci
![Page 17: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/17.jpg)
Limitations of Baseline Similarity Generic IR similarity not a good fit for typical
roles of context + headers Context “topic” of a table Header label of a column
New model based on a two part segmentation of query words over context and header. 17
The present list contains laureates under the country/countries that are stated by the Nobel Prize committee on its website.
Year Winners Subject
1902 Ronald Ross Medicine
1907 Rudyard Kipling Literature
. . . . . . . . .
Nobel Prize WinnersUser Query
This article presents a comprehensive list of peaks in North America, highlighting some of the important features.
Mountain Peak Region Elevation
Mount McKinley Alaska 6194 m
Mount Logan Yukon 5956 m
. . . . . . . . .
Mountains in North AmericaUser Query
![Page 18: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/18.jpg)
Limitations of Baseline Similarity Matches with other parts of table ignored
Frequent words in a column Multi-row headers
Split headers match to union of words Vs
Sub-headers match only to one header How to detect which of the two?
Take soft-max over matches over context, body, other headers of table on one part of the segmented query.
18
Band name Country GenreAarcon Germany Black MetalAct of God Russia Melodic BlackAdragard Italy Black Metal
. . . . . . . . .
Black metal bands
Name Nationality Main areas
exploredAbel Tasman Dutch Oceania
. . . . . . . . .
Exploration Who (explorer)
(Chronological order)Sea route to India Vasco da Gama
. . . . . .
User Query
Name of Explorers Nationality Areas ExploredUser Query
![Page 19: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/19.jpg)
Segmented Similarity
Nobel Prize Winners
• Similarity score between a table column ci and a query column Qj
ci
19
Year
User Query
Qj
![Page 20: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/20.jpg)
Segmented Similarity• Similarity score between a table column ci and a query column Qj
• Maximum soft-max score over matches of different segments of query with different parts of a table
Winners
Nobel Prize
.........................................................
.................... Winners ......................
.......... Nobel Prize ...........................Title
Context
Header Rows
User Query
20
Nobel Prize Winners Year
ci
Qj
![Page 21: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/21.jpg)
Frequent Body ContentsHeader Text
Other Headers in the Same
Row
Other Header Rows in Same
ColumnContextTitle
Soft-max over matches over all sections
Segmented Similarity• Step 1: Segment query column keywords into two parts
• Step 2: Similarity scores between each part and different sections of table
• Step 3: Soft-max over sections of table where each part matches
User Query
21
Title
ContextNobel Prize Winners Year
CurrentHeader Row
Prefix Suffixci
Header Rows
![Page 22: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/22.jpg)
Segmented Similarity• Step 1: Segment query column keywords into two parts
• Step 2: Similarity scores between each part and different sections of table
• Step 3: Soft-max over sections of table where each part matches
User Query
22
Title
Context
Header Rows
Nobel Prize Winners Year
Prefix Suffixci
Soft-max over matches over all sections
CurrentHeader Row
![Page 23: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/23.jpg)
Hard Constraints• MUTEX Constraint
At most one column in a table can be mapped to a query column.
• ALL-IRR Constraint
If one column in a table is assigned a label NR, then all columns of table must be assigned NR.
• MUST-MATCH Constraint
Every relevant table must contain the first query column.
• MIN-MATCH Constraint
Every relevant table must contain at least m out of q query columns. m = 2 if q >= 2.
23
![Page 24: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/24.jpg)
Labeling the Graphical Model
• Final goal: Jointly assign one of |Q|+2 labels to each column to
• maximize sum of node and edge potentials
• satisfy the hard constraints
• NP-Hard24
N1 N2
N4 N5
N3
N7 N8 N9N6
![Page 25: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/25.jpg)
Inference Algorithms• Collective Inference: Table-Centric
Edge Potentials are used to modify node potentials Optimal table level inference
• Collective Inference: Edge-Centric Edge potentials are given central importance. Existing inference algorithms: Belief Propagation,
TRWS, MPLP, α-Expansion, etc. Modified α-Expansion works best in our case.
25
![Page 26: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/26.jpg)
Experimental Setup• Workload
59 multi-column queries mostly collected from Amazon Mechanical Turk (AMT) service [Cafarella et al, 2009]
• Data source
25 million tables from a web crawl of 500 million pages
• Ground Truth
Manual labeling for 1906 web tables
26
![Page 27: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/27.jpg)
Column Mapping Methods• Baseline
• NbrText
Baseline augmented with similarity scores from neighboring columns
• PMI [Cafarella et al, 2009]
Baseline augmented with corpus wide co-occurrence score of column contents and a label
• WWT
Our graphical model based approach with table-centric collective inference
27
![Page 28: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/28.jpg)
Column Mapping Methods Comparison
28
• Overall error WWT: 30.3%, Baseline & PMI: 34.7%, NbrText: 34.2%.
BaselineB
asel
ine
![Page 29: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/29.jpg)
Running Time
29
• Actual column mapping takes less than half a second.
• Time for table & index read can be improved with better machine configuration.
![Page 30: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/30.jpg)
Summary• Presented a graphical model approach for answering
table queries
• A novel method to find similarity using two part query segmentation model
• Robust mechanism of exploiting content and header overlap across table columns
• Different algorithms for inferencing in graphical model
• 12% reduction in error relative to a baseline method
• Future Work Exploiting newer corpus wide co-occurrence statistics Alternative structured sources such as ontologies Enhance the search experience via faceted search and user
feedback. 30
![Page 31: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/31.jpg)
Thank you.
31
![Page 32: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/32.jpg)
Related Work• Query-By-Example Paradigm [2]
Extracting tables from lists on the web Web tables are not considered
• Halevy et al [3, 4] highlight the potential of web tables as a source of structured information
Collecting offline information like attribute columns, attribute associations, etc.
• OCTOPUS [1] Multiple user interactions are necessary PMI score for relevance ranking is not effective in our case.
• Schema Matching [5, 6] Managing complex alignment between the large number of
schema elements in two databases Web tables are noisy unlike database tables. 32
![Page 33: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/33.jpg)
References1. M. J. Cafarella, A. Y. Halevy, and N. Khoussainova. Data
integration for the relational web. PVLDB, 2(1):1090–1101, 2009.
2. R. Gupta and S. Sarawagi. Answering table augmentation queries from unstructured lists on the web. PVLDB, 2(1):289–300, 2009.
3. M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. Webtables: exploring the power of tables on the web. PVLDB, 1(1):538–549, 2008.
4. M. J. Cafarella, A. Y. Halevy, Y. Zhang, D. Z. Wang, and E. Wu. Uncovering the relational web. In WebDB, 2008.
5. A. Doan and A. Y. Halevy. Semantic integration research in the database community: A brief survey. The AI Magazine, 26(1):83–94, 2005.
6. E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. The VLDB Journal, 10(4):334–350, 2001.
33
![Page 34: Answering Table Queries on the Web using Column Keywords Rakesh Pimplikar IBM Research Sunita Sarawagi IIT Bombay 1.](https://reader035.fdocuments.in/reader035/viewer/2022062519/5697c0221a28abf838cd3900/html5/thumbnails/34.jpg)
Segmented Similarity
34
• Overall error reduction from 33.3% to 30.3%
• Reduction is more than 10% in 8 cases.