Annotating Search Results from Web Databases
-
Upload
swami06 -
Category
Engineering
-
view
25 -
download
2
Transcript of Annotating Search Results from Web Databases
![Page 1: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/1.jpg)
CONTENT Introduction Existing System Proposed System Phases of system System Architecture System workflow Modules Advantages of Proposed System Algorithm used in system User classes Activity diagram Applications Software & Hardware requirement References
![Page 2: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/2.jpg)
Introduction Numbers of databases available from html
forms might be encoded using different formatting in html tags.
Data unit level annotation.
Automatically assign labels to the data units of SRRs returned from WDBs.
Deep Web Data Collection Application or Internet Comparison Shopping.
![Page 3: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/3.jpg)
EXISTING SYSTEM In existing system data unit is a piece of text
that semantically represent one concept of an entity.
It describe relation between text node and data unit.
Early applications require tremendous human efforts to annotate data units manually, which severely limit their scalability.
There is high demand for collecting data of interest from multiple WDBs.
In this proposed system we consider how to automatically assign labels to the data units within the SRRs returned from WDBs.
![Page 4: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/4.jpg)
PROPOSED SYSTEMOUR APPROCH
Align data units on as result page into different groups such that data units in same group having same semantic.
For each group annotate with different aspects of annotation.
We consider how to automatically assign labels to the data units within the SRRs returned from WDBs.
![Page 5: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/5.jpg)
PHASES OF SYSTEM
Our solution consists of three phases.
a) Alignment phase.
b)Annotation phase.
c)Annotation wrapper generation phase.
![Page 6: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/6.jpg)
A) ALIGNMENT PHASE
• Identify all data units in SRRs.
• Organize them into different groups.
each group corresponding to a different concepts.
![Page 7: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/7.jpg)
B) ANNOTATION PHASE
• Introduce multiple basic annotators.
• Each exploiting one type of features.
![Page 8: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/8.jpg)
C) ANNOTATION WRAPPER GENRATION PHASE
• Generate the annotation rules .
• Each rule describes how to extract the data units of concepts which are given in annotation phase in the result page.
• It also describe what the appropriate semantic label should be.
![Page 9: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/9.jpg)
Data Unit & Text Nodes’ Features
(Content, presentation style, data-type, path, adjacency)
Data Unit Similarity
Alignment Algorithm
Local Schema & Integrated Interface Schema
Table Annotator, Query Based Annotator, Schema Value Annotator, Frequency based Annotator, In text prefix/ suffix annotator, Common Knowledge Annotator
Combining Annotators -> Build Wrapper
Data alignment
Assigning labels
SYSTEM ARCHITECTURE
![Page 10: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/10.jpg)
SYSTEM WORKFLOW
![Page 11: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/11.jpg)
MODULES
Data Unit and Tag Node Extraction:
Identify relationship between text nodes & tag nodes
Data Unit and Text Node Features
Data Alignment Algorithm
Label Assignment
![Page 12: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/12.jpg)
One-to-One Relationship. One-to-Many Relationship. Many-to-One Relationship. One-To-Nothing Relationship.
Data Unit and Text Node
![Page 13: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/13.jpg)
Data Content (DC) Presentation Style (PS) Data Type (DT) Tag Path (TP) Adjacency (AD)
Data Unit and Text Node Features
![Page 14: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/14.jpg)
Data Unit Similarity. Data content similarity . Presentation style similarity . Presentation style similarity . Data type similarity .
DATA ALIGNMENT
![Page 15: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/15.jpg)
Our data alignment method consists of the following four steps.
Merge text nodes. Align text nodes. Split (composite) text nodes. Align data units.
Alignment Algorithm
![Page 16: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/16.jpg)
Apply semantics labels for each data units which got from SRR’s.
ASSIGNING LABELS
![Page 17: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/17.jpg)
ADVANTAGES OF PROPOSED SYSTEM
We use data unit level annotation.
We propose a clustering-based shifting technique .(data units inside the same group have the same semantic)
To construct an annotation wrapper for any given WDB. The wrapper can be applied to efficiently annotating the SRRs retrieved from the same WDB with new queries.
![Page 18: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/18.jpg)
USER CLASSESThe various classes used in the
Interpretation search result from web database are:1) Wrapper- An annotation wrapper for the
search site is automatically constructed and can be used to annotate new result pages from the same web database.
2) Search engine- It reads the data from the web database and provides to Data for comparison shopping.
3) Wrapper builder-Combining annotator for producing a result.
![Page 19: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/19.jpg)
ACTIVITY DIAGRAM Sample Web Pages
Record Extraction
Reacords
Data Alignments
Alignment Groups
Annotator 1 Annotator 2 Annotator K
Combining Annotation
Annotated Groups
Generating Annotation Groups
Annotation Wrapper
Integrated Search Interface
Web Pages
![Page 20: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/20.jpg)
APPLICATIONS
Web data collection.
Internet comparison shopping.
![Page 21: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/21.jpg)
SOFTWARE REQUIREMENTS
Operating system- Windows XP, 7 Coding language - JAVA Development kit - JDK 1.6 & above Front End - JAVA Swing
![Page 22: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/22.jpg)
HARDWARE REQUIREMENTS
Processor - Pentium –IV Speed - 1.1 Ghz RAM - 256 MB(min) Hard Disk - 20 GB Motherboard - Intel 945 GLX
![Page 23: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/23.jpg)
REFERENCE
1] A. Arasu and H. Garcia-Molina, “Extracting Structured Data from Web Pages,” Proc. SIGMOD Int’l Conf. Management of Data, 2003.2] L. Arlotta, V. Crescenzi, G. Mecca, and P. Merialdo, “Automatic Annotation of Data Extracted from Large Web Sites,” Proc. Sixth Int’l Workshop the Web and Databases (WebDB), 2003. 3] P. Chan and S. Stolfo, “Experiments on Multistrategy Learning by Meta-Learning,” Proc. Second Int’l Conf. Information and Knowledge Management (CIKM), 1993.4] W. Bruce Croft, “Combining Approaches for Information Retrieval,” Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, Kluwer Academic, 2000.5] V. Crescenzi, G. Mecca, and P. Merialdo, “RoadRUNNER: Towards Automatic Data Extraction from Large Web Sites,” Proc. Very Large Data Bases (VLDB) Conf., 2001.
![Page 24: Annotating Search Results from Web Databases](https://reader030.fdocuments.in/reader030/viewer/2022013011/55d0db34bb61eb866c8b456a/html5/thumbnails/24.jpg)
THANK YOU !!!!