Automatic Extraction of Information Behind Web Forms Based on Application Ontologies Automatic...
-
date post
20-Dec-2015 -
Category
Documents
-
view
220 -
download
5
Transcript of Automatic Extraction of Information Behind Web Forms Based on Application Ontologies Automatic...
Automatic Extraction of Information Automatic Extraction of Information Behind Web Forms Based on Behind Web Forms Based on
Application OntologiesApplication Ontologies
by
Sai Ho Yau
Brigham Young University
Next
Previous
Introduction Introduction
There are enormous amounts of information available from the Web, but it is difficult to extract the data automatically due to several reasons:
Dynamically generated Web pages Form interfaces Relevant information can be obtained only after a
Web form is filled out and submitted
Next
Previous
Problems Dealing with Forms Problems Dealing with Forms
No general Web form design
Required text fields
One form may lead to another
Resulting information embedded within forms
Returned error messages versus valid data
Elimination of possible duplicate data
Next
Previous
The FrameworkThe Framework
Next
Previous
ToolsTools
Language and Internet browser used: JavaScript, Java, PHP3., MySQL; Microsoft Internet Explorer
Platform: Solaris Intel (Unix), with Sun Java 1.1.6.
Next
Previous
Method: Construct the Method: Construct the Query StringQuery String
Query String:
http://www.automobilesearch.com/search.html?cat2=0&manufacturer=&searcharea=0&mincost=&maxcost=¤cy=USD&minyear=&maxyear=&go=Search
Domain_Path: http://www.automobilesearch.com/win2form_action: search.html
win2form_length: 1
win2Elem_length_0: 9
win2Elem_name_0: cat2 win2Elem_type_0: select-one win2Elem_value_0: 0 win2Elem_option_length: 16
win2Elem_option_Text_0: All Typeswin2Elem_option_0: 0win2Elem_option_Text_1: Accessorieswin2Elem_option_1: 4940win2Elem_option_Text_2: Classic Carswin2Elem_option_2: 4981
::
win2Elem_name_1: manufacturerwin2Elem_type_1: select-onewin2Elem_value_1:
win2Elem_option_length: 43win2Elem_option_Text_0: Any Manufacturerwin2Elem_option_0:win2Elem_option_Text_1: AM Generalwin2Elem_option_1: AM General
::
win2Elem_name_6: minyearwin2Elem_type_6: textwin2Elem_value_6:
win2Elem_name_7: maxyearwin2Elem_type_7: textwin2Elem_value_7:
win2Elem_name_8: gowin2Elem_type_8: submitwin2Elem_value_8: Search
Next
Previous
The GoalThe Goal
Deal with as many Web forms as possible.
Retrieve all relevant information.
Automate the extraction process.
Next
Previous
Returned Web PageReturned Web Page
Next
Previous
Suggested SolutionSuggested Solution
Next
Previous
ConclusionsConclusions
Eliminate duplicate data.
We can automatically:
Fill in Web forms.
Extract information behind forms.
Screen out error messages and inapplicable Web pages.
Next
Previous
Thank YouThank You