Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result...

18
Deep Web Navigation by Example Deep Web Navigation by Example Yang Wang, Thomas Hornung

Transcript of Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result...

Page 1: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

Deep Web Navigation by ExampleDeep Web Navigation by Example

Yang Wang, Thomas Hornung

Page 2: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

Motivation

• Transformation of Web forms into machine‐accessible Query Interfaces

• Problems solved by this work:F A l i (P 1)• Form Analysis (Part 1)

• Dynamic dependencies of input elements

• Deep Web Navigation to result page (Part 2)

Page 3: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

1 Form Analysis1 Form Analysis

Page 4: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

Form Field

label1

label2

value1

value2Web‐Form

Form Analysis

label3 value3

Search!

Web Form

Client‐side Query Interface

Web‐Page

Page 5: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

d idynamicdependencies

dynamicd d idependencies

Page 6: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

Dynamic or Static?

Idea: Simulate HTML‐events

1. Save initial status of marked dropdown menus (temporally)a. Length of options

b. Option‐text

2 Change selected option of one marked dropdown menu2. Change selected option of one marked dropdown menu

3. Check the options of other marked dropdown menua. changed dynamic

b. unchanged static

4. Text‐Input‐Field static

Page 7: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

static

AttributeRadius Price‐From Price‐To .... Attribute 

name

5 km 500 € 1000 € 500 €10 km Options........ .... 1000 €

Car‐Brand

dynamic

Car Brand

Audi BMW VW ........ .... Car‐Brand

A1 320 325 PoloA2 Car‐Model.... .... Golf ....

Page 8: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

2 Deep Web Navigation2 Deep Web Navigation

Page 9: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

Movie‐NameRequest‐Form

Movie‐Name = "Shrek"Movie‐Name = " The Game Plan"

Intermediate‐Page

Result‐Page

Page 10: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

Navigation Process

Form Field

FoundKeyword? Goal Sitelabel1

label2

value1

value2

HTML page no

ExecuteActions

yes

Return URL

label3 value3

Search!

Page 11: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

similarsimilar

Page 12: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

Web Page Templates

Movie name:   Shrek 2

DBDirectors:         Andrew Adamson

Vicky JensonRelease Date:  5 July 2001

Fix part Dynamic part

......

• Fix part (scaffolding) + dynamic part (data)

D t i dd d d i ll f b k d DB• Data is added dynamically from backend DB

We can identify constant HTML‐Element (keyword), e.g. text (in DIV, H2, ...)

Page 13: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

characteristic

Page 14: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

To Result Page

execute actions

click link

Page 15: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

Actions

• Keyword element• HTML‐element that contains keyword‐textHTML element that contains keyword text• identify by click on the keyword‐text

• Action elementHTML l t th t l t t f d ti• HTML‐element that relevant to performed action

• monitored by system

• How can we address action elementKApath• automatically generated• path from keyword element to action element• path from keyword element to action element• similar to Xpath• addtional: absolute path, tree structure

Page 16: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

Addressing Action Elements

TBODY

KApath

TR TR

KApath

/P::P/TBODY[@a1=v1][@a2=v2]/Child::Child/INPUT[@a3=v3]

TD TD TD TD TD

H2

optional

/ParentNode/ParentNode/ParentNode/Child[1]/Child[1]/Child[0]

Absolute Path

H2

#Text INPUT

TBODY(TR,TR)/TR(TD,TD)/TD(INPUT)

Tree StructureKeyword Element

Keyword Action Element

( , )/ ( , )/ ( )

Page 17: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

Evaluation

• 100 tested websites• Check of dynamic dependencies 99% (from 0.5 to 30 seconds) 

• Deep Web Navigation  96% (from 2.26 to 11.22 seconds)

• Open Issues• Open Issues• Delayed AJAX interactions

• Session IDs• ...

Page 18: Deep Web Navigation by Example - pdfs.semanticscholar.org€¦ · •Deep Web Navigation to result page (Part 2) 1 Form Analysis. Form Field label 1 label 2 value 1 value Web‐Form

Summary

• New Navigation Paradigm• Page‐oriented• Short navigation expression

• Future Work• Future Work• More resilient determination of intermediate pages