ALFRED demo -

Post on 21-May-2015

662 views 1 download

Tags:

description

ALFRED: Crowd Assisted Data Extraction

Transcript of ALFRED demo -

ALFRED: Crowd Assisted Data Extraction

Valter Crescenzi, Paolo Merialdo, Disheng Qiu

Dipartimento di IngegneriaUniversità degli Studi Roma TreVia della Vasca Navale, 79, Rome

disheng@dia.uniroma3.it

Extracting data

2M pages from IMDB, and we want to extract ... titles, directors etc ....

1/7

Extracting data

2M pages from IMDB, and we want to extract ... titles, directors etc ....

DB#Wrapper!

1/7

Extracting data

2M pages from IMDB, and we want to extract ... titles, directors etc ....

Inference algorithm!

DB#Wrapper!

1/7

Extracting data

2M pages from IMDB, and we want to extract ... titles, directors etc ....

Inference algorithm!

DB#Wrapper!

1/7

Extracting data

2M pages from IMDB, and we want to extract ... titles, directors etc ....

Inference algorithm!

DB#Wrapper!

1/7

Scaling Wrapper Inference

Scaling the number of workers with Crowdsourcing platforms opens new challenges:

Issues: Contributions:

2/7

Scaling Wrapper Inference

Scaling the number of workers with Crowdsourcing platforms opens new challenges:

Issues: Contributions:

Non-expert workers

• Simple interactions to reduce the worker error rate• Membership Query (yes/no answer)

2/7

Scaling Wrapper Inference

Scaling the number of workers with Crowdsourcing platforms opens new challenges:

Issues: Contributions:

Non-expert workers

• Simple interactions to reduce the worker error rate• Membership Query (yes/no answer)

• Active Learning to carefully select queries

Costs

2/7

Scaling Wrapper Inference

Scaling the number of workers with Crowdsourcing platforms opens new challenges:

Issues: Contributions:

Non-expert workers

• Simple interactions to reduce the worker error rate• Membership Query (yes/no answer)

• Active Learning to carefully select queries

Costs

2/7

Quality

• Bayesian Model to evaluate the expected wrapper quality• Sampling algorithms• Tolerant to inaccurate workers

Architecture

ALFRED is a wrapper inference system supervised by workers from a crowdsourcing platform.

*Research Track: A Framework for Learning Web Wrappers from the Crowd WWW 2013 3/7

Input and Rules Generation

4/7

Sample Set and Extracted Values

5/7

Sample Set and Extracted Values

page0 page1 page2

r1

r2

r3

Inception City of God Oblivion

Inception City of God null

Inception null Oblivion

6/7

Sample Set and Extracted Values

page0 page1 page2

r1

r2

r3

Inception City of God Oblivion

Inception City of God null

Inception null Oblivion

6/7

Probability and Noisy

7/7