Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

18
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network Wiggins, Gerbracht, Lagoze,Yu, Wong, & Kelling 7 December, 2012 ~ Lake Tahoe, NV Workshop on Human Computation for Science and Computational Sustainability

description

Presentation at the Human Computation for Science and Computational Sustainability workshop at NIPS 2012 in Lake Tahoe, NV.

Transcript of Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Page 1: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Crowdsourcing Citizen Science Data Quality with a

Human-Computer Learning Network

Wiggins, Gerbracht, Lagoze, Yu, Wong, & Kelling

7 December, 2012 ~ Lake Tahoe, NVWorkshop on Human Computation for Science and Computational Sustainability

Page 2: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Crowdsourcing Scientific Work

Page 3: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

eBird

• Online checklist program for bird abundance & distribution

• Data (mostly) from recreational birders; used widely

• Over 100 million records & growing eBird observations per month

Page 4: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Data Quality

Dogbird Catbird

Page 5: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Data Quality

Dogbird Catbird

X

Page 6: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

The eBird HCLN

S Kelling, C Lagoze, W-K Wong, J Yu, T Damoulas, J Gerbracht, D Fink, C Gomes. 2012. eBird: A Human/Computer Learning Network to Improve Biodiversity Conservation and Research. Artificial Intelligence.

Page 7: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Emergent Filters

Kelling, S., J. Yu, J. Gerbracht, and W. K. Wong. 2011. Emergent Filters: Automated Data Verification in a Large-scale Citizen Science Project. Proceedings of the IEEE eScience Conference.

Page 8: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Modeling Expertise

Yit$Zi$

i=1,…,N$

Xi$ Wit$

t=1,…,Ti$

Environmental Covariates Detection

Occupancy (Latent)

Detection Covariates

oi dit

Page 9: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Yit Zi

i=1,…,N

Xi

t=1,…,Ti

Ej Uj j=1,…,M

Wit$

Expertise Covariates

Expertise vj

oi dit, fit

Occupancy-Detection-Expertise

Page 10: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Average Detection Probabilities

Yu, J., W. K. Wong, and R. A. Hutchinson. 2010. Modeling Experts and Novices in Citizen Science Data for Species Distribution Modeling. IEEE 10th International Conference on Data Mining (ICDM),

Hard-to-detect birds Common birds

-0.05

0.00

0.05

0.10

0.15

0.20

Blue Ja

y

Whit

e-brea

sted N

uthatc

h

Northe

rn Card

inal

Great B

lue H

eron

Brown T

hrashe

r

Blue-he

aded

Vireo

Northe

rn Rou

gh-w

inged

Swallow

Wood T

hrush

Page 11: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Emergent Filters + Expertise

Spizella passerina

Page 12: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Emergent Filters + Expertise

Spizella passerina

0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

50"

8)Jan" 8)Feb"8)Mar" 8)Apr" 8)May" 8)Jun" 8)Jul" 8)Aug" 8)Sep" 8)Oct" 8)Nov" 8)Dec"

Page 13: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Emergent Filters + Expertise

0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

50"

8)Jan" 8)Feb"8)Mar" 8)Apr"8)May" 8)Jun" 8)Jul" 8)Aug" 8)Sep" 8)Oct" 8)Nov"8)Dec"

Spizella passerina

0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

50"

8)Jan" 8)Feb"8)Mar" 8)Apr" 8)May" 8)Jun" 8)Jul" 8)Aug" 8)Sep" 8)Oct" 8)Nov" 8)Dec"

Page 14: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Emergent Filters + Expertise

0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

50"

8)Jan" 8)Feb"8)Mar" 8)Apr"8)May" 8)Jun" 8)Jul" 8)Aug" 8)Sep" 8)Oct" 8)Nov"8)Dec"0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

50"

1(Jan" 1(Feb"1(Mar" 1(Apr"1(May" 1(Jun" 1(Jul" 1(Aug" 1(Sep" 1(Oct" 1(Nov"1(Dec"

Spizella passerina

0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

50"

8)Jan" 8)Feb"8)Mar" 8)Apr" 8)May" 8)Jun" 8)Jul" 8)Aug" 8)Sep" 8)Oct" 8)Nov" 8)Dec"

Page 15: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Improving Spatial Coverage

Locations in NY with eBird submissions in 2009

Page 16: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Improving Spatial Coverage

Areas with enough data for emergent filters

Page 17: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Future Work

• Preliminary studies integrated into eBird for better data quality on multiple levels

• Resulting human-computer learning network will use eBird data in new ways

• Evaluation of motivation, learning, and skills related to expertise ranking & birding routes

Page 18: Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Network

Thanks!

www.ebird.org

@AndreaWiggins

[email protected]

www.andreawiggins.com

Acknowledgements

• Leon Levy Foundation

• Wolf Creek Foundation

• National Science Foundation Grants OCI-0830944, CCF-0832782, ITR-0427914, DBI-1049363, DBI-0542868, DUE-0734857, IIS-0748626, IIS-0844546, IIS-0612031, IIS-1050422, IIS-0905385, IIS-0746500, IIS-1209589, AGS-0835821, CNS-0751152, CNS-0855167.