Melding human and machine capabilities to document the world’s living organisms

35
Melding human and machine capabilities to document the world’s living organisms University of Maryland TMSP series March 7, 2011

description

Melding human and machine capabilities to document the world’s living organisms. University of Maryland TMSP series March 7, 2011. Project Team. - PowerPoint PPT Presentation

Transcript of Melding human and machine capabilities to document the world’s living organisms

Slide 1

Melding human and machine capabilities to document the worlds living organisms

University of Maryland TMSP seriesMarch 7, 20111Project Team

Arijit Biswas (CS, Doctoral student); Anne Bowser (iSchool, Masters student); Jen Hammock (EOL); Derek Hansen (iSchool); David Jacobs (CS, UMIACS); Darcy Lewis (iSchool, doctoral student); Cyndy Parr (EOL); Jenny Preece (iSchool); Dana Rotman (iSchool, Doctoral student); Erin Stewart (iSchool Masters student); Eric (CS, Undergrad student)2What we will talk aboutResearch aimsEncyclopedia of Life (EOL)Scientists, citizen scientists, enthusiastsIdentifying leaves:Machine vision approachOdd Leaf OutField Mission GamesQuestions and Discussion

3BioTracker system architecture

4First research questionWhat are the most effective strategies for motivating enthusiasts and experts to voluntarily contribute and collaborate?

5

6

The biodiversity crisis

Global collapse of commercial fisheries by 2053The biodiversity crisis

Christmas bird countBioblitzesAmphibian declines

8

A crisis in science

Photo credit: Cornell Univ.Photo credit: Mary Keim

NA Butterfly AssociationFourth of July CountAudubon Christmas Bird CountCitizen science

The United Nations has declared 2010 the International Year of Biodiversity in recognition of the importance of biological diversity and the looming biodiversity crisis. Biological diversity provides ecosystem services critical to our planet. As much as 90% of the needs of the worlds poorest people depend directly on biodiversity for food, fuel, medicine, etc. [1]. Each species represents a volume in a living library, as each has evolved solutions to natures challenges, solutions that can benefit human society. For example, the genomics revolution and half of our synthetic drugs were made possible by understanding the characteristics of particular species [2]. Yet the rate of species loss is currently 100 to 1,000 times estimates of historical extinction rates, and these rates are increasing with climate change [2]. Recent assessments indicate that, for example, nearly 25% of mammals and one-third of amphibians are endangered or threatened [3].Scientists alone cannot end the biodiversity crisis. Progress in the conservation and sustainable use of biodiversity will depend on the interface of science with both policy and the public. This is not only because the public must appreciate and understand biodiversity in order to be motivated to conserve it. There are nearly 2 million known species and potentially millions more are still undocumented. Without help, professional biologists will be unable to describe many of these species before they disappear from the planet, especially those in biodiversity-rich but economically poorer countries [4].Public participation can address the biodiversity crisis in several areas. One area is assembling existing knowledge on the 1.9 million species known to science. Doing so can accelerate the pace of research and new species description by making freely available, searchable, and re-usable the information currently in libraries or in local databases inaccessible to most of the worlds scientists. Addressing this need is the primary mission of the Encyclopedia of Life (EOL, http://www.eol.org), an international project headquartered at the Smithsonians National Museum of Natural History. In addition to mash-ups of existing scientific databases, we are combining a crowd-sourcing approach with expert review to achieve a high-quality central clearinghouse for species information.

10Powerful citizen science datahttp://ebird.org

11More species, less training

GeocachingBioblitzes

Most citizen science . . .Is driven by scientistsIs analyzed by scientistsWorks best for charismatic speciesWorks best for simple observations or classificationRequires training so sustained engagement desirableBioblitz is a 24-hour inventory of species in a particular location

12

Imagine an electronic page for eachspecies of organism on Earth.

The Encyclopedia of Life13You may know that the Encyclopedia of Life has the goal of delivering a web page for each of the 1.8 million species on earth. This particular effort was articulated poetically by EO wilson in a 2003 article in Trends in Ecology and Evolution, but of course this is not a new idea -- Many people have suggested an encyclopedia of Life, e.g.TDWG: Global Plant Species Information System (1990)Dan Janzen: Species Pages (early 1990s)OECD Megascience Forum: SpeciesBank (1999)ALL Species: Encyclopedia of Life (2001)Smithsonian/Telluride: Encyclopedia of Life (2002)Rainer Froese: SpeciesBase (2005)Tree of Life

Content providersDatabasesJournalsLifeDesksPublic contributions

Curating

CommentingTagging

http://www.eol.orgEOL is a content curation communitySo, the approach of EOL is rather different than many other sites. EOL is a giant mashup that creates pages, that are then available for curators (mostly credentialed scientists) to assess and rate, or for anybody to provide comments or tags.14100+ partner databases700 curators/1000s contributors/46,000 members2.8 million pages500 thousand pages with Creative Commons contentOver 2 million data objects and >1 million pages with links to research literatureTraffic in past year: 1.7 million unique users, 6.2 million page views

EOL statistics

15Scientists and volunteers "Scientists often have an aversion to what nonscientists say about science (Salk, 1986)

Collaboration is based on several factors:Shared vocabulary, practices, and meaningsMutual recognition of knowledge, competency, and prestigeMotivation to collaborate

16Motivations for participationParticipation in social activities stems from personal and collective reasons

EgoismCollectivismAltruismPrincipalismBatson, Ahmad, Tsang, 200217Pilot study scientists motivational factors

Faculty/research position18Pilot study volunteers motivational factors

Years of experience19Second research question

How can a socially intelligent system be used to direct human effort and expertise to the most valuable collection and classification tasks?

20Mobile devices for plant species IDBuild new digital collectionsImage-based search to assist in identificationMake this available on mobile devicesUse this platform to build user communitiesCollaboration with dozens of people at Columbia University, the Smithsonian NMNH, and UMD.

21New images

For Botanists: digitize 90,000+ Type Specimens at Smithsonian

For EOL, people using mobile devices, highest quality images of live specimens.And for machines, images that capture leaf diversity22Computer Vision for species ID

Use a photo to search a data set of known species. Goal is to assist the user, not make identification fully automatic.Take a photo of a leaf on a plain background.232. Automatic segmentation and stem removal

Segmentation relies on value and saturation of pixels, EM algorithm, domain knowledge.

Ipomoea lacunosaMust handle diversity of shapes

Humulus japonicus

3. Build shape descriptors Inner Distance Shape Context Multiscale histograms of curvature

4. Search data set

System accuracy

Incorporating games into the Biotracker platformUsing games to direct human effort and computational resources towards species identification and classification

Data Validation Games

Field Data Collection Games

29Odd Leaf Out

Using computer games for data validation and algorithm refinementResearch QuestionsHow can we motivate users to continue to play when we are dealing with imperfect data that will sometimes provide two correct answers?

What useful data for algorithm refinement can a game of tagging the least-similar image provide?

How can data provided by novices users be employed to enhance the work of experts?

ImplicationsImproving machine vision algorithms based on human performance.

Minimizing the number of data sets that must be hand-verified by scientists and experts

Providing insight on what image factors provide for the best human identification of leaves

Providing information on the extent that other game motivation techniques1 work for scientific identification games

30Odd Leaf Out

31Odd Leaf Out

32

Biotracker field missions

Inspirations

GeocachingLetterboxingBioBlitzSFZeroProject NoahSmart Phone as Data Collection Tool

BiotrackerMissionsDeveloping mobile-social games that motivate citizens to collect and validate useful scientific data33Biotracker field missions

Low fidelity prototypesField testing at UMDNext steps - prototyping and user testing34

www.biotrackers.netQuestions and Discussion35