Vincent S. Smith
Writing theEncyclopediaof Life
Background 1The big picture of biodiversity research
Goal…• Inventory the Earth’s species• Document their relationships• Publish & apply these data
Data set…• 1.8M described species (10M names)
• 300M pages (over last 250 years)
• 1.5-3B specimens
People…• 4-6,000 scientists• 30-40,000 amateurs• Many more citizen scientists?
Background 2The process of biodiversity research
Parochial…• Specialised• Experts• Fragmented & distributed
Methodological…• Communities of practice• Hard to record & update• High output but low impact
Different…• Data• Interpretations• Methods How do we integrate the BIG with the small?
250 yr progress report• Up to 87% of life on Earth is still undescribed
• 6% of biodiversity scientists cover 80% of the worlds biodiversity
• At present rates most species will be extinct long before we describe them
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
250 yrs 1000 yrs!!!
?1758 2008 3008
Bacteria9021 Spp
Archaebacteria259 Spp.
Plants260k spp.
Animals1.18 M spp.
Other193k spp.
Fungi101k
250 year and counting!
The story so far…
Bacteria9021 Spp
Archaebacteria259 Spp.
Plants260k spp.
Animals1.18 M spp.
Other193k spp.
Fungi101k
1.8 million species
Taxonomic effort
Crusta-ceans
39k
Birds 10kReptiles 7.1kMammals 5kAmphib.5k
Sponges 10kCnidarians 9k
Rotifers 1.8kFlatworms 13.7k
Insects0.82 M spp.
Molluscs117 k
Fish 25k
Bacteria9021 Spp
Archaebacteria259 Spp.
Plants260k spp.
Animals1.18 M spp.
Other193k spp.
Fungi101k
1.8 million species
Taxonomic effort
Crusta-ceans
39k
Birds 10kReptiles 7.1kMammals 5kAmphib.5k
Sponges 10kCnidarians 9k
Rotifers 1.8kFlatworms 13.7k
Insects0.82 M spp.
Molluscs117 k
Fish 25k
Bacteria9021 Spp
Archaebacteria259 Spp.
Plants260k spp.
Animals1.18 M spp.
Other193k spp.
Fungi101k
Beetles370k spp.
Flies85k spp.
Butterflies & moths165k spp.
Bees, wasps & ants198k spp.
0.01 papers per species per yeari.e 1 paper every 100 years
Birds: 1 paper per species per yr.Mammals: 2 papers per species per yr.
Elephants: 47 papers per species per yr.
1.8 million species
Taxonomic effort
1,000’s of journals addressinga common set of questions
What is a species? How many species are there? Where are species distributed? How have species distributions changed? How are species related? How have species characters changed? To what extent is are species relationships predictive?
DATA
“Paper minds”Traditional publication
1,000’s of journals addressinga common set of questions
Mol. Phyl. Evol.21,964 pp. since 2000
Menopon gallinaeNumidicola antennatusAmyrsidea ventralisSomaphantus lusiusMenacanthus stramineusColimenopon urocoliusTrinoton anserinumMeromenopon meropisGruimenopon longumHoazineus armiferusCopocephalum zebraComatomenopon elbeli/elongatumPsittacomenopon poicephalusOdoriphila clayae/phoeniculiArdeiphilus trochioxusCuculiphilus fasciatusCiconiphilus quadripustulatusEomenopon denticulatumPiagetiella bursaepelecaniOsborniella crotophagaeHohorstiella lataNeomenopon pteroclurusMachaerilaemus laticorpus/latifronsAustromenopon crocatumEidmanniella pellucidaHolomenopon brevithoracicumDennyus hirundinisMyrsidea victrixAncistrona vagelliPseudomenopon pilosumBonomiella columbaeChapinia robustaPlegadiphilus threskiornisActornithophilus uniseriatusMEGAMENOPONRediella mirabilis
Latumcephalum lesouefi/macropusParaboopia flavaParaheterodoxus insignisBoopia tarsataTherodoxus oweniLaemobothrion maximumRicinus fringillaeTrochiliphagus abdominalisTrochiloecetes rupununiLiposcelis bostrychophilus
What is a species? How many species are there? Where are species distributed? How have species distributions changed? How are species related? How have species characters changed? To what extent is are species relationships predictive?
“Paper minds”Traditional publication
1,000’s of journals addressinga common set of questions
What is a species? How many species are there? Where are species distributed? How have species distributions changed? How are species related? How have species characters changed? To what extent is are species relationships predictive?
“Species Name”The universal linker
RAW DATA > Logically interconnectedbut presently fragmented by the publication process
Other problems…• Time & money• Audience mismatch• Findability & reusability
“Paper minds”Traditional publication
Looking within a paperData mining publications
2. Extract text (OCR)
3. Find keywords
1. Scan
- Taxonomic names- Author names- Citations- Collection data- Morphological data- Descriptions- Identification keys- Illustrations- Photographs
Palma, R.L., and R.L.C. Pilgrim. 2002. A revision of the genus Naubates (Insecta: Phthiraptera: Philopteridae). J. R. Soc. N.Z. 32:7-60.
2. Extract text (OCR)
3. Find keywords
1. Scan
- Taxonomic names- Author names- Citations- Collection data- Morphological data- Descriptions- Identification keys- Illustrations- Photographs
4. Index5. Annotate online
Palma, R.L., and R.L.C. Pilgrim. 2002. A revision of the genus Naubates (Insecta: Phthiraptera: Philopteridae). J. R. Soc. N.Z. 32:7-60.
Looking within a paperData mining publications
How do we bring this all together?
“Publications” Specimens
• Technical issues• Social issues• Needs to scale (web)• Needs to be sustainable
People
?
Technical issues 1Data standards
• TDWG (since 1986)• GBIF• Bridging computer science & biology• Its not science!
“Standards” can mean many things:• Data exchange standards (e.g. Darwin Core)• Common restricted vocabularies (Sp.2000 classification)• Programming standards• Data quality
Technical issues 2Platforms
• Generic databases with custom interfaces (MySQL, Oracle)(e.g. Species 2000, IPNI)
• Bespoke (usually commercial) databases(e.g. KeEMU, Biota)
• Content Management Systems & blogging platforms such as Drupal, Plone, Wordpress etc
(e.g. EOL’s LifeDesks, GBIF websites)• Wikis such as Mediawiki, Semantic Mediawiki
(e.g. Wikipedia, iTaxon)
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
Technology moves fast!
Technical issues 2Platforms
Technical issues 2Platforms - common design considerations
Need scalable and flexible platforms that support:
1) large numbers of users as passive readers and active contributors2) editorial hierarchies serving individual and community needs3) the epistemological richness and diversity of all contributors4) flexible data models that can be modified or added by contributors5) automated integration of third party content6) automated semantic enrichment of contributed and 3rd party content7) content workflows and curation tools8) content archival and citation9) content licensing and a conditions of use framework10) web services11) ease of use
Technical issues 3Web services (integration hacks)
Module Name Description and API Searches the Biodiversity Heritage Library for printed pages held within their archives that have a reference to a specific taxon name. bhl API: http://www.biodiversitylibrary.org/Tools.aspx Searching the Flickr image database for pictures that have taxon name metadata associated with them. flickr API: http://www.flickr.com/services/api/ Displays maps of the world that geolocate biological occurrence records from the GBIF database. gbifmap API: http://ispecies.blogspot.com/2007/08/maps-and-google-tweak.html Searching the morphbank image database for pictures that have taxon name metadata associated with them. morphbank API: http://services.morphbank.net/mb Searches the NCBI database for nucleotide sequences, protein sequences and related links. ncbi API: http://www.programmableweb.com/api/ncbi-entrez Displays the initial section of a Wikipedia article for the taxon name, if the page exists. wikipedia API: http://en.wikipedia.org/wiki/Special:Export/
Similar to Flickr, but for Yahoo! Images. yahooimages API: http://developer.yahoo.com/search/image/V1/imageSearch.html
Social issues 1The community
• Taxonomy as a team sport (Community size and the community of one)• Networking effects (quality, multi-disciplinarity and utility of data)• The rise and rise of the “amateur”• Cost of professionals• Top down and bottom up organization (how to partition the community)• Bottom up benefits, low transaction costs (social information flows, motivation and relations self organize the group)• Support epistemological richness• Collaborative output, peer review, credit (incentives)
Social issues 2Nationalism / Politics
• Convention on Biological Diversity, 1992• Biodiversity does not respect national boundaries• Biodiversity questions do not respect national boundaries• Funding is (usually) national / regional• Benefits are expected to be national• Often don’t match the questions we want to address• Politics amongst researchers and institutions (e.g. EDIT and Lifewatch)• Good politicians and not always good scientists
Social issues 3Incentives
• Article citation (most common method of peer recognition)• Influences authors employment, reputation and research opportunities• Traditional metrics of scholarly activity (no. papers, impact factor, H-Index)• Taxonomy is not usually high impact, but has a long half life• High cost of traditional publication (unaffordable to authors & libraries)• Lessons from Zootaxa (low cost, high volume) and Wikipedia (highly linked)
Social issues 4Licensing
• Mickey mouse, copyright and 1923• Copyright transfer agreements• As of 2009 half of all taxonomic treatments are in copyright
Publications on ants
Social issues 4Licensing
• Who owns your work (your employer?)• Branding and credit• Creative Commons• Open Access• Open Science (making science more accountable)
Social issues 5Human Computer Interactions
Technical solutions & social modelsCurrent options for writing the Encyclopedia
of Life
1) “New” scholarly publishing (semantic enrichment of publications)2) One database to rule them all - the Common Data Model (CDM)3) EOL.org, ToL.org & related initiatives4) Wikipedia / Wikispecies5) Scratchpads / LifeDesks
Encyclopedia of Life (EOL)“A web page for every species”
http://www.eol.org/
• A web page for all 1.8M species
• Multi-institution collaboration
• $50m funding (5 years)- MacArthur and Sloan Foundations
• Megascience mashup- Aggregating data from the web
• Multiple audiences- Science & outreach
• 10 years to complete- First draft 2008, “finished” 2017!
Encyclopedia of Life (EOL)“A web page for every species”
• Huge interest- 11.5 million hits in first 5 hours
- 500+ press articles
- Pages unavailable for first two days!
• First draft 27 Feb. 2008 - 24 “exemplar” pages
- 30,000 detailed pages (fish & amphib.)
- 1 million “stubs” (names & links)
- Growth (needs 1,000 spp. per day)
• Much praise but growing criticism
- Quality vs. quantity of information- Authoritative “vetting” process- Credit for “authors”
• Eight more years to go- Get more content online- Better tools to engage more people
What is a Scratchpad?A website for you & your community
Your data1
Published & reviewedon your site
3Uploaded &
tagged
2
Your data1
Published & reviewedon your site
3Uploaded &
tagged
2
Fast Intuitive Fit for use
What is a Scratchpad?A website for you & your community
What can Scratchpads do?Import, manage, search & browse:
DNA & Phylogenies
Specimens
Literature Images
DNA & Phylogenies
Specimens
Literature ImagesTaxonomy
What can Scratchpads do?Integration & connectivity within & between sites
+Administration -Change your site information -Change you front page -Change your logo -Activity and access logs+Backup -Backing up your data -Restoring your data+Bibliography -Creating a record -Importing from a ref. manager -Exporting to a reference manager+Blog -Creating and adding a blog+Custom Content -Defining a CCK -Importing from a spreadsheet -Creating a custom view+Fileshare -Creating and using a fileshare+Forum -Altering the forum settings -Creating a container for a forum -Creating a new forum -Creating a new topic inside a forum
+Groups -Creating a group -Subscribing to a group+Image -Uploading & basic annotation -Linking image & location records -Linking image & specimen records -Linking image & publication records -Overlay annotations on images+Layout -Change your theme -Menus -Blocks and sidebars+Locations -Creating a record -Importing from a spreadsheet+Pages -Creating, editing, cloning & deleting -Configuring the panels template+Panels -Adding & configuring content -Creating a new panel -Citing a Panels page+Phylogeny -Adding a phylogenetic tree
+Specimens -Creating a record -Importing from a spreadsheet -Linking specimen & location records -Linking specimen & pub. records+Tasks -Creating a tasklist+Taxonomy -Importing from a spreadsheet -Importing from ClassificationBank -Starting from scratch -Taxonomy manager -Displaying a classification -Adding names -Deleting names -Taxonomy & panels+Users -Your settings -Adding a new user -User roles and permissions -Adding and editing user profile fields -Logging in+Webform -Creating and using webforms
What can Scratchpads do?In summary:
What can Scratchpads do?Visual taskguide
Current ScratchpadsAntsBeesBeetlesBig-headed fliesBirdsBlackfliesCiliatesCockroachesDragon TreesDung BeetlesFalse ButtonweedFlat wormsFliesForaminiferaFossil InsectsFungus GnatsHolometabolaLeaf-miner FliesLiceLichens of BermudaMalvaceaeMegalastrum fernsMilichiid fliesMosquitoesMossesNannotax fossilsNepticuloid mothsPalmsPearl oystersPolychaete wormsScaleworms
TermitesTriticid grassesWeevilsWood Ferns
Sulawesi FernsStick insects
Sites: 130+Users: 1500+Pages: 170kSince March 2007
Scratchpad applicationsA multipurpose, flexible technology
4th Edition Howard & Moore, Birds of the world(fact checking, data compilation, 2010, funding)
eBooks
European Mosquito Bulletin (ISSN 1460-6127), Phasmid Studies (ISSN 0966-0011)(submission, review, & dissemination of articles)
eJournals
Scratchpad applicationsA multipurpose, flexible technology
Image galleriesNanno fossils, Cockroaches, Stick insects, Flatworms, Grasses, Lichens & many more…
(rapid upload, annotation, & display of images)
Scratchpad applicationsA multipurpose, flexible technology
ZOOTAXAA rapid international journal for animal taxonomistsISSN 1175-5326 (Print Edition) & ISSN 1175-5334 (Online Edition)
GBIF, Zootaxa, Threatened Plants of the World (Kew), BarCoVer (DNA Barcoding) & more (space for data collection, services, discussion, & organization)
Societies & Organizations
Scratchpad applicationsA multipurpose, flexible technology
How do Scratchpads work?Getting a Scratchpad
• Biological focus• Agree to T&C’s (click-thru) • CC license “by-nc-sa”
Requirements
• Maintainer• Scope/Mission/API Keys• (Sub)domain name
Application
Content• Unrestricted (overlapping)• No branding (focus on authors)• Value added
http://scratchpads.eu/apply
Using a Scratchpad
• User categories (maintainer, ed. contrib.)• Public / private content (flexible groups)• Admin. page (site settings & behavior)
Management
• Content types (biblio, maps, “page” etc)• Forms, managers, Excel, EndNote etc• Custom content (add or extend data types)
Data Input
Tagging (indexing)• Taxonomy terms (2M +)• Multiple classifications• Auto-tagging
How do Scratchpads work?
AutotaggingIndexing data to make it findable
1. Create content
2. Find terms
3. Submit(Index)
(Autotag)
(e.g. reference)
Journal citation mentions taxon name
1. Create content
2. Find terms
3. Submit(Index)
(Autotag)
(e.g. reference)
Matches taxonomy term (Drag & Drop)
AutotaggingIndexing data to make it findable
1. Create content
2. Find terms
3. Submit(Index)
(Autotag)
(e.g. reference)
Page tagged (indexed) with taxon name
AutotaggingIndexing data to make it findable
Indexing data to make it findable
How do Scratchpads work?
• Tagged data can bepresented differently
• For example as part ofa traditional bibliography
• Or as small windows or “panels” of data
Integrating data & “publishing” in a Scratchpad
How do Scratchpads work?
Taxonomic hierarchies
Files and documents
Phylogenetic trees
Customized content
Specimen records
Photographs & illustrations
Personalized instructionsCommon
namesBibliographic
literature
Types of Scratchpad Panel…Built with “tagged data”
Dynamically built species pages
Integrating data & “publishing” in a Scratchpad
How do Scratchpads work?
Browsed through a taxonomy
Integrating data & “publishing” in a Scratchpad
How do Scratchpads work?
Including 3rd party content
Integrating data & “publishing” in a Scratchpad
How do Scratchpads work?
With data curation toolsWith data curation tools
Integrating data & “publishing” in a Scratchpad
How do Scratchpads work?
Listing all “authors”
Integrating data & “publishing” in a Scratchpad
How do Scratchpads work?
Dated, permanent & citable
Integrating data & “publishing” in a Scratchpad
How do Scratchpads work?
Choose which panels to display
Adjusting the panels layout
How do Scratchpads work?
An example based on the Catalogue of Life classification
How do Scratchpads work?
2 million taxon pagesOpen curation at http://catlife.myspecies.info
Questions?
Scratchpad managementScalable & sustainable technology
Virtual machine, open-source software, self-archiving, backed-up, multi-site configuration(easy to move & upgrade, secure & reliable, citable, screencasts, low admin., low marginal costs)
Hardware, software & user support
Top Related