Post on 03-Jan-2016
Every datum counts!Capitalising on small contributions to the big dreams of mobilising biodiversity information
Vishwas Chavan, Eamonn O’ Tuama, Samy Gaiji, David Remsen and Nicholas King
2008 Annual Conference of Taxonomic Databases Working Group 19-25 October 2008, Fremantle, AUSTRALIA
• Both biodiversity and biodiversity data are unevenly distributed around the world:
Developing WorldDeveloping World
BiodiversityBiodiversity
Biodiversity Biodiversity DataData
Developed WorldDeveloped World
Digital Divide Content Divide Lingual Divide
Knowledge Divide
Emerging catastrophe…………
Uneven distribution of biodiversity
Large volume of biodiversity data and information is in languages other than
English
Biodiversity Informatics
activities are concentrated in the North
Few more reasons….
Investment in biodiversity information management is towards large projects
Research in biodiversity informatics is focused towards large data publishers
Small Data Publishers – A neglected mass!
Biodiversity Knowledge Divide: Emerging Catastrophe
Open Access movement can help mobilise data - (a) from mega-biodiversity regions, and (b) by small data publishers
Good News!
Small Data Publishers: Who are they? (1)
• Can’t discover, access, and use their data• Do not know how to manage data for reuse
by others• Lack of skills, infrastructure, and support for
‘interoperable’ data management• More interested in peer-reviewed publishing
than data publishing – as former brings recognition and funding
Small Data Publishers: Who are they? (2)• PI’s of small scale projects, small and medium sized
R&D organisations and NGOs, Citizen Scientists• Citizen Scientists- e.g. Peoples Biodiversity Register• P. Bryan Heidorn’s Hypothesis: “Disproportionate
amount of dark data is in the tail of science”• Small Data Publishers forms the “Long tail” as well
“droplets” of ‘Oceans of Biodiversity Data’
Small Are BIG!• Long tail or Dark Data is economically and
ecologically very critical• Most of existing and future data would be hold
by Small Data Publishers• 80% of current investment is towards Small
Data Publishers– Total Awards: 9347– Big Awards: 1869– SMALL Awards: 7478
Source: Curating the Dark Data in the Long tail of science by P. Bryan Heidorn
Characteristics of SDP Data• Heterogeneous• Distributed and isolated• Manually generated• Individual creation• Not maintained for reuse by others• Obscured or protected• Uneven distribution as well unequal access• It is highly “Unorganised” data sector.......
Festive uses of bio resources Census of trees
Uses of Plants
Status and knowledge about medicinal plants
Census of Birds
Birds signs for forecasting or weather change
Wild AnimalsBurrowing or sub-soil fauna
Paudi village, Siwani, India
Need standards to discover and access such data!
Domestic Animals
Social belief about biodiversity
Citizen Scientists
Seed DiversityMillions of Ramsingh’s across the world are busy in generating biodiversity data
What do we lack?• Data Publishing Framework – Lack awareness
about current knowledge system• Recognition for Data Publishing• Data standards for wide spectrum of biodiversity
and associated data• Suite of standards for data life cycle (generation to
dissemination)• Standards addressing data generation phase
What do we lack?• Tools for Data Capture at its source• Metadata creation as close to the source of data
as possible• Multilingual tools and standards• Hassle-free, skill-level independent toolsBecause.....
Adapting to standards is time-consuming as well costly exercise
Digital
Biodiversity Data
Data mobilisation is like moving mountains…….
What Can be done!• Data Publishing Framework
– Proposed GBIF recommendation on Discovery and Publishing of Biodiversity Data
– GUID for data set and data records
• Expedite the process of standards development– Standards development, ratification and uptake
• Hassle-free, skill-level independent, easy to adapt standards– Standards as integral part of recording / monitoring devices– Metadata creation as close to source as possible
What Can be done!• Standards for interoperability and/or integration with
non-biodiversity data– Evaluation of authenticity, reliability, and data quality as
close to source as possible
• Outreach to national/regional/thematic standards building initiatives– Domain experts find it difficult to understand / adapt
standards– Cultural as well lingual barriers– Engagement of eastern, southern, mega-biodiversity
communities in standards development processes
What Can be done!• Internationalise standards
– Awareness in mega-biodiversity world about standards– Multilingual dissemination – talk the languages that people
understand the bests– Think Globally – Act Locally
• Moving beyond comfort zone– Standards for unorganised data sector– Standards for citizen scientists
• Address concerns of data sensitivity through standards implementation– Will standards help me in identification and protection of sensitive
data?
“Krishna” can move data mountains, if standards bodies act as “Kamdhenus”
TDWG
GBIF
because……
Every datum counts!