Setting the Scene for ViBRANT – Strategy, Philosophy and Communication
Transcript of Setting the Scene for ViBRANT – Strategy, Philosophy and Communication
The Future of Scientific Publishing
Donat Agosti (Plazi, Bern) 21 January 2011
Paris
I don‘t know the future, but I have a dream…
Immersing in the knowledge
I want to ask a publication a question, not the author telling me what I have
to read.
I want to find out
how many and which species are there? how are they related? do they disappear?
how are they distributed?
I want to find out
how many and which species there are how are they related do they disappear
Other people have different interests
An example from the Neurocommons text mining pilot:
• PubMed abstracts: > 16,000,000• CNS classified abstracts: 874,727• text mining recognized: 368,688• text mining processed: 94,381
• extracted graph of 30,000+ relationships and 5,500 genes and proteins “protein-protein
interaction networks” John Wilbanks, Neurocommons
In a semantic Web environment (where machines talk to each other and do most of our work), data need to be able to talk to each other:
27,266 papers
4,563 papers41,985 papers
10,365 papers
128,437 papers
“protein-protein interaction networks” John Wilbanks, Neurocommons
It will open up scientific literature for data mining
“protein-protein interaction networks” John Wilbanks, Neurocommons
An example from the taxonomy text mining pilot:
• Every year: > 17,000 new species described / year• Every year: >100,000 species redescribed /year• Total journals: >2,000 with taxonomic content• Total: 1,900,000 species described• Total: >20,000,000 treatments• text mining processed: 0
• extracted graph of 0 species 0 relationships Taxon mining project
1996
Conservation, Phylogeny, Systematics, Curiosity, Aesthetics, Fascination
2011
Experience, Frustration, Wonder, Excitment, Satisfaction,
Determination
Modeling taxonomic literature:TaxonX
Taxpub NLM DTDPlazi
- Get LSID from Hymenoptera Name Server for names; ZooBank?-Add new names
- Get bibliographic Metadata from HNS (MODS)
- Get bibliographic Guids from bioguid (or EDIT?)
- Get geographic long/lat from geonames.org
Plazi workflow: GoldenGate mark up as an example
-Get Guids for - CBOL- NCBI- specimen- images- .....
The semantically enhanced treatments, extracted, stored on Plazi.org, and served in a human readable form, are linked to the underlying data: Fisher & Smith, 2008, PLoS ONE.
Plazi Search and Retrieval Server: Access to data
TAPIR, SPM
You
You
You
human
machine
The conversion comes at a cost, even though GoldenGate and other editors exist
Ann. Soc. Entomol. Belg.
0
1
2
3
4
5
6
7
3961
3967
3956
3954
3855
3686
3920
3923
3712
3953
3786
3723
4001
4018
3715
3940
4022
4026
8070
HNS ID
min
Time per minute to produce clean OCR using ABBYY; publications in chronological order
Production metrics to measure effort and compare various approaches and alogrithm
How to mark up large body of legacy publications?
Inhouse?Build / use commercial services?Use the community, e.g. volunteers?
Activation energy
Gutenberg Semantic Web
Cos
t pe
r kn
owle
dge
Training and demos...
Avoid it
Prospective publications:Zookeys / Phytokeys
Semantic enhancements to published texts
2036
?
Why do we publish?
Public funded research
Contribute to the welfare of the nations…
Dissemination
Access
Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant systematics publications from 1758 - present.
Through antbase.org‘s digital library, access to this body of literature is worldwide, and it is actively used (>10,000 visits in one month only).
Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages)
The Biodiversity Heritage Library is currently digitizing and make accessible >100 million pages, most of them out of copyright, ie older then 1925. ........ to be finished in 2048...
What is a publication from public funded science?
Open Access
What is a scientific publication?
Print, journal, article, treatment, public funding, pdf, xml
Tool to disseminate scientific knowledge
Why do we publish the way we publish?
What kind of publications serve our needs?
IPBES
Access
Beyond the PDF
Access to what?
Scratchpad, EOL page, Wikipage, species page
Treatment
Treatments come with a lot of overhead
Title
Author
Abstract
Introduction
Taxon descriptions
Suppl. Materials
AcknowledgmentsReferences
Genus
Diagnosis
Notes
Biology
Distribution
Key to sp.
Species descriptions
Species 1
Species 2
Species 3
Species 4
Species ..
Species n
The structure of a systematics publication
Species treatments
Nomenclature
Diagnosis
Distribution
Material Examined
Comments
Description
Graphic art
Species 1
Treatments come with a lot of overheadTreatments are highly structured
Title
Author
Abstract
Introduction
Taxon descriptions
Suppl. Materials
AcknowledgmentsReferences
Genus
Diagnosis
Notes
Biology
Distribution
Key to sp.
Species descriptions
Species 1
Species 2
Species 3
Species 4
Species ..
Species n
The structure of a systematics publication
Species treatments
Nomenclature
Diagnosis
Distribution
Material Examined
Comments
Description
Graphic art
Species 1
Treatments come with a lot of overheadTreatments are highly structured
Content ist defined
Treatments come with a lot of overheadTreatments are highly structured
Content ist defined XML can define it
This can also be applied to entire sections of text, such as the descriptions of a species and its parts.
<tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source="HNS" identifier="193329"/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie </tax:name> <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type="description"> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin strongly curving to a sharp apical tooth, the apex parallel to the anterior clypeal margin. (Holotype with material in mandibles, so mandibles and anterior clypeus $ described below from paratypes.) Median clypeus....</treatment>
Treatments come with a lot of overheadtreatments are highly structured
Content ist definedXML defines them
The question is, how to get them
Mark-up of legacy publications
$$$$$$$$$$$$$$$$$
Prospective semantic mark-up and linking to external sources is the
future
Treatment repository+
external resources
BHL-Modern
The future is writable.
Happy Birthday!January 15, 2001
What is a scientific publication?
Wikipedia entry as a publication?
Quality control
What is a scientific publication?
Centrifugal versus centripetal forcesor
are we attractive enough?
Continuity
$$$$$$$