Defrosting the Digital Library: A survey of bibliographic tools for the next generation web
-
Upload
duncan-hull -
Category
Technology
-
view
27.704 -
download
3
description
Transcript of Defrosting the Digital Library: A survey of bibliographic tools for the next generation web
![Page 1: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/1.jpg)
Defrosting the Digital Library
A survey of bibliographic tools for the next generation Web
Duncan Hull
Faculty of Life Sciences (1992-6) BSc. Computer Science (2002-2007) MSc, PhD. Chemistry (2008-date) Postdoc
![Page 2: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/2.jpg)
It’s all Casey’s fault!
Dr. Casey Bergman,Lecturer Faculty of Life Sciences
I s Citeulike.or
g!http://ukpmc.ac.uk/
![Page 4: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/4.jpg)
Defrosting the Digital Library (in one slide)
• There are lots of digital libraries out there for scientists!
– ACM, IEEE, PubMed, DBLP, Scopus, ISI-WoK, Google Scholar, arXiv
• But they have some fundamental problems with their data
– Identity crisis: identifying people accurately
– Identity crisis: identifying publications accurately
– Keeping data and metadata coupled together
– Impersonal, unsociable, difficult to use: “Cold”
• Some new tools exist to make things better: “warmer”
– Citeulike, Mendeley, Zotero, Papyro, Papers etc
– BUT Fundamental problems with identity and data need to be fixed before the tools will get any better
![Page 5: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/5.jpg)
Metawhat?
getMetadata
getData
• From the Greek μετά (meta) meaning after
– metadata not just data about data
– metadata is data after data
– data first
–metadata second
– Reversible reaction (“round-tripping”)
Title: defrosting the digital library
Authors: Duncan Hull, Steve Pettifer and Douglas Kell
Published: 2008
Journal: PLoS Computational Biology
Tell me more?
What is it about?
Where did it come from?
![Page 6: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/6.jpg)
Chemistry(Science of Matter)
Biology(Science of Life)
Informatics(Science of Information)
CheminformaticsBiochemistry
Bioinformatics
Science!
www.mib.ac.uknactem.ac.uk/refine
www.citeulike.org
Metadata in:
![Page 7: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/7.jpg)
Representing Evidence For Interacting Network Elements
www.sbml.org from www.biomodels.net database at the EBI.ac.uk
![Page 8: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/8.jpg)
Example from Glycolysis in Yeast
reactant
reactant product
productmodifier
This is just one reaction, there are at least another 1700+ in Yeast
![Page 9: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/9.jpg)
Name Synonyms
D-Glucose dextrose; D-Glucose; D-(+)-glucose; D(+)-glucose; grape sugar; Traubenzucker
ATP Adenosine 5'-triphosphate; Adenosine triphosphate; H4atp
Hexokinase Hexokinase-1; Hexokinase-A; Hexokinase PI; YFR053C
ADP 5'-adenylphosphoric acid; Adenosine 5'-diphosphate; H3adp
Glucose-6-phosphate Robison ester, D-Glucose 6-phosphate
Synonyms from Pedro Mendes B-Net Databasehttp://www.comp-sys-bio.org/yeastnet/
![Page 10: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/10.jpg)
Chemistry
Biology Informatics
CheminformaticsBiochemistry
Bioinformatics
![Page 11: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/11.jpg)
Formoreinfo.
www.nactem.ac.uk/refine
One of the biggest challenges is getting hold of accurate metadata from libraries and databases
![Page 12: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/12.jpg)
But first…
• Before getting into the paper…
• Some lessons I learnt while working in industrial informatics for a small startup company called CSW Informatics Ltd
– Ford and BBC
• How business and governments manage metadata
![Page 13: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/13.jpg)
• Ford Focus (launched 1998)
getMetadata
getData
6 million+ “units” sold worldwide to date:america, europe, middle east, africa, australasia
Lots of data, metadata and money!
Owner’s handbook
Tell me more?
What is it about?
![Page 14: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/14.jpg)
Final solution:
Web XSLT Print
![Page 15: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/15.jpg)
Summary: Lessons from Ford
• Data often the tip of the iceberg
– If the data doesn’t sink you, the metadata will
• Businesses like Ford spent $ £ € keeping data and metadata stay together
• Data is often worthless without it
– Can’t sell data (cars) without metadata (manuals)
– Don’t just “make cars”
DATA
METADATA
![Page 16: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/16.jpg)
![Page 17: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/17.jpg)
BBC Spooks?
• Open Source Intelligence (OSINT)
• Overt not Covert espionage: 370 journalists, 24-7, ~100 languages Caversham, Reading.
Keeping an eye on people around the world
since 1939
Winston Churchill
“Big British Castle” (BBC)
![Page 18: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/18.jpg)
I
hate
powerpoint
Radio
MS Word
TV
![Page 19: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/19.jpg)
How do they stay in business?
Broadcasting House, London
Foreign governments, e.g. U.S.A. etc
![Page 20: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/20.jpg)
Word: Not the best way to manage data and metadata
![Page 21: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/21.jpg)
Getting Rid of Worddatabase
XML schema
Web & Intranet
Printed documents
XSLT
![Page 22: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/22.jpg)
A solution that worked!
getMetadata
getData
Who is Thabo Mbeki?
These documents are all about Thabo Mbeki
Thabo Mbeki
![Page 23: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/23.jpg)
Summary: Lessons from the BBC
• Important decisions made on the basis metadata
– Crucial that metadata is accurate, high quality and trustworthy
– Identify people properly is crucial (100%)
– You know what data is about (getMetadata)
– You know where it came from (getData)
– Looked after properly (this can be expensive)
– Businesses built on buying/selling metadata:
![Page 24: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/24.jpg)
How have libraries managed metadata?
On paper since 300 B.C.
(Library of Alexandria)
Organised in physical space
In buildings made from bricks and mortar
Expensive and slow distribute
Only ever read by humans
Filled with content bought from publishers, locked up with copyright
Image via http://en.wikipedia.org/wiki/Library_of_Alexandria
![Page 25: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/25.jpg)
From ~1824 until~1989
Photos via dpicker http://www.flickr.com/photos/dpicker/3107856991/ and pit yacker http://www.flickr.com/photos/78825653@N00/131611136
JRULM (Main Library)Joule Library
Mostly “private” only available to an elite (e.g. University of Manchester Students and Staff)
![Page 26: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/26.jpg)
Metadata (after)
Data
Tightly bound (literally)
Rarely separated
First published 1687, over 300 years old
![Page 27: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/27.jpg)
Data and metadata was like this for centuries!
• Until…
![Page 28: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/28.jpg)
+
Tim Berners-Lee
1989
![Page 29: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/29.jpg)
Timeline: Unchanged for centuries but…
20 years ÷
2309 years
= <1%
![Page 30: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/30.jpg)
Everything’s Gone Digital!
www.scopus.com
www.pubmed.gov
http://ukpmc.ac.uk
www.isiknowledge.com
scholar.google.com
![Page 31: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/31.jpg)
Digital Utopia?
• Bits and bytes 1010100101000001101010 (not paper)
• In pervasive cyberspace (not physical space)
• Databases and/or Web identified by URIs: (not buildings)
• Cost of distribution fallen by orders of magnitude
• Read and indexed by machines like Googlebot et al (not just humans)
• Increasingly public, available to everyone via Open-Access publishing (less private, less restrictive copyright)
• Everything is great?
Alexander Griekspoor
www.mekentosj.com
![Page 32: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/32.jpg)
Welcome to Digital Dystopia
• Isolation
– each discipline has its own data silo
• Impersonal and unsociable
– “who the hell are you”?
– Where are “my” papers? (authored by me, or of interest to me)
– What are my friends and colleagues reading?
– What are the experts reading? What is popular this week / month / year ?
• “Cold”: Identity of publications and authors is inadequate
• Data divorced from its metadata
– GetMetadata / GetData unreliable
– Therefore can be difficult to tell what data is about, or where metadata came from
• Obsolete models of publication, not everything fits publication-sized holes
– Micro-attribution
– Mega-attribution
– Digital contributions (databases, software, wikis/blogs?)
![Page 33: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/33.jpg)
Isolated publication silos
Chemistry
Informatics
Biology
impersonal,isolated, unsociable,Generally rubbish
![Page 34: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/34.jpg)
Identity Crisis part 1: Which publication?
1. http://pubmed.gov/18974831
2. http://www.ncbi.nlm.nih.gov/pubmed/18974831
3. http://ukpmc.ac.uk/articlerender.cgi?accid=pmcA2568856
4. http://ukpmc.ac.uk/picrender.cgi?artid=1687256&blobtype=pdf
5. http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000204
6. http://www.dbkgroup.org/Papers/hull_defrost_ploscb08.pdf
7. http://dx.doi.org/10.1371/journal.pcbi.1000204
• One paper, many URIs. Disambiguation algorithms rely on getting metadata for each
– Big problem for libraries is these redundant duplicates
• Matching can be done by Digital Object Identifier (DOI) and PubMed ID (PMID);
– these are frequently absent < 5% (Kevin Emamy, citeulike)
![Page 35: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/35.jpg)
Identity crisis part 2: Who are you? Who, who … who, who?
1. Douglas Kell
2. Doug Kell
3. Douglas B Kell
4. Kell, D
5. Kell, D.B.
6. Douglas Bruce Kell
7. Druglas Kell
Neil Smalheiser and Vetle Torvik
Typo
Attribution would seem to be a simple process and yet it represents a
major, unsolved problem for information science.
http://tinyurl.com/authorid
![Page 36: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/36.jpg)
Identity crisis part 3: Mistaken Identity
Google Scholar thinks I’m Maurice Wilkins
Dr. Duncan HullHumble Postdoc
Articleabout Authored-by
Authored-by
Wrong!
“DNA mania”
title
http://tinyurl.com/mistakenid
![Page 37: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/37.jpg)
Can’t get metadata (decoupled from data): PDF
getMetadata
getData
Title: defrosting the digital library
Authors: Duncan Hull, Steve Pettifer and Douglas Kell
Published: 2008
Tell me more
Don’t know,
Try google
Don’t know,
Title might be
“defrosting…”
Where did this come from?
![Page 38: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/38.jpg)
Can’t get metadata (decoupled from data): PDF
Why can't I manage
academic papers like MP3s?
http://tinyurl.com/mp3vpdf
James Howison, Carnegie Mellon University
Data is tightly coupled to its metadata
MP3 music file in iTunes
getMetadata
getData
Artist: The Who
Title: Who Are You?
Recorded: 1978
Album: Who Are You
![Page 39: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/39.jpg)
Can’t get metadata (decoupled from data): PDF
Peter Murray-Rust
Hamburger
(unstructured data)
PDF is a hamburger,
and we're trying to turn it
back into a cow.
http://tinyurl.com/pdfhamburger
Cow (structured data)
publishing
text-mining
![Page 40: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/40.jpg)
Can’t get metadata (decoupled from data): HTTP
• Arbitrary URI (not just pubmed, but any scientific paper) http://www.ncbi.nlm.nih.gov/pubmed/18974831
![Page 41: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/41.jpg)
Can’t get metadata (decoupled from data): HTTP
• Fundamental problem with the way the web is built using HTTP, can’t change it now…
Tim Bray, Sun Microsystems
One of the Web's distinguishing features
is that there's a big gaping hole where the metadata ought to be.
http://tinyurl.com/nometadata
![Page 42: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/42.jpg)
I’ll stop moaning now
• Isolation
• Can’t identify people
• Can’t identify publications
• Metadata gets divorced from its data
• But what are the solutions?
![Page 43: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/43.jpg)
www.citeulike.org
Richard CameronKevin EmamyPicture from http://network.nature.com/people/mfenner/blog/2009/01/30/interview-with-kevin-emamy and http://www.citeulike.org/faq/faq.adp
The reason I wrote the site [citeulike.org] was, after recently coming back to academia,
I was slightly shocked by the quality of some of the tools available to help academics
do their job. I found it preferable to start writing proper tools for my own use than to use existing
software.
![Page 44: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/44.jpg)
Why should you care about citeulike?
1. Could save you time
2. But also like Green Fluorescent Protein…
![Page 45: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/45.jpg)
All references in one place
![Page 46: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/46.jpg)
Click Post to Citeulike
![Page 47: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/47.jpg)
Tag it (optional)
![Page 48: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/48.jpg)
Citeulike: Recoupling data and metadata
• Wouldn’t be a problem if the publishers hadn’t decoupled it in the first place!
![Page 49: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/49.jpg)
Citegeist = Citeulike + Zeitgeist
![Page 50: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/50.jpg)
How Big?
0
2
4
6
8
10
12
14
16
Scopus Citeulike Pubmed Arxiv
Library / Database
Publications (millions)
Size
allegedly
2,243,177
~2,000 /day
variable
674,076
2,880 /day
2 papers / min
Linear growth
~500,000
![Page 51: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/51.jpg)
Where will citeulike break?
• The more people that use “social software”, the better they get
– Citeulike is one of the leading ones, but there is plenty of competition
• Parsers are fragile, easily (and deliberately) broken by publishers
– ISI WOK and Scopus
– Each publisher has its own parser (euuuggh!)
• Privacy and competition
– “I don’t want to share any of my data before publication”
– “It’s nobody’s business but mine” (basic human right to privacy)
• Closer integration with Word (and latex tools)
• Might go bust? Why put all my precious data in the hands of a commercial company?
![Page 52: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/52.jpg)
Why should you bother with citeulike?
• Organisation and time saving
– Searching
– Browsing
– Managing references while writing papers
• Quick and efficient sharing of data before publication
– e.g. tag “defrost” when writing this paper
– http://www.citeulike.org/tag/defrost
• Serendipity
– Casey Bergman story
![Page 53: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/53.jpg)
Casey Bergman story
I was importing papers on solexa and 454
genome assembly and came across the following paper:
http://www.citeulike.org/user/cisevol/article/1465689
which was a real find in terms of convincing me
that light shotgun sequence data is worth analysing.
I nicked this from a phd student's library in Brazil
http://www.citeulike.org/profile/GustavoLacerda
Wouldn’t have found this any other way e.g(keyword searching or following citation trails)
![Page 54: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/54.jpg)
Many different solutions
e.g. Papyro: Steve Pettifer
http://utopia.cs.manchester.ac.uk/
![Page 55: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/55.jpg)
And the rest…
www.mendeley.com
www.zotero.org
www.connotea.org
www.mekentosj.com
www.hubmed.org
Re-couple metadata that has be de-coupled from data
www.2collab.com
www.refworks.com
“iTunes for PDF files”
![Page 56: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/56.jpg)
There is still lots more metadata
How many times has http://pubmed.gov/19060304
been cited?
Who has cited http://pubmed.gov/19060304 ? Give me all the references that cite this one
Give me all the referencescited by http://pubmed.gov/19060304
Who the hell is Doug Kell?Steve Pettifer?Duncan Hull?
What is Doug Kell’s h-index?
Remember: Machines ask these questions, not just humans
Notify me wheneverSteve Pettifer
publishes a paperNotify me whenever
someone citeshttp://pubmed.gov/1906030
4
Impact factor?
![Page 57: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/57.jpg)
Digital Identity would solve some of these problems
Give yourself a URI, you deserve it!
Tim Berners-Lee http://www.w3.org/People/Berners-Lee/card#i
see http://dig.csail.mit.edu/breadcrumbs/node/71
![Page 58: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/58.jpg)
URI’s for Douglas Kell
1. http://blogs.bbsrc.ac.uk
2. http://www.chemistry.manchester.ac.uk/aboutus/staff/showprofile.php?id=194
3. http://dbkgroup.org/kell.htm
4. http://douglaskell.myopenid.com
5. http://dx.doi.org/10.1371/journal.pcbi.1000204
“Contributor identifier” from
www.myopenid.com
www.openid.net
(Also Note researcher-id from thomson)
![Page 59: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/59.jpg)
• http://pubmed.gov/19112480 Phil Bourne
![Page 60: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/60.jpg)
John Ziman, Physicist
Science is public
knowledgehttp://tinyurl.com/publicknowledge
![Page 61: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/61.jpg)
Conclusions: What hasn’t changed
• The Web has revolutionised libraries in just 20 short years but…
• Still takes time for humans to read and digest: We can get more papers but there are still only 24 hours in a day, 7 days in a week, 52 weeks in a year
– We need help from machines (and the people that build them)
– Need to make metadata more machine-friendly
![Page 62: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/62.jpg)
Conclusions: Publication metadata matters
• Managed to convince you metadata matters (and why)
• People make important decisions based on metadata
– Funding
– Hiring (and Firing)
– Publishing
– Who to collaborate with
Yet our current libraries can’t even accurately identify crucial metadata
Individual people - digital identity needed
Publications - disambiguation
Everything else…
![Page 63: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/63.jpg)
Conclusions: Scientists are too blasé about metadata!
• Leave it to stamp collectors, dusty-librarians, informaticians, database administrators (yawn!), “biocurators” http://biocurator.org/
– Boring, unscientific, not cutting-edge innovation?
• Everyone wants to use good metadata but few people want to spend time curating and cleaning metadata
– Like a clean toilet
• We ignore metadata at our peril “not my job”
– We leave it to publishers, who then mess it up, and charge us for their services, we should be getting better value for money
– We waste precious time organising metadata
– We waste precious time searching for metadata
– Data is more valuable with better metadata
• Have a look at citeulike (and other tools)
metadata
![Page 64: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/64.jpg)
Conclusions: Do us a favour!
![Page 65: Defrosting the Digital Library: A survey of bibliographic tools for the next generation web](https://reader031.fdocuments.in/reader031/viewer/2022013011/5550125cb4c905af648b49b1/html5/thumbnails/65.jpg)
Acknowledgements
• Refine project: Sophia Ananiadou, Jun'ichi Tsujii, Pedro Mendes, Steve Pettifer, Yoshimasa Tsuruoka, Douglas Kell www.nactem.ac.uk/refine
• BBSRC grant code BB/E004431/1
• CSW Informatics Ltd.: John Chelsom, Mavis Cournane, Niki Dinsey www.csw.co.uk BBC Monitoring, Ford Motor Company
• School of Chemistry, MIB (now) www.mib.ac.uk
• Faculty of Life Sciences (a long long time ago) and Casey Bergman, Jean-Marc Schwartz (now)
• School of Computer Science (not so long ago) Information Management Group http://img.cs.man.ac.uk/
• Any Questions?