Unlocking Taxonomic Literature II using Linked Open Data
-
Upload
joel-richard -
Category
Technology
-
view
383 -
download
2
description
Transcript of Unlocking Taxonomic Literature II using Linked Open Data
![Page 1: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/1.jpg)
Joel Richard, Smithsonian Libraries
Unlocking Taxonomic Literature II
using Linked Open Data
![Page 2: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/2.jpg)
• What is Linked Open Data / The Semantic
Web?
• Where can I see LOD in use?
• What is Taxonomic Literature II?
• How is it being converted to LOD?
• Did we encounter any challenges?
Agenda
![Page 3: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/3.jpg)
Linked dataFrom Wikipedia, the free encyclopedia
A method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies … [and] extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried.
What is Linked Open Data?
http://en.wikipedia.org/wiki/Linked_Open_Data
![Page 4: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/4.jpg)
What is the Semantic Web?
Semantic WebFrom Wikipedia, the free encycloped
A movement led by the World Wide Web Consortium… to promote common data formats on the Web.
By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web dominated by unstructured and semi-structured documents into a "web of data".
"The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries."
http://en.wikipedia.org/wiki/Semantic_Web)
![Page 5: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/5.jpg)
Five Stars of Linked Open Data
Available on the web (in any format) but with an open license, to be Open Data.
Available as machine-readable structured data (e.g. excel instead of image scan of a table.)
As (2) plus non-proprietary format (e.g. CSV instead of Microsoft Excel.)
All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff.
All the above, plus: Link your data to other people’s data to provide context.
What is Linked Open Data?
★
★★
★★★
★★★★
★★★★★
http://www.w3.org/DesignIssues/LinkedData.html
![Page 6: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/6.jpg)
What is Linked Open Data?
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
![Page 7: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/7.jpg)
What is Linked Open Data?
Charles Darwin
“Feb 12, 1809”
Shrewsbury
Born On
Born In
City
England
Type
Is In
Person
Type
Country
Type
Charles Darwin “Feb 12, 1809”Born On
Identifier Predicate Identifier / Value(subject) (verb/relationship) (object)
On the Originof Species
Author Of
![Page 8: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/8.jpg)
Tim Berners-Lee outlined four principles for linked open data:
1. Use URIs to denote things.
2. Use HTTP URIs so that these things can bereferred to and looked up ("dereferenced") by people and user agents.
3. Provide useful information about the thing when its URI is dereferenced, leveraging standards such as RDF, SPARQL.
4. Include links to other related things (using their URIs) when publishing data on the Web.
What is Linked Open Data?
http://www.w3.org/DesignIssues/LinkedData.html
http://5stardata.info/
![Page 9: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/9.jpg)
What is Linked Open Data?
http://dbpedia.org/resource/Charles_Darwin
“Feb 12, 1809”
http://dbpedia.org/resource/Shrewsbury
Born On
Born In
City
http://dbpedia.org/resource/United_Kingdom
Type
Is In
Person
Type
Country
Type
Identifier Predicate Identifier / Value
http://dbpedia.org/resource/On_the_Origin_of_Species
Author Of
Predicate Identifier / Value
![Page 10: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/10.jpg)
What is Linked Open Data?
Predicate Vocabularies• Dublin Core – General Metadata for Discovery• SKOS – Simple Knowledge Organization
System• BIBO – Bibliographic Ontology• BIO – Biographical • FOAF – Friend of a Friend• Events…• Geographic…• Many others!• OWL – Web Ontology Language
![Page 11: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/11.jpg)
What is Linked Open Data?
Mondeca Labs
Linked Open Vocabularies (LOV)
Vocabulary of a Friend(VOAF)
A vocabulary for describing other vocabularies
http://labs.mondeca.com/dataset/lov
![Page 12: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/12.jpg)
What is Linked Open Data?
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .@prefix dbpedia-owl: <http://dbpedia.org/ontology/> .@prefix dbpprop: <http://dbpedia.org/property/> .
<http://dbpedia.org/resource/Charles_Darwin> rdf:type <http://xmlns.com/foaf/0.1/Person>; rdf:type <http://dbpedia.org/ontology/Scientist>; foaf:name “Charles Darwin”; foaf:depiction “http://upload.wikimedia.org/…/Charles_Darwin_seated_crop.jpg”; dbpedia-owl:field <http://dbpedia.org/resource/Natural_history> dbpprop:placeOfBirth "Mount House, Shrewsbury, Shropshire, England”; dbpedia-owl:birthDate "1809-02-12"; dbpedia-owl:birthPlace <http://dbpedia.org/resource/Shrewsbury> dbpedia-owl:deathDate "1882-04-19"; dbpedia-owl:deathPlace <http://dbpedia.org/resource/Down_House> dbpprop:awards <http://dbpedia.org/resource/Royal_Medal>
![Page 13: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/13.jpg)
What is Linked Open Data?
Benefits of Linked Open Data
• Disambiguation
• Connecting Relevant Content
• More visibility via Search
• Enrichment of your data
• Easier reuse of data
![Page 14: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/14.jpg)
Linked Open Data in Use
Google Knowledge Graph
![Page 15: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/15.jpg)
Linked Open Data in Use
Google Knowledge Graph
![Page 16: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/16.jpg)
Linked Open Data in Use
![Page 17: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/17.jpg)
Congress: Linked Data Serviceshttp://id.loc.gov/
Schema.orghttp://www.schema.org
Data.gov / Semantichttp://www.data.gov/semantic
Linked Data.orghttp://linkeddata.org/
Stephen Dale: Linked Data in Actionhttp://www.slideshare.net/stephendale/linked-data-in-action-4487244
Other LOD Examples and Information
![Page 18: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/18.jpg)
Taxonomic Literature: A selective guide to botanical publications and collections with dates, commentaries and types. (Stafleu et al.)
Essential Reference Tool for Botanists
Authors and their Publications from1753 to 1940
It is a “database in book form.”
Taxonomic Literature II
![Page 19: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/19.jpg)
Taxonomic Literature II
![Page 20: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/20.jpg)
Taxonomic Literature II
![Page 21: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/21.jpg)
Taxonomic Literature II
![Page 22: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/22.jpg)
Taxonomic Literature II
![Page 23: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/23.jpg)
Taxonomic Literature II
![Page 24: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/24.jpg)
Scanned the pages.
Uploaded to the Internet Archive.
Hired contractor for OCR and correction (99.97% accuracy.)
Received XML dataset from Contractor.
Verified and Imported to SQL Server Database.
Built a website to search the data.
Taxonomic Literature II
![Page 25: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/25.jpg)
Taxonomic Literature II
![Page 26: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/26.jpg)
First...what does 99.97% accuracy mean?
Taxonomic Literature II
~12,000 Errors
![Page 27: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/27.jpg)
1. Select Identifiers for our data
http://library.si.edu/digital-library/tl-2/author/darwin
http://library.si.edu/digital-library/tl-2/title/origin_of_species
http://library.si.edu/digital-library/tl-2/title/1313
2. Choose vocabularies for predicates (harder than it sounds)
OWL, FOAF, DublinCore, OpenGraph, SIOC, SKOS, BIBO, etc.
3. Create Links to other data sources on the web.
Taxonomic Literature II
![Page 28: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/28.jpg)
Taxonomic Literature II as Linked Data
http://library.si.edu/tl2/author/darwin
http://library.si.edu/tl2/title/1313
tl2:creator <http://library.si.edu/tl2/title/1313>
owl:sameAs <http://viaf.org/viaf/27063124>
dc:creator <http://library.si.edu/tl2/author/darwin>
owl:sameAs http://www.archive.org/details/originofspecies00darwuoft
owl:sameAs <http://www.worldcat.org/oclc/425919213>
Select Identifiers
![Page 29: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/29.jpg)
Taxonomic Literature II as Linked Data
<http://library.si.edu/tl2/author/darwin>
rdf:type <http://xmlns.com/foaf/0.1/Person>
foaf:lastName “Darwin”
foaf:familyName “Darwin”
foaf:firstName “Charles”
foaf:givenName “Charles”
foaf:name “Darwin, Charles Robert”
skos:prefLabel “Darwin, Charles Robert”
bio:birth “1809”
bio:death “1882”
skos:defintion “British evolutionary biologist”
tl2:personAbbreviation “Darwin”
Select Identifiers: Authors
![Page 30: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/30.jpg)
Taxonomic Literature II as Linked Data
<http://library.si.edu/tl2/book/1313>
rdf:type <http://purl.org/ontology/bibo/Book>
tl2:titleNumber “1313”
tl2:titleAbbreviation “Origin sp.”
tl2:shortTitle “On the origin of species”
dc:title “On the origin of species by means of natural
selection, or the preservation of favoured races in the...”
dc:publisher “John Murray”
event:place “London”
dc:created “1859”
Select Vocabularies: Publications
![Page 31: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/31.jpg)
Taxonomic Literature II as Linked Data
Linking: Author Names
Used a combination of OpenRefine and LODRefine as well as custom code.
Results: Mixed
• Matched 15 - 20% of the names in our sample set• Some named weren’t high in the list and required a
human touch
Conclusion: Computer code needs to be improved with the aim of minimizing amount of staff or volunteer time spent matching names.
![Page 32: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/32.jpg)
Taxonomic Literature II as Linked Data
Charles Darwin(From the dbpedia.org)
![Page 33: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/33.jpg)
Taxonomic Literature II as Linked Data
Linking: Herbaria
Used computer code to split the herbarium names and identify them in data provided by the Biodiversity Collections Index.
Results: Good
• Matched 95+% of the herbarium names in all of TL-2• Careful attention to “A” which is an herbarium, but
also starts some sentences in the HERBARIUM and TYPES blocks
Conclusion: These will be added to TL-2 when it is launches as LOD.
![Page 34: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/34.jpg)
Taxonomic Literature II
Missouri Botanical Garden Herbarium (From the Biodiversity Collections Index)
Lsid urn:lsid:biocol.org:col:15859Name Missouri Botanical Garden HerbariumCode MOKind HerbariumTaxon Scope Herbarium collection limited to vascular plants (5.6 million
specimens) and bryophytes (500,000 specimens), Jan. 2009.Geo Scope Worldwide; phanerogams strong in Central America (especially
Costa Rica, Nicaragua, and Panama), tropical South America. . .Size 6,150,000Founded Year 1859Web Site http://www.mobot.org/Location Street P.O. Box 299Location City Saint LouisLocation State MissouriLocation Postcode 63166-0299Location Country Iso US
http://www.biodiversitycollectionsindex.org/urn:lsid:biocol.org:col:15859
![Page 35: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/35.jpg)
Taxonomic Literature II as LOD
How are we going to store all this?
We’re using Drupal – automatically embed some Linked Open Data elements in the webpage.
Probably not a good idea for very large datasets.
TL-2 = 10,000 authors + 37,000 titles (about 400,000 triples, but growing)
![Page 36: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/36.jpg)
TL-2 and LOD Challenges
Performance of Drupal Import:Feeds Import: 7 Hours for 35,000 “Records” or Drupal NodesOther options? Still searching…
Our linked data set will grow to at least 600-700k Drupal nodes.
Is Drupal the best way to do this?
![Page 37: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/37.jpg)
Challenges
• Errors in the Corrected OCR
• Challenges in Parsing Citations
• The 80/20 rule: manually making connections unable to be made by automated means
• Finding suitable sources of data to link to. (DBPedia? VIAF? EOL? Others?)
![Page 38: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/38.jpg)
Summary
• This data may already exist online.
• It may also not always be as accurate as needed for science.
• We are in a position to be the authoritative source for this information.
• Linked Data allows it to be easily reused and shared.
![Page 39: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/39.jpg)
Closing: something fun
One example of reuse
Ryan Schenk http://synynyms.com/
![Page 40: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/40.jpg)
Closing: something fun
One example of reuse
Ryan Schenk http://synynyms.com/
![Page 41: Unlocking Taxonomic Literature II using Linked Open Data](https://reader036.fdocuments.in/reader036/viewer/2022062303/5565c03fd8b42a5b488b4a08/html5/thumbnails/41.jpg)
Thank You!
Unlocking Taxonomic Literature IIusing Linked Open Data
Joel [email protected]/staff/joel-richard
Special thanks to
The International Association for Plant Taxonomy, for giving us permission to scan and digitize TL-2 and place it online.
For his advice and support, Dr. Laurence Dorr, Botanist and Curator, Department of Botany, Smithsonian National Museum of Natural History.
This project was partially funded by the Atherton Seidell Endowment Fund of the Smithsonian Institution.