Linked Open Data: a new resource for eResearch Dr Anne Cregan eResearch Analyst, Intersect and ANDS...

download Linked Open Data: a new resource for eResearch Dr Anne Cregan eResearch Analyst, Intersect and ANDS anne.cregan@intersect.org.au.

If you can't read please download the document

Transcript of Linked Open Data: a new resource for eResearch Dr Anne Cregan eResearch Analyst, Intersect and ANDS...

  • Slide 1
  • Linked Open Data: a new resource for eResearch Dr Anne Cregan eResearch Analyst, Intersect and ANDS [email protected]
  • Slide 2
  • What this talk will cover Open data The web of data RDF triples RDF graphs The Linked Open Data project Publishing to the web of data Consuming the web of data
  • Slide 3
  • Open data The philosophy and practice of making data freely available to everyone, without restrictions from copyright, patents or other mechanisms of control.
  • Slide 4
  • Why make data open? Public money was used to fund the work, so it should be available to the public. Facts cannot legally be copyrighted. Sponsors of research do not get full value for money unless the resulting data are made freely available In scientific research, the rate of discovery is accelerated by better access to data. Source: How to Make the Dream Come True: The Astronomers Data Manifesto (Norris, 2007)
  • Slide 5
  • How to make open data useful Principles Make it easy to find Make it available to everyone Separate it from the applications that use it Interlink it with related datasets in a meaningful way Make it machine processable
  • Slide 6
  • The web of data The web of data = a naming model + a data model on the web Its a web of interlinked data that machines can read (whereas the web is a web of interlinked documents for people to read) Also known as the Semantic Web because of its formal semantics for reasoning and its relationship to meaning
  • Slide 7
  • The web of data It is an initiative of the World Wide Web Consortium (W3C), and is a collaborative effort of many parties It derives from W3C director Sir Tim Berners- Lee's vision of the Web as a universal medium for data, information, and knowledge exchange. Like the web, anyone can publish to it: anyone can say anything about anything.
  • Slide 8
  • The web of data It is an initiative of the World Wide Web Consortium (W3C) and is a collaborative effort of many parties It derives from W3C director Sir Tim Berners- Lee's vision of the Web as a universal medium for data, information, and knowledge exchange. Like the web, anyone can publish to it: anyone can say anything about anything. However, they need to say it in RDF, not HTML.
  • Slide 9
  • The web of data It is an initiative of the World Wide Web Consortium (W3C) and is a collaborative effort of many parties It derives from W3C director Sir Tim Berners- Lee's vision of the Web as a universal medium for data, information, and knowledge exchange. Like the web, anyone can publish to it: anyone can say anything about anything. However, they need to say it in RDF, not HTML. And anything they want to talk about has to be a URI.
  • Slide 10
  • URI = Uniform Resource Identifier The naming model for the web of data A URI is a unique name that identifies a resource A resource is anything to which we can attach identity A resource can be an information object, like a document or a webpage, but it can also be a real world object, like a person. It can be anything at all. For example: A URL is a kind of URI that names the resource and also indicates a means of acting upon or obtaining it via its primary access mechanism e.g. http, ftp URL: http://www.w3.o rg/People/Berne rs-Lee/ URL: http://www.w3.org/ TR/rdf-concepts/
  • Slide 11
  • RDF = Resource Description Framework A framework for describing and linking resources on the web Allows URIs to be connected into a directed graph Based on the idea of triples Subject Predicate Object
  • Slide 12
  • RDF = Resource Description Framework A framework for describing and linking resources on the web Allows URIs to be connected into a directed graph Based on the idea of triples: e.g. intersect.org.au/inter sect- team/AnneCregan intersect.org.au doac:organization
  • Slide 13
  • RDF = Resource Description Framework intersect.org.au doac:organization ands.org.au doac:organization Putting triples together creates a graph intersect.org.au/inter sect- team/AnneCregan
  • Slide 14
  • RDF = Resource Description Framework intersect.org.au doac:organization ands.org.au doac:organization Putting triples together creates a graph Nodes of the graph are URIs and literals intersect.org.au/inter sect- team/AnneCregan Anne foaf:firstName
  • Slide 15
  • RDF = Resource Description Framework intersect.org.au doac:organization ands.org.au doac:organization Has a schema to describe relationships between things, called RDF Schema intersect.org.au/inter sect- team/AnneCregan Anne foaf:firstName
  • Slide 16
  • RDF = Resource Description Framework intersect.org.au doac:organization ands.org.au doac:organization Is a World Wide Web consortium (W3C) Recommendation Is part of the Semantic Web stack intersect.org.au/inter sect- team/AnneCregan Anne foaf:firstName
  • Slide 17
  • Semantic Web Technology Stack The Semantic Web standards build on each other URI is the naming mechanism RDF, RDF-Schema and OWL are the languages for describing resources and relationships between them SPARQL is a query language for querying RDF graphs
  • Slide 18
  • RDF Graphs Putting triples together creates a directed graph
  • Slide 19
  • RDF Graphs Putting triples together creates a directed graph
  • Slide 20
  • RDF Graphs Graphs can be interconnected by referring to URIs in other graphs
  • Slide 21
  • RDF Graphs
  • Slide 22
  • Linking Open Data Project Community project of the W3C Semantic Web and Outreach (SWEO) group Started in 2007 Has grown rapidly by members of the community adding open datasets Has created the largest existing RDF graph over 18 billion triples!
  • Slide 23
  • Linking Open Data Project October 2007
  • Slide 24
  • Linking Open Data Project September 2008
  • Slide 25
  • Linking Open Data Project July 2009
  • Slide 26
  • Linking Open Data Project July 2009
  • Slide 27
  • Linking Open Data Project April 2010
  • Slide 28
  • Linking Open Data Project As at May 2009 had created a linked open data cloud of 4.7 billion RDF triples; in April 2010 Linked Open Numbers added another 14 billion triples Datasets include: DBpedia linked data version of wikipedia US Census 2000 US Census data set Gene Ontology annotations from Gene Ontology db Drug bank info about FDA approved drugs UniProt life sciences data set Lots of bio/life sciences data sets - BIO2RDF cloud More info at http://esw.w3.org/topic/TaskForces/CommunityProje cts/LinkingOpenData/DataSets http://esw.w3.org/topic/TaskForces/CommunityProje cts/LinkingOpenData/DataSets
  • Slide 29
  • Publishing to the Linked Open Data Cloud Principles 1.Use URIs to name things 2.Use HTTP URIs so you can look up those things on the web 3.When someone looks up a URI, provide useful information (dereference-able) 4.Include RDF statements that link to other URIs so that they can discover related things These principles are from Tim Berners-Lees 2007 note: http://www.w3.org/DesignIssues/LinkedData.html
  • Slide 30
  • Consuming linked open data Browsing linked data is easy You need an RDF Browser like Tabulator, Disco, Zitgist, Marbles and OpenLink Lets go for a ride on Disco: http://www4.wiwiss.fu- berlin.de/rdf_browser/ Start here: http://www.w3.org/People/Berners-Lee/card#i http://www4.wiwiss.fu- berlin.de/rdf_browser/ We can travel through the linked open data cloud between URIs linked using RDF RDF Browsers include Marbles http://www5.wiwiss.fu-berlin.de/marbles
  • Slide 31
  • Consuming linked open data eResearch example: Enabling drug discovery Data sets published to the data cloud: Linked CTLinked Clinical Trials 60,000 trials in 158 countries DrugBankFDA-approved drugs 5,000 small molecule and biotech drugs DiseasomeDisorders and Disease genes 4,300 Disorders, disease genes and associations DailyMedChemical structures of marketed drugs 124,000 triples and 29,600 links SWAN Alzheimers Hypothesis Browser Knowledgebase
  • Slide 32
  • Consuming linked open data Using an RDF browser: See all drugs in trials for Alzheimers disease in Linked CT, including a Phase III trial for Varenicline Follow a link to data from DailyMed showing that Varenicline is already on the market for nicotine addition. The typical dose is 1mg twice daily and the Linked CT trial used no higher than that so no new safety issues. Link to DrugBank to find that Varenicline is an alpha-4 beta-2 neuronal nicotine acetylcholine receptor agonist. Diseasome indicates that the corresponding genes are only important in nicotine addiction, not Alzheimers. But the SWAN Knowledgebase shows there are hypotheses relating Alzheimers to nicotinic receptors through amyloid beta.
  • Slide 33
  • Consuming linked open data Using the linked open data cloud with an RDF browser, able to : Browse data relating to companies, clinical trials, drugs, diseases and genetic variation See when extra data is available Gain access to data without needing to map identifiers and synonyms interlinking has already been done Gain additional insights about interesting questions to ask Jentzsch et al Enabling Tailored Therapeutics with Linked Data events.linkeddata.org/ldow2009/papers/ldow2009_paper9. pdf
  • Slide 34
  • Consuming linked open data Querying using SPARQL Queries A SPARQL endpoint enables users (human or other) to query a knowledge base via the SPARQL language. Results are typically returned in one or more machine-processable formats. Examples: http://wiki.dbpedia.org/OnlineAccess http://wiki.dbpedia.org/OnlineAccess
  • Slide 35
  • Types of Queries Selection and extraction queries retrieve parts of the data based on its content, structure, or position Reduction queries specify which part of the data not to include in the answer Restructuring queries restructure data into possible formats/serialisations Aggregation queries aggregate several data item into one new data item Combination and inference queries combine information that is not explicitly connected
  • Slide 36
  • Summary Open data The web of data RDF triples RDF graphs The Linked Open Data project Publishing to the web of data Consuming the web of data
  • Slide 37
  • Thankyou More details are at http://linkeddata.org/ http://linkeddata.org/ http://esw.w3.org/topic/SweoIG/TaskForces/Communit yProjects/LinkingOpenDatahttp://esw.w3.org/topic/SweoIG/TaskForces/Communit yProjects/LinkingOpenData http://www.w3.org/2001/sw/http://www.w3.org/2001/sw/ Questions and comments may be emailed to [email protected]@intersect.org.au