An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led...

36
                                                                           We are leaving MARC and transitioning into RDA in order to make our bibliographic information accessible on the Semantic Web. “…the role of the national library is increasingly moving from a focus on ‘stored knowledge’ to one where ‘smart knowledge’ is paramount.” Libraries will increasingly be linking and connecting to content, which means that local collections will need to be placed within a larger national, and even international, context. Jisc. (2015, March 17). Report: A national monograph strategy roadmap. Downloaded March 30, 2015 from: http://www.jisc.ac.uk/reports/anationalmonographstrategyroadmap. 1

Transcript of An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led...

Page 1: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                    

                                                      

                                      

                 

       

We are leaving MARC and transitioning into RDA in order to make our bibliographic information accessible on the Semantic Web.

“…the role of the national library is increasingly moving from a focus on ‘stored knowledge’ to one where ‘smart knowledge’ is paramount.” Libraries will increasingly be linking and connecting to content, which means that local collections will need to be placed within a larger national, and even international, context.

Jisc. (2015, March 17). Report: A national monograph strategy roadmap. Downloaded March 30, 2015 from: http://www.jisc.ac.uk/reports/a‐national‐monograph‐strategy‐roadmap.

1

Page 2: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                                

       

The Web, as it is familiar to us, uses the HTTP protocol to retrieve information resources. Everything that lives on the Web is an information resource:

Documents; videos; image; music files…

2

Page 3: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                  

Information pulled from our MARC records using our OPAC interface is based on this model. This silos our information.

3

Page 4: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                      

                                  

                                                        

       

                     

The Semantic Web uses the HTTP protocol to identify real world, non‐information resources and the relationships between resources and non‐information resources.

People, places, abstract concepts can be linked to other non‐information resources and information resources and the relationships between them.

The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data to be shared and reused across application, enterprise, and community boundaries.

W3C. What is the Semantic Web? Downloaded February 12, 2014 from http://www.w3.org/2001/sw/

4

Page 5: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                                      

               

                                          

                 

                                          

             

                               

The goal of the Semantic Web is to move from a Web of Documents to an open inter‐connected Web of Data by doing the following using Open Linked Data:

Provide valuable, agreed‐upon information in a standard, open format.

Provide mechanisms to link individual schemas and vocabularies in a way so that people can note if their ideas are “similar” and related, even if they are not exactly the same.

Bring all this information to an environment which can be used by most, if not all of us. (Make data available free of proprietary software, single social networks, or web application.)

Bauer, Florian and Kaltenböck, Martin. (2012). Linked Open Data: The Essentials. Downloaded December 30, 2014 from: http://www.reeep.org/LOD‐the‐Essentials.pdf, p.25.

5

Page 6: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                 

                                                                      

 

                                                            

                             

                                                    

                                   

                     

                                                                        

                                                                      

           

                                                                                 

                                      

One way for us to do this is by connecting our records to the global Web of Data.

This is the Linking Open Data (LOD) Cloud Diagram. At the center is DBpedia, the Linked Data version ofWikipedia. The colors on the diagram represent broad topic areas. Library and cultural institutions are in the green area.

The LOD project is a community activity started in 2007 by the AW3C’s Semantic Web Education and Outreach (SWEO)Interest Group (www.w3.org/wiki/SweoIG) whose goal is to made data freely available to everyone.

The collection of Linked Data published on the Web is referred to as the LOD Cloud.

Open‐content projects as diverse as encyclopedias and dictionaries, government statistics, information regardingchemical and biological collections and endangered species, bibliographic data, music artists and information about theirsongs, and academic research papers are all available using the same data format and reachable using the same API.

The LOD cloud has doubled in size every 10 months since 2007.

More than 40% of the Linked Data in the LOD cloud is contributed by governments (mainly from the United Kingdom andthe United States), followed by geographical data (22%) and data from the life sciences domain (almost 10%).

Life sciences (including some large pharmaceutical companies) contribute over 50% of the links between datasets.Publication data (from books, journals, and the like) comes in second with 19%, and the media domain (the BBC, the New York Times, and others) provides another 12%.

The original data owners themselves publish one‐third of the data contained in the LOD cloud, whereas third partiespublish 67%. For example, many universities republish data from their respective governments in Linked Data formats,often cleaning and enhancing data descriptions in the process.

Wood, David, Zaidman, Marsha, and Ruth, Luke. (2014). Linked Data: Structured Data on the Web. Shelter Island, NY: Manning, p. 14.

6

Page 7: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                                  

                       

                                    

                 

                                                   

                                           

Linked Data refers to a set of best practices and techniques for publishing and connecting structured data on the Semantic Web using international standards of the W3C.

Data that conforms to these practices and techniques are also called Linked Data.

Wood, David, Zaidman, Marsha, and Ruth, Luke. (2014). Linked Data: Structured Data on the Web. Shelter Island, NY: Manning, p.4‐5.

The four Linked Data principles proposed by Tim Berners‐Lee are:

Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) Include links to other URIs so people can discover those things Berners‐Lee, Tim. (2009, June 18). Linked Data. Downloaded December 26, 2014 from http://www.w3.org/DesignIssues/LinkedData.html.

7

Page 8: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                                              

                   

                      

                     

                         

                                   

                                                                                          

                 

RDF (Resource Description Framework) is used to implement Linked Data on the Semantic Web. The Semantic Web Cake, aka Semantic Web Stack or Semantic Web Layer Cake displays the architecture of the Semantic Web.

The Semantic Web relies on the combination of the following technologies:

Explicit metadata: They allow webpages to carry their meaning on theirsleeves, Ontologies: They describe the mainconcepts of a domain and theirrelationships,Logical reasoning: it makes it possible todraw conclusions from combining datawith ontologies.

TRUST is also a major component of the Semantic Web: it relies on folks providing good, accurate information.

I emphasize this for two reasons: 1) trusting that the information accessed in the Web of Data is good information makesthe Web of Data work; and 2) it is important for us to make the case of the importance of good data within our owninstitutions.

Media Map. State‐of‐the‐art Situation. Downloaded February 12, 2014 fromhttp://mediamapproject.org/project_situation.html

8

Page 9: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                                    

                                                    

     

                            

                                                  

                                                                                                              

               

                               

The basic building block is a simple statement called a triple made up of a subject, predicate, and object. HTTP unique identifiers are assigned to subjects, predicates, and objects that have unique identifiers. Objects can also take the form of text which are not assigned unique identifiers. When this happens the object is known as a typed literal.

Subjects and objects with HTTP identifiers are displayed as circles. Literals are shown in rectangles.

Predicates are the arcs, or arrows between them. The predicates denotes the relationship between the subject and object. The arrow of the arc points from the subject to the object. It is possible to have a subject/object relationship between two resources in one graph and a reverse relationship between the same two objects in another graph which the subject from the first graph is the object in the second graph, and the object of the first graph is the subject in the second graph.

A set of triples is called an RDF graph.

“Is a” should be read as “Is an instance of” . Dotted arrows imply an inferred relationship.

9

Page 10: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                            

                                                      

                                                        

                                                            

                     

                                                                      

                           

                              

                                                                           

 

                                                                               

An RDF parser reads text in one or more RDF serializations (formats ‐ XML, N‐Triples, Turtle) and interprets it as triples in the RDF model.

An RDF serializer does the reverse; it takes a set of triples and creates a file that expressesthat content in one of the serialization forms (Allemang & Handler, p.51‐53).

The parser/serializer has no direct counterpart in a relational data‐backed system, at leastas far as standards go. (This is a key advantage of RDF stores over traditional data stores.)

An RDF parser inputs a file with a .rdf extension and converts it into an internalrepresentation of the triples that are expressed in that file. The triples are stored in thetriple store and are available for all the operations of that store.

An RDF store (aka triple store) is a database that is tuned for storing and retrieving data inthe form of triples. In addition to the familiar functions of a database, an RDF store has theadditional ability to merge information from multiple data sources, as defined by the RDFstandard.

The query engine provides the capability to retrieve information from an RDF storeaccording to structured queries.

An application has some work that it performs with the data it processes: analysis, userinteraction, archiving, and so forth. These capabilities are accomplished using someprogramming language that accesses the RDF store via queries (processed with the RDFquery engine).

Converters convert information from one form into RDF and often into a standard form of RDF like Turtle. Most RDF systems include a table converter of some sort. (Tabular data canbe mapped into triples in a natural way.) (Allemang & Handler, p.53).

10

Page 11: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                

                                               

        

Libraries around the world are contributing datasets to the LOD Cloud. You can access these datasets on datahub.

About the Datahub The Datahub provides the means to freely search data, register published datasets, create and manage groups of datasets, and get updates from datasets and groups you're interested in. Accessed April 5, 2015 from: http://datahub.io/organization?q=Library&sort=name+asc

11

Page 12: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                                

The Library of Congress’s Authorities and Vocabularies are linked in the LOD Cloud and LC provides a search interface so you can access their unique identifiers.

12

Page 13: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                          

                 

                                      

OCLC administers the Virtual International Authority File (VIAF). They are providing a unique identifier which links the authority records of National Libraries throughout the world who have submitted their records to VIAF.

OCLC is building tools and services based on these datasets. Scroll down a VIAF page and open the “About” bar.

13

Page 14: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                   Click on WorldCat Identities, and you will open an OCLC WorldCat Discovery page.

14

Page 15: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                   These VIAF and National Library identifiers are now available on Wikipedia

15

Page 16: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                                      

                                          

When you scroll down to the very bottom of a Wikipedia page, you will find Authority Control, where you will see the unique identifiers assigned by VIAF, LC, Getty, ISNI, ULAN, and National Libraries. These are the identifiers used in RDF graphs to uniquely identify the resources that we manage.

16

Page 17: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                                      

                                  

So when we update an Authority Record, that information is already linked in the LOD Cloud. This increases the exposure of library data to a broader linked global community.

This adheres to one of the major principles of Linked Data, to provide useful links to other datasets.

17

Page 18: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                                        

                         

The British Library is one of the forerunners in experimenting with how to represent library‐related information on the web. This is an image of one of their tests from 2010.

British Library. Journal Articles in RDFXML in datahub. Downloaded April 18, 2015 from

18

Page 19: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                

The Library of Congress is developing BIBFRAME as a general model for expressing and connecting bibliographic data.

19

Page 20: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                 It provides the means to replace a MARC record

20

Page 21: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

                                

                                                                                                                

 

                                        

                                                                     

                        

                                        

with an RDF serialization (or format), thereby providing the means to connect bibliographic information to the LOD Cloud.

What you see on this slide is a small portion of the RDF graph for the MARC record on the previous slide. The record was downloaded from OCLC as a MARC RDF/SML file. The record was converted into a BIBFRAME record using MarcEdit. It was opened in Notepad ++, copied and parsed using an RDF validator which produced the graph.

Many of these slides contain hyperlinks to their sources which can be accessed by right‐clicking on the slide and opening the hyperlink.

Resources I suggest checking out: The British Library Data Model for a monograph The British Library Data Model for a serial. The two Stanford resources: Report of the Stanford Linked Data Workshop and Stanford Linked Data Workshop Technology Plan. The Relationship between BIBFRAME and OCLC’s Linked‐Data Model of Bibliographic Description: A Working Paper And from the Reference List: Bauer, Florian and Kaltenböck, Martin. (2012). Linked Open Data: The Essentials. Downloaded December 30, 2014 from:

21

Page 22: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

http://www.reeep.org/LOD‐the‐Essentials.pdf

21

Page 23: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

22

Page 24: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

23

Page 25: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

24

Page 26: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

25

Page 27: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

26

Page 28: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

27

Page 29: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

28

Page 30: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

29

Page 31: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

30

Page 32: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

31

Page 33: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

32

Page 34: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

33

Page 35: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

34

Page 36: An introduction to the semantic web and bibframe · The Semantic Web is a collaborative effort led by then World Wide Web Consortium (W3C) to provide a framework that allows data

35