Querying Cultural Heritage

23
Querying Cultural Heritage Data Dr. Barry Norton, Development Manager, ResearchSpace* * Funded by the Andrew W. Mellon Foundation * Hosted by the Curatorial Directorate, British Museum

description

SPARQL queries for cultural heritage data in the CIDOC-CRM ontology with British Museum examples and exercises

Transcript of Querying Cultural Heritage

Page 1: Querying Cultural Heritage

Querying Cultural Heritage DataCultural Heritage Data

Dr. Barry Norton,Development Manager, ResearchSpace*

* Funded by the Andrew W. Mellon Foundation * Hosted by the Curatorial Directorate, British Museum

Page 2: Querying Cultural Heritage

Statements and Patterns

• For one edge in a graph:

bm-obj:EOC3130

crm:P52_has_current_owner

bm-id:the-british-museumbm-obj:EOC3130

Page 3: Querying Cultural Heritage

Statements and Patterns

• For one edge in a graph:

bm-obj:EOC3130

crm:P52_has_current_owner

bm-id:the-british-museum

• We can declare/retrieve one (N)Triple:

bm-obj:EOC3130

Page 4: Querying Cultural Heritage

Statements and Patterns

• For one edge in a graph:

bm-obj:EOC3130

crm:P52_has_current_owner

bm-id:the-british-museum

• We can declare/retrieve one (N)Triple:

• Or write this in Turtle:

bm-obj:EOC3130

@prefix crm: <http://erlangen-crm.org/current/> .

@prefix bm-obj: <http://collection.britishmuseum.org/id/object/> .

@prefix bm-id: <http://collection.britishmuseum.org/id/> .

bm-obj:EOC3130 crm:P52_has_current_owner bm-id:the-british-museum .

Page 5: Querying Cultural Heritage

Statements and Patterns

• For one edge in a graph:

bm-obj:EOC3130

crm:P52_has_current_owner

bm-id:the-british-museum

• We can write this in Turtle:

• And check for it in SPARQL:

bm-obj:EOC3130

bm-obj:EOC3130 crm:P52_has_current_owner bm-id:the-british-museum .

PREFIX crm: <http://erlangen-crm.org/current/>

PREFIX bm-obj: <http://collection.britishmuseum.org/id/object/>

PREFIX bm-id: <http://collection.britishmuseum.org/id/>

ASK {bm-obj:EOC3130 crm:P52_has_current_owner bm-id:the-british-museum}

true

Page 6: Querying Cultural Heritage

Statements and Patterns

• For a set of edges:bm-obj:EOC3130

bm-id:the-british-museum

crm:P51_has_former_or_current_owner

?

• We can do the work on the client:

• Or have the server do it by turning the triple into a triple pattern:

crm:P51_has_former_or_current_owner

?

bm-obj:EOC3130 crm:P51_has_former_or_current_owner ?owner

Page 7: Querying Cultural Heritage

Exercise

??

Questions:• Why is the answer different?• Who are the two (other) one-time owners?

Page 8: Querying Cultural Heritage

Solutions & Exercises• Why is the answer different?

– Reasoning, part of the work by the server (being a triplestore) means that if two things are related by crm:P52_has_current_ownerthen they’re related by then they’re related by crm:P51_has_former_or_current_owner

• This is part of the work that the server (triplestore) can do for you

• Exercise: query for the (strictly) former owners… ?

?

Page 9: Querying Cultural Heritage

Solution 1/2

• Using specific server functionality:

Page 10: Querying Cultural Heritage

Solution 2/2

• In pure SPARQL:

Page 11: Querying Cultural Heritage

Solutions & ExercisesWho are the two (other) one-time owners?

• Since people and institutions (and places) are

??

• Since people and institutions (and places) are treated as are concepts, the names of the former owners are attached using skos:prefLabel

• Exercise: if you didn’t already, include the names in your query results

Page 12: Querying Cultural Heritage

Solutions & ExercisesIf you didn’t already, include the names in your query results:

Question:Why are we back at two answers?

Page 13: Querying Cultural Heritage

Answer

• Answer:– Just as we can add triples together to make a

graph in RDF, so we can add triple patterns together in SPARQL to make a graph pattern

– By default all triple patterns must be matched, – By default all triple patterns must be matched, but we can use the OPTIONAL {} pattern to allow variation

• Exercise:– Query for the owners and their names, if they

exist*

* N.B. this bug in the BM data will be fixed soon

Page 14: Querying Cultural Heritage

Solution

Page 15: Querying Cultural Heritage

Exercise

• Take a look here:

• Exercise: copy and run this query

Page 16: Querying Cultural Heritage

CSV Exercise

• Type:

• Observe that one can now paste the query including line breaks*including line breaks*

• Type:

* N.B. for now you should first replace the "s with 's and change the one occurrence of ecrm: with crm: - we’ll fix this

* N.B. currently the query needs to be simplified as the BBC data is not loaded – this will be available soon

Page 17: Querying Cultural Heritage

Data Analysis

• One can import this CSV file into many tools:– A spreadsheet can be a good way to carry out

basic visualisations– A scripting environment like (i)python/scipy or

R can allow more analysis before visualisation, but:

• both languages also have libraries to encapsulate interaction via SPARQL (rdflib/sparqlwrapper and SPARQL/RCurl respectively)

• one should decide whether more analysis should first be carried out using SPARQL…

Page 18: Querying Cultural Heritage

Exercise

• If you haven’t so far, click on one of the (HotW) 100 Objects (such as number 70, Hoa Hakananai'a Easter Island Statue) having run the main queryhaving run the main query

• Choose a material and observe the query for other objects in this material

• Adapt this query to count how many BM objects are made from basalt

Page 19: Querying Cultural Heritage

Solution & Exercise

• Exercise: Now count the ‘top ten’ materials and the number of objects for each

Page 20: Querying Cultural Heritage

Solution

Page 21: Querying Cultural Heritage

A Last Word

• SPARQLing a ‘native RDF’ database (often called a ‘triplestore’) is not the only option before defaulting to programming

• A ‘native graph’ database indexes the • A ‘native graph’ database indexes the graph in a different way, supporting traversal-oriented queries

Page 22: Querying Cultural Heritage

Exercise

Double click

Page 23: Querying Cultural Heritage

Exercise

Double click