Connections:

42
Connections: Piloting linked data to connect library and archive resources to the new world of data, and staff to new skills Laura Akerman Metadata Librarian Robert W. Woodruff Library Emory University Zheng (John) Wang AUL, Digital Access, Resources, and IT Hesburgh Library Notre Dame University

description

Zheng (John) Wang AUL, Digital Access, Resources, and IT Hesburgh Library Notre Dame University. Connections: Piloting linked data to connect library and archive resources to the new world of data, and staff to new skills. Laura Akerman Metadata Librarian Robert W. Woodruff Library - PowerPoint PPT Presentation

Transcript of Connections:

Page 1: Connections:

Connections: Piloting linked data to connect library and

archive resources to the new world of data, and staff to new skills

Laura AkermanMetadata Librarian

Robert W. Woodruff LibraryEmory University

Zheng (John) WangAUL, Digital Access, Resources, and ITHesburgh Library

Notre Dame University

Page 2: Connections:

Who has presented most frequently at CNI?

Page 3: Connections:

Current Model: Search and Discover

Page 4: Connections:
Page 5: Connections:

Metadata Published as Documents

Page 6: Connections:

Require Human to Decipher

Page 7: Connections:

Linked Data Model: Find

Page 8: Connections:

Semantic Graph Model

Page 9: Connections:

Machine Understands Semantics

Page 10: Connections:

RDF Triple

Subject ObjectPredicate

Page 11: Connections:

RDF Triple

Laura ConnectionsLecture

Page 12: Connections:

RDF Triples

Laura ConnectionsLecture

CNI

Pla

ce

John

Kno

w2012

Year

Page 13: Connections:

Reuse, Authority Control, Knowledging Linking...

Relevant to What We Do

Page 14: Connections:

Connections Pilot

To Interlink EAD, Catalog, and Other External Resources

Page 15: Connections:

Connections: Context

Little Time to Learn Additional New Things

Page 16: Connections:

Hands-on learning

Page 17: Connections:

Ingredients• Leader/teacher/evangelist• Learning group – open to all

o 2 "classes" a month, 5 months. • Pilot: 3 months

o Brainstorming a pilot projecto Start small o Team: programmer, subject liaison, metadata

specialists, archivist, digital curator, fellow. o 1-3 hrs/week for all but leadero A sandbox running Linux

Page 18: Connections:

The Pilot:Grand Ambitions

Page 19: Connections:

Maps

Our Own Triplestore

RDF from EAD

RDF from TEI

RDF from MARCXML (and MARC)

Data from other archives CW150

Other data

Timelines

User interface Navigation

DBPedia

id.loc.gov

Integrate linked data into discovery layer (catalog)?

SPARQL

Civil War

Redesign metadata creation as RDF

Faculty project

National Park Service Data

Rosters

Crowdsourcing

Page 20: Connections:

3 months later...

Page 21: Connections:

Sampling little bites of the meal:

Visualization – Simile Welkin

EAD (starting from ArchiveHub stylesheet

Sesame triplestore

MARCXML (starting from LC DC stylesheet)

id.loc.gov URIs for LC subjects and names (scripted)

DBPedia/subjects (by hand)

Make some RDF metadata

Page 22: Connections:

HTTP:OurResourceURL

HasSubject"Mobley, Thomas"

Page 23: Connections:

HTTP:OurResourceURL

HasSubject rdfs:resource HTTP://OurPersonMobleyT1rdfs:label""Mobley, Thomas"

Page 24: Connections:

hasSubject

HTTP:OurPersonMobleyT1

memberOf

Confederate States of America. Army. Georgia Infantry Regiment, 48th

Page 25: Connections:

hasSubject

HTTP:Our Mobley Tom1

memberOf

48th Georgia Infantry http://id.loc.gov/authorities/names/n99264720

hasSubject

sameAs

DBPedia:http://dbpedia.org/page/48th_Georgia_Volunteer_Infantry

Page 26: Connections:

Confederate miscellany collection, 1860-1865

isPartOf

heldBy

Page 27: Connections:

We learned:

Selecting material that will “link up” without SPARQL, is too hard!

Even when items are in a unified “discovery layer”, the types of search are limited.

Get it into triples, then find out!

Page 28: Connections:

We learned:

There are many ways of modeling data

• No one model to follow has emerged. We have to think about this ourselves.

Page 29: Connections:

ArchivesHub handles subjects:<associatedWith><!--About the Concept (Person)--><skos:Concept xmlns:skos="http://www.w3.org/2004/02/skos/core#" rdf:about="http://duchamp.library.emory.edu/resource/id/concept/person/lcnaf/gearyjohnwhite1819-1873">

<rdfs:label xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:lang="en">Geary, John White, 1819-1873.</rdfs:label>

<skos:inScheme> <skos:ConceptScheme rdf:about="http://duchamp.library.emory.edu/resource/id/conceptscheme/lcnaf"> <rdfs:label xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:lang="en">lcnaf</rdfs:label> </skos:ConceptScheme> </skos:inScheme> <foaf:focus xmlns:foaf="http://xmlns.com/foaf/0.1/"><!--About the Person--><foaf:Person rdf:about="http://duchamp.library.emory.edu/resource/id/person/lcnaf/gearyjohnwhite1819-1873"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Agent"/> <rdf:type rdf:resource="http://purl.org/dc/terms/Agent"/> <rdf:type rdf:resource="http://erlangen-crm.org/current/E21_Person"/> <rdfs:label xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:lang="en">Geary, John White, 1819-1873.</rdfs:label> </foaf:Person> </foaf:focus> </skos:Concept> </associatedWith>

Page 30: Connections:

LC's MARCXML to RDF/Dublin Core:

dc:subject "Geary, John White, 1819-1873."

Page 31: Connections:

Simile MARC to MODS to RDF:<modsrdf:subject rdf:resource= "http://simile.mit.edu/2006/01/Entity#Geary_John_White_18191873"/> <rdf:Description rdf:about= "http://simile.mit.edu/2006/01/Entity#Geary_John_White_18191873"> <rdf:type rdf:resource= "http://simile.mit.edu/2006/01/ontologies/mods3#Person"/> <modsrdf:fullName>Geary, John White </modsrdf:fullName> <modsrdf:dates>1819-1873</modsrdf:dates </rdf:Description>

Page 32: Connections:

Linked data is HUGE It’s coming at us FASTIt’s not “cooked” yet

We learned:

Page 33: Connections:

More learnings

• We learned more by doing than by "class".

• Making DBPedia mappings or links by hand is very time consuming! We need better tools.

• We need to spend a lot more time learning about OWL, and linked data modeling.

Page 34: Connections:

Challenges

• Easily available tools are not ideal!• Skills we needed more of: HTML5, CSS,

Javascript• Time! • Visualization/killer app not there yet.• Can't do things without the data! No timeline

if no dates!

Page 35: Connections:

What we got out of it

Test triplestore for training and more development

Better ideas on what to pilot nextConvinced some doubters"Gut knowledge“ about triples, SPARQL, scaleBeginning to realize how this can be so much more than a better way to provide "search"

Page 36: Connections:

Outside our reach for now

Transform ILS system to use triple store instead of MARC

Create hub of all data our researchers might wantMake a bank of shared transformations for EAD,

MARC, etc. Shared vocabulary mappings Social/networking aspect (e.g. Vivo, OpenSocial...)

- need a culture shift?

Page 37: Connections:

Next? Maybe...

Build user navigation?More Civil War triples including other local institutions’ stuff?Publishing plan?Integrate ILS with DBPedia links?Suite of “portal tools” for scholars?Use linked data for crowdsourcing metadata?More classes?Connect with others at Emory around linked data

Page 38: Connections:

Recommendation: Individual Institutions

• Focus on unique digital content• Publish unique triples• Reuse existing linked data

Page 39: Connections:

Recommendation: Community

• Create standards or best practices

• Grow our skills• Test and evaluate tools• Develop tools

Page 40: Connections:

Recommendation: Librarians’ Role?

• Interdisciplinary linking? • Metadata librarians - Linking association and

normalization

Page 41: Connections:

Acknowledgements

Connections group sponsors: Lars Meyer, John EllingerConnections Pilot team: Laura Akerman (leader), Tim

Bryson, Kim Durante, Kyle Fenton, Bernardo Gomez, Elizabeth Roke, John Wang

Fellows who joined us: Jong Hwan Lee, Bethany NashOur website:

https://scholarblogs.emory.edu/connections/ Laura Akerman, [email protected] Wang, [email protected]

Page 42: Connections:

Thanks

Q&A