Code4lib Digital Content Integrated with ILS Data for User Discovery: Lessons Learned

19
Digital Content Integrated with ILS Data for User Discovery: Lessons Learned (A Real Use Case for RDF!) C ODE 4L IB 2015 P ORTLAND , OR F EBRUARY 11 Naomi Dushay / [email protected] Laney McGlohon / [email protected]

Transcript of Code4lib Digital Content Integrated with ILS Data for User Discovery: Lessons Learned

Digital Content Integrated with ILS Data for User Discovery:

Lessons Learned

(A Real Use Case for RDF!)C O D E4 L I B 2 0 1 5

P O RT L A N D , O R

F E B R U A RY 11

Naomi Dushay / [email protected]

Laney McGlohon / [email protected]

Indexing

Code Solr Master

Index

ILS DOR

XML

Data

Harvestor

Indexing Code

Course

ReservesMARC

Data Sources

4

Digital Collections Have Items

Item

MODS

Collection

MODS

Item

MODS

Collection

MARC

Item

MODS

Collection

MARC

Item

MARC

To merge or not to merge?

Our Data is Ugly (shhhhh ….)

We Have Dups in ILS data• Data isn’t “work” based

• Catalog card data conversion

We Don’t Store Every Field in SearchWorks Solr• Search performance, storage

• Can’t use atomic updates to Solr doc (only fields changed)

• Have to recreate Solr doc from MARC, with same code

• Digital Content Workflow Predated SearchWorks

• Multiple Workflows

• Poor Metadata QA

https://flic.kr/p/bQJRaZ

Indexing

Code Solr Master

Index

ILS DOR

XML

Data

Harvestor

Indexing Code

Course

ReservesMARC

Coll Rec In|Outside ILS, Items Outside the ILS

https://flic.kr/p/7Q5RuB

Indexing

Code

Solr Master

Index

ILS DOR

XML

Data

Harvestor

Indexing Code

Course

ReservesMARC

Merge Manager

App to Manage Multiple Sources

App to Manage Multiple Sources

Fail.

Why?

• Performance

• Storing Solr document pieces (in a database)

• Adding non-trivial app for writes to Solr

• Complexity

Indexing

Code Solr Master

Index

ILS DOR

XML

Data

Harvestor

Indexing Code

Course

ReservesMARC

Coll Rec In|Outside ILS, Items In|Outside the ILS

Fail.

Why?

• ILS updates and steps on digital work info

• ILS feed granularity issues

• Repeated Re-indexing

Coll Rec In|Outside ILS, Items In|Outside the ILS

Solr Atomic Updates

We Don’t Store Every Field in SearchWorks Solr – Currently• Search performance, storage

• Can’t use atomic updates to Solr doc (only fields changed)

• Have to recreate Solr doc from MARC, with same code

Would Require:

• Performance Testing / Load Testing

• Changes to stable ILS MARC Solr code

• Changes to stable ILS course reserve Solr code

• Changes to digital content metadata Solr code

Indexing

Code Solr Master

Index

ILS DOR

XML

Data

Harvestor

Indexing Code

Course

ReservesMARC

Let There Be (More) Dups!

Automagically Amend ILS MARC

ILS DOR

Course

ReservesMARC

Solr Master

Index

Indexing

Code + UI code

StanfordSync & Embeddable Digital Objects

Thank You!

February 11th, 2015

[email protected]

[email protected]