Code4lib Digital Content Integrated with ILS Data for User Discovery: Lessons Learned
-
Upload
laney-mcglohon -
Category
Technology
-
view
399 -
download
0
Transcript of Code4lib Digital Content Integrated with ILS Data for User Discovery: Lessons Learned
Digital Content Integrated with ILS Data for User Discovery:
Lessons Learned
(A Real Use Case for RDF!)C O D E4 L I B 2 0 1 5
P O RT L A N D , O R
F E B R U A RY 11
Naomi Dushay / [email protected]
Laney McGlohon / [email protected]
Indexing
Code Solr Master
Index
ILS DOR
XML
Data
Harvestor
Indexing Code
Course
ReservesMARC
Data Sources
4
Digital Collections Have Items
Item
MODS
Collection
MODS
Item
MODS
Collection
MARC
Item
MODS
Collection
MARC
Item
MARC
Our Data is Ugly (shhhhh ….)
We Have Dups in ILS data• Data isn’t “work” based
• Catalog card data conversion
We Don’t Store Every Field in SearchWorks Solr• Search performance, storage
• Can’t use atomic updates to Solr doc (only fields changed)
• Have to recreate Solr doc from MARC, with same code
• Digital Content Workflow Predated SearchWorks
• Multiple Workflows
• Poor Metadata QA
Indexing
Code Solr Master
Index
ILS DOR
XML
Data
Harvestor
Indexing Code
Course
ReservesMARC
Coll Rec In|Outside ILS, Items Outside the ILS
Indexing
Code
Solr Master
Index
ILS DOR
XML
Data
Harvestor
Indexing Code
Course
ReservesMARC
Merge Manager
App to Manage Multiple Sources
App to Manage Multiple Sources
Fail.
Why?
• Performance
• Storing Solr document pieces (in a database)
• Adding non-trivial app for writes to Solr
• Complexity
Indexing
Code Solr Master
Index
ILS DOR
XML
Data
Harvestor
Indexing Code
Course
ReservesMARC
Coll Rec In|Outside ILS, Items In|Outside the ILS
Fail.
Why?
• ILS updates and steps on digital work info
• ILS feed granularity issues
• Repeated Re-indexing
Coll Rec In|Outside ILS, Items In|Outside the ILS
Solr Atomic Updates
We Don’t Store Every Field in SearchWorks Solr – Currently• Search performance, storage
• Can’t use atomic updates to Solr doc (only fields changed)
• Have to recreate Solr doc from MARC, with same code
Would Require:
• Performance Testing / Load Testing
• Changes to stable ILS MARC Solr code
• Changes to stable ILS course reserve Solr code
• Changes to digital content metadata Solr code
Indexing
Code Solr Master
Index
ILS DOR
XML
Data
Harvestor
Indexing Code
Course
ReservesMARC
Let There Be (More) Dups!