IGELU Conference 2014 Stephen Gillespie RMIT University Library.
Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief...
-
Upload
jameson-lucey -
Category
Documents
-
view
216 -
download
0
Transcript of Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief...
![Page 1: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/1.jpg)
Natural Language Processing for LODLAM
Presented at IGeLU 2014by Corey A Harper2014-09-16
A brief intro to machine learning & data science
for Libraries
![Page 2: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/2.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Context
Narrative
Story telling
The Library's story,
and the Archives story,
but alsoβ¦
![Page 3: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/3.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Usersβ stories
Scholars' stories
Adding context through recombinant metadata
![Page 4: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/4.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Scholars & Users Stories β Tim Sherratt (@wragge)
Also: http://discontents.com.au/a-map-and-some-pins-open-data-and-unlimited-horizons/
![Page 5: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/5.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Library Authority Data
βInclude links to other URIs. so that they can discover more things.β
Short of providing and linking to URIs, this *is* authority data.
This is what our authority files are for.
![Page 6: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/6.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Linked data is about context
authorities provide context
and yet our controlled vocabs
are nearly gone
because the interfaces to them were broken
![Page 7: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/7.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
![Page 8: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/8.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
![Page 9: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/9.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
The Death of Browse
β’ Next-Gen Discovery Systems don't make use of Authority Control
β’ βBrowseβ was/is broken as a UI Design
β’ Rich data in Authorities, disconnected from narrative, context, search
β’ Richer βAuthorityβ type data outside libraries...
β’ βNext Gen Next Gen Discoveryβ¦
![Page 10: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/10.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
![Page 11: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/11.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
![Page 12: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/12.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
![Page 13: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/13.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
![Page 14: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/14.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Fuzzy Wuzzy β Seat GeekF
uzzy Wuzzy β A
wesom
e Library from S
eatGeek
https://github.com/seatgeek/fuzzyw
uzzyh
ttp://se
atg
ee
k.com
/blo
g/d
ev/fu
zzywu
zzy-fuzzy-strin
g-m
atch
ing
-in-p
ytho
n
![Page 15: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/15.jpg)
Slide courtesy of Doug Oard Univ. of Maryland
![Page 16: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/16.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Tools - Natural Language Processing
β’ DBPedia Spotlighthttps://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki
β’ Zemanta: http://www.zemanta.com/?wpst=1
β’ Open Calais: http://www.opencalais.com/
β’ Open Refine: http://openrefine.org/
β’ DataTXT: https://dandelion.eu/products/datatxt/
β’ AlchemyAPI: http://www.alchemyapi.com/
β’ FuzzyWuzzy: https://github.com/seatgeek/fuzzywuzzy
![Page 17: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/17.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
![Page 18: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/18.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
![Page 19: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/19.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Where does this lead?
We need new interfaces
new tools
for new kind of catalogers
for knowledge organization experts
![Page 20: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/20.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Linked Jazz Back End
![Page 21: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/21.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Primo PNX and Authorities
β’ Indexing Cross References
β’ New Browse Functionality
β’ Authority Control from Aleph / Almaβ’ What about non-MARC, or non-
Aleph Data?
β’ Matching Strings to Authorities
![Page 22: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/22.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Enter Open Refinehttp://freeyourm
etadata.org/
![Page 23: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/23.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Match strings to vocabulariesβ¦
![Page 24: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/24.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Like LCNAFβ¦
![Page 25: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/25.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Or Wikipedia
![Page 26: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/26.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Automated Authority Control?
![Page 27: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/27.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
![Page 28: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/28.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Open Refine RDF Skeleton
![Page 29: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/29.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
![Page 30: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/30.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Proposed System Architecture
![Page 31: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/31.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Hydra Modeling & Architecture
β’ Approaches to Provenanceβ’ Prov-O
β’ Named Graphs
β’ Named Datastreams
β’ βnβ nyucore βrecordsββ’ Same properties defined for each
β’ Keep data sources separate
β’ Merge for display in Blacklight & export to Primo
![Page 32: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/32.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Separate Metadata Datastreams
β’ source_metadata, enrich_metadataβ’ Reload one or both without affecting other
or native metadata
β’ native_metadataβ’ Edited only through Hydra UIβ’ Partitioned from external sources
![Page 33: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/33.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Metadata Provenance
![Page 34: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/34.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Fedora Datastreams
![Page 35: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/35.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Blacklight User Interface
![Page 36: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/36.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Where does this lead?
We need new interfaces
new tools
for new kind of catalogers
for knowledge organization experts
![Page 37: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/37.jpg)
A Role for Ex Libris
β’ Alma &/or Primoβ’ Named Entity Recognition
β’ Vocabulary Reconciliation
β’ Provenance Management
β’ Primo Centralβ’ Named Entity Recognition on Full Text
β’ Auto Classification
![Page 38: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/38.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
A bit louder...
we need new interfaces
we need enterprise tools
Integrated into our metadata management systems
for new kind of catalogers
for knowledge organization experts
![Page 39: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/39.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Simplified Workflow Proposal
![Page 40: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/40.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
More Tools β At Programming Level
β’ Open NLP: https://opennlp.apache.org/
β’ Stanford Natural Language Toolkit: http://nlp.stanford.edu/software/index.shtml
β’ Python Tools β’ SciKitLearn, Pandas, NLTK, SciPi, NumPiβ’ https://www.kaggle.com/wiki/GettingStartedWithPythonForDataScience
β’ http://pandas.pydata.org/
β’ http://www.nltk.org/
![Page 41: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/41.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
More Data Science-ey Toolshttp://w
ww
.rexeranalytics.com/D
ata-Miner-S
urvey-Results-2013.htm
l
![Page 42: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/42.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Data Science Techniques
β’ Feature Extraction / Feature Engineering
β’ Predictive Modeling
β’ Probabilistic Classification β Large Multi-Class Problems
β’ Text Analyticsβ’ Vectorization
β’ Bags & Sets of Words
β’ TF/IDF
β’ N-Grams
β’ Sparse Matrices
![Page 43: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/43.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Simple Example β Predict Yelp Star Ratings
![Page 44: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/44.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Fitting a Model β NaΓ―ve Bayes
![Page 45: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/45.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Data Science Venn Diagramhttp://drew
conway.com
/zia/2013/3/26/the-data-science-venn-diagram
![Page 46: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/46.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
1+ lnπππ‘ππ π·πππ’ππππ‘πΆππ’ππ‘
π·πππ’ππππ‘π πΆπππ‘ππππππππππ
http://www.amazon.com/Data-Science-Business-data-analytic-thinking/dp/1449361323
![Page 47: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/47.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Where can we go from here?
β’ NER is just the beginning
β’ Feature Engineering
β’ Hiring Statisticians
β’ Clustering & Classification
β’ Vocabulary Pruning and Engineeringβ’ Manageable 10-20k Class Text Classification Problems
β’ Domain Specific
β’ Ex Librisβ Activity in this space
![Page 48: Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062417/551c54a3550346b1458b4eee/html5/thumbnails/48.jpg)
Harper β IGeLU β NLP 4 LODLAM β Sept 16, 2014
Thanks!
212.998.2479
@chrpr