NYPL Labs 9-10-13 HacksHackers Presentation
-
Upload
david-riordan -
Category
Documents
-
view
90 -
download
0
description
Transcript of NYPL Labs 9-10-13 HacksHackers Presentation
The Great Data Migration
or... hackin’ the library with nypl labs
9/10/13
a What is NYPL Labs?
Ben Vershbow | Founder & Manager - NYPL [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
a New York Public Library
a New York Public Library
existing patron base
a New York Public Library
existing patron base+ global community of users
a New York Public Library
free for all to use
a New York Public Library
free for all to use+ hack / build / improve
a New York Public Library
books, archives, images, documents, A/V etc.
a New York Public Library
+ digital material, data & APIsbooks, archives, images, documents, A/V etc.
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
Map Warpermaps.nypl.org
a
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
a
What’s on the Menu?menus.nypl.org
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
a
Stereogranimatorstereo.nypl.org
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
NYPL NYPLBPL↑ ↑ ↑
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
+
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
a
Direct Me NYC: 1940directme.nypl.org
x
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
NYPL collec-ons
Genealogy community
NYPL collec-ons
Genealogy community
NYPL collec-ons
U.S. Geological Survey
Genealogy community
NYPL collec-ons
U.S. Geological Survey OpenStreetMap (via MapBox)
Genealogy community
NYPL collec-ons
U.S. Geological Survey OpenStreetMap (via MapBox)
New York Times API
Genealogy community
NYPL collec-ons
U.S. Geological Survey OpenStreetMap (via MapBox)
New York Times API
NYPL users & staff
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
@nypl_labs | #HacksHackers | Ben Vershbow | [email protected] | @subsublibrary
Textapi.repo.nypl.org
a
Crowd-sourcing the transcription of historical theater programs
Ensembleensemble.nypl.org
Paul [email protected] | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
FromThePage / Transcribe Bentham
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
Scripto
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
T-PEN
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
Freeform text transcription is not complex entity extraction
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
Crowd sourcing complex entity extraction of documents with inconsistent layouts
e.g. historical theater programs
Ensemble
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
NYPL Labs | What’s on the Menu?
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
NYPL Labs | What’s on the Menu?
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
Transcribable & DocumentCloud
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
Zooniverse | Notes From Nature
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
Zooniverse | Old Weather
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
NYPL Labs | Ensemble
http://ensemble.nypl.org
Built from Scribehttps://github.com/zooniverse/scribe
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
demo
FromThePage beta.fromthepage.com
T-PEN t-pen.org
Whats’ on the Menu? menus.nypl.org
Transcribable github.com/propublica/transcribable
Notes from Nature notesfromnature.org
Old Weather oldweather.org
Ensemble ensemble.nypl.org
@nypl_labs | #HacksHackers | Ensemble | Paul Beaudoin | [email protected] | @nonword
a
Archives & Manuscripts
archives.nypl.org
Trevor [email protected] | @trevorthornton
Matt [email protected] | @thisismmiller
a
Archives & Manuscripts
archives.nypl.org
Trevor [email protected] | @trevorthornton
Matt [email protected] | @thisismmiller
or: where to find Timothy Leary’s Powerglove
Unique, unpublished materials: correspondence, personal papers, organizational records, literary manuscripts, AV documentation, electronic records
Typically included within discrete collections, which are often acquired in whole
Finding aids provide researchers with guidance on collection contents
EAD (Encoded Archival Description)XML schema for encoding finding aids
NYPL Archives & Manuscripts
@nypl_labs | #HacksHackers | Archives & Manuscripts | Trevor Thornton | [email protected] | @trevorthorntonMatt Miller | [email protected] | @thisismmiller
The traditional model for presenting EAD-encoded finding
aids
@nypl_labs | #HacksHackers | Archives & Manuscripts | Trevor Thornton | [email protected] | @trevorthorntonMatt Miller | [email protected] | @thisismmiller
What we did (more or less)
@nypl_labs | #HacksHackers | Archives & Manuscripts | Trevor Thornton | [email protected] | @trevorthorntonMatt Miller | [email protected] | @thisismmiller
System overview
@nypl_labs | #HacksHackers | Archives & Manuscripts | Trevor Thornton | [email protected] | @trevorthorntonMatt Miller | [email protected] | @thisismmiller
a
Video Annotation & Synchronization
digitalcollections.nypl.org/tools/video/compose
For NYPL Digital CollectionsJerome Robbins Dance Division
Brian Foo | [email protected] | @beefoo
ScenariosJerome Robbins Dance Division
Enhance & Improve video data• e.g. Sync multiple angles of the same performance• e.g. Annotate a performance
Discovery• e.g. Compare multiple performances
Instruction• e.g. Enhance lecture with multimedia
Probably many more• e.g. Mash-ups
@nypl_labs | #HacksHackers | Video Annotation & Synchronization | Brian Foo | [email protected] | @beefoo
Technology Used
RoR - Backend Framework
Backbone.js - Javascript MVC Framework
Brightcove - Video delivery platform
Popcorn.js - HTML5 media framework by Mozilla• Does not natively support multi-video• Does not natively support Brightcove
@nypl_labs | #HacksHackers | Video Annotation & Synchronization | Brian Foo | [email protected] | @beefoo
demo
ateh vectorizor
github.com/NYPL/map-vectorizer
mauricio giraldo arteagaNYPL Labs
[email protected] | @mgiraldo
background
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
building =
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
not paper
building =
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
not paper
not black
building =
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
not paper
not black
> 20m2 (~180ft2)
building =
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
not paper
not black
> 20m2 (~180ft2)
< 3,000m2 (~27,000ft2)
building =
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
not paper
not black
> 20m2 (~180ft2)
< 3,000m2 (~27,000ft2)
+ attributes (color, dots, crosses...)
building =
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
process
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
https://github.com/NYPL/map-vectorizer
test it! (please)
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
gdal_polygonize.pygenerates polygons automagically!
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
$ gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
$ gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
we need to optimize the input
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
we need to simplify the output
(for those polygons that we care about)
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
pts = spsample(polygon, n=1000, type="hexagonal")
pts = spsample(polygon, n=1000, type="hexagonal")pts = spsample(polygon, n=1000, type="regular")
pts = spsample(polygon, n=1000, type="hexagonal")pts = spsample(polygon, n=1000, type="regular")pts = spsample(polygon, n=1000, type="random")
pts = spsample(polygon, n=1000, type="hexagonal")pts = spsample(polygon, n=1000, type="regular")pts = spsample(polygon, n=1000, type="random")pts = spsample(polygon, n=500, type="hexagonal")
x.as = ashape(pts@coords,alpha=2.0)
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
we need to validate the output
(polygonzo!)
@nypl_labs | #HacksHackers | teh vectorizor | Mauricio Giraldo Arteaga | [email protected] | @mgiraldo
demo
a Old NYCDan Vanderkam
SOME OTHER [email protected] | @danvdk
~40,000 images
Mostly taken from 1920–1950
Many were taken by Percy Loomis Sperr, who was commissioned by the library to take photographs of buildings soon to be demolished
Milstein Collection
@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | [email protected] | @danvdk
demo
Images on the NYPL site were small, pictures even smaller.
What’s MrSID?
Challenges
@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | [email protected] | @danvdk
First find the areas that aren’t brown:
@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | [email protected] | @danvdk
Then find the Rectangles:
@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | [email protected] | @danvdk
UI work
Better geocoding for boroughs with complicated streets
Keep your eyes out for an Old NYC launch this fall!
http://www.danvk.org/wp/2013-02-09/finding-pictures-in-pictures/
What’s left?
@nypl_labs | #HacksHackers | Old NYC | Dan Vanderkam | [email protected] | @danvdk
this thing we’re doing is way too big to do alone
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
this used to be a reservoir of water, now its a reservoir of knowledge
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
–an anonymous nypl docent
now its a reservoir of data
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
now its time to use it
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
datasetsMaps (GIS + GeoTIFFs) | Digital Collections API |
Menus API | City Directories | Archives | Ensemble API
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
there will be more
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
HackathonsPublishing Hackathon | Maphack
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
there will be more
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
NYPL Tech Challenges(coming soon)
like the x-prize but for way lower stakes and civic good
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
Questions for you:
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
what kind of things would you want to work on with
nypl labs?
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
making ebooks easier to borrow?
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
opening up historical social networks?
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
we want to know what questions you’re
interested in
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
how you want to use the library today...
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
...will be how everyone will use the library very
soon.
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
help us make that happen
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
its gonna be awesome
@nypl_labs | #HacksHackers | What’s Next | Dave Riordan | [email protected] | @riordan
@nypl_labs | @subsublibrary | @nonword | @beefoo @trevorthornton | @thisismattmiller | @mgiraldo | @riordan
Special thanks to: Chrys Wu