Delivering Data For New Generations of Research
description
Transcript of Delivering Data For New Generations of Research
![Page 1: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/1.jpg)
HATHI TRUST A Shared Digital Repository
Delivering Data For New Generations of Research
Strategies and ChallengesJeremy York
NISO/BISG ForumALA 2010
![Page 2: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/2.jpg)
Introduction
• Digital Repository– Initial focus on digitized book and journal content– “Light” archive
• Collections and Collaboration– Comprehensive collection– Shared strategies– Local services– Public Good
![Page 3: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/3.jpg)
Content Distribution
6,173,575 – Total1,177,667 – Public Domain
* As of June 15, 2010
![Page 4: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/4.jpg)
Language Distribution (1)
* As of June 15, 2010
![Page 5: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/5.jpg)
Language Distribution (2)The next 40 languages make up ~13% of total
* As of June 15, 2010
![Page 6: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/6.jpg)
Originating Institution
* As of June 15, 2010
![Page 7: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/7.jpg)
Content over time
* As of June 15, 2010
![Page 8: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/8.jpg)
Content Growth
![Page 9: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/9.jpg)
![Page 10: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/10.jpg)
Data Distribution & APIs
• OAI-PMH• Metadata files• Bibliographic API• Data API
![Page 11: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/11.jpg)
Extended Services
• Community Development Environment• Non-Google Ingest• Non-Book/Non-Journal Ingest• Computational Research
![Page 12: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/12.jpg)
Strategies for Computational Research
• Data distribution• Protocol-based access• Research Center
![Page 13: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/13.jpg)
![Page 14: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/14.jpg)
SEASR Architecture
Components Components
Virtualization InfrastructureVirtualization Infrastructure
Meandre InfrastructureMeandre Infrastructure
VisualizationVisualization
Component RepositoryComponent Repository Component DiscoveryComponent Discovery
Meandre Data-Intensive FlowsMeandre Data-Intensive Flows
AppsApps ServicesServicesPluginsPlugins Web AppsWeb Apps
AnalyticsAnalyticsDataData
Dev
elop
er T
ools
Dev
elop
er T
ools
RepositoriesData
AnalysisComponents
Flows
RepositoriesData
AnalysisComponents
Flows
User InterfacesUser Interfaces
Cloud ComputingCloud Computing
VisualizationsVisualizations
Meandre WorkbenchMeandre Workbench
![Page 15: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/15.jpg)
SEASR @ Work – Tag Cloud
• Count tokens
• Filter options supported
• Stem words
![Page 16: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/16.jpg)
SEASR @ Work – Entity Mash-up
• Entity Extraction with OpenNLP or Stanford NER
• Locations viewed on Google Map
• Dates viewed on Simile Timeline
![Page 17: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/17.jpg)
SEASR @ Work – Entities To Network
• Identify entities• Define relationships between entities
within same sentence
![Page 18: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/18.jpg)
SEASR @ Work – Text Clustering
• Clustering of Text by token counts• Filtering options for stop words, Part of
Speech• Dendogram Visualization
![Page 19: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/19.jpg)
SEASR @ Work – Audio Analysis
• NEMA: Executes a SEASR flow for each run
– Loads audio data
– Extracts features for every 10 sec moving window of audio
– Loads and applies the models
– Sends results back to the WebUI
• NESTER: Annotation of Audio via Spectral Analysis
![Page 20: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/20.jpg)
SEASR @ Work – Zotero
• Plugin to Firefox • Zotero manages the
collection• Launch SEASR Analytics
– Citation Analysis uses the JUNG network importance algorithms to rank the authors in the citation network that is exported as RDF data from Zotero to SEASR
– Zotero Export to Fedora through SEASR
– Saves results from SEASR Analytics to a Collection
• Launch MONK Processing– MONK DB Ingestion Workflo
w
![Page 21: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/21.jpg)
SEASR @ Work – Emotion Tracking
Goal is to have this type of Visualization to track emotions across a text document (Leveraging flare.prefuse.org)
![Page 22: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/22.jpg)
Sentiment Analysis: Visualization
![Page 23: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/23.jpg)
Person Extraction:Scott's Waverley, Ivanhoe, and The Heart of Midlothian.
![Page 24: Delivering Data For New Generations of Research](https://reader035.fdocuments.in/reader035/viewer/2022081513/56815196550346895dbfcc22/html5/thumbnails/24.jpg)
Location Extraction:Top: Walter Scott's Waverley Bottom: Maria Edgeworth's Castle Rackrent