HTRC Workshop 101 THATCamp Gainesville April 24, 2014.

Post on 31-Dec-2015

216 views 1 download

Tags:

Transcript of HTRC Workshop 101 THATCamp Gainesville April 24, 2014.

HTRC Workshop 101

THATCamp GainesvilleApril 24, 2014

Outline

• HathiTrust and HathiTrust Research Center overview

• How to Use the HTRC Portal– Workset Builder– Algorithm Analysis

• Opportunities to connect you with the HathiTrust Research Center

HathiTrust “Wow” Numbers

• 11,135,776 total volumes• 5,801,121 book titles• 290,893 serial titles• 3,897,521,600 pages• 499 terabytes• 132 miles• 9,048 tons• Public Domain: 3,743,574 volumes(~34% of

total)http://www.hathitrust.org

Content Distribution

Dates

Language Distribution

The top 10 languages make up ~86% of all content

Board of Governors Executive Committee Executive Director

HathiTrustDigital Library

90+ partners

HathiTrustDigital Library

90+ partners

University of

Illinois

University of

Illinois

IndianaUniversity

IndianaUniversity

HathiTrustResearch

Center

HathiTrustResearch

Center

University of

Michigan

University of

Michigan

Data Copy

#1

Data Copy

#1

Data Copy

#2

Data Copy

#2

IndianaUniversity

IndianaUniversity

HathiTrust Collection Builder

HTRC Portal

www.hathitrust.org/htrc

Log in to HTRC Portal

Create a Log In

How To Start a Workset

Log In Again to Workset Builder

Workset Builder

Why Worksets?

• The result of a first-level, rough filter

• Better scale for intensive analytics

• Provides essential scope for certain analytics– Word frequency scope over Bacon’s essays

• Some tools (are trained to) work best on a narrow, homogeneous work-set

• Eliminate noise that would otherwise arise by asking questions across whole of HT

Workset Search

Select Items

Create Worksets

Analysis in the HTRC Portal

Choose Algorithm

Choose Collection(s) for Analysis

Run the Analysis…

Results!

View Results

Looking into the future

• Non-consumptive research on copyrighted texts

• Bookworm tool development: http://sandbox.htrc.illinois.edu/bookworm/

• Improvement of metadata through Workset Creation for Scholarly Analysis (WCSA) study

• Documentation and user guides forthcoming soon

Acknowledgements: HTRC Team

• HTRC @ Illinois (GSLIS and the University Library):

Stephen Downie, Tim Cole, Loretta Auvil, Sayan Bhattacharyya, Boris Capitanu, Colleen Fallaw, Katrina Fenlon, Harriett Green, Peter Organisciak, Megan Senseney, Craig Willis

• Indiana University: led by Beth Plale

Get Involved!

HTRC Announcements:htrc-announce-l @ list.indiana.edu

HTRC User Group:htrc-usergroup-l @ list.indiana.edu

Questions?

Harriett GreenEnglish and Digital Humanities Librarian

University of Illinois at Urbana-Champaign

green19@illinois.edu