Data for Research (DfR) service
-
Upload
historiaimedia -
Category
Education
-
view
367 -
download
2
Transcript of Data for Research (DfR) service
l
JSTOR Advanced Technology Research
Denver 25th January 2008 John Burns Clare Llewellyn
l
Today we will introduce a public beta of our Data for Research service and show you some of the other services that JSTOR’s advanced technology group is working on.
Mission: Working with other researchers on large-scale text and data mining initiatives with an eye toward beneficial applications for scholars and students.
l
What is Data Mining?
“Data mining is the process of extracting hidden patterns from data” Lyman and Varian 2003
“As data sets and the information extracted from them have grown in size and complexity, direct hands-on data analysis has increasingly been supplemented and augmented with indirect, automatic data processing using more complex and sophisticated tools, methods and models”
Kantardizic 2002
Example: Data mining is using consumer purchasing patterns to predict which products are bought together (gas and flights)
l
What is Text Mining?
“In text mining the patterns are extracted from natural language text rather than from structured databases of facts”
Marti Hearst 2003
“Text mining attempts to discover new, previously unknown information by applying techniques from information retrieval, natural language processing and data mining”
National Text Mining Center, UK
Example: Looking at which words co-occur in articles that in order to predict interactions (magnesium and migraines)
l
Advanced Technology at JSTOR
• Why are we here • Who we are • What we are doing
l
Why are we releasing our system here?
Librarians are the point from which innovation is spread throughout the academy
“New roles and functions for librarians include: • information consultants and producers • information gatekeepers and intermediators • end-user educators • managers and leaders • data analysts in data administration centers • preservers of knowledge • information equalizers”
Park 1987
A Data Support Role: “Helping students get their hands dirty with the data”
Robin Rice 2008 2nd DCC / RIN Research Data Management Forum
l
Who we are - Advanced Technology Research
• A formal commitment by JSTOR to a pro-active role in technology innovation to face new challenges and opportunities
• Our MO is to collaborate with and aid the scholarly community • We area team of world-class scientists and technologists with a proven
track record of innovation
Mission Statement
“The Advanced Technology Research Group is dedicated to creating, discovering and using relevant technologies in support of JSTOR and the broader scholarly community.”
l
ATR - Collaborations with the academic community.
For other researchers we provide • Access to large well-curated data sets • An exposure channel on JSTOR for research results • Facilities on JSTOR to expose tools and techniques to users • Collaboration opportunities
For JSTOR • We evaluate novel techniques • We present rapid prototypes to users • Develop peer relationships with research institutions • Bring new forms of traffic to the JSTOR data • Reuse JSTOR data in new and exciting ways
l
What we are doing - Projects and Partners
• University of Washington – Citation Network Analysis • University of Princeton – Topic Analysis • UIUC - Software Environment for the Advancement of Scholarly
Research (SEASR) • University of Michigan – Linguistic tools • Tufts -Classics Studies • University of Liverpool – OAI-ORE, Text Mining, Data Analysis • University of Queensland - Annotations • Los Alamos National Labs – Annotation Management • DFKI (German Artificial Intelligence Centre) – Document capture
and reconstruction / remastering. • XRCE (EuroPARC, France) – Scanned Document Analysis • …
l
Advanced Technology Research - Showcase
Showcase provides a preview of interesting and useful technologies. It allows our research partners to demonstrate their tools and gain feedback and it allows JSTOR to assess candidate technologies before committing them to the product roadmap.
l
Advanced Technology Research - Showcase
A place to expose JSTOR data and tools and to encourage new research
• Provides access to JSTOR datasets • Facility to expose and use tools created by researchers from
JSTOR and elsewhere. • Explanation of ongoing research • As a forum to facilitate connections between groups working with
JSTOR data
URL: http://showcase.jstor.org
l
Data for Research
• DFR is a set of web tools designed to allow for the visual exploration of large-scale data sets and the download of word frequencies in JSTOR articles
• Beta Version launched 01/23/09
• URL: http://dfr.jstor.org
l
Why Word Frequencies
OCR Data
Citation Data
Usage Data
Word Frequency
Data Requested from JSTOR users in 2008
l
What can you do with work counts?
Real life requests:
“I would like to request time and word distribution frequencies in linguistics (specific movement removed). These sorts of frequencies could potentially allow me to better understand and delimit the formation of groups, and the underlying impetus behind these groups as expressed in linguistic form.”
“I would like to create subject headings for material, using word frequency as a guide to selecting the appropriate terms for the headings.”
l
DFR – DEMO!
http://dfr.jstor.org
l
DFR – Front Page
l
Thefe
l
Hath Pre - 1900
l
Hath – post 1900
l
Chymistry
l
Download Page
l
Files Downloaded
l
Chart to show the use of the word Chymistry
0
1
2
3
4
5
6
7
8 16
66
1669
16
72
1675
16
83
1692
16
97
1703
17
12
1738
17
65
1783
18
01
1889
19
07
1916
19
21
1928
19
31
1936
19
41
1945
19
50
1953
19
56
1960
19
64
1967
19
71
1974
19
80
1983
19
87
1990
19
93
1996
19
99
2002
20
05
l
l
3 Journals from 1957
Agricultural History American Journal Nursing The Annals Mathematics
l
Any questions / feedback?
Please take a look at the site and tell us what you think. Email: [email protected]
Contact details Email: [email protected] Phone: 609-986-2282