Taming the Wilde
-
Upload
charleston-conference -
Category
Education
-
view
89 -
download
2
description
Transcript of Taming the Wilde
Taming the Wilde
Collaborating with Expertise for Faster, Better, Smarter
Collection Analysis
Jackie Bronicki, Collections and Online Resources CoordinatorCherie Turner, Chemical Sciences Librarian
Shawn Vaillancourt, Education Librarian Frederick Young, Systems Analyst
OutlineOutline of Presentation
Research Question 1: What are the best measurements for evaluating the current scope of the collection?
Research Question 2: What subject areas are not adequately covered in the current collection?
Research Questions
• Influenced by the Cornell University Library Print Collection usage report
• No language analysis
• No patron analysis
• Limited formats
Methodology
• 889,825 total monograph items in final dataset
• 425,865 titles that have not circulated (48%)
• 787,590 titles circulated 5 or fewer times (88%)
• 861,910 titles that have not circulated in the last year (97%)
Results
A B C D E F G H J K L M N P Q R S T U V Z0
50000
100000
150000
200000
250000
Distribution by LC Class
𝑃𝐸𝑈=𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑈𝑠𝑎𝑔𝑒
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑜𝑓 𝐻𝑜𝑙𝑑𝑖𝑛𝑔𝑠
𝑃𝐸𝑈 𝐵=1.43%1.32%
=1.08
1.32%
1.43%
If PEU>1 OverusedIf PEU<1 Underused
𝑅𝐵𝐻=𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑜𝑓 𝐼𝐿𝐿𝐵𝑜𝑟𝑟𝑜𝑤𝑖𝑛𝑔
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑜𝑓 𝐻𝑜𝑙𝑑𝑖𝑛𝑔𝑠%
𝑅𝐵𝐻𝐵=0.79%1.43%
=0.6
Mean RBH=1.54±5.18If RBH>Mean RBH OverusedIf RBH<Mean RBH Underused
Comparing Circulation to ILL Usage
LC Subclass
Percent of Holdings
Percent Usage PEU
Holdings Usage
Percent of ILL Borrowing RBH ILL Usage
B 1.32% 1.43% 1.08 Overused 0.79% 0.60 UnderusedBC 0.09% 0.08% 0.82 Underused 0.05% 0.51 UnderusedBD 0.24% 0.20% 0.84 Underused 0.24% 1.01 UnderusedBF 1.22% 1.78% 1.46 Overused 2.00% 1.64 OverusedBH 0.07% 0.09% 1.29 Overused 0.05% 0.68 UnderusedBJ 0.22% 0.27% 1.21 Overused 0.18% 0.79 UnderusedBL 0.42% 0.65% 1.56 Overused 0.69% 1.65 OverusedBM 0.10% 0.07% 0.67 Underused 0.09% 0.95 UnderusedBP 0.13% 0.26% 1.95 Overused 0.34% 2.57 OverusedBQ 0.04% 0.10% 2.63 Overused 0.32% 8.05 OverusedBR 0.36% 0.33% 0.91 Underused 0.70% 1.96 OverusedBS 0.22% 0.16% 0.73 Underused 0.36% 1.62 OverusedBT 0.16% 0.13% 0.85 Underused 0.40% 2.53 OverusedBV 0.18% 0.15% 0.86 Underused 0.44% 2.49 OverusedBX 0.52% 0.29% 0.56 Underused 1.69% 3.23 Overused
If PEU>1 OverusedIf PEU<1 Underused
If RBH>Mean RBH OverusedIf RBH<Mean RBH Underused
Mean RBH=1.54±5.18
Comparing Circulation to ILL Usage
LC Subclass Holdings Usage ILL Usage ActionB Overused Underused No ChangesBC Underused Underused Ease OffBD Underused Underused Ease OffBF Overused Overused Growth OpportunityBH Overused Underused No ChangesBJ Overused Underused No ChangesBL Overused Overused Growth OpportunityBM Underused Underused Ease OffBP Overused Overused Growth OpportunityBQ Overused Overused Growth OpportunityBR Underused Overused Change PurchasingBS Underused Overused Change PurchasingBT Underused Overused Change PurchasingBV Underused Overused Change PurchasingBX Underused Overused Change Purchasing
Comparing Circulation to ILL Usage
The More Important Question…..
• Sierra Infrastructure– What data existed where?– Title vs. Item – Call Number
• Defining Input/Output Variables – What we could output (circulation)
• MaRC
• Scope of Project
• Building a proper sample
Initial Challenges – Research Team
Challenges to Possibilities
• Understanding the question
• Does the System Provide an Answer?
• What can we do?
• High Expectations
• Inconsistency of Data– Bad input– Batch overlay– Doesn’t exist
Data Mining Challenges – Research Team
• Scaled Expectations
• Learning curve
• Piecing the Data Together
Data Mining Challenges – Systems Team
Research Question 1: What are the best measurements for evaluating the current scope of the collection?
Research Question 2: What subject areas are not adequately covered in the current collection?
Research Questions
Initial Output Criteria
Bibliographic Record
Call NumberSubject HeadingsPublication/Copyright Date
ISBNRecord NumberTitle
Item Record
Copy NumberTotal Number of CheckoutsStatus
Order Record
Order Date
Final Output Criteria
Bibliographic Record
Item Record
Call NumberTotal CheckoutsLast Year CheckoutsYear to Date Checkouts
Location
Call NumberPublication/Copyright DateRecord NumberTitle
PublisherCatalog DateISBN
• Fields for our analysis– Call Number– Request Date– Filled Date– Format
• Fields for later analysis– Lending Library– Title– Author– Publication Date– Publisher– Language– Library Type– ISBN– OCLC Number
ILL Output Criteria
Except….
• What was MaRC telling us?
• How were fields used?
Got Data?
• ISBN?
• Location: 143,823 records deleted
• Call numbers: 14894 records deleted
Data Cleaning
• Understanding the infrastructure– Order records– Bib records• MaRC
– Item records• Understanding local practice• Experts provide guidance and practical
solutions!
Lessons Learned
Print and Electronic Serials
• Challenges
– Different systems store records– Different kinds of usage information available– Holdings based analysis– Subscription or Subscription + Aggregated– Vendor supplied records
Aguilar, W. (1986). The application of relative use and interlibrary demand in collection development. Collection Management, 8(1), 15-24. Knievel, J. E., Wicht, H., & Connaway, L. S. (2006). Use of circulation statistics and interlibrary loan data in collection management. College & Research Libraries, 67(1), 35-49.. John N. Ochola PhD (2003) Use of circulation statistics andInterlibrary loan data in collection management, Collection Management, 27:1, 1-13,DOI:10.1300/J105v27n01_01 Mills, Terry R. (1982). The University of Illinois Film Center Collection Use Study. http://files.eric.ed.gov/fulltext/ED227821.pdf "Report of the Collection Development Executive Committee Task Force on Print Collection Usage." (2012).Cornell University Library, http://staffweb.edu/system/files/CollectionUsageTF_ReportFinal11-22-10.pdf