Reference Collections: Collection Characteristics.
-
Upload
anabel-montgomery -
Category
Documents
-
view
218 -
download
3
Transcript of Reference Collections: Collection Characteristics.
![Page 1: Reference Collections: Collection Characteristics.](https://reader036.fdocuments.in/reader036/viewer/2022082821/5697bfdf1a28abf838cb25f0/html5/thumbnails/1.jpg)
Reference Collections:Collection Characteristics
![Page 2: Reference Collections: Collection Characteristics.](https://reader036.fdocuments.in/reader036/viewer/2022082821/5697bfdf1a28abf838cb25f0/html5/thumbnails/2.jpg)
CACM Collection
3204 Communications of the ACM articles
Focus of collection: computer science
Structured subfields: – Author names– Date information– Word stems from title and abstract– Categories from hierarchical classification– Direct references between articles– Bibliographic coupling connections– Number of co-citations for each pair of articles
![Page 3: Reference Collections: Collection Characteristics.](https://reader036.fdocuments.in/reader036/viewer/2022082821/5697bfdf1a28abf838cb25f0/html5/thumbnails/3.jpg)
CACM Collection
3204 Communications of the ACM articles
Test information requests:– 52 information requests in natural language
with two Boolean query expressions– Average of 11.4 terms per query– Requests are rather specific with an average
of about 15 relevant documents– Result in relatively low precision and recall
![Page 4: Reference Collections: Collection Characteristics.](https://reader036.fdocuments.in/reader036/viewer/2022082821/5697bfdf1a28abf838cb25f0/html5/thumbnails/4.jpg)
ISI Collection
1460 documents from the Institute of Scientific Information
Focus of collection: information science
Structured subfields: – Author names– Word stems from title and abstract– Number of co-citations for each pair of
articles
![Page 5: Reference Collections: Collection Characteristics.](https://reader036.fdocuments.in/reader036/viewer/2022082821/5697bfdf1a28abf838cb25f0/html5/thumbnails/5.jpg)
ISI Collection
1460 documents from the Institute of Scientific Information
Test information requests:– 35 information requests in natural language
with Boolean query expressions– Average of 8.1 terms per query– 41 information requests in NL without
Boolean query expression– Requests are fairly general with an average
of about 50 relevant documents– Higher precision and recall
![Page 6: Reference Collections: Collection Characteristics.](https://reader036.fdocuments.in/reader036/viewer/2022082821/5697bfdf1a28abf838cb25f0/html5/thumbnails/6.jpg)
Observation
Collection # of Docs # of Terms Terms/Doc
CACM 3204 10446 40.1
ISI 1460 7392 104.9
Number of terms increases slowly with number of documents
![Page 7: Reference Collections: Collection Characteristics.](https://reader036.fdocuments.in/reader036/viewer/2022082821/5697bfdf1a28abf838cb25f0/html5/thumbnails/7.jpg)
Cystic Fibrosis Collection1239 articles with “Cystic Fibrosis” index in
MEDLINEStructured subfields:
– MEDLINE accession number– Author– Title– Source– Major subjects– Minor subjects– Abstract (or extract)– References in the document– Citations to the document
![Page 8: Reference Collections: Collection Characteristics.](https://reader036.fdocuments.in/reader036/viewer/2022082821/5697bfdf1a28abf838cb25f0/html5/thumbnails/8.jpg)
Cystic Fibrosis Collection
1239 articles with “Cystic Fibrosis” index in MEDLINE
Test information requests:– 100 information requests– Relevance assessed by four experts with a
scale of 0 (not relevant), 1 (marginal relevance), and 2 (high relevance)
– Overall relevance is sum (0-8)
![Page 9: Reference Collections: Collection Characteristics.](https://reader036.fdocuments.in/reader036/viewer/2022082821/5697bfdf1a28abf838cb25f0/html5/thumbnails/9.jpg)
Discussion Questions
In developing a search engine:– How would you use metadata (e.g. author,
title, abstract)?– How would you use document structure?– How would you use references, citations,
co-citations?– How would you use hyperlinks?