Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
-
Upload
toine-bogers -
Category
Science
-
view
77 -
download
1
Transcript of Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Tagging vs. Controlled Vocabulary !Which is More Helpful for Book Search?
Toine Bogers1 & Vivien Petras2 1 Aalborg University Copenhagen, Denmark 2 Humboldt-Universität zu Berlin, Germany iConference 2015, Newport Beach March 25, 2015
Tagging vs. Controlled Vocabulary Indexing
Controlled Vocabulary (CV)
+ Semantic relationships - Large development costs
Tagging
+ Use of the users’ vocabulary - No term normalization
5
Previous studies: ! Analyze nature of terms mostly: overlap / complementary vocabulary ! Few and conflicting results for retrieval ! Small samples
Study Objectives
What do tags and controlled vocabularies really bring to the table in a realistic search environment?
1. Which (combination of) metadata elements can best contribute to retrieval success?
2. How does the retrieval performance of tags and CVs compare using a large-scale and realistic test collection under carefully controlled circumstances?
8
Methodology
! How do we evaluate retrieval performance?
• Large collection of documents
• Realistic information needs (= topics)
• Relevance judgments (= relevant documents for topics)
12
Methodology
• Book collection - Controlled vocabularies in library catalogs
- Tags in social cataloging sites
23
INEX Test Collection of Book Records
User-Generated Content (UGC)
Bibliographic Metadata (Core)
Author Title Publication year Publisher
Reviews Tags
Controlled Vocabulary Content (CV)
DDC class labels Amazon subjects Amazon geographic names Amazon category labels
DDC class labels LCSH topical terms Geographic names Personal names Chronological terms Genre/form terms
DDC class labels LCSH topical terms Geographic names Personal names Chronological terms Genre/form terms
Methodology
• Book collection - Controlled vocabularies in library catalogs
- Tags in social cataloging sites
• Book search information needs - LibraryThing fora
Methodology
• Book collection - Controlled vocabularies in library catalogs
- Tags in social cataloging sites
• Book search information needs - LibraryThing fora
• Book search relevance judgements - LibraryThing fora
37
Experimental setup
• INEX Test collection for book records - Any-CV = 2 mio. records - Each-CV = 350,000 records
• LibraryThing forum topics - Query and Narrative representations - 640 different topics split in half for training the IR system and testing
• Relevance judgements: recommendations from LT members - with graded relevance scoring (highest relevance if book is added by
searcher) • Evaluation metric: Normalized Discounted Cumulated Gain
(NDCG@10) - Evaluated for the first 10 results of search output - Scores range between 0.0 and 1.0
39
Comparing controlled vocabulary sources
• Question - Which of the three sources of controlled vocabulary provides the best
performance?
• Answer - No significant differences in performance for the different providers or
their combination
- Amazon is not better or worse than British Library or Library of Congress
- Subsequent experiments combine all CV sources
40
Comparing metadata elements
• Questions - Which of the metadata elements provides the best stand-alone performance?
- Which metadata element performs better: tags or CV?
- Which combination of metadata elements provides the best performance?
41
0.00!
0.01!
0.02!
0.03!
0.04!
0.05!
0.06!
0.07!
0.08!
0.09!
0.10!
0.11!
0.12!
0.13!
Core!Controlled vocabulary!
Reviews!
Tags!User-generated content!
Core + Controlled vocabulary!
Core + Reviews!
Core + Tags!
Core + User-generated content!
All fields!
Query!Narrative!
Core Controlled vocabulary Reviews Tags
Core + Controlled vocabulary
Core + Reviews
Core + Tags
Core + User-generated
contentAll fields
User-generated
content
Results of (combinations of) element sets per topic representation
45
Comparing metadata elements
• Answers - Reviews are the best performing metadata elements by far
‣ Significantly so compared to all other individual metadata elements
- Combining metadata elements nearly always outperforms individual elements
‣ All metadata elements combined provide the best overall performance
- Slight advantage of tags over CV (but not significantly)
46
Follow-up analysis (1)
• Question - What is the nature of the difference between tags and CV: do they
complement each other or overlap?
48
-1.0!
-0.9!
-0.8!
-0.7!
-0.6!
-0.5!
-0.4!
-0.3!
-0.2!
-0.1!
0.0!
0.1!
0.2!
0.3!
0.4!
0.5!
0.6!
0.7!
0.8!
0.9!
1.0!
1! 21! 41! 61! 81! 101! 121! 141! 161! 181! 201! 221! 241! 261! 281! 301! 321!
-1.0!
-0.9!
-0.8!
-0.7!
-0.6!
-0.5!
-0.4!
-0.3!
-0.2!
-0.1!
0.0!
0.1!
0.2!
0.3!
0.4!
0.5!
0.6!
0.7!
0.8!
0.9!
1.0!
1! 26! 51! 76! 101! 126! 151! 176! 201! 226! 251! 276! 301! 326!
Per-topic differences (Tags vs. controlled vocabulary)
0.00!
0.01!
0.02!
0.03!
0.04!
0.05!
0.06!
0.07!
Que
ry -
CV
- Fic
tion!
Que
ry -
CV
- Non
-fict
ion!
Que
ry -
Tags
- Fi
ctio
n!
Que
ry -
Tags
- N
on-fi
ctio
n!
Query!Narrative!
Δ N
DC
G@
10
Tags > CV
CV > tags
Per-topic differences (Tags vs. controlled vocabularies)
• Answer - Tags and CVs outperform each other on different topic sets, offering
complementary performance
49
Follow-up analysis (2)
• Question - Does the book type influence the relative performance of tags vs. CV?
‣ Fiction
‣ Non-fiction
51
Fiction vs. non-fiction
0.00!
0.01!
0.02!
0.03!
0.04!
0.05!
0.06!
0.07!
Que
ry -
CV
- Fic
tion!
Que
ry -
CV
- Non
-fict
ion!
Que
ry -
Tags
- Fi
ctio
n!
Que
ry -
Tags
- N
on-fi
ctio
n!
Query!Narrative!
Fiction Non-fiction
Controlled vocabulary
Fiction Non-fiction
Tags• Answer - Advantage of tags over CV terms is most pronounced for fiction book
requests (but never significantly so)
- Retrieving relevant non-fiction books is easier than retrieving relevant fiction books
52
Follow-up analysis (3)
• Question - Does the type of information need influence the relative performance of tags
vs. CV?
‣ Search
‣ Recommendation
‣ Search + Recommendation
‣ Known-item
53
Follow-up analysis (3)
• Answers - Tags are better at satisfying
known-item needs and mixes of search & recommendation aspects
- CV is better for pure recommendation needs
- Differences are indicative, but not significant
0.00!
0.01!
0.02!
0.03!
0.04!
0.05!
0.06!
0.07!
0.08!
0.09!
0.10!
S!
S+R!
Controlled vocabulary!
Tags!
0.00!
0.01!
0.02!
0.03!
0.04!
0.05!
0.06!
0.07!
0.08!
0.09!
0.10!
S!
S+R!
Controlled vocabulary!
Tags!
Query Narrative
SearchSearch +
Recommendation
Recommendation
Known-item
SearchSearch +
Recommendation
Recommendation
Known-item
55
Conclusions
• Tags have a slight (but not significant) advantage over CV
• Tags and CV provide largely complementary performance
• Future work - Detailed analysis of precision/recall effect of tags vs. CV
‣ CV contains more unique terms, tags more repetition of terms
‣ Possible consequence: CVs boost recall, tags boost precision
- Detailed analysis of which types of tags/CV match relevant documents
- More detailed analysis of request types and their relation to tag/CV performance