Transforming Tags to (Faceted) Tagsonomies
Marti HearstUC Berkeley School of Information
This Research Supported by NSF IIS-9984741.
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Focus: Search and Navigation of Large Collections
ImageCollections
E-GovernmentSites
Example: the University of California Library Catalog
Shopping SitesDigital Libraries
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
What do we want done differently?
• Organization of results• Hints of where to go next• Flexible ways to move around
• … How to structure the information?
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Outline• Motivation: support for browsing big collections
– Focus on usability for a wide range of lay users
• Approach: flexible application of hierarchical faceted metadata– Advantages of the approach– Results of usability studies
• Automated Facet Creation– We have a nearly-automated algorithm that works well– I think it could greatly improve folksonomies
organization
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
How to Structure Information for Search and Browsing?
• Hierarchy is too rigid
• KL-One is too complex
• Hierarchical faceted metadata:– A useful middle ground
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
What are facets?• Sets of categories, each of which describe a
different aspect of the objects in the collection.• Each of these can be hierarchical.• (Not necessarily mutually exclusive nor
exhaustive, but often that is a goal.)
Time/Date Topic RoleGeoRegion
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Facet example: Recipes
Course
Main Course
CookingMethod
Stir-fry
Cuisine
Thai
Ingredient
Red Bell Pepper
Curry
Chicken
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Example of Faceted Metadata:Categories for Biomedical Journal Articles
1. Anatomy [A]
2. Organisms [B]
3. Diseases [C]
4. Chemicals and Drugs [D]
1. Lung
2. Mouse
3. Cancer
4. Tamoxifen
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Motivation Description: 19th c. paint horse; saddle and hackamore; spurs; bandana on rider; old time cowboy hat; underchin thong; flying off.
Nature Animal Mammal Horse
Occupations Cowboy
Clothing Hats Cowboy Hat
Media Engraving Wood Eng.
Location North America America
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Motivation Description: 19th c. paint horse; saddle and hackamore; spurs; bandana on rider; old time cowboy hat; underchin thong; flying off.
By using facets,what we are not capturing?
The hat flew off;The bandana stayed on.
The thong is part of the hat.
The bandana is on the cowboy(not the horse). The saddle is on the horse (not the cowboy).
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Hierarchical Faceted Metadata
• A simplification of knowledge representation
• Does not represent relationships directly
• BUT can be understood well by many people when browsing rich collections of information.
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
How to Put In an Interface?Some Challenges:
• Users don’t like new search interfaces.
• How to show lots of information without overwhelming or confusing?
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
A Solution (The Flamenco Project)
• Use proper HCI methods.
• Organize search results according to the faceted metadata so navigation looks similar throughout
– Easy to see what to go next, were you’ve been
– Avoids empty result sets
– Integrates seamlessly with keyword search
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Information previews• Use the metadata to show where to go next
– More flexible than canned hyperlinks– Less complex than full search
• Help users see and return to previous steps• Reduces mental work
– Recognition over recall– Suggests alternatives
• More clicks are ok iff (J. Spool)• The “scent” of the target does not weaken• If users feel they are going towards, rather than away,
from their target.
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
What is Tricky About This?
• It is easy to do it poorly• It is hard to be not overwhelming
– Most users prefer simplicity unless complexity really makes a difference
– Small details matter
• It is hard to “make it flow”
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Search Usability Design Goals
1. Strive for Consistency
2. Provide Shortcuts
3. Offer Informative Feedback
4. Design for Closure
5. Provide Simple Error Handling
6. Permit Easy Reversal of Actions
7. Support User Control
8. Reduce Short-term Memory Load
From Shneiderman, Byrd, & Croft, Clarifying Search, DLIB Magazine, Jan 1997. www.dlib.org
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Usability Studies• Usability studies done on 3 collections:
– Recipes: 13,000 items– Architecture Images: 40,000 items– Fine Arts Images: 35,000 items
• Conclusions:– Users like and are successful with the
dynamic faceted hierarchical metadata, especially for browsing tasks
– Very positive results, in contrast with studies on earlier iterations.
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Post-Test Comparison
15 16
2 30
1 29
4 28
8 23
6 24
28 3
1 31
2 29
FacetedBaseline
Overall Assessment
More useful for your tasksEasiest to useMost flexible
More likely to result in dead endsHelped you learn more
Overall preference
Find images of rosesFind all works from a given period
Find pictures by 2 artists in same media
Which Interface Preferable For:
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Advantages of the Approach• Honors many of the most important usability
design goals– User control– Provides context for results– Reduces short term memory load– Allows easy reversal of actions– Provides consistent view
• Allows different people to add content without breaking things
• Can make use of standard technology
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Advantages of the Approach
• Systematically integrates search results:– reflect the structure of the info architecture– retain the context of previous interactions
• Gives users control and flexibility – Over order of metadata use– Over when to navigate vs. when to search
• Allows integration with advanced methods– Collaborative filtering, predicting users’ preferences
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Disadvantages
• Does not model relations explicitly• Does it scale to millions of items?
– Adaptively determine which facets to show for different combinations of items
• Requires faceted metadata!
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Opportunities
• Creating hierarchical faceted categories– Assigning items to those categories– Adaptively adding new facets as data changes
• A new approach to personalization: – User-tailored facet combinations
• Create task-based search interfaces– Equate a task with a sequence of facet types
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Creating Classifications from Data
• Most approaches are associational– AKA clustering, LSA, LDA, etc.– This leads to poor results when applied to text
• To derive facets, need a different angle– We have a simple approach based on
WordNet
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Blei, Ng, & Jordan ’03 (Latent Dirichlet Allocation)
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Blei, Ng, & Jordan ’03 (Latent Dirichlet Allocation)
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Sanderson & Croft ’99Term Subsumption
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Sanderson & Croft ’99Term Subsumption
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Stoica & Hearst ’04WordNet-based
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Stoica & Hearst ’04WordNet-based
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Application to Photo Labeling
• ANLP class project Fall ’04– Earlier version of code– Masters students: Jeff Towle and Simon King
• Dataset: 1650 very short photo labels• Procedure
– Students simply ran the code– Had to remove proper names– Re-ran the code; done!
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Example Photos
very scary x-mas tree Hp presentation
chasing a cat in the dark My cat
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
• instrumentality, (112) – vehicle (26)
• car (9) • bike (8) • vessel, watercraft (4)
– mayflower (2) – ferry (1) – gig (1)
• truck (3) • airplane (2)
– device (20) • machine (7)
– computer (4) – laptop (1) – sander (1)
– container (16) • vessel (7)
– bottle (5) » water_bottle (2) » jug (1) » pill_bottle (1)
– bath (2) – bowl (1)
• can (2) • backpack (1) • bumper (1) • empty (1) • salt_shaker (1)
– furniture, piece of furniture, article of furniture (12)
• seat (8) – bench (2) – chair (2) – couch (2) – lounge (1)
• bed (4) • desk (1)
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Associational techniques• Pros:
– Sometimes terms grouped to get a general concept• Airline, airplane, pilots, flight
• Cons:– Highly unpredictable– Not comprehensive
• Dollar and yen but no deutchmarks
• Eastern but no other directions
– Not uniform in subject matter• Mixing currencies with countries with timing
• Mixing compass directions with airlines
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Lexical Hierarchy-based• Pros
– Faceted and hierarchical– Consistent is-a hierarchies– Comprehensiveness more likely
• Cons– Doesn’t provide overall themes
• Airlines, pilots, airplanes
– Sometimes uses wrong word sense– Sometimes the right term/hierarchy is not present
• Doesn’t have “dish type” nor “cuisine” for recipes• Specialized domains won’t work
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Our Approach• Leverage the structure of WordNet
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Our Approach• Leverage the structure of WordNet
Doc
umen
ts
WordNet
Get hypernym
paths
Sel
ect
ter
ms
Build tree
Compresstree
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
1. Select Terms
red blue
• Select well distributed
terms from collection Doc
ume
nts
WordNet
Get hypernym
pathsSel
ect
term
s
Build tree
Comp. tree
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
2. Get Hypernym Path
red blue
chromatic color
abstraction
property
visual property
color
red, redness
abstraction
property
visual property
color
blue, blueness
chromatic color
Doc
ume
nts
WordNet
Get hypernym
pathsSel
ect
te
rms
Build tree
Comp. tree
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
3. Build Tree
red blue
chromatic color
abstraction
property
visual property
color
red, redness
abstraction
property
visual property
color
blue, blueness
chromatic color
red blue
abstraction
property
visual property
color
red, redness
chromatic color
blue, blueness
Doc
ume
nts
WordNet
Get hypernym
pathsSel
ect
te
rms
Buildtree
Comp. tree
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
4. Compress Tree
Doc
ume
nts
WordNet
Get hypernym
pathsSel
ect
te
rms
Build tree
Comp.tree
red, redness
color
red
chromatic color
blue, blueness
blue
green, greenness
green green red
color
chromatic color
blue
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
4. Compress Tree (cont.)
red
color
chromatic color
blue green
color
red blue green
Doc
ume
nts
WordNet
Get hypernym
pathsSel
ect
te
rms
Build tree
Comp. tree
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Disambiguation• Ambiguity in:
– Word senses– Paths up the hypernym tree
Sense 1 for word “tuna”organism, being => plant, flora => vascular plant => succulent => cactus
=> tuna
Sense 2 for word “tuna”organism, being => fish => food fish => tuna => bony fish => spiny-finned fish => percoid fish => tuna
2 paths for same word
2 paths for
same sense
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
How to Select the Right Senses and Paths?
• First: build core tree– (1) Create paths for words with only one sense– (2) Use Domains
• Wordnet has 212 Domains– medicine, mathematics, biology, chemistry, linguistics, soccer, etc.
• Automatically scan the collection to see which domains apply• The user selects which of the suggested domains to use or
may add own • Paths for terms that match the selected domains are added to
the core tree
• Then: add remaining terms to the core tree.
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Using Domains
dip glosses:
Sense 1: A depression in an otherwise level surface
Sense 2: The angle that a magnet needle makes with horizon
Sense 3: Tasty mixture into which bite-size foods are dipped
dip hypernyms
Sense 1 Sense 2 Sense 3
solid shape, form food
=> concave shape => space => ingredient, fixings
=> depression => angle => flavorer
Given domain “food”, choose sense 3
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Opportunities for Tagging• New opportunity: Tagging, folksonomies
– (flickr de.lici.ous)– People are created facets in a decentralized manner– They are assigning multiple facets to items– This is done on a massive scale– This leads naturally to meaningful associations
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
http://www.airtightinteractive.com/projects/related_tag_browser/app/
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
This Doesn’t Solve Everything• Harder to determine what’s related to more
complex terms• Still not good for finding a recipe using potatoes
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Linking Metadata Into Tasks
• Old Yahoo restaurant guide combined:– Region – Topic (restaurants) – Related Information
• Other attributes (cuisines)
• Other topics related in place and time (movies)
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Green: restaurants & attributes
Red: related in place & time
Yellow: geographic region
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Other Possible Combinations• Region + A&E• City + Restaurant + Movies• City + Weather• City + Education: Schools• Restaurants + Schools• …
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Creating Tasks from HFM
• Recipes Example:– Click Ingredient > Avocado– Click Dish > Salad– Implies task of “I want to make a Dish type d with an
Ingredient i that I have lying around”– Maybe users will prefer to select tasks like these over
navigating through the metadata.
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Summary
• Flexible application of hierarchical faceted metadata is a proven approach for navigating large information collections.
– Midway in complexity between simple hierarchies and deep knowledge representation.
• Perhaps HFM is a good stepping stone to deeper semantic relations
– Currently in use on e-commerce sites; spreading to other domains
Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS
Opportunities
• Creating hierarchical faceted categories– Assigning items to those categories– Adaptively adding new facets as data
changes
• A new approach to personalization: – User-tailored facet combinations
• Create task-based search interfaces– Equate a task with a sequence of facet types
• Make use of folksonomies data!
Top Related