1
Hybrid Approaches to Taxonomy & Folksonomy
Semantic Technology, 2009
Stephanie LemieuxEarley & Associates
2
Agenda
• The taxonomy/folksonomy debate• Tagging pitfalls• Social tagging & the enterprise• Hybrid approaches to
taxonomy/folksonomy• Corporate tagging tools
3
About me
• Stephanie Lemieux– Senior Consultant at Earley & Associates, Inc.
– Masters in Library and Information Studies (MLIS),
specializing in taxonomy development, content
management, search, IA
– Developed enterprise taxonomies and helped a
variety of clients through CMS deployments
– Projects include: Motorola, Ford Foundation, Best
Buy, American Greetings, Urban Land Institute
– Blog: http://sethearley.wordpress.com/
The tired debate
Taxonomy Folksonomy
Control Democracy
Top-down Bottom-up
Arduous process Just do it
Accurate Good enough
Restrictive Flexible
Static Evolving
Expensive to maintain Low cost – “crowdsourced”
4
The relevance problem
• Search results should be relevant to what a searcher wants, but technology can only determine if it is relevant to a search term*
• Taxonomies and folksonomies = 2 approaches to the problem of relevance with common goal of describing content, each with particular gaps
5
*Billy Cripe: Folksonomy, Keywords & Tags: Social & Democratic User Interaction in Enterprise Content Managementhttp://www.oracle.com/technology/products/content-management/pdf/OracleSocialTaggingWhitePaper.pdf
Taxonomy
• Added by a small number of individuals: author/originators or “authorized” persons (e.g.librarian)
• Describes meaning or purpose of content based on a set view point for a specific audience using a controlled vocabulary
• Relationships between terms defined– Hierarchical (e.g. Computer hardware > Keyboard)– Associative (e.g. Computer hardware – Software)– Equivalent (e.g. Laptop = Notebook Computer)
6
Tags
• Added by authors and consumers (individual motivation)
• Can connote any type of meaning or purpose
• No compression around a single viewpoint, no control of vocabulary
• Self-correcting through volume
7
Why tagging is so interesting…
• Adding individual value to the act of classification – user control over findability
• Reducing the cognitive burden (i.e. it’s easy)
• Reduced technological investment (i.e. it’s cheap)
• Can leverage emergent structure (folksonomy)
8
Reno|Reno|TagsTags
9
The downside…
Neither tags nor taggers are perfect…• No language control
Guy & Tonkin, 2006.http://www.dlib.org/dlib/january06/guy/01guy.html
Study: 40% of flickr tags and
28% of del.icio.us tags were flawed in
these ways
Misspellings Library vs. libaryPlam pilot
Compound words TimBernersLee
Case & number Folksonomy,Folksonomies
Personal tags To readMy dog@work
Single-use tags Billybobsdog
The downside…
• Varying levels of granularity
• Same tag, different meanings• Lack of relationships between
tags – which is broader? Narrower?• Lack of consistency/approach to change –
even single user can change language and hamper own personal retrieval
10
RobinRobin
BirdBird
Turdus migratorinus
Turdus migratorinus
…Known as “tag noise”
11
The downside…
• Most tag search does not account for stemming, plurals, etc.
E.g.
Search on Delicious:
Folksonomy: 16049
Folksonomies: 4404
Both: 2642
12
The tagging hype cycle
http://www.pui.ch/phred/archives/2007/05/tag-history-and-gartners-hype-cycles.html
13
The web vs. the enterprise
• Shirky: “there is no shelf”– Traditional organization schemes are built to deal with
physical collections and constraints.– They don’t work well on the web
• large corpus• no clear edges• no formal categories• no authority
• The enterprise is much more defined• smaller corpuses• formal entities• coordinated users, clear tasks• need for reliable retrieval
E.g.FlickrDelicious
Social tagging works well in this
context
Social tagging is more of a
challenge, needsclear arena
14
Role of folksonomy in the enterprise?
• Tagging external links– Seeing what colleagues are interested in– Sharing links with a specific team– Subscribing to link feeds– Monitoring news/blog coverage of the company– Consumer/competitor research– Tracking industry trends
• Tagging internal links– Finding/facilitating access to most popular pages on the
intranet– Seeing what intranet pages mean to staff
15
Role of folksonomy in the enterprise?
• Social aspects– Identifying subject matter experts– Connecting people who share interests– Encouraging collaboration & resource sharing
• Improve your taxonomy, information retrieval– User tagging to refine the corporate taxonomy
• New concepts• New terminology
– Seeing what employees find interesting– Distributing tagging tasks
16
The downside…
• Potential issues of security, inappropriateness– Can implement some level of vetting
• Privacy concerns– Can be anonymous tagging, although this removes
some social value– Can create role or team-based collections
• Need higher ratio of active participants due to population size
17
Message text
External News Reports
Discussion postings
Links
Engineering document repositories
Success Stories
Policies
Approved Methods
Best Practices
Lower Cost Higher CostTagging/Organizing Processes
Unfiltered Reviewed/Vetted/Approved
Lower Value Higher Value
Key concept: Not all content is created equally
The content continuum
18
What if we blended the two?
Folksonomy/TaxonomyLow cost
Findability
Flexible
Structured relationships
User terminologyOversight
Social sharing
Consistency
Hybrid approaches
19
Co-existence Tag-influenced taxonomy
Taxonomy influenced tagging
Tag hierarchies/ontologies
Co-existence
• Taxonomy and folksonomy are used side by side
• Strengths of each approach preserved, philosophy of each kept “pure”
20
Web example: Flickr & Library of Congress: http://www.flickr.com/photos/library_of_congress/
Co-existence – public library
21
22
Raytheon – corporate example• Used in Raytheon employee portal - website lists
(“Suggested sites” feature box)
• How does it work: – inserted “Suggested Sites” in a "feature" box to the right
of the regularly ranked results – website suggestions (URLs) submitted along with
recommended tags/keyword which are subsequently verified and approved by librarians
http://www.slideshare.net/CJMConnors/i-kms-singapore-presentation
23
Variation: Tag mediation
• Vetting & editing tags• Pros:
– Weeds out potentially inappropriate tags– Eliminates misspellings, plural issues, etc.– Some can be done automatically (spell-checker,
e.g.)– Enhances findability
• Cons: – Higher effort/cost– Perceived lack of trust– Who knows better?
Tag-influenced taxonomy
• Taxonomy & tagging co-exist, tags serve as pool of candidate terms to enrich taxonomy, keep it current– Find new terminology (synonyms, popular language)– Find new concepts
• Performed as separate processes (taxonomy tagging=formal, tagging=informal) or combined in single interface
24
Tag-influenced taxonomy
• Requires formal vetting process• Can be supported by automation (e.g.
candidate tags pulled & filtered with script to remove taxonomy terms, stop words)
• Evaluate candidates based on – Frequency (“literary warrant”)– Salience within context
• Look at tags used in conjunction with taxonomy
25
Taxonomy-influenced tagging
• Presenting choices/suggestionsto user from controlled set of terms/tags– Sometimes users prefer easy choice
• Drop-down menus• Check boxes• Type ahead• Tree view
– “influenced” – option to enter own tag? Good source of new terms
– Enforces consistency– Offers structure
26
27
WWW example: ZigTag (2008)
Definitions from Wikipedia & Wordnet
Tagging with type-ahead against database of 3M unique concepts & 8M synonyms
28
Zigtag
• Type ahead & synonyms encourage consistency• Users can enter new tags• Synonyms based on Wikipedia, so can be “dirty
data”• No hierarchy, only equivalent relationships so far
29
Zigtag search
Still get problems with uncontrolled tags & recall
Interesting relationships from Wikipedia
Browesable tag cloud
Example: myedna (Education.au)
http://www.educationau.edu.au/jahia/webdav/site/myjahiasite/shared/papers/tagging_hayman.ppt
Fully taxonomy-directed tagging
© 2008 31
Buzzillions.com
• Review site: tags are “controlled” not against a taxonomy, but against other tags – reduces redundancy
• Only popular tags exposed as faceted navigation
SharePoint?
• Plugins make taxonomy easy, present like tags
E.g. KWizCom: plugin manages taxonomy and tags in easy interface… can opt-out of letting users create own tags
32
33
Taxonomy-directed tagging
• Pros:– More consistency– Better support for findability– Relationships, definitions leveraged – adding
meaning to the tags– Realistic for the enterprise
• Cons:– Not really folksonomy anymore..– Can be forcing terminology on user– Need to develop reference list of concepts –
manually through taxonomy or need large corpus to derive automatically
Tag hierarchies
• 2 flavors: user-powered, automatic derivation
• User-powered– Social approach– Bogus hierarchies possible– Small population will contribute
• RawSugar tried it (no longer around): taggers could specify hierarchy in own account, tags clustered in a based on common groups
34
Raw Sugar example
35
36
More user-powered tag relationships• E.g. LibraryThing
LibraryThing allows any use to combine (or uncombine) 2 tags that are semantically equivalent.
www.librarything.com
Automatic derivation
• Tag hiearchies, facets, ontologies, or “folksontology”
• Done through statistical/clustering algorithms
37
http://www.pui.ch/phred/automated_tag_clustering/
Delicious & citeulike hiearchy
38
http://heymann.stanford.edu/taghierarchy.html
Clustering at Flickr
39
40
Auto clustering/facets
• Still not very mature• Time-sensitive• Community-
sensitive• Ambiguous tags• Improve with volume
(self-correcting)
http://www.pui.ch/phred/automated_tag_clustering/
Intelligent tags
• Moving toward more semantic tagging with machine readable tags– Flickr: can tag images with machine tags
e.g. “geo:quartier=“SoHo” namespace:predicate=value
e.g. “lastfm:event=34640” – makes your photo appear on a lastfm event page
41
Intelligent tags
• MOAT: Meaning of a tag – part of linked data movement, mapping tags to semantic web– http://moat-project.org/
• Adding to the triplet– User – resource – tag – meaning– Meaning = URI to a resource containing
meaning (e.g. DBPedia)
42
<tag:RestrictedTagging> <tag:taggedResource rdf:resource="http://example.org/post/1"/> <foaf:maker rdf:resource="http://apassant.net/alex"/> <tag:associatedTag rdf:resource="http://tags.moat-project.org/tag/apple"/> <moat:tagMeaning rdf:resource="http://dbpedia.org/resource/Apple_Records"/></tag:RestrictedTagging>
Conclusion
• Not all content is created equal – tags and taxonomies have their sweet spots
• Hybrid approaches are emerging– taxonomy-influenced tagging leading the pack
in popularity on the web– co-existence in the enterprise
• Look for more developments on the semantic web/linked data front for making tags more intelligent
43
Corporate social tagging tools
44© 2008
45
Corporate social tagging software
• http://www.connectbeam.com/
46
Corporate social tagging software
• http://www.cogenz.com/
47
Corporate social tagging software•
http://www-306.ibm.com/software/lotus/products/connections/dogear.html
© 2008 48
Corporate social tagging software• BEA AquaLogic Pathways
• http://www.bea.com/framework.jsp?CNT=index.jsp&FP=/content/products/aqualogic/pathways/
Corporate social tagging software
• http://www.newsgator.com/business/socialsites/default.aspx
49
50
Stephanie [email protected]
Blog: sethearley.wordpress.comTwitter: stephlemieux
Send an email to [email protected] for a free pass to one of our conference calls.
Questions?
Top Related