Strategies LLC Taxonomy 6-15 June 2007Copyright 2007 Taxonomy Strategies LLC. All rights reserved....
-
Upload
samir-stamper -
Category
Documents
-
view
220 -
download
5
Transcript of Strategies LLC Taxonomy 6-15 June 2007Copyright 2007 Taxonomy Strategies LLC. All rights reserved....
Strategies LLCTaxonomy
6-15 June 2007 Copyright 2007 Taxonomy Strategies LLC. All rights reserved.
Taxonomy & metadatastrategies for effectivecontent management
Melbourne, Sydney, Canberra
Masterclass
2Taxonomy Strategies LLC The business of organized information
Today’s agenda
9:00-9:10 10 min Introduction
9:10-9:15 5 min Warm-up exercise
9:15-9:45 30 min Taxonomy fundamentals: Building taxonomies
9:45-10:00 15 min Taxonomy exercise
10:00-10:30 30 min Taxonomy fundamentals: Taxonomy business case
10:30-11:00 30 min Tea Break
11:00-12:00 60 min Taxonomy governance
12:00-12:30 30 min Capabilities self-assessment
12:30-13:30 60 min Lunch
13:30-14:30 60 min Taxonomy benchmarking
14:30-14:45 15 min Benchmarking exercise
14:45-15:15 30 min Tea Break
15:15-16:15 60 min Content tagging
16:15-16:30 15 min Tagging exercise
16:30-17:00 30 min Q&A
3Taxonomy Strategies LLC The business of organized information
Who I am: Joseph Busch
Over 25 years in the business of organized information. Founder, Taxonomy Strategies LLC Director, Solutions Architecture, Interwoven VP, Infoware, Metacode Technologies
– (acquired by Interwoven, November 2000)
Program Manager, Getty Foundation Manager, Pricewaterhouse
Metadata and taxonomies community leadership. President, American Society for Information Science & Technology Director, Dublin Core Metadata Initiative Adviser, National Research Council Computer Science and
Telecommunications Board Reviewer, National Science Foundation Division of Information and
Intelligent Systems Founder, Networked Knowledge Organization Systems/Services
4Taxonomy Strategies LLC The business of organized information
What we do
Organize Stuff
5Taxonomy Strategies LLC The business of organized information
For us, taxonomy work includes:
Metadata specification defines the properties needed to describe content so that it can be found & used.
Vocabularies are collections of terms that are used to specify some of the metadata properties.
Some vocabularies are big and hierarchical, some are small and flat.
An application profile specifies what metadata & vocabularies are required, and then represents them formally.
6Taxonomy Strategies LLC The business of organized information
Recent & current projects: http://www.taxonomystrategies.com/html/clients.htm
Government Commercial
Not-for-Profit
7Taxonomy Strategies LLC The business of organized information
Who are you? What sectors do you work in?
Your Role Administrator Records Manager Content Manager Communications Editor Information Architect Usability Expert Librarian Knowledge Engineer Ontologist Chief Information Officer
Industrial Sector Agriculture & Processing
Food, Lumber, Pulp & Paper Financial Services
Banking & Insurance Government
Public administration Public safety
High Tech Computers, Software &
Telecommunications Heavy Manufacturing
Steel, Automobiles & Aircraft Manufacturing
Consumer Products Medical & Health Care Mining & Refining
Petrochemicals, Oil & Gas Pharmaceuticals
8Taxonomy Strategies LLC The business of organized information
Why are you here?
What are the key questions that you want answered in today’s workshop?
Please rank the questions from the most important (5) to the least important (1)
Please provide your job title, organization and department; your name is optional.
Priority (1-5) Questions
Your title or role:
Your org or industry:
Your dept:
Your name: (optional)
9Taxonomy Strategies LLC The business of organized information
Today’s agenda
9:00-9:10 10 min Introduction
9:10-9:15 5 min Warm-up exercise
9:15-9:45 30 min Taxonomy fundamentals: Building taxonomies
9:45-10:00 15 min Taxonomy exercise
10:00-10:30 30 min Taxonomy fundamentals: Taxonomy business case
10:30-11:00 30 min Tea Break
11:00-12:00 60 min Taxonomy governance
12:00-12:30 30 min Capabilities self-assessment
12:30-13:30 60 min Lunch
13:30-14:30 60 min Taxonomy benchmarking
14:30-14:45 15 min Benchmarking exercise
14:45-15:15 30 min Tea Break
15:15-16:15 60 min Content tagging
16:15-16:30 15 min Tagging exercise
16:30-17:00 30 min Q&A
10Taxonomy Strategies LLC The business of organized information
The Taxonomy problem: How to pick from > 5,000 faucets?
By: Category Price Brand Color/Finish # Handles Series Name Water Filter? Faucet Spray Handle Shape Soap Dispenser?
11Taxonomy Strategies LLC The business of organized information
The main issue: What goes here?
When do the things in the list change?
How do we maintain the list?
What rules do we follow?
12Taxonomy Strategies LLC The business of organized information
Seven phases of taxonomy development
Week: 1 2 3 4 5 6 7 8 9 10 11 12
1 Identify Objectives
Conduct interviews
2 Inventory Resources
Identify, gather & review resources
Define fields & purpose
3 Specify Metadata
4 Model Content
Define content chunks & XML
DTDs
5 Specify Vocabularies
Compile controlled vocabularies
6 Specify Procedures
Develop workflow, rules & procedures
7 Test & Train Manually tag small sample
13Taxonomy Strategies LLC The business of organized information
Taxonomy design phases need to be iterated
1 Identify Objectives
2 Inventory Resources
3 Specify Metadata
4 Model Content
5 Specify Vocabularies
6 Specify Procedures
7 Test & Train
Interview core team and stakeholders
Identify, gather & review resources
Define fields & purpose
Define content
chunks & XML DTDs
Compile controlled
vocabularies
Develop workflow rules &
procedures
Plan & Prototype
Manually tag small sample
Gather additional resources,
if any
Revise if needed, bake
into alpha CMS
Revise if needed, bake into alpha
CMS
Revise, use in alpha CMS
alpha workflows in CMS
Alpha Dev & TestReview tagged
samples, default
procedures
Use alpha CMS to tag
larger sample
Modify CMS for
beta
Modify CMS for beta
Revise, use in beta CMS
Modify & extend
workflows
Gather additional sources, if
any
Beta D&T
Interview alpha users
Use beta CMS to tag larger
sample
Finalize training materials & train
staff
Modify for 1.0
Modify for 1.0
Revise using team
procedure
Finalize procedure materials
Final D&T
Interview beta users
14Taxonomy Strategies LLC The business of organized information
Licensing an existing taxonomy
See Factiva’s taxonomy www.taxonomywarehouse.com There are usually license fees, but these will be less than
the effort to develop an equivalent taxonomy. But pre-existing taxonomies rarely fit an organization’s
needs and may require extensive customization.
Recommendation Adopt a faceted approach. Reuse existing (especially internal) vocabularies for as
many of the facets as possible. Plan on doing full-custom “Content Type” and “Topic”
taxonomies.
15Taxonomy Strategies LLC The business of organized information
Free sources for 8 common taxonomies
Taxonomy Definition Potential SourcesOrganization Organizational structure. SP 800-87, U.S. Government Manual, Your
organizational structure, etc.
Content Type Structured list of the various types of content being managed or used.
Dublin Core Type Vocabulary, AGLS Document Type, Your records management policy, etc.
Industry Broad market categories such as lines of business, life events, or industry codes.
SIC, NAICS, Your market segments, etc.
Location Place of operations or constituencies.
FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics Div, US Postal Service, Your sales regions, etc.
Business Activity
Business activities or functions performed to accomplish mission and goals.
Federal Enterprise Architecture Business Reference Model, Enterprise ontology, Your business functions, etc.
Topic Business topics relevant to your mission & goals.
Federal Register Thesaurus, NAL Agricultural Thesaurus, Your research areas, etc.
Audience Subset of constituents to whom a piece of content is directed or is intended to be used by.
GEM, ERIC Thesaurus, IEEE LOM, Your psycho-graphics or personas, etc.
Products & Services
Names of products/programs and services.
ERP system, Your products and services, etc.
16Taxonomy Strategies LLC The business of organized information
Typical product catalog: A-Z, then idiosyncratic categories
17Taxonomy Strategies LLC The business of organized information
How to analyze existing product catalog categories: Principles and priorities
Preparing a product catalog for facet browsing (aka Guided Navigation) requires a category hierarchy and additional attributes.
Principles1. Categories and subcategories that could be swapped are candidates for
conversion to attributes.2. Repeated lists of subcategories signal a possible need for an attribute.3. The number of attributes should not exceed six or seven, so not all attribute
candidates should be used.• Avoid selecting strongly correlated attributes, such as “Weight” and “Shipping
Weight”.
Priorities1. Choose Categories that apply to many products, over those with few
products.2. Choose Attributes that apply to many Categories over those that apply only
to very few categories.
18Taxonomy Strategies LLC The business of organized information
Product categories example: Wireless carrier
Products
AccessoriesContentPhonesServices
BatteriesCasesChargersDataHands-FreeHeadsetsMiscellaneous
ConferencingInternet / DataLandline PhoneNetwork & Roaming
Relay ServicesSolutionsWireless Data
Versatile PhonesSmart DevicesBasic PhonesPrepaid PhonesInternational Only Phones
Mobile Broad-band Cards
PurchasedSubscription
19Taxonomy Strategies LLC The business of organized information
Product attributes example: Digital cameras in an electronics catalog
Types of attributes Generic attributes
– Brand/Product Family/Model– Price Range– Usually Ships
Merchandising attributes– Usage (E-mail, Internet Browsing, Programming, …)– Segment (Home, Business, Education, Government …)– Region & Country– Most Popular– New– Related Products
Specialized attributes– Capacity (Battery; Memory; MB; GB; BPS, …)– Resolution (DPI; Megapixels; XGA, XGA, UXGA, …)– Size (Display; Screen; ...)– Standard (a, b, g, n, …; scsi, ata, sata, eide, …; dimm, simm,
…)– Type (Camera; Battery; Display; Printer; Server; Storage;
Switch; …)
Resolution3 Megapixels (4)4 Megapixels (5)5 Megapixels (27)6-8 Megapixels (21)
BrandCanon (15)Fuji (10)Kodak (17)Nikon (8)Olympus (9)
TypePoint & Shoot (25)Digital SLR (10)Packages (5)
Price Range$100-250 (5)$250-500 (16)$500-1000 (19)More than $1000 (3)
20Taxonomy Strategies LLC The business of organized information
Faceted taxonomy theory & practice
How many terms are needed to provide sufficient granularity? Not as many as you think!
Post-coordinate indexing allows several simple controlled vocabularies to be combined, rather than using a single large pre-coordinated vocabulary.
21Taxonomy Strategies LLC The business of organized information
The power of faceted taxonomy
4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,00010,000 nodes (104) Easier to maintain Easier to tag by content authors Can be easier to navigate
It’s more effective to increase the number of facets, than to increase the number of terms per facet.
AdvocacyContractors & Grantees
Environmental Professionals
Federal Facilities
General PublicIndustryKidsResearchers & Scientists
Small BusinessStudents
Audience
AdvisoryExposureFood SafetyHealth Assessment
Health EffectHealth Risk Occupational Health
Pesticide Effects
Sun ProtectionToxicity
Health Industry
AllergenBiological Contaminant
CarcinogenChemicalExplosiveLiquid WasteMicroorganismOzonePesticideRadioactive Waste
Substance
Agriculture & Cattle
Automobile Repair
ChemicalDry CleaningElectronics & Computer
EnergyExtractive Industries
Food Processing
Leather Tanning & Finishing
Metal Finishing
22Taxonomy Strategies LLC The business of organized information
Automatically created taxonomies
Documents can be ‘clustered’ based on similarities and differences.
Problems: Typically only a single
hierarchy No overall plan Results hard for people to
navigate
What does “North” mean on this map?
23Taxonomy Strategies LLC The business of organized information
Automatic taxonomy construction software
Software can scan large quantities of content and extract statistically significant words and phrases.
Example: Archive of 10 publications analyzed for
topics related to “copyright.” Software does a poor job of
De-duplication. Turning significant words and phrases
into a larger structure. Discriminating between “gold” and
“garbage.” Software is good for
Getting an understanding of the key noun phrases in a large collection.
Providing test cases for evaluating a taxonomy.
Source: Sample data courtesy of nStein.
24Taxonomy Strategies LLC The business of organized information
Most popular flickr tags on 20 Feb 2007http://www.flickr.com/photos/tags/
Sort flickr categories into 5 or fewer groups. Then label each group.
25Taxonomy Strategies LLC The business of organized information
Taxonomy exercise—Facet grouping
Universal taxonomy facets By location (spatially) By time (chronologically) By type (genre) By physical properties (size, color, shape, etc.) By subject (topic)
Richard Saul Wurman. Information Architects (1996)
26Taxonomy Strategies LLC The business of organized information
Taxonomy exercise— Facet grouping
Sort flickr categories into 5 or fewer groups. Then label each group.
27Taxonomy Strategies LLC The business of organized information
Today’s agenda
9:00-9:10 10 min Introduction
9:10-9:15 5 min Warm-up exercise
9:15-9:45 30 min Taxonomy fundamentals: Building taxonomies
9:45-10:00 15 min Taxonomy exercise
10:00-10:30 30 min Taxonomy fundamentals: Taxonomy business case
10:30-11:00 30 min Tea Break
11:00-12:00 60 min Taxonomy governance
12:00-12:30 30 min Capabilities self-assessment
12:30-13:30 60 min Lunch
13:30-14:30 60 min Taxonomy benchmarking
14:30-14:45 15 min Benchmarking exercise
14:45-15:15 30 min Tea Break
15:15-16:15 60 min Content tagging
16:15-16:30 15 min Tagging exercise
16:30-17:00 30 min Q&A
28Taxonomy Strategies LLC The business of organized information
Business case and motivations for taxonomies
How are we going to use content, metadata, and taxonomies in applications to obtain business benefits?
29Taxonomy Strategies LLC The business of organized information
What technology analysts have said: Add metadata to search on!
“Adding metadata to unstructured content allows it to be managed like structured content. Applications that use structured content work better.”
“Enriching content with structured metadata is critical for supporting search and personalized content delivery.”
“Content that has been adequately tagged with metadata can be leveraged in usage tracking, personalization and improved searching.”
“Better structure equals better access: Taxonomy serves as a framework for organizing the ever-growing and changing information within a company. The many dimensions of taxonomy can greatly facilitate Web site design, content management, and search engineering. If well done, taxonomy will allow for structured Web content, leading to improved information access.”
30Taxonomy Strategies LLC The business of organized information
Fundamentals of taxonomy ROI
Tagging content using a taxonomy is a cost, not a benefit. There is no benefit without exposing the tagged content
to users in some way that cuts costs or improves revenues.
Putting taxonomy into operation requires UI changes and/or backend system changes, as well as data changes.
You need to determine those changes, and their costs, as part of the ROI.
31Taxonomy Strategies LLC The business of organized information
Product utilization: Taxonomy compared to search
Conversion rate increases. HomeDepot.com – Double digit increase. 1-800-Flowers.com – More than a 10% increase. Otto Group (Kaleidoscope, Freemans, Grattan, and lookagain
catalogs) – 130% increase.
Lift in average order size.
32Taxonomy Strategies LLC The business of organized information
Product catalog: Taxonomy compared to search
Benefit:Increased conversion rate & revenue lift
Web sales net income $ 80,000,000
Increased conversion rate 30%
$ 24,000,000
Order size lift 10%
$ 8,000,000
Potential revenue increase per year $ 32,000,000
33Taxonomy Strategies LLC The business of organized information
Usability research: Taxonomy compared to search
“We found that users preferred a browsing oriented interface for a browsing task, and a direct search interface when they knew precisely what they wanted.”
Marti Hearst (and others)
“The category interface is superior to the list interface in both subjective and objective measures.”
Hao Chen & Susan Dumais
34Taxonomy Strategies LLC The business of organized information
Usability research: Taxonomy compared to search
0
20
40
60
80
100
120
140
Category List
Me
dia
n S
earc
h T
ime
in
Se
con
ds
In top 20 results
Not in top 20 results
Category is 36% faster
Category is 48% faster
Source: Chen & Dumais
35Taxonomy Strategies LLC The business of organized information
Time saved: Taxonomy compared to search
1 hour per day searching x 36% faster = 22 minutes each day
22 minutes x 250 working days per year = 5500 minutes or 92 hours per year
36Taxonomy Strategies LLC The business of organized information
Time saved: Taxonomy compared to search
Benefit: Increase service efficiency
Number of call center calls per month 50,000
Average cost per call $ 20
Call response costs per month $ 1,000,000
Total call response costs per year $12,000,000
Percentage of self-serviced calls due to improved information browsing 30%
Service costs savings per year $ 3,600,000
37Taxonomy Strategies LLC The business of organized information
Trusted advisers: Taxonomy avoids costs
“The amount of time wasted in futile searching for vital information is enormous, leading to staggering costs …”
Sue Feldman,
Sun’s usability experts calculated that 21,000 employees were wasting an average of six minutes per day due to inconsistent intranet navigation structures. When lost time was multiplied by staff salaries, the estimated productivity loss exceeded $10M per year—about $500 per employee per year.
Jakob Nielsen, useit.com
38Taxonomy Strategies LLC The business of organized information
Searching
Creating
Commun-icating
Knowledge workers spend up to 2.5 hours each day looking for information …
… But find what they are looking for only 40% of the time.
Source: Kit Sims Taylor
39Taxonomy Strategies LLC The business of organized information
Creating new
content
Recreating existing content
SearchingCommun-icating
25%8%
Knowledge workers spend more time re-creating existing content than creating new content
Source: Kit Sims Taylor (cited by Sue Feldman in her original article)
40Taxonomy Strategies LLC The business of organized information
Cost saved by not recreating content
Benefit: Increase in productivity
Number of employees 100
Average employee salary $ 80,000
Employee costs per year $8,000,000
Increase in productivity from not re-creating content 25%
Employee cost savings per year $2,000,000
41Taxonomy Strategies LLC The business of organized information
Business case summary
1. Classifications and classification-like schemes are being used to facilitate information seeking in the workplace, and on the web.
2. Users take advantage (and prefer) this type of scheme (faceted navigation) when it is made available in the user interface.
3. Hierarchical or facet navigation can be guided by the User Interface.
4. Facet navigation is best combined with keyword searching. E.g., keyword search followed by faceted navigation of results.
42Taxonomy Strategies LLC The business of organized information
Today’s agenda
9:00-9:10 10 min Introduction
9:10-9:15 5 min Warm-up exercise
9:15-9:45 30 min Taxonomy fundamentals: Building taxonomies
9:45-10:00 15 min Taxonomy exercise
10:00-10:30 30 min Taxonomy fundamentals: Taxonomy business case
10:30-11:00 30 min Tea Break
11:00-12:00 60 min Taxonomy governance
12:00-12:30 30 min Capabilities self-assessment
12:30-13:30 60 min Lunch
13:30-14:30 60 min Taxonomy benchmarking
14:30-14:45 15 min Benchmarking exercise
14:45-15:15 30 min Tea Break
15:15-16:15 60 min Content tagging
16:15-16:30 15 min Tagging exercise
16:30-17:00 30 min Q&A
43Taxonomy Strategies LLC The business of organized information
Taxonomy requires a business processes
Taxonomies must change, gradually, over time if they are to remain relevant.
Maintenance processes need to be specified so that the changes are based on rational cost/benefit decisions.
44Taxonomy Strategies LLC The business of organized information
Taxonomy governance can be viewed as a standards process
Taxonomy must evolve, but in a predictable way. Team structure, with an appeals process
Taxonomy stewardship is part-time role at most organizations. Team needs to make decisions based on costs and benefits.
Documentation and educational materials. Comment-handling responsibilities (part of error-
correction process) Issue Logs. Release Schedule.
45Taxonomy Strategies LLC The business of organized information
Taxonomy governance: Change process overview
Working Copiesof CVs, maintain in
Taxonomy Tool
Site Search Tool
Portal
Project Archives
’
DMS’
Metatagging Tool
Search UI
2: NASA Taxonomy Teamdecides when to
update snapshots ofexternal CVs
4: Updated versions ofCVs to Consumers
NASA Taxonomy Governance Environment
3: Team adds value to snapshots through
definitions, synonyms, classification rules,
training materials, etc.
Internally CreatedCVs
Codes
NASA Competencies
CVs from otherNASA Sources
External StandardVocabularies
’
’
2: Taxonomy Team decides when to update CV snapshots
Taxonomy Facets
3: Team adds value via definitions, synonyms, classification rules, training materials, etc.
1: External controlled vocabularies (CVs) change on their own schedule
Taxonomy Governance Environment
4: Updated versions of CVs published to consumers
CV Consumers
CV Sources
Subject Codes
Expertise
Other Internal
External Standard
Site Search Tool
Portal
Working Papers
Web CMS
DAM
Tagging Tool
Search UI
Internally Created
Taxonomy Tool
CV = Controlled Vocabulary
46Taxonomy Strategies LLC The business of organized information
Who should build the taxonomy?
The taxonomy (and metadata specification) should be produced by a cross-functional team which includes business, technical, information management, and content creation stakeholders.
The team should plan on maintaining the taxonomy as well as building it.
Maintenance will not (usually) be anyone’s full-time job. Exact mix of people on team will change.
It should be built in an iterative fashion, with more content and broader review for each iteration.
47Taxonomy Strategies LLC The business of organized information
Taxonomy governance: Generic team charter
Taxonomy Team is responsible for maintaining: The Taxonomy, a multi-faceted classification scheme. Associated taxonomy materials, such as:
– Editorial Style Guides.– Taxonomy Training Materials.– Metadata Standard.
Team rules and procedures for change management. Taxonomy Team will consider costs and benefits of
suggested changes. Taxonomy Team will:
Manage relationship between providers of source vocabularies and consumers of the Taxonomy.
Identify new opportunities for use of the Taxonomy across the enterprise to improve information management practices.
Promote awareness and use of the Taxonomy.
48Taxonomy Strategies LLC The business of organized information
Taxonomy governance team: Generic roles
Business Lead
Technical Specialist
Taxonomy Specialist
Content Specialist
Content Owners
Keeps committee on track with larger business objectives. Balances cost/benefit issues to decide appropriate levels of
effort. Obtains needed resources if those on committee can’t
accomplish a particular task.
Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc.
Helps obtain data from various systems.
Committee’s liaison to content creators. Estimates costs of proposed changes in terms of editorial
process changes, additional or reduced workload, etc.
Suggests potential taxonomy changes based on analysis of query logs, indexer feedback.
Makes edits to taxonomy, installs into system with aid of IT specialist.
Reality check on process change suggestions.
49Taxonomy Strategies LLC The business of organized information
Where taxonomy changes come from
experience
End User
Firewall
Taxonomy
Content TaggingLogic
ApplicationUI
TaggingUI
Tagging Staff
Taxonomy Editor
Staff notes
‘missing’concepts
Query log analysis
Requests from other parts of NASA
experience
End User
Taxonomy Team
FirewallFirewall
Taxonomy
Content TaggingLogic
TaggingLogic
ApplicationUI
ApplicationUI
TaggingUI
TaggingUI
Tagging Staff
Taxonomy Editor
Staff notes
‘missing’concepts
Query log analysis
Requests from other parts of the organization
Team Considerations
1. Business goals.
2. Changes in user experience.
3. Retagging cost.
Recommendations by Editor
1. Small taxonomy changes (labels, synonyms)
2. Large taxonomy changes (retagging, application changes)
3. New “best bets” content.
Application Logic
50Taxonomy Strategies LLC The business of organized information
Taxonomy maintenance processes
Different organizations will need to consider their own change processes.
Organization 1: A custodian is responsible for the content, but checks facts with department heads before making changes.
Organization 2: Analysts suggest changes, editors approve, copyeditors verify consistency.
Organization 3: Marketing reps ask for a change, taxonomy editor makes demo, web representative approves it.
Change process MUST also consider cost of implementing the change
Retagging data. Reconfiguring auto-classifier. Retraining staff. Changes in user expectations.
51Taxonomy Strategies LLC The business of organized information
Taxonomy maintenance workflow
Problem?
Problem?
Yes
Yes No
No
Suggest new name/category
Review new name
Taxon-omy
Copy edit new name
Add to enterprise Taxonomy
Analyst Editor Copywriter Sys Admin
Taxonomy Tool
52Taxonomy Strategies LLC The business of organized information
Sample taxonomy editor: Data Harmony
Hierarchy Browser
Standard Term Info
53Taxonomy Strategies LLC The business of organized information
Taxonomy editing tools vendors
Abi
lity
to E
xecu
telo
whi
gh
Completeness of VisionVisionariesNiche Players
Most popular taxonomy editor is
MS Excel
An immature area– No vendors are in
upper-right quadrant!
MultiTes is widely used, cheap with
functionality
High functionality
/high cost products ($100K+)
54Taxonomy Strategies LLC The business of organized information
Taxonomy maturity model
Taxonomy governance processes must fit the organization. As consultants, we notice different levels of maturity in the business
processes around content management, taxonomy, and metadata. Honestly assess your organization’s metadata maturity in order to
design appropriate governance processes. We are starting to define a maturity model, similar to the Software
Capability Maturity Model (CMM) Initial: Ad hoc, each project begins from scratch. Repeatable: Procedures defined and used, but not standardized across
organization or are misapplied to projects. Defined: Standard processes are tailored for project needs. Strategic
training for long-range goals is in place. Managed: Projects managed using quantitative quality measures.
Process itself is measured and controlled. Optimizing: Continual process improvement. Extremely accurate project
estimation.
55Taxonomy Strategies LLC The business of organized information
Purpose of maturity model
Estimating the maturity of an organization’s information management processes tells us:
How involved the taxonomy development and maintenance process should be
– Overly sophisticated processes will fail. What to recommend as first steps.
Maturity is not a goal, it is a characterization of an organization’s methods for achieving particular goals.
Mature processes have expenses which must be justified by consequent cost savings or revenue gains.
IT Maturity may not be core to your business.
56Taxonomy Strategies LLC The business of organized information
Taxonomy maturity scorecardInitial Repeatable Defined Managed Optimizing
Organizational Structure
Executive Sponsorship *
Budgeting *
Hiring & Training *
Quality Assurance
Manual Processes * 1
Automated Processes *
Project Management
Estimating & Scheduling *
Cost Control *
Project Methodology * 2
Design and Execution
Planning *
Design Excellence *
Development Maturity *
1 – X is starting to examine search query logs, which is an important first step in improving search. But this is only an isolated example.2 – IT has a project methodology they are trying to use across all projects. But not all business units have project methodologies.
57Taxonomy Strategies LLC The business of organized information
Taxonomy governance self-assessment
Background
1. Rate your organization’s overall taxonomy maturity from 1 to 10.
Immature 1 2 3 4 5 6 7 8 9 10 Mature
2. What type of change was most recently made to your organization’s taxonomy management environment?
Functionality Standards Tools People Data Quality
2. What is the area for your organization’s taxonomy management environment improvement?
Functionality Standards Tools People Data Quality
Basic
1. Is there a process in place to examine search query logs? Yes No
2. Is there an organization-wide metadata standard, such as the “Dublin Core”, for use by search tools? Yes No
Intermediate
1. Is there an ongoing data cleansing procedure to look for any redundant, obsolete or trivial content (ROT)? Yes No
If there is a process, describe it briefly.
2. Does the search engine index more than 4 repositories around the organization?
3. Are system features and metadata fields added based on cost/benefit analysis, or because they are easy to do with the current applications and tools? Cost/Benefit Easy
4. Are applications and tools acquired after requirements have been analyzed, or are major purchases sometimes made to use up year-end money? Requirements Year-End
5. Are there hiring and training practices for metadata and taxonomy positions? Yes No
If there is training, describe it briefly.
Advanced
1. Are there established qualitative and quantitative measures of metadata quality? Yes No
If there are measures, describe them briefly.
2. Can the CEO explain the return on investment (ROI) for content management, search and metadata? Yes No
58Taxonomy Strategies LLC The business of organized information
2005 Maturity survey: Search practices
n=87Not current
practiceBeing
developed In practiceFormer practice
NA or Unknown
Search Box in standard place on all web pages. 20% (12) 11% (7) 62% (38) 2% (1) 5% (3)
Search engine indexes multiple repositories in addition to web sites. 25% (15) 21% (13) 44% (27) 2% (1) 8% (5)
Spell Checking. 31% (19) 18% (11) 38% (23) 0% (0) 13% (8)
Synonym Searching. 41% (25) 23% (14) 30% (18) 0% (0) 7% (4)
Search results grouped by date, location, or other factors in addition to simple relevance score. 37% (22) 20% (12) 37% (22) 0% (0) 7% (4)
Queries are logged and the logs are regularly examined 31% (19) 25% (15) 31% (19) 5% (3) 8% (5)
Common queries identified, 'best' pages for those queries are found, and search engine configured to return them at the top. (Best Bets) 46% (28) 25% (15) 21% (13) 0% (0) 8% (5)
Advanced computation of relevance based on data in addition to the text of the document. 43% (26) 16% (10) 25% (15) 0% (0) 16% (10)
A faceted search tool, such as Endeca, has been implemented for the organization's external site or product catalog search. 68% (41) 7% (4) 10% (6) 0% (0) 15% (9)
A faceted search tool, such as Endeca, has been implemented for the organization's internal website(s) or portal. 57% (34) 15% (9) 17% (10) 0% (0) 12% (7)
59Taxonomy Strategies LLC The business of organized information
2005 Maturity survey: Metadata practices
n=87Not current
practiceBeing
developed In practiceFormer practice
NA or Unknown
Metadata standards are developed for the needs of each system with no overall attempt to unify them. 22% (13) 12% (7) 37% (22) 20% (12) 10% (6)
An Organization-wide metadata standard exists and new systems consider it during development. 37% (22) 37% (22) 20% (12) 0% (0) 7% (4)
The Organization-wide metadata standard is based on the Dublin Core. 52% (30) 16% (9) 21% (12) 0% (0) 12% (7)
Multiple repositories comply with metadata standard. 52% (31) 20% (12) 17% (10) 0% (0) 12% (7)
A Cataloging Policy document exists to teach people how to tag data in compliance with organizational metadata standard. 48% (29) 20% (12) 20% (12) 0% (0) 12% (7)
The Cataloging Policy document is revised periodically. 48% (29) 15% (9) 17% (10) 0% (0) 20% (12)
A centralized metadata repository exists to aggregate and unify metadata from disparate sources. 57% (34) 17% (10) 17% (10) 0% (0) 10% (6)
Metadata is manually entered into web forms. 15% (9) 12% (7) 61% (36) 3% (2) 8% (5)
Metadata is generated automatically by software. 38% (23) 18% (11) 27% (16) 2% (1) 15% (9)
Metadata is generated automatically, then reviewed manually for correction. 48% (29) 18% (11) 17% (10) 2% (1) 15% (9)
60Taxonomy Strategies LLC The business of organized information
2005 Maturity survey: Taxonomy practices
n=87 Not current practice
Being developed In practice
Former practice
NA or Unknown
Org Chart Taxonomy - One based primarily on the structure of the organization. 36% (21) 10% (6) 34% (20) 5% (3) 15% (9)
Products Taxonomy - One based primarily on the products and/or services offered by the organization. 37% (22) 10% (6) 32% (19) 5% (3) 15% (9)
Content Types Taxonomy - One based primarily on the different types of documents. 28% (16) 21% (12) 40% (23) 5% (3) 7% (4)
Topical Taxonomy - One based primarily on topics of interest to the site users. 20% (12) 36% (21) 34% (20) 3% (2) 7% (4)
Faceted Taxonomy - One which uses several of the approaches above. 32% (19) 29% (17) 34% (20) 0% (0) 5% (3)
The Taxonomy, or a portion of it, was licensed from an outside taxonomy vendor. 75% (44) 3% (2) 14% (8) 0% (0) 8% (5)
The Taxonomy follows a written 'style guide' to ensure its consistency over time. 47% (28) 22% (13) 20% (12) 0% (0) 10% (6)
The Taxonomy is maintained using a taxonomy editing tool other than MS Excel. 35% (21) 17% (10) 40% (24) 2% (1) 7% (4)
The Taxonomy was validated on a representative sample of content during its development. 28% (17) 22% (13) 33% (20) 3% (2) 13% (8)
A Roadmap for the future evolution of the Taxonomy has been developed. 38% (23) 40% (24) 13% (8) 0% (0) 8% (5)
61Taxonomy Strategies LLC The business of organized information
Today’s agenda
9:00-9:10 10 min Introduction
9:10-9:15 5 min Warm-up exercise
9:15-9:45 30 min Taxonomy fundamentals: Building taxonomies
9:45-10:00 15 min Taxonomy exercise
10:00-10:30 30 min Taxonomy fundamentals: Taxonomy business case
10:30-11:00 30 min Tea Break
11:00-12:00 60 min Taxonomy governance
12:00-12:30 30 min Capabilities self-assessment
12:30-13:30 60 min Lunch
13:30-14:30 60 min Taxonomy benchmarking
14:30-14:45 15 min Benchmarking exercise
14:45-15:15 30 min Tea Break
15:15-16:15 60 min Content tagging
16:15-16:30 15 min Tagging exercise
16:30-17:00 30 min Q&A
62Taxonomy Strategies LLC The business of organized information
Taxonomy testing methods
Method Process Who Requires ValidationWalk-thru Show &
explain Taxonomist SME Team
Rough taxonomy
Approach Appropriateness to task
Walk-thru Check conformance to editorial rules
Taxonomist Draft taxonomy
Editorial Rules
Consistent look and feel
Usability Testing
Contextual analysis (card sorting, scenario testing, etc.)
Users Rough taxonomy
Tasks & Answers
Tasks are completed successfully
Time to complete task is reduced
User Satisfaction
Survey Users Rough Taxonomy
UI Mockup Search
prototype
Reaction to taxonomy Reaction to new interface Reaction to search results
Tagging Samples
Tag sample content with taxonomy
Taxonomist Team Indexers
Sample content
Rough taxonomy (or better)
Content ‘fit’ Fills out content inventory Training materials for people &
algorithms
63Taxonomy Strategies LLC The business of organized information
Walk-through method—Show & explain
ABC Computers.com
AllBusinessEmployeeEducationGaming Enthusiast
HomeInvestorJob SeekerMediaPartnerShopper
First TimeExperiencedAdvanced
Supplier
Audience
AllHome & Home Office
GamingGovernment, Education & Healthcare
Medium & Large Business
Small Business
Line of Business
AllAsia-PacificCanadaEMEAJapanLatin America & Caribbean
United States
Region-Country
DesktopsMP3 PlayersMonitorsNetworkingNotebooksPrintersProjectorsServersServicesStorageTelevisionsOther Brands
Product Family
AwardCase StudyContract & Warranty
DemoMagazineNews & EventProduct Information
ServicesSolutionSpecificationTechnical NoteToolTrainingWhite PaperOther Content Types
Content Type
Business & Finance
Interpersonal Development
IT Professionals Technical Training
IT Professionals Training & Certification
PC ProductivityPersonal Computing Proficiency
Competency Industry
Banking & Finance
Communica-tions
E-BusinessEducationGovernmentHealthcareHospitalityManufacturingPetro-chemicalsRetail / Wholesale
TechnologyTransportationOther Industries
Service
Assessment, Design & Implementa-tion
DeploymentEnterprise Support
Client SupportManaged Lifecycle
Asset Recovery & Recycling
Training
64Taxonomy Strategies LLC The business of organized information
Walk-through method— Editorial rules consistency check
Abbreviations Ampersands Capitalization General…, More…, Other… Languages & character sets Length limits Multiple parents Plural vs. singular form Scope notes Serial comma Sources of terms Spaces Synonyms & acronyms Term order (Alphabetic or …) Term label order (Direct vs.
inverted)…
Rule Name Editorial Rule
Abbreviations Abbreviations, other than colloquial terms and acronyms, shall not be used in term labels.Example: Public InformationNOT: Public Info.
Ampersands The ampersand [&] character shall be used instead of the word ‘and’. Example: Licensing & ComplianceNOT: Licensing and Compliance
Capitalization Title case capitalization shall be used. Example: Customer ServiceNOT: CUSTOMER SERVICENOT: Customer serviceNOT: customer service
General…, More…, Other…
The term labels “General…”, “More…”, and “Other…” shall be used for categories which contain content items that are not further classifiable. Example: “Other Property”
“Other Services”“General Information”“General Audience”
… …
65Taxonomy Strategies LLC The business of organized information
Task-based testing*
15 representative questions were selected Perspective of various organizational units Most frequent website searches Most frequently accessed website content Correct answers to the questions were agreed in advance by team.
15 users were tested Did not work for the organization Represented target audiences
Testers were asked “where would you look for …” “under which facet… Topic, Commodity, or Geography?” Then, “… under which category?” Then, “…under which sub-category?” Tester choices were recorded
Testers were asked to “think aloud” Notes were taken on what they said
Pre- and post questions were asked Tester answers were recorded
* Based on Donna Maurer’s usability
work with the Australian government
66Taxonomy Strategies LLC The business of organized information
Task-based testing—Representative questions
1. How much cotton is imported from China? 2. What are the impacts of “mad cow" disease on U.S. meat production, sales?3. What is the average farm income level in your state?4. How much of our diet comes from fast food?5. How many people receive WIC benefits (Special Supplemental Nutrition
Program for Women, Infants, and Children)?6. How much acreage is planted to genetically engineered corn?7. What is the cost of foodborne illness in the United States?8. What part of food costs go to farmers, retailers?9. Which States produce the most tobacco?10. What percentage of farms in the United States are small farms?11. What are the costs and benefits associated with providing more traceability in
the U.S. food supply?12. How many people in America don’t get enough to eat?13. What is behind the trade balance (surplus or deficit) in agricultural goods?14. What is the extent of conservation compliance? How does that impact
farmer's decisions?15. What are the impacts of foreign trade restrictions on U.S. farmers, U.S. food
prices?
67Taxonomy Strategies LLC The business of organized information
Task-based testing—Closed card sorting
3. What is the average farm income level in
your state?
1. Topics2. Commodities3. Geographic Coverage
1. Topics1.1 Agricultural Economy1.2 Agriculture-Related
Policy1.3 Diet, Health & Safety1.4 Farm Financial
Conditions1.5 Farm Practices &
Management1.6 Food & Agricultural
Industries1.7 Food & Nutrition
Assistance1.8 Natural Resources &
Environment1.9 Rural Economy1.10 Trade & International
Markets
1.4 Farm Financial Conditions
1.4.1 Costs of Production1.4.2 Commodity Outlook1.4.3 Farm Financial
Management & Performance
1.4.4 Farm Income1.4.5 Farm Household
Financial Well-being1.4.6 Lenders & Financial
Markets1.4.7 Taxes
68Taxonomy Strategies LLC The business of organized information
Task based testing— Card sort analysis
Find-it Tasks User 1 User 2 User 3 User 4 User 5
1. Cotton Cotton Cotton Asia Cotton Cotton
2. Mad cow Cattle Food Safety Cattle Cattle Cattle
3. Farm income Farm Income Farm Income US States Farm Income Farm Income
4. Fast foodFood Consumption
Diet Quality & Nutrition
Food Expenditures
Diet Quality & Nutrition
Diet Quality & Nutrition
5. WIC WIC Program WIC Program WIC Program WIC Program WIC Program
6. GE Corn Corn Corn Corn Corn Corn
7. Foodborne illnessFoodborne Disease
Foodborne Disease
Consumer Food Safety
Foodborne Disease
Foodborne Disease
8. Food costs Food Prices Market Structure Market AnalysisFood Expenditures
Retailing & Wholesaling
9. Tobacco Tobacco Tobacco Tobacco Tobacco Tobacco
10. Small Farms Farm Structure Farm Structure Farm Structure Farm Structure Farm Structure
11. Traceability Food System Labeling PolicyFood Safety Innovations
Food Safety Policy Food Prices
12. Hunger Food Security Food Security Food Security Food Security Food Security
13. Trade balanceCommodity Trade
Trade & Intl Markets
Commodity Trade Market Analysis
Commodity Trade
14. ConservationsCropping Practices
Conservation Policy
Conservation Policy
Conservation Policy
Conservation Policy
15. Trade restrictions Trade PolicyFood Safety & Trade WTO Market Analysis
Commodity Trade
69Taxonomy Strategies LLC The business of organized information
Task based testing—Card sort results
In 80% of the trials users looked for information under the categories that we expected them to look for it.
Breaking-up topics into facets makes it easier to find information, especially information related to commodities.
70Taxonomy Strategies LLC The business of organized information
Task based testing—Card sort results
Test Questions%
Correct%
Agree
1. Cotton 91% 82%
2. Mad cow 73% 64%
3. Farm income 100% 55%
4. Fast food 91% 73%
5. WIC 100% 100%
6. GE corn 100% 100%
7. Foodborne illness 82% 82%
8. Food costs 55% 27%
9. Tobacco 100% 100%
10. Small farms 91% 91%
11. Traceability 36% 18%
12. Hunger 100% 73%
13. Trade balance 36% 64%
14. Conservation 91% 91%
15. Trade restrictions 55% 36%
Possible change required.
Change required.
Possible error in categorization of this question because 64% thought the answer should be “Commodity Trade.”
On these trials, only 50% looked in the right category, & only 27-36% agreed on the category.
Policy of “Traceability” needs to be clarified. Use quasi-synonyms.
71Taxonomy Strategies LLC The business of organized information
Task-based testing—User satisfaction survey
Was it easy, medium or difficult to choose the appropriate Topic?
– Easy – Medium– Difficult
Was it easy, medium or difficult to choose the appropriate Commodity?
– Easy – Medium– Difficult
Was it easy, medium or difficult to choose the appropriate Geographic Coverage?
– Easy – Medium– Difficult
72Taxonomy Strategies LLC The business of organized information
User satisfaction survey—Results
-
0.50
1.00
1.50
2.00
Topic Commodity Geography
Facet
Ea
sy
-
->
Dif
fic
ult
EasierMore Difficult
73Taxonomy Strategies LLC The business of organized information
User interface survey— Which search UI is ‘better’?
Criteria User satisfaction Success completing tasks Confidence in results Fewer dead ends
Methodology Design tasks from specific to
general Time performance Calculate success rates Survey subjective criteria Pay attention to survey
hygiene:– Participant selection– Counterbalancing– T-scores
Source: Yee, Swearingen, Li, & Hearst
74Taxonomy Strategies LLC The business of organized information
User interface survey— Results (1)
Which Interface would you rather use for these tasks?
Google-like Baseline
Faceted Category
Find images of roses 15 16
Find all works from a certain period 2 30
Find pictures by 2 artists in the same media 1 29
…
Overall assessment:Google-like
BaselineFaceted
Category
More useful for your usual tasks 4 28
Easiest to use 8 23
Most flexible 6 24
More likely to result in dead-ends 28 3
Helped you learn more 1 31
Overall preference 2 29
…
Source: Yee, Swearingen, Li, & Hearst
75Taxonomy Strategies LLC The business of organized information
User interface survey— Results (2)
6.06.7
4.7 4.6
5.8 5.56.0
4.0
7.26.3
3.5
7.7 7.4 7.8
4.8
7.6
0123456789
Faceted Category
Google-like Baseline
Source: Yee, Swearingen, Li, & Hearst
76Taxonomy Strategies LLC The business of organized information
Tagging samples—How many items?
GoalNumber of
Items Criteria
Illustrate metadata schema 1-3 Random (excluding junk)
Develop training documentation
10-20 Show typical & unusual cases
Qualitative test of small vocabulary (<100 categories)
25-50 Random (excluding junk)
Quantitative test of vocabularies *
3-10X number of categories
Use computer-assisted methods when more than 10-20 categories. Pre-existing metadata is the most meaningful.
* Quantitative methods require large amounts of tagged content. This requires specialists, or software, to do tagging. Results may be very different than how “real” users would categorize content.
77Taxonomy Strategies LLC The business of organized information
Tagging samples—Manually tagged metadata sample
Attribute Values
Title Jupiter’s Ring System
URL http://ringmaster.arc.nasa.gov/jupiter/
Description Overview of the Jupiter ring system. Many images, animations and references are included for both the scientist and the public.
Content Types Web Sites; Animations; Images; Reference Sources
Audiences Educators; Students
Organizations Ames Research Center
Missions & Projects Voyager; Galileo; Cassini; Hubble Space Telescope
Locations Jupiter
Business Functions Scientific and Technical Information
Disciplines Planetary and Lunar Science
Time Period 1979-1999
78Taxonomy Strategies LLC The business of organized information
Tagging samples—Spreadsheet for tagging 10’s-100’s of items
1) Clickable URLs for sample content
2) Review small sample and describe
3) Drop-down for tagging (including ‘Other’ entry for the unexpected
4) Flag questions
79Taxonomy Strategies LLC The business of organized information
Rough bulk tagging—Facet demo (1)
Collections: 4 content sources NTRS, SIRTF, Webb, Lessons Learned
Taxonomy Converted MultiTes format into RDF for Seamark
Metadata Converted from existing metadata on web pages, or Created using simple automatic classifier (string matching with
terms & synonyms) 250k items, ~12 metadata fields, 1.5 weeks effort
OOTB Seamark user interface, plus logo
80Taxonomy Strategies LLC The business of organized information
Rough bulk tagging— Facet demo (2)
81Taxonomy Strategies LLC The business of organized information
Document distribution—How evenly does it divide the content?
Documents do not distribute uniformly across categories
Zipf (1/x) distribution is expected behavior
80/20 rule in action (actually 70/20 rule)
Measured v Expected Distribution of Top 10 Content Types in Library of Congress Database
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
Congre
sses
Biogra
phy
Period
icals
Map
s
Fiction
Exhib
itions
Juve
nile l
itera
ture
Bibliog
raph
y
Statis
tics
Top 10 Content Types
Nu
mb
er o
f R
eco
rds
Leading candidate for splitting
Leading candidates for merging
82Taxonomy Strategies LLC The business of organized information
Document distribution— How evenly does it divide the content?
Methodology: 115 randomly selected URLs from corporate intranet search index were manually categorized. Inaccessible files and ‘junk’ were removed.
Results: Slightly more uniform than Zipf distribution. Above the curve is better than expected.
Measured v Expected Intranet Content Type Distribution
0
5
10
15
20
25
Peo
ple,
Gro
ups
& P
lace
s
New
s &
Eve
nts
Man
uals
&Le
arni
ngM
ater
ials
Ope
ratio
ns &
Inte
rnal
Com
mun
icat
ions
Mar
ketin
g &
Sal
es
Reg
ulat
ions
,P
olic
ies,
Pro
cedu
res
&T
empl
ates
Pap
ers
&P
rese
ntat
ions
Oth
er &
Unc
lass
ified
Pro
gram
s,P
ropo
sals
, P
lans
& S
ched
ules
Content Type
# D
ocu
men
ts
83Taxonomy Strategies LLC The business of organized information
Document distribution— How does taxonomy “shape” match that of content?
Background: Hierarchical taxonomies allow
comparison of “fit” between content and taxonomy areas
Methodology: 25,380 resources tagged with
taxonomy of 179 terms. (Avg. of 2 terms per resource)
Counts of terms and documents summed within taxonomy hierarchy
Results: Roughly Zipf distributed (top 20
terms: 79%; top 30 terms: 87%) Mismatches between term% and
document% flagged
Term Group%
Terms%
Docs
Administrators 7.8 15.8
Community Groups 2.8 1.8
Counselors 3.4 1.4
Federal Funds Recipients and Applicants
9.5 34.4
Librarians 2.8 1.1
News Media 0.6 3.1
Other 7.3 2.0
Parents and Families 2.8 6.0
Policymakers 4.5 11.5
Researchers 2.2 3.6
School Support Staff 2.2 0.2
Student Financial Aid Providers
1.7 0.7
Students 27.4 7.0
Teachers 25.1 11.4
Source: Courtesy Keith Stubbs, US. Dept. of Ed.
84Taxonomy Strategies LLC The business of organized information
Usability testing—How intuitive (repeatable) are the categorizations (1)?
Methodology: Closed Card Sort For alpha test of a grocery site 15 Testers put each of 71 best-selling product types into one of
10 pre-defined categories Categories where fewer than 14 of 15 testers put product into
same category were flagged
85Taxonomy Strategies LLC The business of organized information
Usability testing—How intuitive (repeatable) are the categorizations (2)?
86Taxonomy Strategies LLC The business of organized information
% of Testers Cumulative % of Products
15/15 54%
14/15 70%
13/15 77%
12/15 83%
11/15 85%
<11/15 100%
With Poly-Hierarchy
69%
83%
93%
100%
100%
100%
Usability testing—How intuitive (repeatable) are the categorizations?
87Taxonomy Strategies LLC The business of organized information
The #1 underused source of quantitative information on how to improve your
taxonomy?
Query Logs & Click Trails
88Taxonomy Strategies LLC The business of organized information
Query log & click trail examination—Who are the users & what are they looking for?
Only 30-40% of organizations regularly examine their logs*.
Sophisticated software available, but don’t wait. 80% of value comes from basic reports
89Taxonomy Strategies LLC The business of organized information
Query log & click trail examination— Query log
UltraSeek Reporting Top queries Queries with no results Queries with no click-through Most requested documents Query trend analysis Complete server usage
summary
90Taxonomy Strategies LLC The business of organized information
Query log & click trail examination—Click trail packages
iWebTrack NetTracker OptimalIQ SiteCatalyst Visitorville WebTrends
91Taxonomy Strategies LLC The business of organized information
Summary— Start a “Measure & Improve” mindset
Taxonomy changes do not stand alone Search system improvements Navigation improvements Content improvements Process improvements
92Taxonomy Strategies LLC The business of organized information
Benchmarking exercise
What are 5 representative questions that your users ask or tasks that your users do when using your application?
Is it currently easy, medium or difficult to answer these questions or accomplish these tasks?
Rating (Easy/ Medium/Difficult) Questions or Tasks
93Taxonomy Strategies LLC The business of organized information
Conclusion—What is a good taxonomy?
Incremental, extensible process that identifies and enables owners, and engages stakeholders.
Quick implementation that provides measurable results as quickly as possible.
A means to an end, and not the end in itself. Not perfect, but it does the job it is supposed to do—such
as improving search and navigation. Improved over time, and maintained.
94Taxonomy Strategies LLC The business of organized information
Today’s agenda
9:00-9:10 10 min Introduction
9:10-9:15 5 min Warm-up exercise
9:15-9:45 30 min Taxonomy fundamentals: Building taxonomies
9:45-10:00 15 min Taxonomy exercise
10:00-10:30 30 min Taxonomy fundamentals: Taxonomy business case
10:30-11:00 30 min Tea Break
11:00-12:00 60 min Taxonomy governance
12:00-12:30 30 min Capabilities self-assessment
12:30-13:30 60 min Lunch
13:30-14:30 60 min Taxonomy benchmarking
14:30-14:45 15 min Benchmarking exercise
14:45-15:15 30 min Tea Break
15:15-16:15 60 min Content tagging
16:15-16:30 15 min Tagging exercise
16:30-17:00 30 min Q&A
95Taxonomy Strategies LLC The business of organized information
Tagging Overview
Tagging is better than the words that happen to occur in a piece of content.
All tagging is useful End user tagging Tagging by librarians Automated tagging by OS and algorithms
Content should be tagged throughout its lifecycle, each time the content is handled and used so that it accrues value or its significance is diminished.
96Taxonomy Strategies LLC The business of organized information
MS Office: File Properties
How many people fill this in?
97Taxonomy Strategies LLC The business of organized information
Organize
How many people click on this?
98Taxonomy Strategies LLC The business of organized information
What is social tagging?
End user tagging Easy, intuitive tagging interfaces Almost instantaneous feedback
Enables people to tag & re-tag content … in response to seeing their tags in context with other tags.
Emergent categories Resembles open card sort process in which patterns emerge … rather than validating categories using closed card sorts.
99Taxonomy Strategies LLC The business of organized information
Social tagging innovators
flickr founders Caterina Fake Stewart Butterfield
del.icio.us founder Joshua Schachter
del.icio.us & flickr are now both part of Yahoo! As of April 2006 flickr had 130 million photos posted by 3
million registered users.
100Taxonomy Strategies LLC The business of organized information
Four tagging rules for end users
Rule Description
Use specific terms
Apply the most specific terms when tagging content. But do not tag every possible topic, just the ones that are most important or best characterize the content as a whole.
Use multiple terms
Use as many terms as necessary to describe overall What the content is about & Why it is important. Do not over-tag.
Use appropriate terms
Only fill-in the facets & values that make sense. Not all facets apply to all content.
Consider how content will be used
Anticipate how the content will be searched for in the future, & how to make it easy to find it. Remember that search engines can only operate on explicit information.
101Taxonomy Strategies LLC The business of organized information
Agenda
Content Tagging Tagging Interface
102Taxonomy Strategies LLC The business of organized information
Requirements for a tagging interface
Automated form fill-in (automatically fills in known data) Tagging precedents (see tags already assigned by
others) Controlled vocabularies, e.g., with pull-down list Multi-valued tags Geo-tagging Group tagging Clean-up tag tools, e.g., alpha list Batch editing Share/Don’t share (Public/Private) Identified owner (who can be emailed) Almost immediate feedback, e.g., tag cloud
103Taxonomy Strategies LLC The business of organized information
Form fill-in: Automatically filled-in known data
104Taxonomy Strategies LLC The business of organized information
Form fill-in: Automatically filled-in known data
Manual form fill-in w/ check boxes, pull-down lists, etc.
Auto keyword & summarization
105Taxonomy Strategies LLC The business of organized information
Form fill-in: Automatically filled-in known data
Auto-categorization
Parse & lookup (recognize names)
Rules & pattern matching
106Taxonomy Strategies LLC The business of organized information
Tagging precedents: See tags assigned by others
107Taxonomy Strategies LLC The business of organized information
Multi-valued group tagging
108Taxonomy Strategies LLC The business of organized information
Group geo-tagging
109Taxonomy Strategies LLC The business of organized information
Group geo-tagging
110Taxonomy Strategies LLC The business of organized information
Clean up tag tools: Alpha list
111Taxonomy Strategies LLC The business of organized information
Batch edit
112Taxonomy Strategies LLC The business of organized information
Share or don’t share tagging
113Taxonomy Strategies LLC The business of organized information
Bulk tagging
ID collection of related content items by pattern or context Then, apply same attributes to all content items
114Taxonomy Strategies LLC The business of organized information
Tag a folder
Drag & drop content items into folder Then, content items inherit properties of folder
115Taxonomy Strategies LLC The business of organized information
Workflow
Approve & improve mindset
Review & Improve
Review & Improve
Add Metadata
Create Content Publish
116Taxonomy Strategies LLC The business of organized information
Interactive rewards
Almost instantaneous exposure of tags in simple user interfaces on the web provides positive reinforcement for user tagging that simply did not exist before.
For example, Most popular Tag clouds Alerts
117Taxonomy Strategies LLC The business of organized information
Most popular
Another example is most emailed from, e.g., the NY Times.
118Taxonomy Strategies LLC The business of organized information
Tag cloud
119Taxonomy Strategies LLC The business of organized information
Alerts
New (content selected by date) Subscriptions (content selected by tags) Interest (content selected by other people) Individual (content selected for you by other people)
Strategies LLCTaxonomy
6-15 June 2007 Copyright 2007 Taxonomy Strategies LLC. All rights reserved.
Is faceted indexing the future of social tagging?
121Taxonomy Strategies LLC The business of organized information
Tagging exercise: Blog tagging (a)
ALA Tech Source. http://www.techsource.ala.org/blog/2007/04/google-buys-oclc-announces-new-products.html
122Taxonomy Strategies LLC The business of organized information
Tagging exercise: Blog tagging (b)
HBSP. http://discussionleader.hbsp.com/davenport/2007/04/cause_and_effect_reporting_raw.html#comments
123Taxonomy Strategies LLC The business of organized information
Tagging exercise: Taxonomy facets—definitions
Taxonomy Facets Descriptions
Business activityUse for common business function or activity such as finance, marketing and sales.
Industry / ProductUse for content that is about or related to an industrial sector or product such as construction equipment.
Geography Use for content that is about a region, country or city.
OrganizationUse for named organizations, brands and business entities.
Person / RoleUse for named people and the roles people have in organizations.
Content TypeUse for content genres such as letters, memos and reports.
Audience Use to indicate the intended audience.
TopicUse for other business and associated topics that the content is about or related to.
124Taxonomy Strategies LLC The business of organized information
Tagging exercise: Taxonomy facets—values
Geography Industry / Product People / RoleOrganization /
EntityContent TypeBusiness activity
Business LeadersThought LeadersPolitical LeadersRoles
Business entitiesCompanies & brands
Government agencies
InternationalNGOsOrganization types
Agriculture …MiningUtilitiesConstructionManufacturingWholesale tradeRetail tradeTransportation &
warehousingInformationFinance &
insuranceReal estateProfessionalManagementAdministrative
supportEducationHealth careArts, entertainment
& recreationAccommodation &
foodOther servicesPublic
administration
AfricaAmericasAntarctica
Asia
Europe
Oceania
Global
Historical
geographyOceans & seasRegions
Audience
AccountingAuditingFinanceHR managementITMarketingOperations
managementSales
ConsumerEmployeeManagerExecutive
Basic facts & information
BlogBrochureDatabaseE-mailLetterMemoMultimediaReportNewsletterPodcastPress ReleaseResearch & Analysis
RSS Feed
Taxonomy Facets Tags
Business activity
Industry / Product
Geography
Organization
Person / Role
Content Type
Audience
Topic
125Taxonomy Strategies LLC The business of organized information
Summary
There are lessons to be learned from web tagging about how to get good metadata in document and content management applications.
Document and content management system tagging must be simple, and it must be almost instantaneously easier to find relevant work products.
Strategies LLCTaxonomy
6-15 June 2007 Copyright 2007 Taxonomy Strategies LLC. All rights reserved.
Questions?
Joseph A. Busch
+ 415-377-7912
http://www.taxonomystrategies.com