Taxonomies and Search for Chicago SharePoint User Group
-
Upload
earley-amp-associatesinc -
Category
Technology
-
view
4.972 -
download
5
Transcript of Taxonomies and Search for Chicago SharePoint User Group
Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomies, Metadata and Search
Seth [email protected]
2Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Starting March 3rd, 2011 (Recordings will be available)
Register at:
www.earley.com/webinars/jumpstarts/sharepoint-2010-architecting-business-value
SharePoint Call Series Architecting for Business Value
3Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
• Session 1 - SharePoint 2010 – Best Practices for Creating Business Value March 3rd, 12:00- 1:00 pm
• Session 2: Methods and Tools for Better SharePoint Search March 10th, 12:00- 1:00 pm
• Session 3: Practical Approaches to Developing Rich Information Architectures March 17th, 12:00- 1:00 pm
• Session 4: The Role of Governance in Ensuring Success March 24th, 12:00- 1:00 pm
Jumpstart Series – Architecting SharePoint for Business Value
4Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Earley & Associates Highlights
Founded 1994
Focus Areas Holistic approach to specific business contexts and goals for:
• Retail
• Manufacturing
• Pharmaceuticals & Life Sciences
• Public Sector
• Media & Entertainment
Personnel Core team of 30 consultants
Locations Stow, MA headquarters, consultants in US, UK & Canada, global projects
Services • Taxonomy & Information Architecture
• Search Strategy for Enterprise & Web
• ECM, DAM & Information Lifecycle
• Program Management & Governance
5Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
• Co-author of Practical Knowledge Management from IBM Press
• 17 years experience building content and knowledge management systems, 20+ years experience in technology
• Former Co-Chair, Academy of Motion Picture Arts and Sciences,
Science and Technology Council Metadata Project Committee
• Founder of the Boston Knowledge Management Forum
• Former adjunct professor at Northeastern University
• Guest speaker for US Strategic Command briefing on knowledge networks
• Currently working with enterprises to develop knowledge and digital asset management systems, taxonomy and metadata governance strategies
• Founder of Taxonomy Community of Practice – host monthly conference calls of case studies on taxonomy derivation and application. http://finance.groups.yahoo.com/group/TaxoCoP 100+ calls since 2005
• Co-founder Search Community of Practice:
http://tech.groups.yahoo.com/group/SearchCoP
Seth Earley, Founder & President, Earley & Associates
6Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Session Objective
From Session Abstract
• High level review of basic concepts related to taxonomy, metadata and search
• How are taxonomies integrated with metadata management and standards and
• The relationship between taxonomy and information architecture
• How taxonomy, metadata and IA relate to SharePoint
• Options for creating good information architectures within 2010.
• How to leverage taxonomy and metadata to improve navigation and search in your SharePoint portal.
• Techniques for implementation using native SharePoint functionality.
7Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Agenda
• Change is constant
• Taxonomy definition
• Information and semantic architecture
• The challenge of search
• Five basic truths about search
• The role of metadata
• Taxonomy and navigation
• Case Study
• Conclusion
8Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Change is constant
• Snap shot versus movie• Business changes faster than IT can• Systems grow up to solve specific problems without a view toward
integration• Integrated environments
• Solution to application proliferation…?
9Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Library
Web site
Healthday
Same Term, Different Expressions…
Cardiology
Cardiac Care
Heart Health
Problems:
• Difficulty finding relevant information
• Federated search configuration is cumbersome
• Inability to view consolidated results
• Limited ability to control shared vocabularies
• Weak governance or demonstrated control
• Costly/cumbersome administrative overhead
10Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomy is an enabler…
• Every organization is struggling with findability
• Content management applications, search tools, workflow applications, customer relationship management systems, etc all strive to create views of information that are in the context of work processes
What is the key component to any of these initiatives?
Having a common language in which to:• Describe• Communicate• Translate
information between applications and between user audiences
11Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Information architecture versus Semantic architecture
• Information architecture describes the ways in which systems capture, manage, organize and present information Metadata fields describe information about a document or piece of
content. Identifiers of various kinds: Name, account number, part id, price, etc Conditions or status of the content: Workflow approval state, Date
created, review date, etc
• Semantic architecture is about meaning and nuance Terms can have multiple contexts and meanings. People use different terms to describe the same thing
12Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
A single concept can have different Expressions
Person we do business with• Cust_Name• Cust_ID• Customer ID• Customer• Client
Person who writes a document• Contributor• Author• Creator
What we buy or sell a product for:• Price • Cost
Pitch • the property of sound• the throwing of a baseball• a vendor's position (especially on the
sidewalk) • sales talk• degree of deviation from a horizontal
plane• dark heavy viscid substance• a high approach shot in golf • a card game• abrupt up-and-down motion • the action of throwing something• …
A single expression can represent different Concepts
Info Architecture Semantic Architecture
Source: Fred Leise
13Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomy definition
• Taxonomy is a system for organizing concepts and categorizing content Expresses hierarchical
relationships (parent/child) Arranged in a tree-like
structure, with top level categories that branch out to reveal sub-categories and terms in varying levels of depth
Dictionary of preferred terminology
Products
Games
Card games
Action figures
Board games
Brands
Milton Bradley
Scrabble
Disney
Battleship
14Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomy definition
• Taxonomy: system for organizing concepts and categorizing content• Expresses hierarchical relationships (parent/child)
• Expresses other relationships
Sample taxonomy record
Car SYN: Automobile Vehicle
fr-CA: Voiture en-UK: Auto es-CO: Carro
Synonyms
Translationsand regionalvariants
Preferred term
15Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomy is a foundation…
• It is a system for classification
• It allows for a means to organize documents and web content
• Helps us fine tune search tools and mechanisms
• Creates a common language for sharing concepts
• Allows for a coherent approach to integrate information sources
• It is a common language for business processes
16Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomy as a common business language
Case Example:Motorola’s Global
Taxonomy FrameworkServed Multiple Processes
Case Example:Motorola’s Global
Taxonomy FrameworkServed Multiple Processes
Browsing & filtering
Compare product
Related documents
Financial reporting
Business intelligence Program Management
Product Lifecycle Management
17Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Enterprise taxonomy drivers
Application Primary driver
“Clock speed”
Constituencies Technology challenges
Web Content Consistency in branding, internal efficiencies
Medium to fast
Web developers, content managers, content creators
Exposing taxonomy to CMS, integration with search
Enterprise data standards
Cross platform integration, business intelligence, metadata modeling, data warehousing
Very slow to slow
Data architects, standards boards, data modelers, business intelligence
“Source of truth”, difficulty integrating metadata standards
E Commerce Web site sales. Need to support customer experience
Very fast Merchandisers, e commerce development team, marketing
Commerce platforms do not necessarily leverage capabilities. Updates to classification are not a priority
Product development
Product development efficiencies, speed to market
Fast Engineering, Product development, product marketers
Product life cycle management systems usually self contained
Intranet development
Internal efficiencies
Slow to Medium
Intranet managers, functional managers
Difficulty unifying access to multiple repositories, sheer volume of sources
Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
The Challenge of Search
Five basics truths about search
19Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Search as Utility
• “search as a utility has become deeply ingrained into people's everyday lives.“ – Study by Nielsen/Net Ratings
• “search software, hardware, and support bundle or search appliance has become very popular since being introduced in early 2002" – Goebel Group
These are misleading concepts. Search is used as a utility, but contexts vary so widely that “plugging search in” does not always produce satisfactory results.
20Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Truth #1.
We have to change our definition of search.
• Search is no longer just a white box.
• Search is an experience.
• Search is about information access & capabilities.
21Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Truth #2.
Search algorithms are getting better, but they cannot infer human
context & intent.
• A search engine doesn’t know if I’m an engineer, an attorney, or a high school student.
• Perspective has an impact on whether a set of search results are useful & appropriate.
22Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Truth #3.
Taxonomy, metadata and information architecture are key aspects of
search.
• Search is fundamentally about metaata
• Some content is structured, some isn’t and needs help
• Advanced search functionalities require taxonomy
23Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Truth #4.
Search is increasingly looking like navigation.
• What happens when you click on a link?
• Guided navigation & faceted search are really the same thing
24Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Truth #5.
Search is messy.
• Knowledge is messy, information is messy.
• People find answers through haphazard and chaotic processes.
25Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
“…search terms are short, ambiguous and an approximation of the searcher’s real information need…”
Source: http://research.microsoft.com/~ryenw/papers/WhiteCONTEXT2002.pdf Ryen W. White, Joemon M. Jose and Ian Ruthven
26Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Rising Expectations Plus Increased Complexity
• Search seems to be a ‘given’ – we expect it to be there
• Most enterprise search is less than optimal – too many results, irrelevant results, missing results
• It was not so long ago that organizations were starved for information
• A puzzling fact: as information environments have grown more complex, users expectations have grown that search should be simpler
27Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Search is complex
Enterprise search is diverse – need to access multiple applications and contexts – both structured and unstructured
Business Intelligence/Analytics
Customer Relationship Mgt
Document repositories
Custom databases and applications
Intranets/web pages
28Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Search is Heterogeneous
Search/Tagging/Taxonomy Integration Framework
Data Sources
Search Mechanisms
Appliances Federated Search
Auto categorization/Clustering
Entity Extraction
Faceted Search
Semantic Search
Business Intelligence
Customer Relationship Mgt
Document repositories
Custom databases and applications
Intranets/web pages
29Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
What is the right mechanism for accessing information?
• Content can be created in structured or unstructured contexts
• It’s value can vary depending on audience, context or process
• Some content is extremely nuanced and requires more precise access (according to audience or task, solution, etc…)
• Search can be based on inherent structure and content of a document (implicit metadata) or on information applied to that content (explicit metadata)
30Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
More Structured
Instant Messages
Wikki’s
Blogs
Discussions
Collaborative Workspaces
Online Learning
Instructor Led Courses
Content Mgt
Workflow systems
Doc Mgt Systems
Records Mgt Systems
Knowledge Creation Knowledge Access/Reuse
Chaotic Processes Controlled Processes
Different tools are appropriate depending upon degree of collaboration and creation versus structured access
Less Structured
Emergent Value
31Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Lower Cost Higher Cost
Message text
External News Example deliverables
Discussion postings
Interim deliverables
Content Repositories
Success Stories
Benchmarks
Approved Methods
Best Practices
Unfiltered Reviewed/Vetted/Approved
Lower Value Higher Value
Relative value
Formal Tagging/Organizing Processes
(More difficult to access) (Easier to access)
Social tagging (“folksonomy”)
Structured tagging (taxonomy)
Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
The Role of Metadata
Metadata drives content processes
Taxonomies provide the organizing principles behind metadata
33Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
What is metadata?
• It is the “is –ness” of a piece of content
• And the “about- ness” of a piece of content
• This is a Product Description
• It is about the Motorola Android
Taxonomies are the organizing principle behind metadata and the values that populate
metadata fields`
Taxonomies are the organizing principle behind metadata and the values that populate
metadata fields`
34Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
What is a content model?
• Content is structured with body information and a wrapper that formats and tags that information
• Also called a “content object model”*
Title
DescriptionSimple content object modelSimple content object model
*Content model refers to overall frameworkContent object model refers to a specific model for a set of document types
I.e., an overall “Content Model” includes multiple Content Object Models”
35Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Metadata for a product page in a content
management system
Title
DateAuthor
Features
Product_Name
Category
Doc_IDDoc_Type
“is – ness”“is – ness”
“about – ness”“about – ness”
FAQ
Product
Press release
Specification
Promotion
36Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Meta data allows for various views of content
• Web pages are made up of assembled items of content
• These are comprised of metadata elements that are assembled together into “content types”
Title
Comp_Features
DateAuthor
Features
Product_Name
Category
Promotion_ID
Promo_Type
Related_Products
Doc_ID
Content_ID
Date
Content_ID
Date
Content_ID
Date
Product content typeProduct content type
Promotion content typePromotion content type
Standard HeaderStandard Header
Related Products content type
Related Products content type
37Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
The User Experience (UX) is at the intersection of taxonomies, metadata and content objects
• Taxonomy: system for organizing and classifying content• Metadata: information about our content, housekeeping, as well as semantic
and structural information• Content Objects: groups of metadata that are assembled into components
that are then assembled into pages or documents
How will taxonomy surface on the front-facing application?
What do the wireframes suggest?
How do people interact with it?
How does the content architecture deliver the front-end design?
38Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomy and the User Experience
• Define what the user interface will eventually look like
• Identify how content is laid out on the page
• Faceted Search:
Taxonomy Facets
Taxonomy Facets
Document Preview
Document Preview
Best BetsBest Bets
SynonymsSynonymsMisspellings
Results
39Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomy in a content management application
1. Filtering products / search results
2. Dynamic relationships
3. Tagging & categorization of content
4. Dynamic navigation
5. Feature consistency / compare product
3
1
5
24
4
40Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
When is it metadata and when is it taxonomy?
• Taxonomy can be applied as metadata• Typically this is expressed as a drop down “controlled vocabulary”
list (also called “reference data”)• Some controlled vocabularies are very simple, with a few
unambiguous choices• Some are specific to a particular system or tool and will not
change frequently• There is a tendency to lump all metadata into a technology
bucket and assume this is owned and managed by IT• Not a good approach (since we need business ownership and
participation)
41Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Who owns the taxonomy? A question of governance
• Metadata Management (IT or application owner) Unambiguous Limited number of values Not frequently changing Housekeeping or administration role Specific to an application
• Taxonomy Management (business or functional owner) Ambiguous meaning Subject to frequent changes or updates Common across multiple applications or contexts Requires specific knowledge of field (subject matter expertise)
Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Metadata and Search
All search leverages metadata
Explicit versus implicit metadata
43Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
All search leverages metadata…
…but not all metadata is explicit
• Full text search derives metadata about documents
• Creates an index of terms that occur in a document collection
• Associates documents with those index entries
44Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Explicit metadata versus implicit metadata
DEF Company
Support
ABC Company
ABC shall provide first level technical support to all Licensed Product end users and/or Sublicensed Product customers/users. DEF will provide second level support. DEF shall provide to ABC a primary and a secondary support person to act as the primary interface with ABC’s technical and customer support team. DEF shall provide direct technical support to ABC for all uses of the DEF Software. Support level definitions and responsibilities are set forth in Exhibit C. An “SLA Failure” as defined in Exhibit C shall qualify as a Release Condition sufficient to authorize the Escrow Agent to release to Source Code to ABC pursuant to Section 7 and the Escrow Agreement.
LicenseContent Type =
Organization =
ABCcustomerscustomer supportcustomer support teamDEFDEF softwareend usersescrow agreement.escrow agentexhibit clicensed product
release conditionsection 7secondary supportSLASLA failuresoftwaresource codesupport levelsublicensed producttechnical support
Topic =Forward Index – Words per documentInverted Index – Documents per word
Explicit metadata
Used to derive implicit metadata
45Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Search index points to document
12
3
4
Forward Index – Words per documentInverted Index – Documents per word
A search index becomes derived metadata about a collection of documents
Term Document
Acme 1, 2, 3, 4
customers 2, 3
escrow 3, 4
exhibit c 2
license 1, 4
…etc …etc
In which documents do these words occur?
46Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
…but not all metadata is explicit
• Full text search derives metadata about documents
• Creates an index of terms that occur in a document collection
• Associates documents with those index entries
• Occurrence of certain words in a document and the relative value of those occurrences, including: Weighting Relative positioning Semantic relationships…
…becomes information about the document that is cached in the index and served by the search engine
• Search algorithms vary in how metadata is derived and exposed to users.
All Search Leverages Metadata
Relevance ranking, for example, is additional metadata for a result that is ‘implied’ or derived based on incoming connections to a piece of content.
Relevance ranking, for example, is additional metadata for a result that is ‘implied’ or derived based on incoming connections to a piece of content.
47Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Examples of implicit metadata:
• ‘Structure’ and format of content – a piece of content may be ‘unstructured’ and not contain metadata, but it is well organized. Example : Newspaper story contains a headline, sub head, and first paragraph with who,
what, where, when, etc. Clear editorial standards
• Context of content – Where did the content come from? If from a particular web site, file share, data source or intranet location the domain of knowledge provides context. How can we disambiguate the term “diamond”?
Sports site – baseball diamond Commerce site – diamond ring
Sales context for ‘feature’ versus engineering context for ‘feature’ “Adapter” – power cord “Adapter” – blue tooth headset
48Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Context as metadata
• If we maintain context of a piece of information in our search results, this is equivalent to having additional metadata on that content
Search results organized by repository
This is a form of “federated” search – a single search term fed to multiple repositories
Example courtesy of Morrison and Foerster
49Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
“We should get Google”…
50Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Why you will not “just get Google”
• Google leverages linkages on the web that are not typically duplicated internally in the organization
• Search engines cannot infer intent or know what is important to you in the context of your work task
• Information relevance is dependant on who you are and your level of expertise as well as what you are trying to accomplish
• Not all content is equal - Google is fine for broad search results or less precise information, may not work as well if large numbers of documents with finer granularity of differences
51Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Why doesn’t Google, just use Google?
52Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Why you will not “just get Google”
Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
More Definitions: Taxonomy, Ontology, Thesaurus…
54Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
“Sound bite” definitions
• A Taxonomy is a list of terms that enable classification of information Method used to organize Subject/Topic metadata Typically expresses hierarchical relationships (parent/child) Emphasizes context
• A Thesaurus is a specialized taxonomy Equivalence relationships (synonyms) Associative relationships (related terms – “see also”) Preferred terms, variant terms
• An Ontology is a collection of taxonomies and thesauri A body of knowledge is represented by multiple lists of categories Categories of various types are conceptually related
55Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Definitions
• Classification Scheme - A preordained structure of words or symbols used to organize information content
• Index - A list organized in a standardized sequential fashion
Types of indexes may include: back-of-the-book, telephone directory, computerized look-up tables (e.g. b-tree, file system), card catalog, meeting roster of attendees, customer list, to name a few.
An index is a classification scheme
A taxonomy is a classification scheme
But… a classification scheme is not necessarily a taxonomy…
56Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Classification versus Taxonomy
TAX
Assets
Individuals
Corporations
Liabilities
Individuals
Corporations
TAX ITEMS
Assets
Real Estate
Vehicles
Liabilities
Loans
Debts
TAX PAYERS
Individuals
Single
Married
Organizations
Corporations
Associations
57Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Types of Term Relationships
Used in thesauri.
Also called “entry types” of terms.
Synonyms.
Things that are related conceptually.
Associative relation types are context and audience specific.
This is how we might relate multiple taxonomies.
Purist definition of a taxonomy – terms have parent/child relationship.
Equivalence Hierarchical Associative
Increasing complexity
58Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Relationship Types
Relationship Examples
E E
? A
? ?H
H
A
E Equivalence
H Hierarchical
A Associative
Computer Manufacturers
International Business Machines
IBM
Software Group
Big Blue
Hardware Software
59Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Equivalence Terms Associative Terms
• Common misspellings
• Other terms used
• Abbreviations
• Internal names
• See also
• Related products
• Language spoken
• Products for market
• Available in region
• Risks in region
Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
The Role of Taxonomy
61Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Goals of a taxonomy
• Allow for knowledge discovery
• Improve usability of applications as well as learnability of applications
• Reduce the cost of delivering services, developing products and conducting operations
• Improve operational efficiencies by allowing for reuse of information rather than recreation
• Improve search results and applicability (both precision and recall)
62Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomy Challenges
• Taxonomy means many things in SharePoint Site organization Content types Controlled vocabularies for tagging documents
• Challenges Typically integration of legacy content requires significant tagging effort Users wanted to leverage hierarchy in search in the form of faceted navigation
63Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomy Solutions
• Taxonomy Technology Leveraging Hierarchy and Taxonomy in both tagging and faceted search True taxonomy management is beyond the scope of SharePoint 2010
• Taxonomy in Context Auto-populate metadata fields with taxonomy values based on the overall
architecture of the site and users roles Reduce the burden on users allow Locations, Departments, Roles to be filled in
automatically
64Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Recall versus Precision
• The goal of effective search is to pull back lots of relevant results
• This is measured by “recall” and “precision”
• Recall: I am getting the documents that contain my term
• Precision: These results are relevant to me
When trying to improve recall, precision can suffer and vice versa
Precision can also be subjective – based on who we are and what we are doing, in other words, context and task
When trying to improve recall, precision can suffer and vice versa
Precision can also be subjective – based on who we are and what we are doing, in other words, context and task
65Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Precision
Recall
Relevant items in a database
Items retrieved
Irrelevant items
Relevant items retrieved
Relevant items not retrieved
A
B
C
Ratio of number of relevant items retrieved to total number of relevant items in database
AA B+
Ratio of number of relevant items retrieved to total number of irrelevant and relevant items retrieved
AA C+
X 100 %
X 100 %
Goal is to improve recall without sacrificing precision
66Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomy & search strategies
Six strategies you should know about
Tuned search Relevance ranking Faceted search Related terms Clustering Disambiguation
67Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomy and Search Strategies
• Pre Search processing Search engine applies taxonomy or thesaurus to narrow or expand search before
retrieving results
Tuned search “Best Bets” Relevance ranking Faceted search
• Post Search Processing Search results are narrowed or organized after they are retrieved
Related terms Clustering Disambiguation
68Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Applying a taxonomy to search
We need a mechanism to improve search
• A Taxonomy can be used to Define search terms and map those terms to specific locations of
information (need to integrate with a search engine)
Apply terms to a document so that relevant and consistent search results are returned (need to integrate with a content management system)
• A Thesaurus can be used to define term synonyms and related terms in order to improve the recall of information. We may define “proposal” and “statement of work” and “SOW” as
meaning the same thing. If I enter SOW, I can pull back documents that are labeled with (or contain) the other terms. This is referred to as “term expansion”
Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Tuned Search, or “Best Bets”
70Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Tuned Search
What is Tuned Search?
• Search terms are defined in a taxonomy and mapped back to specific locations of information (ie. Specific web pages).
• Eg. A user searching on a broad term like cell phones would be first pointed to a landing page (a “best bet”), or presented a box of hand-picked links above regular search results.
71Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Best Bets Example – Best Buy
72Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Tuned Search “Best Bets”
• The same search using just keyword matching could a have retrieved a list of pages with the words “phone” or “cell” e.g.
Home phones Cordless phones 12 cell batteries Etc.
• Reading through pages of possible matches is time consuming and frustrating
73Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Tuned Search “Best Bets”
How Does a Taxonomy Help?
• Using the taxonomy categories as landing pages assures that users are strategically directed to the content that is most important.
• Best bets are done in conjunction with a taxonomy/thesaurus, not just a list of search terms… Eg. Circuit City
74Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Circuit City Example
• Search on “Cell phone”:
75Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Circuit City Example
• Search on “Mobile phone”:
76Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Circuit City Example
• What do these things have to do with mobile phones?
Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Relevance Ranking
78Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Relevance ranking boost
• Can assign more weight to specific metadata fields in the engine’s ranking algorithms
• If search term matches metadata field, higher relative weight than full text hit and boosted rank
• E.g. Best Buy boosts taxonomy category
• E.g. Motorola could boost the product category
content index
Metadata field Relative Weighting: 45
Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Leveraging taxonomy terms as metadata
Faceted search
80Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Leverage the taxonomy terms as metadata - faceted search
What is Faceted Search?
• Attribute based search (guided navigation) approach to create precise, targeted search results. Each parameter narrows the search result to the most appropriate content. Also commonly referred to as “advanced searching” or “parametric
searching”
• Users think they are browsing, but they are actually searching
• Allows for multiple navigation schemes based on taxonomy
81Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Navigational taxonomy
Taxonomy can be a hierarchical grouping of navigational nodes on a web site
Motorola.com
Mobile phonesModems & gateways
2-way radios
Unlocked GSM
With service Accessories
Batteries Headsets
Bluetooth headsets
Challenge is there is no “one way” to navigate that is correct.
Is this the “correct” way?
82Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Navigational taxonomy
Or is this one “correct”? Or is this one?
Motorola.com
Mobile phonesModems & gateways
2-way radios
Camera phones
Bluetooth phones
Bluetooth accessories
Sunglasses Headsets
Motorola.com
Mobile phonesModems & gateways
2-way radios
Unlocked GSM
With serviceBluetooth
accessories
Sunglasses Headsets
83Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Motorola.com => United States => Government => Portable Radios
Motorola.com => Portable Radios => United States => Government
Motorola.com => Government => Portable Radios => United States
Motorola.com
CanadaUnited
KingdomUnited States
Enterprise Government
Portable radios
Mobile computers
Consumers
Motorola.com
Mobile computers
Mobile radiosPortable radios
United StatesCanada
Government
United Kingdom
Enterprise Consumer
Motorola.com
Government Enterprise Consumers
Mobile computers
Portable radios
United Kingdom
Canada United States
84Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Navigating with “facets”
• Two way radios Portable Fixed Mobile Motorcycle
• Vertical market Government Manufacturing Wholesale retail
• Country Canada United Kingdom United States
Vertical market
Target document: P = Portable radioG = United StatesV = Government
Product type
Geographic region
“Facet” is a top level category in the taxonomy
Just three nodes with 5 terms each could have 3 to the 5th power (243) possible combinations
85Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Is it search? Or navigation?
Good example of faceted search using hierarchy
Good example of faceted search using hierarchy
86Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Faceted search – PC Connection
Each parameter narrows the search result to the most appropriate content.
87Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomy and Search
• Post Search Processing- Search results are narrowed after they are retrieved
Related terms
Clustering
Disambiguation
88Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Related Terms
• Leverages associative relationships in a taxonomy
89Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Clustering
• Adds context to large result sets
• Clusters are similar to facets but based on derived attributes
• Derived attributes based on concepts contained in result set mapped to taxonomy
90Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Clustering Example
90
91Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Clustering
How do I implement Clustering?
• Build out your taxonomy, then extract entities from content and categorize based on derived metadata (facets)
92Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Categorizing content
Statistical/linguistic
Rules-based
These documents look similar due to an analysis of word patterns – lets put them into the same group
These documents look similar based on some rule that have created (they contain marketing plans and are about the newest widget) lets put them into the same group
93Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Clustering based on Taxonomy Slice from a set of Search Results:
TaxonomyPathTreeBuilder
Taxonomy“Slice”
TaggedDocumentsTagged
DocumentsTaggedDocumentsTagged
DocumentsTaggedDocuments
Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Disambiguation
95Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Disambiguation of search results
What is Disambiguation?
• If a user enters a broad term (like “mobile”) the taxonomy can return terms that help the user select a more precise terms
• Includes multiple approaches: Term expansion
Complex lookups
96Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Disambiguation methods
• Show related search terms with check boxes in the search results page.
• Show additional search terms as links, perhaps with a prompt - "You might also be interested in:"
• Expand the query and show the expanded words in the search box
• Expand the query invisibly
97Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Disambiguation of search results
mobile Mobile data terminalsHandheld computers
Network InfrastructureMobile switches
PhonesFixed mobile car phonesMobile phones
Software applicationsMobile applications
Two way radiosMobile radios
Intelligent video solutionsMobile video enforcerMobile video sharing
MESH SolutionsMulti-radio mobile broadband
Mobile ComputingMobile application
Presenting term in multiple contexts
Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
From Associative Relationships
99Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Disambiguation of search results
How Do I Implement Disambiguation Methods?
• Need to integrate thesaurus with search engine• Can be accomplished through custom frameworks, web
services, API calls• Thesaurus values can live inside of search engine, in taxonomy
management tool, in spreadsheets or databases or in public sources
100Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Disambiguation
• Query: Did Enron executives illegally sell Enron stock?
Source: CognitionSearch.com
Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Taxonomy and Navigation*
*Taxonomy is not the same as navigation
102Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
103Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Applying a taxonomy to navigation
We need to improve navigation for our site
• A Taxonomy can be used to Inform navigation (though it is not the same as navigation) Define metadata and the information architecture of the site.
104Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Navigation – Sales Node
Sales ToolsAnalyst Reports
………
Case Studies…
Competition…
Customer ReferencesFAQ’sPricing & LicensingWhite PapersPresentations
105Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Navigation – Sales Node
Sales ToolsAnalyst Reports
………
Case Studies…
Competition…
Customer ReferencesFAQ’sPricing & LicensingWhite PapersPresentations
Doc Types• Analyst Reports• Assessment• Benchmarks• Best Practice• Brochures• Campaign• Case studies• Competition• Configuration Guide• Contracts• Customer References• Data sheet• Event• FAQ• Guides• License Agreements• Migration• Presentations• Press Releases• Price Lists• Quick Reference Guide• White papers
106Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Why you will not just “use a folksonomy”
• All content is not equal
• Higher value content requires more rigor
• Social tagging is still immature
• May be appropriate for some kinds of content
• On systems open to large user groups, esoteric tags which are understood by a only minority of users tend to proliferate burdens users decreases system efficiency
• Core to folksonomies are the flaws that formal classification systems are designed to eliminate, such as redundancy, misspelling, etc.
• Taxonomists/ontologists argue that an agreed-to set of tags enables more efficient indexing and searching of content
earley
earley & associates
earley & associates inc
earley & associates needham, massachusets
earley & associates taxonomy
earley & associates, inc
earley & associates, inc.
earley & earley associates
earley and associates
earley and associates inc
earley and associates seth
earley and associates taxonomy
earley assoc
earley associates
earley associates address
earley associates boston
earley associates wordmap
earley financial
earley jumpstart
earley taxonomy
earley taxonomy & metadata jumpstart call: managing structured metadata and taxonomies
earley.com
early & associates
early and associates
taxanomic classification of the freycinetia
taxonimic classification of humans
taxonomic and dichotomus
taxonomic classification
taxonomic classification human
taxonomic genus of king cobra
taxonomic implementation
taxonomies of knowledge
taxonomies project roadmap
taxonomist job description
taxonomy metadata
taxonomy & metadata jumpstart - 2007
taxonomy and false drops
taxonomy and classifiation examples of animals
taxonomy and metadata
taxonomy and metadata jumpstart
taxonomy c
taxonomy classification
taxonomy classification charts
taxonomy community of practice
taxonomy consulting
taxonomy creation
taxonomy creation management
taxonomy defined
taxonomy deployment
taxonomy development process
taxonomy implementation
taxonomy iqpc
taxonomy job description
taxonomy maintenance
taxonomy management
taxonomy management job title
taxonomy management tools
taxonomy metadata
taxonomy models for project management
taxonomy of global executives
taxonomy of man
taxonomy search
taxonomy seth early
taxonomy structure business organisation
taxonomy training
taxonomy validation
taxonomy(2007)
taxonomy, mlis
taxonomy/classification.online
108Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Conclusions
• Search engines, no matter how sophisticated, do not obviate the need for taxonomies
• Content value in the context of a work process will determine the level of required structure
• There is no “one size fits all”
• Taxonomy, content strategy and search all work together to improve the findability of content.
• Google doesn’t always get it right…
109Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Earley & Associates: #1 on Google for Silver Mining Tools
110Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.
Questions?
Jeff CarrSenior Information Architect & Search [email protected]
Seth [email protected] Follow me on twitter: sethearleyConnect with me on LinkedIn: www.linkedin.com/in/sethearley