Taxonomies and Search for Chicago SharePoint User Group

Post on 18-May-2015

4.972 views 5 download

Tags:

Transcript of Taxonomies and Search for Chicago SharePoint User Group

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomies, Metadata and Search

Seth Earley781-4820-8080Seth@earley.com

2Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Starting March 3rd, 2011 (Recordings will be available)

Register at:

www.earley.com/webinars/jumpstarts/sharepoint-2010-architecting-business-value

SharePoint Call Series Architecting for Business Value

3Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

• Session 1 - SharePoint 2010 – Best Practices for Creating Business Value March 3rd, 12:00- 1:00 pm

• Session 2: Methods and Tools for Better SharePoint Search March 10th, 12:00- 1:00 pm

• Session 3: Practical Approaches to Developing Rich Information Architectures March 17th, 12:00- 1:00 pm

• Session 4: The Role of Governance in Ensuring Success March 24th, 12:00- 1:00 pm

Jumpstart Series – Architecting SharePoint for Business Value

4Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Earley & Associates Highlights

Founded 1994

Focus Areas Holistic approach to specific business contexts and goals for:

• Retail

• Manufacturing

• Pharmaceuticals & Life Sciences

• Public Sector

• Media & Entertainment

Personnel Core team of 30 consultants

Locations Stow, MA headquarters, consultants in US, UK & Canada, global projects

Services • Taxonomy & Information Architecture

• Search Strategy for Enterprise & Web

• ECM, DAM & Information Lifecycle

• Program Management & Governance

5Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

• Co-author of Practical Knowledge Management from IBM Press

• 17 years experience building content and knowledge management systems, 20+ years experience in technology

• Former Co-Chair, Academy of Motion Picture Arts and Sciences,

Science and Technology Council Metadata Project Committee

• Founder of the Boston Knowledge Management Forum

• Former adjunct professor at Northeastern University

• Guest speaker for US Strategic Command briefing on knowledge networks

• Currently working with enterprises to develop knowledge and digital asset management systems, taxonomy and metadata governance strategies

• Founder of Taxonomy Community of Practice – host monthly conference calls of case studies on taxonomy derivation and application. http://finance.groups.yahoo.com/group/TaxoCoP 100+ calls since 2005

• Co-founder Search Community of Practice:

http://tech.groups.yahoo.com/group/SearchCoP

Seth Earley, Founder & President, Earley & Associates

6Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Session Objective

From Session Abstract

• High level review of basic concepts related to taxonomy, metadata and search

• How are taxonomies integrated with metadata management and standards and

• The relationship between taxonomy and information architecture

• How taxonomy, metadata and IA relate to SharePoint

• Options for creating good information architectures within 2010.

• How to leverage taxonomy and metadata to improve navigation and search in your SharePoint portal.

• Techniques for implementation using native SharePoint functionality.

7Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Agenda

• Change is constant

• Taxonomy definition

• Information and semantic architecture

• The challenge of search

• Five basic truths about search

• The role of metadata

• Taxonomy and navigation

• Case Study

• Conclusion

8Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Change is constant

• Snap shot versus movie• Business changes faster than IT can• Systems grow up to solve specific problems without a view toward

integration• Integrated environments

• Solution to application proliferation…?

9Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Library

Web site

Healthday

Same Term, Different Expressions…

Cardiology

Cardiac Care

Heart Health

Problems:

• Difficulty finding relevant information

• Federated search configuration is cumbersome

• Inability to view consolidated results

• Limited ability to control shared vocabularies

• Weak governance or demonstrated control

• Costly/cumbersome administrative overhead

10Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy is an enabler…

• Every organization is struggling with findability

• Content management applications, search tools, workflow applications, customer relationship management systems, etc all strive to create views of information that are in the context of work processes

What is the key component to any of these initiatives?

Having a common language in which to:• Describe• Communicate• Translate

information between applications and between user audiences

11Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Information architecture versus Semantic architecture

• Information architecture describes the ways in which systems capture, manage, organize and present information Metadata fields describe information about a document or piece of

content. Identifiers of various kinds: Name, account number, part id, price, etc Conditions or status of the content: Workflow approval state, Date

created, review date, etc

• Semantic architecture is about meaning and nuance Terms can have multiple contexts and meanings. People use different terms to describe the same thing

12Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

A single concept can have different Expressions

Person we do business with• Cust_Name• Cust_ID• Customer ID• Customer• Client

Person who writes a document• Contributor• Author• Creator

What we buy or sell a product for:• Price • Cost

Pitch • the property of sound• the throwing of a baseball• a vendor's position (especially on the

sidewalk) • sales talk• degree of deviation from a horizontal

plane• dark heavy viscid substance• a high approach shot in golf • a card game• abrupt up-and-down motion • the action of throwing something• …

A single expression can represent different Concepts

Info Architecture Semantic Architecture

Source: Fred Leise

13Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy definition

• Taxonomy is a system for organizing concepts and categorizing content Expresses hierarchical

relationships (parent/child) Arranged in a tree-like

structure, with top level categories that branch out to reveal sub-categories and terms in varying levels of depth

Dictionary of preferred terminology

Products

Games

Card games

Action figures

Board games

Brands

Milton Bradley

Scrabble

Disney

Battleship

14Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy definition

• Taxonomy: system for organizing concepts and categorizing content• Expresses hierarchical relationships (parent/child)

• Expresses other relationships

Sample taxonomy record

Car SYN: Automobile Vehicle

fr-CA: Voiture en-UK: Auto es-CO: Carro

Synonyms

Translationsand regionalvariants

Preferred term

15Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy is a foundation…

• It is a system for classification

• It allows for a means to organize documents and web content

• Helps us fine tune search tools and mechanisms

• Creates a common language for sharing concepts

• Allows for a coherent approach to integrate information sources

• It is a common language for business processes

16Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy as a common business language

Case Example:Motorola’s Global

Taxonomy FrameworkServed Multiple Processes

Case Example:Motorola’s Global

Taxonomy FrameworkServed Multiple Processes

Browsing & filtering

Compare product

Related documents

Financial reporting

Business intelligence Program Management

Product Lifecycle Management

17Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Enterprise taxonomy drivers

Application Primary driver

“Clock speed”

Constituencies Technology challenges

Web Content Consistency in branding, internal efficiencies

Medium to fast

Web developers, content managers, content creators

Exposing taxonomy to CMS, integration with search

Enterprise data standards

Cross platform integration, business intelligence, metadata modeling, data warehousing

Very slow to slow

Data architects, standards boards, data modelers, business intelligence

“Source of truth”, difficulty integrating metadata standards

E Commerce Web site sales. Need to support customer experience

Very fast Merchandisers, e commerce development team, marketing

Commerce platforms do not necessarily leverage capabilities. Updates to classification are not a priority

Product development

Product development efficiencies, speed to market

Fast Engineering, Product development, product marketers

Product life cycle management systems usually self contained

Intranet development

Internal efficiencies

Slow to Medium

Intranet managers, functional managers

Difficulty unifying access to multiple repositories, sheer volume of sources

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

The Challenge of Search

Five basics truths about search

19Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Search as Utility

• “search as a utility has become deeply ingrained into people's everyday lives.“ – Study by Nielsen/Net Ratings

• “search software, hardware, and support bundle or search appliance has become very popular since being introduced in early 2002" – Goebel Group

These are misleading concepts. Search is used as a utility, but contexts vary so widely that “plugging search in” does not always produce satisfactory results.

20Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Truth #1.

We have to change our definition of search.

• Search is no longer just a white box.

• Search is an experience.

• Search is about information access & capabilities.

21Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Truth #2.

Search algorithms are getting better, but they cannot infer human

context & intent.

• A search engine doesn’t know if I’m an engineer, an attorney, or a high school student.

• Perspective has an impact on whether a set of search results are useful & appropriate.

22Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Truth #3.

Taxonomy, metadata and information architecture are key aspects of

search.

• Search is fundamentally about metaata

• Some content is structured, some isn’t and needs help

• Advanced search functionalities require taxonomy

23Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Truth #4.

Search is increasingly looking like navigation.

• What happens when you click on a link?

• Guided navigation & faceted search are really the same thing

24Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Truth #5.

Search is messy.

• Knowledge is messy, information is messy.

• People find answers through haphazard and chaotic processes.

25Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

“…search terms are short, ambiguous and an approximation of the searcher’s real information need…”

Source: http://research.microsoft.com/~ryenw/papers/WhiteCONTEXT2002.pdf Ryen W. White, Joemon M. Jose and Ian Ruthven

26Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Rising Expectations Plus Increased Complexity

• Search seems to be a ‘given’ – we expect it to be there

• Most enterprise search is less than optimal – too many results, irrelevant results, missing results

• It was not so long ago that organizations were starved for information

• A puzzling fact: as information environments have grown more complex, users expectations have grown that search should be simpler

27Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Search is complex

Enterprise search is diverse – need to access multiple applications and contexts – both structured and unstructured

Business Intelligence/Analytics

Customer Relationship Mgt

Document repositories

Custom databases and applications

Intranets/web pages

28Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Search is Heterogeneous

Search/Tagging/Taxonomy Integration Framework

Data Sources

Search Mechanisms

Appliances Federated Search

Auto categorization/Clustering

Entity Extraction

Faceted Search

Semantic Search

Business Intelligence

Customer Relationship Mgt

Document repositories

Custom databases and applications

Intranets/web pages

29Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

What is the right mechanism for accessing information?

• Content can be created in structured or unstructured contexts

• It’s value can vary depending on audience, context or process

• Some content is extremely nuanced and requires more precise access (according to audience or task, solution, etc…)

• Search can be based on inherent structure and content of a document (implicit metadata) or on information applied to that content (explicit metadata)

30Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

More Structured

Email

Instant Messages

Wikki’s

Blogs

Discussions

Collaborative Workspaces

Online Learning

Instructor Led Courses

Content Mgt

Workflow systems

Doc Mgt Systems

Records Mgt Systems

Knowledge Creation Knowledge Access/Reuse

Chaotic Processes Controlled Processes

Different tools are appropriate depending upon degree of collaboration and creation versus structured access

Less Structured

Emergent Value

31Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Lower Cost Higher Cost

Message text

External News Example deliverables

Discussion postings

Interim deliverables

Content Repositories

Success Stories

Benchmarks

Approved Methods

Best Practices

Unfiltered Reviewed/Vetted/Approved

Lower Value Higher Value

Relative value

Formal Tagging/Organizing Processes

(More difficult to access) (Easier to access)

Social tagging (“folksonomy”)

Structured tagging (taxonomy)

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

The Role of Metadata

Metadata drives content processes

Taxonomies provide the organizing principles behind metadata

33Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

What is metadata?

• It is the “is –ness” of a piece of content

• And the “about- ness” of a piece of content

• This is a Product Description

• It is about the Motorola Android

Taxonomies are the organizing principle behind metadata and the values that populate

metadata fields`

Taxonomies are the organizing principle behind metadata and the values that populate

metadata fields`

34Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

What is a content model?

• Content is structured with body information and a wrapper that formats and tags that information

• Also called a “content object model”*

Title

DescriptionSimple content object modelSimple content object model

*Content model refers to overall frameworkContent object model refers to a specific model for a set of document types

I.e., an overall “Content Model” includes multiple Content Object Models”

35Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Metadata for a product page in a content

management system

Title

DateAuthor

Features

Product_Name

Category

Doc_IDDoc_Type

“is – ness”“is – ness”

“about – ness”“about – ness”

FAQ

Product

Press release

Specification

Promotion

36Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Meta data allows for various views of content

• Web pages are made up of assembled items of content

• These are comprised of metadata elements that are assembled together into “content types”

Title

Comp_Features

DateAuthor

Features

Product_Name

Category

Promotion_ID

Promo_Type

Related_Products

Doc_ID

Content_ID

Date

Content_ID

Date

Content_ID

Date

Product content typeProduct content type

Promotion content typePromotion content type

Standard HeaderStandard Header

Related Products content type

Related Products content type

37Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

The User Experience (UX) is at the intersection of taxonomies, metadata and content objects

• Taxonomy: system for organizing and classifying content• Metadata: information about our content, housekeeping, as well as semantic

and structural information• Content Objects: groups of metadata that are assembled into components

that are then assembled into pages or documents

How will taxonomy surface on the front-facing application?

What do the wireframes suggest?

How do people interact with it?

How does the content architecture deliver the front-end design?

38Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy and the User Experience

• Define what the user interface will eventually look like

• Identify how content is laid out on the page

• Faceted Search:

Taxonomy Facets

Taxonomy Facets

Document Preview

Document Preview

Best BetsBest Bets

SynonymsSynonymsMisspellings

Results

39Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy in a content management application

1. Filtering products / search results

2. Dynamic relationships

3. Tagging & categorization of content

4. Dynamic navigation

5. Feature consistency / compare product

3

1

5

24

4

40Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

When is it metadata and when is it taxonomy?

• Taxonomy can be applied as metadata• Typically this is expressed as a drop down “controlled vocabulary”

list (also called “reference data”)• Some controlled vocabularies are very simple, with a few

unambiguous choices• Some are specific to a particular system or tool and will not

change frequently• There is a tendency to lump all metadata into a technology

bucket and assume this is owned and managed by IT• Not a good approach (since we need business ownership and

participation)

41Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Who owns the taxonomy? A question of governance

• Metadata Management (IT or application owner) Unambiguous Limited number of values Not frequently changing Housekeeping or administration role Specific to an application

• Taxonomy Management (business or functional owner) Ambiguous meaning Subject to frequent changes or updates Common across multiple applications or contexts Requires specific knowledge of field (subject matter expertise)

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Metadata and Search

All search leverages metadata

Explicit versus implicit metadata

43Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

All search leverages metadata…

…but not all metadata is explicit

• Full text search derives metadata about documents

• Creates an index of terms that occur in a document collection

• Associates documents with those index entries

44Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Explicit metadata versus implicit metadata

DEF Company

Support

ABC Company

ABC shall provide first level technical support to all Licensed Product end users and/or Sublicensed Product customers/users. DEF will provide second level support. DEF shall provide to ABC a primary and a secondary support person to act as the primary interface with ABC’s technical and customer support team. DEF shall provide direct technical support to ABC for all uses of the DEF Software. Support level definitions and responsibilities are set forth in Exhibit C. An “SLA Failure” as defined in Exhibit C shall qualify as a Release Condition sufficient to authorize the Escrow Agent to release to Source Code to ABC pursuant to Section 7 and the Escrow Agreement.

LicenseContent Type =

Organization =

ABCcustomerscustomer supportcustomer support teamDEFDEF softwareend usersescrow agreement.escrow agentexhibit clicensed product

release conditionsection 7secondary supportSLASLA failuresoftwaresource codesupport levelsublicensed producttechnical support

Topic =Forward Index – Words per documentInverted Index – Documents per word

Explicit metadata

Used to derive implicit metadata

45Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Search index points to document

12

3

4

Forward Index – Words per documentInverted Index – Documents per word

A search index becomes derived metadata about a collection of documents

Term Document

Acme 1, 2, 3, 4

customers 2, 3

escrow 3, 4

exhibit c 2

license 1, 4

…etc …etc

In which documents do these words occur?

46Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

…but not all metadata is explicit

• Full text search derives metadata about documents

• Creates an index of terms that occur in a document collection

• Associates documents with those index entries

• Occurrence of certain words in a document and the relative value of those occurrences, including: Weighting Relative positioning Semantic relationships…

…becomes information about the document that is cached in the index and served by the search engine

• Search algorithms vary in how metadata is derived and exposed to users.

All Search Leverages Metadata

Relevance ranking, for example, is additional metadata for a result that is ‘implied’ or derived based on incoming connections to a piece of content.

Relevance ranking, for example, is additional metadata for a result that is ‘implied’ or derived based on incoming connections to a piece of content.

47Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Examples of implicit metadata:

• ‘Structure’ and format of content – a piece of content may be ‘unstructured’ and not contain metadata, but it is well organized. Example : Newspaper story contains a headline, sub head, and first paragraph with who,

what, where, when, etc. Clear editorial standards

• Context of content – Where did the content come from? If from a particular web site, file share, data source or intranet location the domain of knowledge provides context. How can we disambiguate the term “diamond”?

Sports site – baseball diamond Commerce site – diamond ring

Sales context for ‘feature’ versus engineering context for ‘feature’ “Adapter” – power cord “Adapter” – blue tooth headset

48Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Context as metadata

• If we maintain context of a piece of information in our search results, this is equivalent to having additional metadata on that content

Search results organized by repository

This is a form of “federated” search – a single search term fed to multiple repositories

Example courtesy of Morrison and Foerster

49Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

“We should get Google”…

50Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Why you will not “just get Google”

• Google leverages linkages on the web that are not typically duplicated internally in the organization

• Search engines cannot infer intent or know what is important to you in the context of your work task

• Information relevance is dependant on who you are and your level of expertise as well as what you are trying to accomplish

• Not all content is equal - Google is fine for broad search results or less precise information, may not work as well if large numbers of documents with finer granularity of differences

51Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Why doesn’t Google, just use Google?

52Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Why you will not “just get Google”

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

More Definitions: Taxonomy, Ontology, Thesaurus…

54Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

“Sound bite” definitions

• A Taxonomy is a list of terms that enable classification of information Method used to organize Subject/Topic metadata Typically expresses hierarchical relationships (parent/child) Emphasizes context

• A Thesaurus is a specialized taxonomy Equivalence relationships (synonyms) Associative relationships (related terms – “see also”) Preferred terms, variant terms

• An Ontology is a collection of taxonomies and thesauri A body of knowledge is represented by multiple lists of categories Categories of various types are conceptually related

55Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Definitions

• Classification Scheme - A preordained structure of words or symbols used to organize information content

• Index - A list organized in a standardized sequential fashion

Types of indexes may include: back-of-the-book, telephone directory, computerized look-up tables (e.g. b-tree, file system), card catalog, meeting roster of attendees, customer list, to name a few.

An index is a classification scheme

A taxonomy is a classification scheme

But… a classification scheme is not necessarily a taxonomy…

56Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Classification versus Taxonomy

TAX

Assets

Individuals

Corporations

Liabilities

Individuals

Corporations

TAX ITEMS

Assets

Real Estate

Vehicles

Liabilities

Loans

Debts

TAX PAYERS

Individuals

Single

Married

Organizations

Corporations

Associations

57Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Types of Term Relationships

Used in thesauri.

Also called “entry types” of terms.

Synonyms.

Things that are related conceptually.

Associative relation types are context and audience specific.

This is how we might relate multiple taxonomies.

Purist definition of a taxonomy – terms have parent/child relationship.

Equivalence Hierarchical Associative

Increasing complexity

58Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Relationship Types

Relationship Examples

E E

? A

? ?H

H

A

E Equivalence

H Hierarchical

A Associative

Computer Manufacturers

International Business Machines

IBM

Software Group

Big Blue

Hardware Software

59Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Equivalence Terms Associative Terms

• Common misspellings

• Other terms used

• Abbreviations

• Internal names

• See also

• Related products

• Language spoken

• Products for market

• Available in region

• Risks in region

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

The Role of Taxonomy

61Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Goals of a taxonomy

• Allow for knowledge discovery

• Improve usability of applications as well as learnability of applications

• Reduce the cost of delivering services, developing products and conducting operations

• Improve operational efficiencies by allowing for reuse of information rather than recreation

• Improve search results and applicability (both precision and recall)

62Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy Challenges

• Taxonomy means many things in SharePoint Site organization Content types Controlled vocabularies for tagging documents

• Challenges Typically integration of legacy content requires significant tagging effort Users wanted to leverage hierarchy in search in the form of faceted navigation

63Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy Solutions

• Taxonomy Technology Leveraging Hierarchy and Taxonomy in both tagging and faceted search True taxonomy management is beyond the scope of SharePoint 2010

• Taxonomy in Context Auto-populate metadata fields with taxonomy values based on the overall

architecture of the site and users roles Reduce the burden on users allow Locations, Departments, Roles to be filled in

automatically

64Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Recall versus Precision

• The goal of effective search is to pull back lots of relevant results

• This is measured by “recall” and “precision”

• Recall: I am getting the documents that contain my term

• Precision: These results are relevant to me

When trying to improve recall, precision can suffer and vice versa

Precision can also be subjective – based on who we are and what we are doing, in other words, context and task

When trying to improve recall, precision can suffer and vice versa

Precision can also be subjective – based on who we are and what we are doing, in other words, context and task

65Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Precision

Recall

Relevant items in a database

Items retrieved

Irrelevant items

Relevant items retrieved

Relevant items not retrieved

A

B

C

Ratio of number of relevant items retrieved to total number of relevant items in database

AA B+

Ratio of number of relevant items retrieved to total number of irrelevant and relevant items retrieved

AA C+

X 100 %

X 100 %

Goal is to improve recall without sacrificing precision

66Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy & search strategies

Six strategies you should know about

Tuned search Relevance ranking Faceted search Related terms Clustering Disambiguation

67Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy and Search Strategies

• Pre Search processing Search engine applies taxonomy or thesaurus to narrow or expand search before

retrieving results

Tuned search “Best Bets” Relevance ranking Faceted search

• Post Search Processing Search results are narrowed or organized after they are retrieved

Related terms Clustering Disambiguation

68Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Applying a taxonomy to search

We need a mechanism to improve search

• A Taxonomy can be used to Define search terms and map those terms to specific locations of

information (need to integrate with a search engine)

Apply terms to a document so that relevant and consistent search results are returned (need to integrate with a content management system)

• A Thesaurus can be used to define term synonyms and related terms in order to improve the recall of information. We may define “proposal” and “statement of work” and “SOW” as

meaning the same thing. If I enter SOW, I can pull back documents that are labeled with (or contain) the other terms. This is referred to as “term expansion”

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Tuned Search, or “Best Bets”

70Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Tuned Search

What is Tuned Search?

• Search terms are defined in a taxonomy and mapped back to specific locations of information (ie. Specific web pages).

• Eg. A user searching on a broad term like cell phones would be first pointed to a landing page (a “best bet”), or presented a box of hand-picked links above regular search results.

71Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Best Bets Example – Best Buy

72Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Tuned Search “Best Bets”

• The same search using just keyword matching could a have retrieved a list of pages with the words “phone” or “cell” e.g.

Home phones Cordless phones 12 cell batteries Etc.

• Reading through pages of possible matches is time consuming and frustrating

73Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Tuned Search “Best Bets”

How Does a Taxonomy Help?

• Using the taxonomy categories as landing pages assures that users are strategically directed to the content that is most important.

• Best bets are done in conjunction with a taxonomy/thesaurus, not just a list of search terms… Eg. Circuit City

74Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Circuit City Example

• Search on “Cell phone”:

75Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Circuit City Example

• Search on “Mobile phone”:

76Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Circuit City Example

• What do these things have to do with mobile phones?

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Relevance Ranking

78Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Relevance ranking boost

• Can assign more weight to specific metadata fields in the engine’s ranking algorithms

• If search term matches metadata field, higher relative weight than full text hit and boosted rank

• E.g. Best Buy boosts taxonomy category

• E.g. Motorola could boost the product category

content index

Metadata field Relative Weighting: 45

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Leveraging taxonomy terms as metadata

Faceted search

80Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Leverage the taxonomy terms as metadata - faceted search

What is Faceted Search?

• Attribute based search (guided navigation) approach to create precise, targeted search results. Each parameter narrows the search result to the most appropriate content. Also commonly referred to as “advanced searching” or “parametric

searching”

• Users think they are browsing, but they are actually searching

• Allows for multiple navigation schemes based on taxonomy

81Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Navigational taxonomy

Taxonomy can be a hierarchical grouping of navigational nodes on a web site

Motorola.com

Mobile phonesModems & gateways

2-way radios

Unlocked GSM

With service Accessories

Batteries Headsets

Bluetooth headsets

Challenge is there is no “one way” to navigate that is correct.

Is this the “correct” way?

82Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Navigational taxonomy

Or is this one “correct”? Or is this one?

Motorola.com

Mobile phonesModems & gateways

2-way radios

Camera phones

Bluetooth phones

Bluetooth accessories

Sunglasses Headsets

Motorola.com

Mobile phonesModems & gateways

2-way radios

Unlocked GSM

With serviceBluetooth

accessories

Sunglasses Headsets

83Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Motorola.com => United States => Government => Portable Radios

Motorola.com => Portable Radios => United States => Government

Motorola.com => Government => Portable Radios => United States

Motorola.com

CanadaUnited

KingdomUnited States

Enterprise Government

Portable radios

Mobile computers

Consumers

Motorola.com

Mobile computers

Mobile radiosPortable radios

United StatesCanada

Government

United Kingdom

Enterprise Consumer

Motorola.com

Government Enterprise Consumers

Mobile computers

Portable radios

United Kingdom

Canada United States

84Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Navigating with “facets”

• Two way radios Portable Fixed Mobile Motorcycle

• Vertical market Government Manufacturing Wholesale retail

• Country Canada United Kingdom United States

Vertical market

Target document: P = Portable radioG = United StatesV = Government

Product type

Geographic region

“Facet” is a top level category in the taxonomy

Just three nodes with 5 terms each could have 3 to the 5th power (243) possible combinations

85Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Is it search? Or navigation?

Good example of faceted search using hierarchy

Good example of faceted search using hierarchy

86Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Faceted search – PC Connection

Each parameter narrows the search result to the most appropriate content.

87Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy and Search

• Post Search Processing- Search results are narrowed after they are retrieved

Related terms

Clustering

Disambiguation

88Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Related Terms

• Leverages associative relationships in a taxonomy

89Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Clustering

• Adds context to large result sets

• Clusters are similar to facets but based on derived attributes

• Derived attributes based on concepts contained in result set mapped to taxonomy

90Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Clustering Example

90

91Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Clustering

How do I implement Clustering?

• Build out your taxonomy, then extract entities from content and categorize based on derived metadata (facets)

92Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Categorizing content

Statistical/linguistic

Rules-based

These documents look similar due to an analysis of word patterns – lets put them into the same group

These documents look similar based on some rule that have created (they contain marketing plans and are about the newest widget) lets put them into the same group

93Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Clustering based on Taxonomy Slice from a set of Search Results:

TaxonomyPathTreeBuilder

Taxonomy“Slice”

TaggedDocumentsTagged

DocumentsTaggedDocumentsTagged

DocumentsTaggedDocuments

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Disambiguation

95Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Disambiguation of search results

What is Disambiguation?

• If a user enters a broad term (like “mobile”) the taxonomy can return terms that help the user select a more precise terms

• Includes multiple approaches: Term expansion

Complex lookups

96Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Disambiguation methods

• Show related search terms with check boxes in the search results page.

• Show additional search terms as links, perhaps with a prompt - "You might also be interested in:"

• Expand the query and show the expanded words in the search box

• Expand the query invisibly

97Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Disambiguation of search results

mobile Mobile data terminalsHandheld computers

Network InfrastructureMobile switches

PhonesFixed mobile car phonesMobile phones

Software applicationsMobile applications

Two way radiosMobile radios

Intelligent video solutionsMobile video enforcerMobile video sharing

MESH SolutionsMulti-radio mobile broadband

Mobile ComputingMobile application

Presenting term in multiple contexts

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

From Associative Relationships

99Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Disambiguation of search results

How Do I Implement Disambiguation Methods?

• Need to integrate thesaurus with search engine• Can be accomplished through custom frameworks, web

services, API calls• Thesaurus values can live inside of search engine, in taxonomy

management tool, in spreadsheets or databases or in public sources

100Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Disambiguation

• Query: Did Enron executives illegally sell Enron stock?

Source: CognitionSearch.com

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy and Navigation*

*Taxonomy is not the same as navigation

102Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

103Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Applying a taxonomy to navigation

We need to improve navigation for our site

• A Taxonomy can be used to Inform navigation (though it is not the same as navigation) Define metadata and the information architecture of the site.

104Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Navigation – Sales Node

Sales ToolsAnalyst Reports

………

Case Studies…

Competition…

Customer ReferencesFAQ’sPricing & LicensingWhite PapersPresentations

105Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Navigation – Sales Node

Sales ToolsAnalyst Reports

………

Case Studies…

Competition…

Customer ReferencesFAQ’sPricing & LicensingWhite PapersPresentations

Doc Types• Analyst Reports• Assessment• Benchmarks• Best Practice• Brochures• Campaign• Case studies• Competition• Configuration Guide• Contracts• Customer References• Data sheet• Event• FAQ• Guides• License Agreements• Migration• Presentations• Press Releases• Price Lists• Quick Reference Guide• White papers

106Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Why you will not just “use a folksonomy”

• All content is not equal

• Higher value content requires more rigor

• Social tagging is still immature

• May be appropriate for some kinds of content

• On systems open to large user groups, esoteric tags which are understood by a only minority of users tend to proliferate burdens users decreases system efficiency

• Core to folksonomies are the flaws that formal classification systems are designed to eliminate, such as redundancy, misspelling, etc.

• Taxonomists/ontologists argue that an agreed-to set of tags enables more efficient indexing and searching of content

earley

earley & associates

earley & associates inc

earley & associates needham, massachusets

earley & associates taxonomy

earley & associates, inc

earley & associates, inc.

earley & earley associates

earley and associates

earley and associates inc

earley and associates seth

earley and associates taxonomy

earley assoc

earley associates

earley associates address

earley associates boston

earley associates wordmap

earley financial

earley jumpstart

earley taxonomy

earley taxonomy & metadata jumpstart call: managing structured metadata and taxonomies

earley.com

early & associates

early and associates

taxanomic classification of the freycinetia

taxonimic classification of humans

taxonomic and dichotomus

taxonomic classification

taxonomic classification human

taxonomic genus of king cobra

taxonomic implementation

taxonomies of knowledge

taxonomies project roadmap

taxonomist job description

taxonomy metadata

taxonomy & metadata jumpstart - 2007

taxonomy and false drops

taxonomy and classifiation examples of animals

taxonomy and metadata

taxonomy and metadata jumpstart

taxonomy c

taxonomy classification

taxonomy classification charts

taxonomy community of practice

taxonomy consulting

taxonomy creation

taxonomy creation management

taxonomy defined

taxonomy deployment

taxonomy development process

taxonomy implementation

taxonomy iqpc

taxonomy job description

taxonomy maintenance

taxonomy management

taxonomy management job title

taxonomy management tools

taxonomy metadata

taxonomy models for project management

taxonomy of global executives

taxonomy of man

taxonomy search

taxonomy seth early

taxonomy structure business organisation

taxonomy training

taxonomy validation

taxonomy(2007)

taxonomy, mlis

taxonomy/classification.online

108Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Conclusions

• Search engines, no matter how sophisticated, do not obviate the need for taxonomies

• Content value in the context of a work process will determine the level of required structure

• There is no “one size fits all”

• Taxonomy, content strategy and search all work together to improve the findability of content.

• Google doesn’t always get it right…

109Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Earley & Associates: #1 on Google for Silver Mining Tools

110Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Questions?

Jeff CarrSenior Information Architect & Search Consultant780-819-7275jeff@earley.com

Seth EarleyCEO781-820-8080seth@earley.com Follow me on twitter: sethearleyConnect with me on LinkedIn: www.linkedin.com/in/sethearley