Taxonomy 101

37
+ Taxonomy 101 Controlled Vocabularies and Beyond Barbara McGlamery, Marthastewart.com

Transcript of Taxonomy 101

Page 1: Taxonomy 101

+

Taxonomy 101Controlled Vocabularies and Beyond

Barbara McGlamery, Marthastewart.com

Page 2: Taxonomy 101

+About Me

9+ years Time Inc.

Entertainment Weekly

This Old House

Time

People

Instyle

Recipe Finder

1+ years Martha Stewart

Martha Stewart Living

Martha Stewart Weddings

Whole Living

Page 3: Taxonomy 101

+Agenda

Basics of taxonomy and controlled

vocabularies

Developing a taxonomy

Taxonomy software and tagging tools

Records management and taxonomy

Page 4: Taxonomy 101

+What is a controlled vocabulary?

Predefined, authorized terms that can be consistently applied to content

Types: Lists

Synonym rings

Authority Files

Facets

Page 5: Taxonomy 101

+What is a taxonomy?

Classification of a controlled vocabulary in a hierarchical list

Types:

Taxonomy

Thesaurus

Ontology

Page 6: Taxonomy 101

+Controlled Vocabulary

Predefined, authorized terms

that can be consistently applied

to content

Relationship is between the list

value and class

Page 7: Taxonomy 101

+Controlled Vocabulary

Units of Measure

Cup

Tablespoon

Teaspoon

Page 8: Taxonomy 101

+Synonym Ring

Extends a CV by adding synonyms as

equivalent terms

Relationship is between list value and its

synonyms

Page 9: Taxonomy 101

+Synonym Ring

Units of Measure

Cup = C= c

Tablespoon = Tbl = T

Teaspoon – tsp = t

Page 10: Taxonomy 101

+Authority File

Extends CV’s and synonym rings further by

assigning one term as the preferred term

which all other synonyms will point to

Relationship assigns property (Preferred

Term) to one term and all others as

synonyms

Page 11: Taxonomy 101

+Authority File

Units of Measure

(Preferred Term) Cup

Syn: C, c

(PT) Tablespoon

Syn: Tbl, T

(PT) Teaspoon

Syn: tsp, t

Page 12: Taxonomy 101

+Facets

Terms are broken down individually by

unique properties, allowing a mix and match

approach to search and retrieval

Relationship is between one facet node and

multiple values

Page 13: Taxonomy 101

+Facets

Page 14: Taxonomy 101

+Taxonomy

Classification of a controlled vocabulary in

a hierarchical list

Relationship is in assigning a hierarchy to

list values

Page 15: Taxonomy 101

+Taxonomy

Food

Main Ingredient

Vegetables (ahem…fruit)

Tomatoes

Beefsteak tomatoes

Cherry tomatoes

Sundried tomatoes

Page 16: Taxonomy 101

+Thesaurus

CV’s in a hierarchical structure with

predefined relationships between terms

(Broader Term, Narrower Term, Preferred

Term, etc.)

Relationship is in assigning standardized

properties to list values

Page 17: Taxonomy 101

+Thesaurus

Food

(BT) Main Ingredient

(BT)Vegetables (ahem…fruit)

(BT)Tomatoes

(NT)Beefsteak tomato

(NT)(PT)Cherry tomato

(RT) Roma tomato

(NT)Sundried tomato

(RT) Tomato sauce

Page 18: Taxonomy 101

+Ontology

CV’s in a hierarchical structure with complex

relationships defined

Relationship is in assigning predetermined

standardized and freeform properties to list

values

Page 19: Taxonomy 101

+Ontology

Beefsteak tomatoes

(isMainIngredient)

Tomato sauce

Will Smith

(isLeadActor)

Men in Black 3

Page 20: Taxonomy 101

+Semantic (semantic) Web

Big S

Initiative from W3C to create a web of machine readable data by marking up content with consistently applied, standardized and freeform properties

RDF/OWL

Proprietary

Little s

Various standards that mark up content with agreed-upon and freeform properties

Microformats

Microdata

Proprietary

Page 21: Taxonomy 101

+Pros and Cons of CV’s and

taxonomy

Benefits

Greater precision in search and retrieval

Allows for faceted browsing

Facilitates aggregation of content

Clearly defines relationships between things

Limitations

Initial costs

Upkeep

Can spiral out of control

May be too complex for some organizations

Page 22: Taxonomy 101

+What is taxonomy used for in web

world?

Search and retrieval

Faceted browsing

Aggregation

of content

Internal organization

of assets

Page 23: Taxonomy 101

+Developing a taxonomy

Strategy and planning

Choosing style and method

Determine classes and relationships

Gather terms and organize

Add terms and relationships

Review and approval

Page 24: Taxonomy 101

+Strategy and Planning

Identify business case

ROI

Money saved

Money earned

Scope

Use cases

Front-end

Back-end

Approval

Wireframes and functional specification

Page 25: Taxonomy 101

+Choose Style and Method

Method

Top down

Bottom up

Styles

CV

Synonym ring

Authority file

Facets

Taxonomy

Thesaurus

Ontology

Page 26: Taxonomy 101

+Determine Classes and

Relationships

Classes

As few as necessary

Relationships between terms

As few as necessary

With a taxonomy, determine nature of hierarchy

Type of

With a thesaurus, use predefined, but you may not want

to use all

With ontology, determine complex relationships

Page 27: Taxonomy 101

+Gather Terms and Organize

Research

Competitive analysis

Identify existing outside CV’s that might be utilized (SIC

codes)

Meet with stakeholders

Get as much input as possible

Stick to biz case (spiraling problem)

You are the final decision maker

Must conform to structure decided upon otherwise mass

chaos

Always keep use cases in mind

Page 28: Taxonomy 101

+Add Terms and Relationships

Things to keep in mind:

Synonyms, misspellings, special characters

Homonyms

Different database identifiers or different names

Shower (baby and bathroom)

Duplicates

Technical considerations if different children

Breads as a main ingredient or as a dish

Bruschetta (dish, but not main ingredient)

Descriptions

Identifying duplicates or notes regarding the application to content

Page 29: Taxonomy 101

+Review and Approval

Thorough review by all stakeholders

This can take several sessions if

taxonomy is big

Final approval and sign-off

Critical for buy-in

Page 30: Taxonomy 101

+Taxonomy and Tagging Tools

Relational databases

Filemaker Pro

Microsoft Access

MySql

Content management

software

Drupal

Sharepoint

Proprietary applications

Thesaurus and taxonomy tools

Open source

Protégée

Commercial

SchemaLogic (Thesaurus)

TopBraid Composer,

(Ontologies), Pro

Auto categorization and text

mining

Data Harmony MAIstro,

Nstein

Page 31: Taxonomy 101

+Tagging the Content

Manual

Good for small, controlled sets of documents

Highly accurate

Time consuming

Automated

Good for large unwieldy sets of documents

Fast and getting more accurate daily

Expensive, 3rd party apps

Hybrid

Manual – content or document creators insert valuable metadata

Automated – other data extracted and matched to taxonomy

Page 32: Taxonomy 101

+Real World Application of Taxonomy

for Records Management

Classifying

Storing and retrieving

Securing

Archiving or destroying

Page 33: Taxonomy 101

+Real World Applications

CV

List of Departments (HR, IT, Marketing)

Synonym rings

Mergers and acquisitions = M and A = M&A

Authority File

(PT) Mergers and acquisitions

Syn: M and A, M&A

Facets

Authors, Departments, Security Level

Taxonomy/Thesaurus

Organizational chart

Investment Bank Director

SVP Investments

EVP Investments

Investment Analyst

Ontology

Relationships between affiliations and departments/industries

ARMA (isProfessionalAssn) for Records Managers

Page 34: Taxonomy 101

+What could it be used for in your

world?

http://www.yutope.com/2008/07/is-your-email-inbox-overflowing/

Page 35: Taxonomy 101

+Industry standards

Taxonomy specific

Dublin Core (DC)

Thesaurus construction

ANSI/NISO Z39.19

ISO 2788; 5964

Ontology development

W3C

Resource Description Framework (RDF)

Web Ontology Language (OWL)

Records Management specific

Metadata management

ISO/S 23081-1

ISO 23081-2

Page 36: Taxonomy 101

+

Questions?

Page 37: Taxonomy 101

+My contact info

Barbara McGlamery

Taxonomist

Martha Stewart Living Omnimedia

(212)827-8817

[email protected]