Taxonomy 101

Post on 06-Jul-2015

712 views 2 download

Transcript of Taxonomy 101

+

Taxonomy 101Controlled Vocabularies and Beyond

Barbara McGlamery, Marthastewart.com

+About Me

9+ years Time Inc.

Entertainment Weekly

This Old House

Time

People

Instyle

Recipe Finder

1+ years Martha Stewart

Martha Stewart Living

Martha Stewart Weddings

Whole Living

+Agenda

Basics of taxonomy and controlled

vocabularies

Developing a taxonomy

Taxonomy software and tagging tools

Records management and taxonomy

+What is a controlled vocabulary?

Predefined, authorized terms that can be consistently applied to content

Types: Lists

Synonym rings

Authority Files

Facets

+What is a taxonomy?

Classification of a controlled vocabulary in a hierarchical list

Types:

Taxonomy

Thesaurus

Ontology

+Controlled Vocabulary

Predefined, authorized terms

that can be consistently applied

to content

Relationship is between the list

value and class

+Controlled Vocabulary

Units of Measure

Cup

Tablespoon

Teaspoon

+Synonym Ring

Extends a CV by adding synonyms as

equivalent terms

Relationship is between list value and its

synonyms

+Synonym Ring

Units of Measure

Cup = C= c

Tablespoon = Tbl = T

Teaspoon – tsp = t

+Authority File

Extends CV’s and synonym rings further by

assigning one term as the preferred term

which all other synonyms will point to

Relationship assigns property (Preferred

Term) to one term and all others as

synonyms

+Authority File

Units of Measure

(Preferred Term) Cup

Syn: C, c

(PT) Tablespoon

Syn: Tbl, T

(PT) Teaspoon

Syn: tsp, t

+Facets

Terms are broken down individually by

unique properties, allowing a mix and match

approach to search and retrieval

Relationship is between one facet node and

multiple values

+Facets

+Taxonomy

Classification of a controlled vocabulary in

a hierarchical list

Relationship is in assigning a hierarchy to

list values

+Taxonomy

Food

Main Ingredient

Vegetables (ahem…fruit)

Tomatoes

Beefsteak tomatoes

Cherry tomatoes

Sundried tomatoes

+Thesaurus

CV’s in a hierarchical structure with

predefined relationships between terms

(Broader Term, Narrower Term, Preferred

Term, etc.)

Relationship is in assigning standardized

properties to list values

+Thesaurus

Food

(BT) Main Ingredient

(BT)Vegetables (ahem…fruit)

(BT)Tomatoes

(NT)Beefsteak tomato

(NT)(PT)Cherry tomato

(RT) Roma tomato

(NT)Sundried tomato

(RT) Tomato sauce

+Ontology

CV’s in a hierarchical structure with complex

relationships defined

Relationship is in assigning predetermined

standardized and freeform properties to list

values

+Ontology

Beefsteak tomatoes

(isMainIngredient)

Tomato sauce

Will Smith

(isLeadActor)

Men in Black 3

+Semantic (semantic) Web

Big S

Initiative from W3C to create a web of machine readable data by marking up content with consistently applied, standardized and freeform properties

RDF/OWL

Proprietary

Little s

Various standards that mark up content with agreed-upon and freeform properties

Microformats

Microdata

Proprietary

+Pros and Cons of CV’s and

taxonomy

Benefits

Greater precision in search and retrieval

Allows for faceted browsing

Facilitates aggregation of content

Clearly defines relationships between things

Limitations

Initial costs

Upkeep

Can spiral out of control

May be too complex for some organizations

+What is taxonomy used for in web

world?

Search and retrieval

Faceted browsing

Aggregation

of content

Internal organization

of assets

+Developing a taxonomy

Strategy and planning

Choosing style and method

Determine classes and relationships

Gather terms and organize

Add terms and relationships

Review and approval

+Strategy and Planning

Identify business case

ROI

Money saved

Money earned

Scope

Use cases

Front-end

Back-end

Approval

Wireframes and functional specification

+Choose Style and Method

Method

Top down

Bottom up

Styles

CV

Synonym ring

Authority file

Facets

Taxonomy

Thesaurus

Ontology

+Determine Classes and

Relationships

Classes

As few as necessary

Relationships between terms

As few as necessary

With a taxonomy, determine nature of hierarchy

Type of

With a thesaurus, use predefined, but you may not want

to use all

With ontology, determine complex relationships

+Gather Terms and Organize

Research

Competitive analysis

Identify existing outside CV’s that might be utilized (SIC

codes)

Meet with stakeholders

Get as much input as possible

Stick to biz case (spiraling problem)

You are the final decision maker

Must conform to structure decided upon otherwise mass

chaos

Always keep use cases in mind

+Add Terms and Relationships

Things to keep in mind:

Synonyms, misspellings, special characters

Homonyms

Different database identifiers or different names

Shower (baby and bathroom)

Duplicates

Technical considerations if different children

Breads as a main ingredient or as a dish

Bruschetta (dish, but not main ingredient)

Descriptions

Identifying duplicates or notes regarding the application to content

+Review and Approval

Thorough review by all stakeholders

This can take several sessions if

taxonomy is big

Final approval and sign-off

Critical for buy-in

+Taxonomy and Tagging Tools

Relational databases

Filemaker Pro

Microsoft Access

MySql

Content management

software

Drupal

Sharepoint

Proprietary applications

Thesaurus and taxonomy tools

Open source

Protégée

Commercial

SchemaLogic (Thesaurus)

TopBraid Composer,

(Ontologies), Pro

Auto categorization and text

mining

Data Harmony MAIstro,

Nstein

+Tagging the Content

Manual

Good for small, controlled sets of documents

Highly accurate

Time consuming

Automated

Good for large unwieldy sets of documents

Fast and getting more accurate daily

Expensive, 3rd party apps

Hybrid

Manual – content or document creators insert valuable metadata

Automated – other data extracted and matched to taxonomy

+Real World Application of Taxonomy

for Records Management

Classifying

Storing and retrieving

Securing

Archiving or destroying

+Real World Applications

CV

List of Departments (HR, IT, Marketing)

Synonym rings

Mergers and acquisitions = M and A = M&A

Authority File

(PT) Mergers and acquisitions

Syn: M and A, M&A

Facets

Authors, Departments, Security Level

Taxonomy/Thesaurus

Organizational chart

Investment Bank Director

SVP Investments

EVP Investments

Investment Analyst

Ontology

Relationships between affiliations and departments/industries

ARMA (isProfessionalAssn) for Records Managers

+What could it be used for in your

world?

http://www.yutope.com/2008/07/is-your-email-inbox-overflowing/

+Industry standards

Taxonomy specific

Dublin Core (DC)

Thesaurus construction

ANSI/NISO Z39.19

ISO 2788; 5964

Ontology development

W3C

Resource Description Framework (RDF)

Web Ontology Language (OWL)

Records Management specific

Metadata management

ISO/S 23081-1

ISO 23081-2

+

Questions?

+My contact info

Barbara McGlamery

Taxonomist

Martha Stewart Living Omnimedia

(212)827-8817

bmcglamery@marthastewart.com