Faceted Classification Complex subjects from simpler components.

31
Faceted Classification Complex subjects from simpler components

Transcript of Faceted Classification Complex subjects from simpler components.

Faceted Classification

Complex subjects from simpler components

Outline

• Refresher from last week: Basic classification structures and arrangement within that structure.

• Goals of faceted classification.

• Basic design of faceted classifications.

• Facet analysis of complex subjects (factoring).

• Determination of facet structure.

• Faceted browsing on the Web.

Three basic classification structures

One hierarchy. All concepts emanate via hierarchical relationships from a single root node.

Example: The enzyme hierarchy from the first day of class.

Three basic classification structures

Multiple parallel hierarchies. Instead of a single root node, there are multiple nodes. The parallel hierarchies might be of similar kinds, but different themes (e.g., religion and science, which are both disciplines).

Examples: The Soviet library classification from the first day of class (any library classification, where there are separate independent sections for various disciplines).

Three basic classification structures

Faceted. A variation of multiple parallel hierarchies in which fundamental types are combined to create complex concepts. Facets are typically of different orthogonal kinds (processes, products, actors, places).

Examples: See Hunter and Vickery readings.

Structural refinements

In addition to the basic hierarchical relationships from broader terms to narrower ones (is-a, is-a-part-of, is-an-instance-of), we can implement additional structural refinements to clarify relationships between concepts in a single array (level) of a hierarchy.

Arrangement within arrays

Two forms:

• When multiple principles of division are used, showing the nature of relationships between narrower concepts (“children”) to a broader concept (“parent”).

• Using order of concepts within an array to convey relationships between “siblings.”

Showing principles of divisionshoes

high heelshiking bootsmary-janespumpsrunning shoessandalsslingbacksstilettoswedgeswinter boots

shoes(by season)

winterspring

(by function)hikingrunning

(by style)bootspumpssandals

(by feature)slingbacksmary-janes

(by heel type)stilettoswedges

(by heel height)high heels

Another exampleFurniture

(by material)wooden furnitureplastic furniture

(by style)rococco furnituremodern furniture

(by room)bedroom furnitureoffice furniture

(by function)storage furniture

(by form)bookcasestablesdeskswardrobesbureaus

sleeping furniture

Uses of structural labelsThe parenthetical phrases that indicate principles of division (sometimes called “node labels” or “subfacet indicators”) are typically not used for indexing, but help the user (either the indexer or the information seeker) to understand the types of relationships defined by the system and to apply terms accordingly.

A shoe might be indexed as: winter-boots-wedges-high heels.

shoes(by season)

winterspring

(by function)hikingrunning

(by style)bootspumpssandals

(by feature)slingbacksmary-janes

(by heel type)stilettoswedges

(by heel height)high heels

Ordering concepts at each level

An “array” is a group of siblings (descriptors at the same level of hierarchy). Order in an array provides another means to show relationships between concepts. Possible orders:

•Chronological (art styles from Post-Impressionist to Dada to Cubist to Abstract Impressionist)

•Directional (east to west, for example, or closest to farthest)

•Increasing intensity (slowest to fastest music tempos, for example, or lightest to darkest hues)

•Increasing concreteness (from more general to more specific, such as from philosophical warrant to cultural warrant to literary warrant)

•Increasing quantity (from one to many)

•Order of a process (from plowing to planting to weeding to harvesting, for example)

Example: music tempos

Largo

Andante

Moderato

Allegro

Vivace

Presto

(slowest to fastest)

Allegro

Andante

Largo

Moderato

Presto

Vivace

(alphabetical)

And in the beginning, S. R. Ranganathan saw a Meccano set...

Motivations for faceted classification

• The sheer number of documents keeps growing. • The subjects of the documents are both more specific and more complex. • Knowledge itself is rapidly expanding—new subjects are constantly being created.

It’s not helpful to put huge numbers of documents in general subject categories (British History, Nuclear Physics). And yet we can’t possibly enumerate all the possible subjects that either currently exist or may soon exist. What to do?

Goals of faceted classification

If we can create a classification scheme that lists subject components, then we can build complex subjects out of the components as necessary.

We facilitate the construction of complex subjects by organizing the component concepts that make up our classification into facets, or potential aspects of the subject.

From compound to components

Example of complex subject:

The history of Japanese tea-drinking etiquette

Components (or isolates, or factors): History + Japan + Tea + Drinking + Etiquette

Potential fundamental categories (facets) for the components: Disciplines (history); Locations (Japan); Beverages (Tea); Activities (Drinking); Values (Etiquette)

Building subjects from components

A traditional faceted classification for libraries includes both the facet structure of components and syntax rules for combining the components into complex subjects.

These rules are necessary to ensure that documents are filed consistently on shelves. (In an online environment, these rules become superfluous.)

To “mechanize” the subject-building process and simplify filing, components are given a notation (such as “soil acidity – sag” that clarifies the component’s position within a facet.

Structure of faceted classificationsWhile a facet may be a simple list, components within a facet are typically arranged hierarchically (using a stricter or looser sense of hierarchy as appropriate).

Organic farming classification

Crops Processes MaterialsFruits (by origin) Planting Natural soil amendments Vines

Grapes Controlling pests Compost Bushes Fertilizing Mulch Trees Natural pesticidesVegetables

Herbs

Designing faceted classifications

1. Decompose complex concepts (which you have gathered via your research into the subject literature) into component parts, via syntactic or semantic factoring.

2. Group the simple components into fundamental categories.

3. Organize the components in each facet (with hierarchical relationships, subfacets that indicate multiple principles of division, order within arrays, and so on).

Understanding complex concepts

There are two kinds of compounds:

• A multi-word unit (which may be a simple concept, such as stained glass, or a complex concept, such as glass cutting).

• A multi-concept unit (which may be a single word, such as sourdough).

Syntactic and semantic factoring

Syntactic factoring: A term with multiple words is divided into smaller components.Example: rye bread into rye + bread; Irish emigration into emigration + Irish

Semantic factoring: A term is divided into multiple elementary concepts. Example: apartment into dwelling + rental + shared building.

Semantic factoring

Most standards/authorities don’t recommend semantic factoring, and there aren’t rules you can use to help with it.

But semantic factoring can sometimes help you discover missing concepts in your subject language.

It might be extreme to describe Passover as “holiday + Jewish + commemoration + Exodus,” but doing so might make us consider both religion and commemoration of events as aspects common to many holidays.

Parsing compounds

A compound term consists of a focus (the class of things or events) and a difference, which modifies the class and makes a subclass.

Examples:• Car tires: Focus is tires, difference is cars. • Opera singing: Focus is singing, difference is opera. • Mushroom hunter: Focus is hunter, difference is mushroom.

Action/patient factoring

If the term contains an action (focus) modified by the recipient of the action (difference), factor.

But if the term refers to a material (focus) as modified by an action (difference), don’t factor.

Example:Hair dyeing: hair + dyeingBronze engraving: bronze + engraving But don’t factor: dyed hair, engraved bronzes

Part/whole factoring

If the focus refers to a part or property, and the difference refers to the whole or the possessor of the part or property, factor.

But if the focus is the whole, and the difference is the part or property, don’t factor.

Examples:Soil acidity: soil + acidityCar tires: tires + carsDon’t factor: spare tires, rain forest. Maybe: pine forest, redwood forest.

Action/performer factoringIf the term contains an intransitive action (focus) modified by the performer (difference), factor.

If the performer (focus) is modified by its performance of an intransitive action (difference), don’t factor.

Examples:Student meeting: students + meetingsLemur migration: lemurs + migrationsBut don’t factor: migratory birds

Determination of facet structure

Ranganathan started from the top down: describing fundamental categories (PMEST) for all subjects and organizing components into those universal facets.

The Classification Research Group (CRG), as described by Vickery, advocates beginning from the bottom up: reviewing components and assigning preliminary fundamental categories based on the concept’s definition within the classification’s domain, then looking for commonalities in these preliminary choices. Facets are specific to each classification.

Principles for creating facets

Some elements to consider when creating facets:

•Independence (are the facets mutually exclusive?).

•Semantic importance (do the facets represent the most important fundamental types in the domain?)

•Balance (are the facets at similar levels of abstraction?)

•Comprehensiveness (do the facets include all important subject components in the domain?)

•Hospitality (would it be easy to add more concepts to a facet?)

•Relevance (are the facets of interest to the identified user group and purpose)?

Faceted browsing on the Web

Hearst’s Flamenco is an interface to support browsing of faceted structures on the Web.

The Hearst article that you read describes how users preferred the faceted browsing interface to a search engine when exploring the collection.

(Note that the facets that Hearst used in the Flamenco system are semi-automatically generated and not, perhaps, the best that one might create...)

Your own classification design project

• Continue compiling potential concepts from source documents.

• Use your audience and purpose, as well as your subject knowledge, to refine the scope of your classification: its boundaries and its central and peripheral areas.

• Break down your candidate concepts into components: generate broader/narrower/linking concepts as appropriate.

• Define each concept’s particular meaning in the context of your classification.

• Wrangle your concepts into a classified structure.

Assignment progress check

• Bring drafts to class next week for peer feedback sessions (especially a draft of your classified structure).

• Also, everyone will have a five-minute check-in with me: you will tell me your subject, audience, and purpose in a few sentences, and explain your classified structure.