Metadata: What, How and Why?
-
Upload
stuart-frost -
Category
Documents
-
view
28 -
download
0
description
Transcript of Metadata: What, How and Why?
Metadata: What, How and Why?
IMT595BApril 6, 2007
Mike CrandallUniversity of Washington Information School
April 6. 2007 Metadata: What, How and Why? 2
Web 2.0… The Machine is Us/ing Us
http://youtube.com/watch?v=6gmP4nk0EOE
April 6. 2007 Metadata: What, How and Why? 3
Roadmap
• What is metadata?– The basics – Metadata standards
• How can you use metadata? – What is it for?– When do you use it? – How much does it cost? – What about maintenance?
• Why would you use metadata?– What value does it add?– When are alternatives a better choice?– Social tagging vs. metadata
• Things to think about
What is Metadata?
April 6. 2007 Metadata: What, How and Why? 5
April 6. 2007 Metadata: What, How and Why? 6
What is Metadata?
• Data about data• Definitional data that provides information
about or documentation of other data managed within an application or environment… metadata may include descriptive information about the context, quality and condition, or characteristics of the data (FOLDOC)
• Levels of complexity– Simple (embedded in object; e.g., a hyperlink)– Structured (Dublin Core, content management)– Rich (library MARC records, Encoded Archival
Description)
April 6. 2007 Metadata: What, How and Why? 7
Origins
• Library science– Focus is on entities as containers for information– Emphasis is on resource discovery– Tight focus resulted in widespread standards
• Data management– Focus is on the information itself– Much more complex information spaces (e.g.,
NASA satellite data)– Much more varied types of information and use– Emphasis is on data use (authenticity, authority)– Standards tend to be associated with data types
April 6. 2007 Metadata: What, How and Why? 8
Types of Metadata
• Administrative– Object management– Rights and access management– Maintenance and preservation– Meta-metadata for managing metadata
• Structural or technical– Describes relationships between parts– Enables recognition and use of objects by systems
• Descriptive– Describes characteristics of object– Physical and aboutness (subject)
April 6. 2007 Metadata: What, How and Why? 9
Metadata Schemas
• Sets of metadata elements designed to meet the needs of a community
• The elements are the fields that hold values authorized for use in the schema
• Many different needs, so many different schemas are available
• Three primary components– Structure: the model used to derive the schema (e.g., RDF)– Semantics: the meaning of the elements
• Values are specified through rules or vocabularies (“encoding schemes” or authority control)
– Syntax: the method for encoding the schema (e.g., XML, XHTML)
April 6. 2007 Metadata: What, How and Why? 10
How Can You Use Metadata?
April 6. 2007 Metadata: What, How and Why? 12
Information Systems
Soergel, 1985
April 6. 2007 Metadata: What, How and Why? 13
Objectives of Metadata
• Find– Through search engines, catalogs, etc.
• Identify– Distinguishing between items for purposes of use
• Select– By attributes such as language, format, genre, etc.
• Obtain– Either directly or through location/ordering metadata
• Navigate– For example, categories on web sites
• Manage– Content management systems– Document repositories
April 6. 2007 Metadata: What, How and Why? 14
Finding
Indexing
User
Other Users
Query Preprocessing
Result Set Manipulation
Searching Index(es)User
Interface
Indexer
Independent Metadata
Data Stores
Data Analysis
Index Metadata
database schemasthesauri
file systemhttpmessaging storesDocument storeDatabasesDirectory stores
string manipulationsynonym sets &thesauristemmingwordbreaking
adaptive crawlingword breakingword stemmingNLP
dedupingconcatenationranking
Result Refining
User Metadata
April 6. 2007 Metadata: What, How and Why? 15
MSWeb Search
April 6. 2007 Metadata: What, How and Why? 16
News Publishing Tool
April 6. 2007 Metadata: What, How and Why? 17
Navigating
April 6. 2007 Metadata: What, How and Why? 18
Facets at wine.com
Facet / Metadata # of vocabulary terms
Type 46
Region 16
Winery 750
Price 6
Rating 6
Total terms 824
Total combinations 1,656,824Morante, Marcia. Creating Useful Taxonomies: Metadata, Taxonomies and Controlled Vocabularies. SLA – PER Division, June 8, 2004. http://www.kcurve.com/Metadata_Taxonomy%20Development_SLA_060804.ppt
April 6. 2007 Metadata: What, How and Why? 19
ManagingLayers in the Darwin Information Typing Architecture
Web site; information portal
aggregate printinghelpset
Delivery contexts
Web site; information portal
aggregate printinghelpset
Delivery contexts
referencetaskconcepttopic
Typed topic structures
referencetaskconcepttopic
Typed topic structures
highlighting software programming user interfaceIncluded domains:
referencetaskconceptTyped topic:
Specialized vocabularies (domains) across information types
highlighting software programming user interfaceIncluded domains:
referencetaskconceptTyped topic:
Specialized vocabularies (domains) across information types
OASIS (CALS) tablemetadata
Common structures
OASIS (CALS) tablemetadata
Common structures
http://www-128.ibm.com/developerworks/xml/library/x-dita1/
April 6. 2007 Metadata: What, How and Why? 20
Costs of Metadata
• Basic question should really be what are you trying to accomplish, and does metadata add value to your project?
• Startup costs can be high, but maintenance costs will be at least equal if not more
• Good metadata systems require resources– people, machines, and time
• Don’t start without an understanding of what those might be
April 6. 2007 Metadata: What, How and Why? 21
Example Startup CostsMILESTONES / TASKS LABOR / HOURS COSTS 1) INITIAL COSTS SOFTWARE AND OUTSIDE SERVICES Taxonomy Development Software $125,000 Search and Auto-classification Engine $140,000 Cross Application Schema Repository $125,000 Onsite Installation and Training $30,000 SUB TOTAL $420,000
PROJECT RESOURCES
System evaluation and purchase Knowledge Architecture Manager (80) Search Architect (40) Knowledge Architect (40)
7,600
Approvals Knowledge Architecture Manager (40) Knowledge Auditor/ Customer Liaison (80)
5,600
Audit (interviews) Knowledge Architect (40) Knowledge Auditor/ Customer Liaison (100)
5,800
Audit (systems) Knowledge Architect (20) Taxonomy Designer (100) Search Architect (40)
6,700
Imported Structures Taxonomy Designer (80) Knowledge Auditor/ Customer Liaison (40)
4,800
Modeling Knowledge Architect (40) Taxonomy Designer (60) Search Architect (60)
6,900
Application Development Search Architect (100) Application Developers (300) Knowledge Architect (20) Taxonomy Designer (20) System Engineer (120)
23,000
Refinement and Validation Taxonomy Designer (50) Taxonomy Engineers (300)
12,500
SUB TOTAL $72,900 TOTAL INITIAL COSTS $492,900
April 6. 2007 Metadata: What, How and Why? 22
Example Maintenance Costs 2) ONGOING COSTS (annualized) Customer Relations Knowledge Architecture Manager (1200),
Knowledge Architect (240) Knowledge Auditor/ Customer Liaison (800)
$102,800
Planning and Management Knowledge Architecture Manager (400) Knowledge Architect (120) Search Architect (120)
$30,800
Architectural Design Knowledge Architect (600) Search Architect (120) Taxonomy Designer (240)
$42,000
Change History and Reporting Knowledge Auditor/ Customer Liaison (160) Application Developers (120)
$11,200
Sea Change Events Knowledge Architecture Manager (120), Knowledge Architect (240) Knowledge Auditor/ Customer Liaison (240) Taxonomy Designer (240) Taxonomy Engineers (960) Search Architect (160) Application Developers (200)
$84,800
Reconciliation Knowledge Auditor/ Customer Liaison (80) Taxonomy Designer (320) Taxonomy Engineers (640)
$38,400
Synchronization Knowledge Auditor/ Customer Liaison (120) Taxonomy Designer (120) Taxonomy Engineers (480) Search Architect (240) Application Developers (480)
$56,400
Application Upgrades and Modifications
Search Architect (900) Knowledge Architect (120) Taxonomy Designer (240) Application Developers (1800)
$127,500
System Maintenance and Upkeep System Engineer (1800) $72,000 TOTAL ANNUAL ONGOING COSTS $565,900 TOTAL PROJECT OUTLAYS $1,058,800
Why Use Metadata?
April 6. 2007 Metadata: What, How and Why? 24
It’s Not Just the Tools
"Content" has been treated like a kind of soup that "content providers" scoop out of pots and dump wholesale into information systems. But it does not work that way. Good information retrieval design requires just as much expertise about information and systems of information organization as it does about the technical aspects of systems.
Bates,Marcia J. “After the Dot-Bomb: Getting Web Information Retrieval Right This Time” First Monday 7(7), July 2002. http://firstmonday.org/issues/issue7_7/bates/index.html
April 6. 2007 Metadata: What, How and Why? 25
The Big Picture
Selamat & Choudrie, 2004
We’re here
But don’t forget
the rest
April 6. 2007 Metadata: What, How and Why? 26
Alternative Approaches
• What about folksonomies and social tagging?– What problems can they solve?– What issues do they raise?
• How many people are likely to tag?• What about synonym control?• Does it matter?
• Civilizations in decline are consistently characterised by a tendency towards standardization and uniformity. Arnold Toynbee, historian(1889-1975)
April 6. 2007 Metadata: What, How and Why? 27
Alternative Approaches
April 6. 2007 Metadata: What, How and Why? 28
April 6. 2007 Metadata: What, How and Why? 29
April 6. 2007 Metadata: What, How and Why? 30
April 6. 2007 Metadata: What, How and Why? 31
April 6. 2007 Metadata: What, How and Why? 32
Where Does Metadata Fit?
We tend to think that the hard problems are the big ones. So we
believe that searching the Web is hard because it's so huge. But
I've been thinking lately that the really hard problems are actually
the ones in the middle. In the middle, many algorithms don't work
that well with moderate document sets, context becomes more
important, interaction is critical, and you can't get the user "in
the ballpark" anymore--you have to get them right to the thing
they're looking for.
Karl Fast- http://lists.ibiblio.org/mailman/private/aifia-members/2004-February/001129.html
April 6. 2007 Metadata: What, How and Why? 33
Braly & Froh (2007) after Shirky (2005)
A ContinuumWhen to Use Formal Metadata
April 6. 2007 Metadata: What, How and Why? 34
Things to Think About
• Make sure you can measure results
• Don’t assume one size fits all
• Choose user access points wisely
• Provide user tools and education for effective use of your metadata
• Make sure you’re adding value
• Balance theory with practical needs
• Consider trust and provenance
April 6. 2007 Metadata: What, How and Why? 35
Readings• Soergel, D. (1985). Organizing information. Principles of data base and retrieval systems. Orlando, Fl: Academic Press. 450 p. • Taylor, A. (2004). The Organization of Information. 2nd ed. Westport, Conn: Libraries Unlimited. 417p.• Burnett, K. (1999) “A Comparison of the Two Traditions of Metadata Development”. Journal of the American Society for
Information Science, 50(13), 1209-1217. • Rosenfeld, L. & P. Morville. (2002). Chapter 9, “Thesauri, Controlled Vocabularies, and Metadata” in Information Architecture for
the World Wide Web. 2nd ed. Sebastopol, CA: O’Reilly. (p. 176-208).• Zeng, M.L. (2005). Construction of controlled vocabularies: A primer. NISO. http://www.slis.kent.edu/~mzeng/Z3919/index.htm.• Bates,Marcia J. (2002) “After the Dot-Bomb: Getting Web Information Retrieval Right This Time” First Monday 7(7), July 2002.
http://firstmonday.org/issues/issue7_7/bates/index.html• Bryar, J.V. (2001) “Taxonomies: The value of organized business knowledge”. A White Paper Prepared for NewsEdge. • Byrne, T. (2004) “Enterprise information architecture: Don’t do ECM without it”. Econtent 27.5 (May 2004): 22-29.• Earley, S. (2005). “Developing enterprise taxonomies”. Early & Associates.
http://www.earley.com/Earley_Report/ER_Taxonomy.htm. • Montague Institute. (2001). “Managing taxonomies strategically”. http://www.montague.com/abstracts/taxonomy3.html. • Selamat, M.H. & J. Choudrie. (2004). “The diffusion of tacit knowledge and its implications on information systems: The role of
meta-abilities”. Journal of Knowledge Management, 8(2), 128-139. • Bulterman, D.C.A. (2004) "Is It Time for a Moratorium on Metadata?," IEEE MultiMedia, vol. 11, no. 4, pp. 10-17, 2004.
http://homepages.cwi.nl/~dcab/PDF/ieeeMM2004.pdf • Fitzgerald, M. (2006) “The Name Game: Tagging tools let users describe the world in their own terms as taxonomies become
"folksonomies."” CIO Magazine, April 1, 2006. http://www.cio.com/archive/040106/et_main.html?action=print• Braly, M. & G. Froh (2007). “Tagging”. Presentation for IMT530 Organization of Information Resources. (Feb 10.2007).
EnterpriseTagging.org http://enterprisetagging.org/assets/pdf/IMT530_Tagging_Presentation.pdf. • Shirky, C. (2005). “Ontology is Overrated: Categories, Links, and Tags”. Clay Shirky’s Writings About the Internet.
http://www.shirky.com/writings/ontology_overrated.html.
April 6. 2007 Metadata: What, How and Why? 36
Questions???