Metadata: Practical, painless, profitable

2
by Christine Connors ometimes I’m Pinky, sometimes Brain. My inner “Brain,” S eager to graft librarian DNA everywhere possible, is ecstatic that thousands have been bitten by the tagging trend. Slowly but surely metadata is gaining the appreciation it deserves. Teenagers and geeks aren’t the only ones who’ve been bitten. Executives and judges are coming around to the value of metadata too. Taking Over the World! Jessica Vascarello, in her January 24, 2006 Wall Street Journal article, called tugging “The Next Big Thing in Searching.” IDC published a whitepaper in May 2006 call- ing the management of metadata “essential for success” (“Managing Metadata in the Coherent Information Environ- ment: Essential for Success in the 2lstCentury Enterprise” IDC Executive Brief, May 2006, www.idc.com). Judge David Waxse, of the US District Court for Kansas, in September 2005 determined in Williams v Sprint that electronic documents should be produced with their metadata intact, unless the parties agree to the metadata being removed or the produc- ing party requests a protective order (www.ksd.uscourts.gov/ opinions/O32200JWLDJW- 3333.pdf). This decision was heav- ily influenced by the Sedona Principles, produced by a group of legal experts concerned with intellectual property rights and other issues (http://www.thesedonaconference.org/). Christine Connors has an MSLlS from Simmons. Intending to go the route of the reference librarian, she found herself instead in IT in charge of taxonomies metadata and website management. Reach her at Christine<at>sw2sw.net Practical applications of metadata can improve the bot- tom line. We are beginning to see evidence that can make the most skeptical manager supportive of quality metadata. Soft dollar savings are easier to calculate, though harder to sell. Cost avoidance is good, but not nearly as interesting to man- agement as hard dollar savings. Allow me to put forth some practical ideas for consideration. Good metadata, including elements for rights and security management, can help a company avoid millions of dollars in intellectual property infringement fines. While the judge in the Williams v Sprint case did not choose to assess fines, the legal fees alone would be worth saving. Other infamous cases do come to mind, including the Tasini vs. New York Times (http://laws.findlaw.com/us/000/00-20 1 .html) and American Geophysical Union v. Texaco (American Geophysical Union v. Texaco, 37 E3d 882 (2d Cir. 1994), http://fairuse.stanford. eddprimary-materials/cases/texaco/settlement. html). Content management and search are applications where metadata are critical. Content re-use is becoming important for companies as they attempt to streamline operations. Reducing duplicates is also desirable - better server manage- ment means lower costs and fewer versions to review to find the right one saves employee time. Reducing near duplicates - documents that are only slightly different - is of even greater importance. Reading multiple copies to determine which has the precise data you need or which is the authentic, legal ver- sion is time-consuming. Requiring authors to spend a few extra minutes applying correct metadata or employing cata- logers in the content publishing workflow increases the rel- evance of an information object. Employee time is not wasted in searching since time and material costs are saved when employees don’t have to recreate a document. AugustlSeptember 2006-Bulletin of the Arnericon Society for Information Science and Technology 1 3

Transcript of Metadata: Practical, painless, profitable

by Christine Connors

ometimes I’m Pinky, sometimes Brain. My inner “Brain,” S eager to graft librarian DNA everywhere possible, is ecstatic that thousands have been bitten by the tagging trend. Slowly but surely metadata is gaining the appreciation it deserves. Teenagers and geeks aren’t the only ones who’ve been bitten. Executives and judges are coming around to the value of metadata too.

Taking Over the World! Jessica Vascarello, in her January 24, 2006 Wall Street

Journal article, called tugging “The Next Big Thing in Searching.” IDC published a whitepaper in May 2006 call- ing the management of metadata “essential for success” (“Managing Metadata in the Coherent Information Environ- ment: Essential for Success in the 2lstCentury Enterprise” IDC Executive Brief, May 2006, www.idc.com). Judge David Waxse, of the US District Court for Kansas, in September 2005 determined in Williams v Sprint that electronic documents should be produced with their metadata intact, unless the parties agree to the metadata being removed or the produc- ing party requests a protective order (www.ksd.uscourts.gov/ opinions/O32200JWLDJW- 3333.pdf). This decision was heav- ily influenced by the Sedona Principles, produced by a group of legal experts concerned with intellectual property rights and other issues (http://www.thesedonaconference.org/).

Christine Connors has an MSLlS from Simmons. Intending to go the route of the reference librarian, she found herself instead in IT in charge of taxonomies metadata and website management. Reach her at Christine<at>sw2sw.net

Practical applications of metadata can improve the bot- tom line. We are beginning to see evidence that can make the most skeptical manager supportive of quality metadata. Soft dollar savings are easier to calculate, though harder to sell. Cost avoidance is good, but not nearly as interesting to man- agement as hard dollar savings. Allow me to put forth some practical ideas for consideration.

Good metadata, including elements for rights and security management, can help a company avoid millions of dollars in intellectual property infringement fines. While the judge in the Williams v Sprint case did not choose to assess fines, the legal fees alone would be worth saving. Other infamous cases do come to mind, including the Tasini vs. New York Times (http://laws.findlaw.com/us/000/00-20 1 .html) and American Geophysical Union v. Texaco (American Geophysical Union v. Texaco, 37 E3d 882 (2d Cir. 1994), http://fairuse.stanford. eddprimary-materials/cases/texaco/settlement. html).

Content management and search are applications where metadata are critical. Content re-use is becoming important for companies as they attempt to streamline operations. Reducing duplicates is also desirable - better server manage- ment means lower costs and fewer versions to review to find the right one saves employee time. Reducing near duplicates - documents that are only slightly different - is of even greater importance. Reading multiple copies to determine which has the precise data you need or which is the authentic, legal ver- sion is time-consuming. Requiring authors to spend a few extra minutes applying correct metadata or employing cata- logers in the content publishing workflow increases the rel- evance of an information object. Employee time is not wasted in searching since time and material costs are saved when employees don’t have to recreate a document.

AugustlSeptember 2006-Bulletin of the Arnericon Society for Information Science and Technology 1 3

Two Examples That Demonstrate Strong Positive ROI for Metadata I . We use metadata to free wasted employee time to accom-

plish our goals. Let’s look at a time-wasting scenario. Here are the em- ployee constants:

The organization’s burdened rate is $100,000 a year

FTEs are paid for 2080 hours a year (40 hourdweek

Our imaginary organization has 25,000 employees If we save each employee just 15 minutes per week - less than 1% of their time - the savings to the organization is $15.6 million a year. Susan Feldman of IDC has estimated the wasted time to be much higher, which is no surprise to many of us, but is a hard sell to management (see “The High Cost of Not Finding Information” by Susan Feldman of IDC, in the March 2004 issue of KMWorld, www.km world.com/Articles/ReadArticle.aspx?ArticleID=95 34).

I have worked on enterprise search applications and can vouch for an average of 30% of searches being aban- doned per month. How much time does that represent? Moreover, how many frustrated attempts at finding infor- mation lead to recreating existing data? When I worked for the libraries at Raytheon, a Six Sigma study done by my colleagues in their Integrated Defense Systems (IDS) Research Library determined that the average savings per object borrowed or acquired by a librarian was $2200. Savings estimates included lost employee time, time to research and recreate the information, and costs for pur- chasing information.

for a full time employee (FTE).

for 52 weeks)

2. We use metadata tofree up space on computer networks to install our critical data. If information is easier to find, then its rate of duplication will be lower. If a 20% reduction can be had simply by not re-creating information, what further impact is achieved by not having to store it?

Let us again consider a scenario: An organization is storing 100 terabytes (TB) of data. Twenty percent (20%) is duplicates. A mid-range Sun server holds 1.7 TB and costs $31,000.

If fewer servers can be purchased or some returned, then the savings would be approximately $364,000 ((20A.7) x 3 1 k=364,706)

Further, we could also consider a tiered storage solution. As data becomes less active and/or less valuable, it can be moved to lower cost storage systems. The moves can be

done intelligently, based on the metadata such as date of creation, date last accessed, security rules, records man- agement rules or retention schedules. Let’s assume the following:

We start with 100 TB of data on Tier 1 - the top of the

Tier 1 costs $3000m/Month, Tier 2, $1500/lB/Month;

Suppose that analysis determines that we could use-

20% is on Tier 1,30% is on Tier 2, and 50% is on Tier 3 Under our original configuration costs are $300,000

per month, while the tiered storage costs would be $145,000 per month, for a savings of $155,000. Imagine the savings going forward if the amount of data increases at a rate of 60% per year!

line storage.

Tier 3, $800/TB/Month

fully reorganize the data such that:

Are You Pondering What I’m Pondering? Try social tagging behind the firewall. It gives employees

the means to label content in personally meaningful ways. It helps secure data against security leaks or the competitive intelligence gathering possible when links are posted on pub- lic sites. It can also help to boost the usage of existing data silos inside the organization, such as directories, best prac- tices or lessons-learned databases.

Incrementally add value with metadata-based services. Start small, aim at high-value content. Add Suggested Sites or Suggested People to search engine results pages. Get for- ward thinking website developers to use RDF/A (www.w3.org/ 200 l/sw/BestPractices/HTML/2006-0 1 -24-rdfa-primer) or Microformats (http://microformats.org/) to tag sections of their content. [RDF/A is a collection of attributes for layer- ing RDF (Resource Description Format) on XML languages. It is a World Wide Web Consortium (W3C) internal working document.] Get hold of an LDAP (Lightweight Directory Access Protocol) feed and convert it to FOAF. FOAF stands for Friend-of-a-Friend and describes people, the links between them and the things they create and do (www.foaf-project.org). LDAP already contains great contact information, so take it to the next level. Discover the appropriate schemas for your data, add your terms and start small. Find the supporters in your organization.

Above all, keep metrics! Vital research continues to come from academic organizations. Executives however, want to benchmark against other organizations in their industry or of their size. We need to collectively contribute to a body of knowledge on the value of metadata to support our business cases. Let’s work together to evolve - and take over the world!

1 4 Bulletin of the American Society for Information Science and Technology-AugustlSepternber 2006