Introduction to the GAO Enterprise Taxonomy Project Version 1.0 June 17, 2008 Draft.

40
Introduction to the GAO Enterprise Taxonomy Project Version 1.0 June 17, 2008 Draft

Transcript of Introduction to the GAO Enterprise Taxonomy Project Version 1.0 June 17, 2008 Draft.

Introduction to the GAO Enterprise Taxonomy Project

Version 1.0June 17, 2008

Draft

2

Introduction

• Definitions, Concepts, Context• About GAO• Development Process

o Researcho Strategyo Designo Implementation

Roadmap Sequencing Plan

o Administration• Elements of Information Architecture Maturity• Search Beta Demo

3

Definitions, Concepts and Context

Search is supposed to work – automagically.

5

Definitions, Concepts, Context

• Enterprise Search o Google o Better Metadata = Better Search

• What is Taxonomy?• What is Information Architecture (IA)?• Perspective from a Taxonomy Manager Point of View• Other perspectives

o Web Designero Usability Engineero Data Architect

Information Architecture for the World Wide Web, Peter Morville and Louis Rosenfeld

Definitions, Concepts, Context

Information Architecture for the World Wide Web, Peter Morville and Louis Rosenfeld

Definitions, Concepts, Context

Information Architecture for the World Wide Web, Peter Morville and Louis Rosenfeld

Definitions, Concepts, Context

ElementData Type Length Required Source Purpose

Asset Metadata

Unique ID String Variable Y System supplied System identifier to retrieve item.

Creator String Variable Y System supplied Editorial ownership.

Title String Variable N System suppliedText search & results display

Description String Variable N User supplied

Date Date Fixed N System supplied Publish, feature, & review content.

Subject Metadata

Topic String Variable Y Topic CV

Search and Browse (Faceted Navigation)

Program String Variable Y Program CV

Agency String Variable Y Agency CV

Type String Variable Y Content Type CV

Use Metadata

Security Level String Variable N Security CV Use control

Audience String Variable N Audience CV Target, personalize content.

Definitions, Concepts, Context: Metadata Schema (excerpt)

Courtesy of EPA

10

Definitions, Concepts, Context: Faceted Classification

• A faceted classification schema enables search and discovery by multiple attributes. These facets bring additional context to the search for assets

ContentSpace Shuttle

Space

exploration

Engagem

ent

Report

NA

SA

00’-0

5’

Cre

ato

rC

reat

or

Co

nsu

mer

Co

nsu

mer

Navigation System:• Wireframes• Blueprints (Site Maps)• Global/Local Templates• Hierarchies• A-Z Index Search and Browse

Zone:Faceted Navigation

akaClassification Scheme

or Taxonomy

Faceted Navigation: Volume and granularity of content presents findability problems. Some systems integrate search and browse allowing users to go back and forth

Search System:• Query Builder• Search Engine• Relevance Ranking• Results Presentation• Metadata Schema• Controlled Vocabulary

Definitions, Concepts, Context: Faceted Navigation System

12

Definitions, Concepts, Context: User Centered Design Focus

The user should be able to:• Search multiple repositories efficiently and intuitively• Find an object without having to know where it is stored• Use keyword queries integrated with browse to discover both

known and unknown data• Save searches and apply personal tags to content,

increasing its findability • Expose relationships between items; increased context

improves sense-making• Use a common, familiar information model when searching

across repositories • Keep search simple

“Deliver the right information to the right person at the right time”

13

About GAO

14

Mission and Work

GAO’s Mission is to support the Congress in meeting its constitutional responsibilities and to help improve the performance and ensure the accountability of the federal government for the benefit of the American people. We provide Congress with timely information that is objective, fact-based, nonpartisan, non-ideological, fair, and balanced.

GAO’s Work is done at the request of congressional committees or subcommittees or is mandated by public laws or committee reports. We also undertake research under the authority of the Comptroller General. We support congressional oversight by:

o auditing agency operations to determine whether federal funds are being spent efficiently and effectively;

o investigating allegations of illegal and improper activities; o reporting on how well government programs and policies are meeting

their objectives; o performing policy analyses and outlining options for congressional

consideration; and o issuing legal decisions and opinions, such as bid protest rulings and

reports on agency rules.

15

GAO Engagement Overview

• The Engagement Management Process sets forth specific activities that need to be completed for an engagement.  These activities allow an engagement to successfully proceed to product issuance. The activities are not necessarily done sequentially. This process applies to GAO's:

o Congressionally requested work o Legislative mandates o Comptroller General Authority (CGA) work

• Engagement Process: Phases or Activitieso Acceptanceo Planning and Designo Data Gathering and Analysiso Product Development and Distributiono Results

• Many Document Typeso Report, Testimony, Decision, Guidance, CG Presentation ect.

• ~3 million documents in the electronic records management system• ~180,000 engagement publications (audit/legal)• ~40 engagement system applications

16

Acceptance

- Evaluate Request, Mandate, CGA Proposal

- Make Acceptance Decision- Communicate Decision

Planning & Design

- Staff Engagement- Launch Engagement- Design Engagement- Plan Engagement- Commit to Engagement

Results

- Recommendation Tracking

- Accomplishment Reporting

- Audit Documentation Archive

Data Gathering & Analysis

- Gather Data- Analyze Data- Reach Message Agreement

Product Development & Dist.

- Develop Product- Obtain Concurrence- Index and Reference- Address Agency Comments- Perform Final Processing

Engagement Systems

Strategic Planning

Workforce Planning

Performance & Accountability Management

db3

db2:Requestor

SubjectTeamJob #

db1:Requestor

SubjectTeam

Accepted Date

db5

db4:Team

SubjectTitle

ObjectivesJob #

db7

db6:Document

TeamJob #

db9

db8:TeamTitle

Product #

db11

db10:Requestor

TeamJob #Title

Product #

Creation Indexing Publication

17

Typical Enterprise Semantic Problem

“Let us go down there and confuse their language”

Information assets are: • Fragmented • Decentralized • Inconsistently described

Multiple repositories, search systems and results presentationsare confusing to the user• Where is the information stored?• What keywords will retrieve the information?

To-Be GAO Information Architecture:• Enterprise IA Vision–Unify Information Space• Common Information Infrastructure• Taxonomy is the common metadata and vocabulary that provides meaning and context to assets

18

Development Process

19

Development Process: Near and Longer Term View

Research Strategy Design Implementation Administration

Longer Term IA Program

Iterative: Series of mini-projectsNear Term

20

Audiences, tasks,needs, informationseeking behavior,experience, vocab-ularies

Business goals, funding, politics,culture, technology,human resources

Document types,content objects,metadata, volume,existing structure

Research: Information Gathering and Analysis

Information Architecture for the World Wide Web, Peter Morville and Louis Rosenfeld

Research Methods • Focus Groups• Interviews• Questionnaires• Benchmarking

21

NASA World Bank EPA

Requirement

Faceted Navigation Yes Yes Yes

Core Metadata Specification

Yes Yes Yes

Enterprise Metadata Profile Yes Yes Yes

Metadata Registry/Repository

Yes Yes Yes

Tools (lists may be incomplete)

CategorizationExtractionFaceted NavigationTaxonomy ManagerInxightSidereanSchemaLogic

CategorizationClusteringExtractionSummarizationTaxonomy ManagerTeragram

CategorizationTaxonomy ManagerContent Intelligence ServicesSynaptica

Research: Benchmarking

Statement of Needs Target Requirements

Corporate Taxonomy Faceted Navigation

Develop an enterprise vocabulary using manual or technological solution

Manual or Extraction Tool

Automatic assignment of content to relevant categories

Categorization Tool

GAO needs a consistent information infrastructure shared across different applications

Core Metadata Specification Information Model Metadata Registry/Repository

Tool TypeVendor

22

Priority

Goal

Capabilities

Improve our ability to work more efficiently

Improve ability to store, archive, retrieve project information

EnterpriseContent

Management

Product Lifecycle

Management

InformationDiscovery and

Retrieval

• Document Storage• Web Content

Management• Records

Management• Work Flow

• Product Data Management• Requirements Management• Risk Management

• Cross Repository Retrieval

• External Partners Data Exchange

• Access Verification• Export Compliance

Processes

Technologies

• Security: Authentication• Metadata Standards

• Electronic Library - DocuShare

• Document Repository - Teamcenter Community

• Web Content - Rythmyx

• PDMS - Teamcenter Enterprise• Requirements Repository – DOORS,

Cradle, Core• Risk Management - ARM

• Portals – Inside JPL, Teamcenter Community

• Search Engine – Google• Problem Reporting - PFR/PRS• Manufacturing/Inventory - iPICS

• Domain Taxonomies• Schema Registries

Common Information Infrastructure

• Unique Object Identifiers

NASA: View from the top….

IA work supportsmany different stakeholders

Courtesy of NASA

23

• Involve users in a process of information gathering and analysis to facilitate identification and development of a set of enterprise information architecture needs or capabilities

• Define a common information infrastructure (common metadata and vocabulary) necessary to sufficiently describe GAO’s information assets

• Design an enterprise asset tagging workflow which increases data integrity and increases productivity

• Develop an approach for data validation, both for new metadata and the conversion of existing metadata from GAO systems

• Create an EIA roadmap which highlights near term “quick wins” and longer term to-be architectures

Business/Financial

Information and

TechnologyMgt

Engagement Audit/Legal

HumanCapital

Harmonize with

Common Metadata

and Vocabulary

Strategy: GAO To-Be IA (Draft)

24

Strategy: Some Specific Capabilities (Draft)

• Simple and transparent metadata capture process for content creation

• Auto-population of metadatao entity/concept extraction, categorization, clustering,

summarizationo Process for metadata creation and conversion

• Improve data integrity, quality and governance• Semantic interoperability

Dual track work plans – improve (quick wins) as-is systems and develop to-be roadmaps

o As-Is GAO Search Beta (demo)o To-Be Information Architecture Roadmap

25

Strategy Vision: Search and Browse (Draft)

Common Information InfrastructureFragmented Information Infrastructure

26

Metadata Extraction, Categorization Applications

Content Creation

Metadata Repository(Harmonize)

Validate against

metadata schema

Strategy Vision: Meta-Tagging Workflow for Content Creation and Conversion (Draft)

Content Object

Data Conversion

Legacy metadata

Document processing auto-population

Common Information Infrastructure

QualityAssurance

27

Design Approach: Four separate but related work tracks (Draft)

• Metadata Schema and Vocabularies (Taxonomy) – assess existing metadata schemas, industry standards and asset use cases to derive a “baseline” metadata framework; a core model plus selective extensions required as necessary by content type and domain. The baseline model will provide a foundation for the project’s subsequent work tracks.

• Information Infrastructure – utilize baseline metadata schema to build metadata element registry/repository, relate metadata element definitions to resources; design and implement architectural framework for metadata auto-population tools

• Governance Framework – define both process (meta-tagging workflow) and technology (meta-tagging tools) activities which provide quality assurance to GAO’s content stakeholders.

• Information Retrieval Systems – create search and navigation roadmaps for “quick wins” and longer range vision, implement mechanisms for faceted navigation

28

Implementation: IA Roadmap

29

Implementation: Prototype Sequence of Events (Draft)

Information Gathering &Planning

Conduct Interviews and Use Case

Prototype Testing

Build Information Retrieval Mechanisms (Search and Browse)

Assess information environment

Build Infrastructure Architecture

Design Metadata Schema and Vocabulary

Facet analysis and Use Case

Assess results and iterate design

Content Integration

Core Metadata Specification Version 1.0 Complete

Mapping/CrosswalkElement Definition/Registry

Audit of information domains and content types - prioritize

Faceted Navigation Enabled

Develop metadata strategy

Auto-populaterepository

Governance Framework

30

Administration: Data Governance (Draft)

• The need to plan, define, enable, and measure information architecture changes drives the need to address governance and consensus building regarding new roles, responsibilities, and workflows as early as possible

Who will manage the metadata schema and

vocabularies?

Thesaurus

Authority Files

ClassificationRules

DataDictionary

Taxonomy

•Core Metadata Specification•Maintain Taxonomies•Maintain Authority Files•Maintain Thesaurus •Maintain Classification Rules

Who will manage impacts to the business?

•Manage consensus across content owner groups on common metadata •Champion changes to existing systems and business unit procedures to capture accurate and more precise asset descriptions

Who will manage impacts to asset tagging

workflows?

• Meta-tagging new content and legacy content conversion•Determine if ‘re-indexing’ is required and how frequently it will occur

Who will manage the infrastructure impacts?

•Foster integration of content and search engines

Auto-population tools

31

Increasing Levels of IA Maturity: Iterative Development

• Search engine indexes multiple repositories• Advanced computation of relevance• Search log and click trail analysis• Core metadata specification• Faceted Navigation• Intelligent Search and Discovery• Metadata Registry/Repository • Semantic Interoperability • Enterprise Search Best Practice• Auto-population of metadata• Improved data integrity and governance• Enterprise 2.0 – social tagging, wikis, blogs

32

Search Beta Demo

33

34

35

36

37

38

39

40