EIM Intro - Information Lifecycle

download EIM Intro - Information Lifecycle

of 23

Transcript of EIM Intro - Information Lifecycle

  • 8/9/2019 EIM Intro - Information Lifecycle

    1/23

    RFCorsello

    Research

    Foundation

    Enterprise Information ManagemInformation Lifecycle

  • 8/9/2019 EIM Intro - Information Lifecycle

    2/23

    Introduction

    Information Management is a complex subject covering all aspemanaging information within a given domain or organization

    Sharing between domains and organizations is a large need that bradditional complexity

    Information Management requires the understanding of the thrbasic states of information

    Data raw values

    Information data under a given context

    Knowledge the emergence of understanding from information

  • 8/9/2019 EIM Intro - Information Lifecycle

    3/23

    Information StFrom Data to Kno

  • 8/9/2019 EIM Intro - Information Lifecycle

    4/23

    Data

    Data is: Raw values, such as 5 or 5 dollars

    Collections of values, such as a spreadsheet file

    Data does not imply format or structure

    Data itself is the value, not the storage

    Data may be structured or formatted in any way

    Any specific format MAY provide context

    Data within an appropriate context becomes information

    Once removed from a context, or in an irrelevant context is once again data

    Data may be relevant in many contexts, which is the cornerstone for sharing

  • 8/9/2019 EIM Intro - Information Lifecycle

    5/23

    Information

    Information is:

    Data under a relevant context

    Context must be relevant to the data

    Context must also be relevant to the observer

    A concept information does not physically differ from data

    Information implies an ability to gain understanding

    But only if the information is available

    Information must be relevant to the context of the observer

    Data must also be relevant to the context of the observer to be more than simply data

    Knowledge arises from information

    Facts are information known to be correct or true

    Information is data with relevant context

    Data is any value or set of values

  • 8/9/2019 EIM Intro - Information Lifecycle

    6/23

    Knowledge

    Knowledge is:

    The theoretical or practical understanding of a subject

    Facts and information

    Awareness or familiarity gained by experience of a fact or situation

    Knowledge arises from information

    Facts are information known to be correct or true

    Poor context for information may greatly impact the ability to derive knowledge

    Information implies an ability to gain understanding

    But only if the information is available

    Knowledge results in the synthesis of information

    Possibly just in the mind

    Tools for content management enable the creation of context

    Linking between contexts and content provides additional information

    Knowledge management is more appropriately information management

    It is the users that translate knowledge into information within the systems

  • 8/9/2019 EIM Intro - Information Lifecycle

    7/23

    Data to Knowledge

    From an Information Technology perspective

    All knowledge is represented as information

    All information is represented as data

    Structure of the data as stored provides context

    Derivation of information from data comes from how the data is accessed

    The access methods and data structures form the context

    Knowledge is represented as new information generate from existing information

    One user creates information that a second user accesses

    The second user has gained knowledge from the first

    If software were able to take action based upon the information, that software would also have gained knan intelligent agent

    Data is what is managed in a computer system

    The contexts are also represented in a computer as data

  • 8/9/2019 EIM Intro - Information Lifecycle

    8/23

    Information LifecFrom Capture to D

  • 8/9/2019 EIM Intro - Information Lifecycle

    9/23

    The Lifecycle

    The information lifecycle is the processes by which data comes existence, is managed over time and eventually is discarded.

    There are generally four basic states of the information lifecycle

    Creation, collection or capture

    Distribution, use and access

    Maintenance, update or change

    Disposition, archival or destruction

  • 8/9/2019 EIM Intro - Information Lifecycle

    10/23

    Creation

    The creation of data is the process of generating and storing da For some data, the entire lifecycle may be outside of IT systems

    as paper records.

    The data creation phase is broken into three primary areas:

    Capture

    Assessment and Approval

    Ingestion

  • 8/9/2019 EIM Intro - Information Lifecycle

    11/23

    CreaCapture, Assessment, In

  • 8/9/2019 EIM Intro - Information Lifecycle

    12/23

    Capture

    Data capture can be divided into four primary categories:

    Continual or telemetry, where data is automatically generated and fed into ainformation repository, such as surveillance cameras

    Bulk, or offline where data is collected and aggregated, then fed in bulk toinformation repository

    Manual, which is the traditional human process of collecting data a single at a time

    Derived, or automated generation, where data is created by performing com

    on other data. This includes activities such as: Models and simulations

    Statistical analysis

    Interpolation or smoothing

  • 8/9/2019 EIM Intro - Information Lifecycle

    13/23

    Assessment and Approval

    The assessment process involves the evaluation of captured data to ensure it meets pre-defined criterion for acceptance

    The two primary parts to the assessment process are:

    Quality Assurance (QA)

    Quality Control (QC)

    QA is the set of practices that are performed to:

    Ensure data will meet acceptance criteria prior to being created. This involves activities such as:

    maintenance and calibration of instruments

    usage guidance for proper instruments.

    Evaluate collected data to enhance quality assurance activities for future collections.

    Evaluate QC practices and results to ensure quality criteria are met.

    QC is the set of practices that:

    Ensures the data within a repository will meet or exceed quality criterion

    Prevent poor quality data from making it into the public repositories

    The assessment and approval stage ensures only the created data meeting quality and acceptance criteria are available to futu

    Poor assessment and approval practices result in poor quality data being available

  • 8/9/2019 EIM Intro - Information Lifecycle

    14/23

    Ingestion

    Accepted data is loaded into the appropriate business repositories The process of ingesting data may involve transformations to match the

    destination format

    Transformation is a common requirement for automated collection me

    To maintain a full and verifiable chain of custody

    Raw data is kept in addition to the transformed data

    For space savings, raw data are often archived to an offline store

    Once data is ingested, it is available for consumption

    It is not uncommon for the entire creation process to be automateda single system

  • 8/9/2019 EIM Intro - Information Lifecycle

    15/23

    Distribution and Use

    Use of data within a repository is the primary purpose for the d

    existence

    Data use is considered in several ways:

    Discoverability

    Accessibility

    Usability

  • 8/9/2019 EIM Intro - Information Lifecycle

    16/23

    Distribution andDiscoverable, Accessible, Available

  • 8/9/2019 EIM Intro - Information Lifecycle

    17/23

    Discoverable

    Once data is within a repository is may be used

    In order to use data it must be discovered by a potential user

    Mechanisms to facilitate the location of data are discovery mechanisms

    If data cannot be found, it cannot be used

    Discoverability is key in the storage of data and the availability of that storaguser system

    If a user must search in multiple locations to find data, it is of marginal discoand use

    For data to be discovered, the discovery data (metadata or catalog) must alsaccessible

    A user interface is the central location for discovery to be exposed to a user

  • 8/9/2019 EIM Intro - Information Lifecycle

    18/23

    Accessibility

    Data must be accessed to provide value

    The accessibility of data involves aspects such as:

    Security

    Logical location

    Format

    If data is secured so that potential users cannot access it, the value of the data is diminished to those us

    In sensitive domains, this is expected and desired

    Logical location further limits accessibility if the data is contained within a repository that cannot be acc Behind a firewall

    Simply far away, then data transfer may take too long

    Data in formats that are proprietary or poorly supported may not be accessible to the tools required

    Overall, accessibility is a balancing act with security, need and cost

  • 8/9/2019 EIM Intro - Information Lifecycle

    19/23

    Usability

    Data in an unusable format given the available tools is unusable

    If data must be processed prior to being used it is less usable

    If processing time is long, data may become irrelevant before it is usable

    Usability has subtle implications such as:

    Scale

    Temporal currency

    Accuracy and precision

    Low precision data cannot be used in a high precision analysis

    Cost of data creation always a trade off against anticipated use

    Collecting high-quality, high-precision data can always pay off if cost shared with users in need oflower-precision data

    Redundant data collections are purely evil

  • 8/9/2019 EIM Intro - Information Lifecycle

    20/23

    Maintenance and DisposNow that we have it, What do we do with it and How do we get

  • 8/9/2019 EIM Intro - Information Lifecycle

    21/23

    Maintenance

    Data that changes over time must be maintained

    Data editing is subject to discovery, access and usage in addition to the need edits

    In some scenarios, only the current values are relevant

    In other scenarios, temporal changes are of greater significance than the current values

    Editing scenarios affect and influence data management strategies.

    The maintenance phase of the lifecycle includes:

    The entire set of practices and processes governing data management and maintenance

    Issues such as archival, availability, continuity of operations (COOP), fault-tolerance, perfand total costs

    The maintenance phase is the longest lived part of the data lifecycle

    All data uses occur within the maintenance phase

  • 8/9/2019 EIM Intro - Information Lifecycle

    22/23

    Disposition

    Disposition involves the processes and practices by which data is aged within repo

    Disposition includes:

    Archival or removal of old data

    Segregation of history data from live data

    Mechanisms for making segregated data available

    It is common that disposition is driven by storage costs and legal mandates such a

    SarbanesOxley (Sarb Ox / SOX)

    ClingerCohen

    Health Insurance Portability and Accountability Act (HIPAA)

    Once data has been disposed it is no longer part of the information lifecycle

    If the data is still available, such as historic data, it is not disposed

  • 8/9/2019 EIM Intro - Information Lifecycle

    23/23

    Quest