9b206DW Life Cycle_2

download 9b206DW Life Cycle_2

of 25

Transcript of 9b206DW Life Cycle_2

  • 8/2/2019 9b206DW Life Cycle_2

    1/25

    Maintenance

    Occurs when the system is inproduction

    Includes: technical operational tasks that are

    necessary to keep the systemperforming optimally usage monitoring

    performance tuning

    index maintenance

    system backup

    Ongoing support, education, andcommunication with business users

  • 8/2/2019 9b206DW Life Cycle_2

    2/25

    Growth

    DW systems tend to expand (if theywere successful)

    Is considered as a sign of success

    New requests need to be prioritized

    Starting the cycle again

    Building upon the foundation that has

    already been established Focusing on the new requirements

  • 8/2/2019 9b206DW Life Cycle_2

    3/25

    Questions ?

  • 8/2/2019 9b206DW Life Cycle_2

    4/25

    2008/2/4 4

    Fact

    table

    Dimensiontable 1

    Dimension

    table n

    Dimensiontable 2

    :

    :

    :

    :

    :

    SourceDatabase

    1

    SourceDatabase

    2

    SourceDatabase

    m

    MOLAP HOLAP ROLAP

    OR OR

    Source databases

    Star Schema designOLAP implementation Data

    storage

    Dataextraction

    Users

    Users

    SQL query

    OLAP

    command

    Relational views

    with OLAP

    Architecture of Three Tier Data Warehouse

    ----------------------------------------------Top Tier Front-end Processing---

    ----Middle Tier OLAP Server---

    -Bottom TierData Warehouse Server-

  • 8/2/2019 9b206DW Life Cycle_2

    5/25

    Data Warehouse for Decision Support

    A data base is a collection of data organized bya database management system.

    A data warehouse is a read-only analyticaldatabase used for a decision support system

    operation.

    A data warehouse for decision support is oftentaking data from various platforms, databases,

    and files as source data. The use of advancedtools and specialized technologies may benecessary in the development of decisionsupport systems, which affects tasks,

    deliverables, training, and project timelines.2008/1/29 5

  • 8/2/2019 9b206DW Life Cycle_2

    6/25

    Data Warehouse for endusers

    A data warehouse is readily user-friendlyby the analyst for end users, even thosewho are not familiar with databasestructure.

    Data warehouse is a collection ofintegrated de-normalized databases forfast response performance.

    In general, a data warehousing storage isfor at least 5 years long term capacityplanning growth.

    2008/1/29 6

  • 8/2/2019 9b206DW Life Cycle_2

    7/25

    Cycle

    1. Planning

    2. Gathering Data Requirements andModeling

    3. Physical Database Design andDevelopment

    4. Data Mapping and Transformation5. Data Extraction and Load

    6. Automating the Data Management

    Process7. Application Development-Creating the

    starter sets of reports

    8. Data Validation and Testing2008/1/29 7

  • 8/2/2019 9b206DW Life Cycle_2

    8/25

    Phase 1: Planning

    Planning for a data warehouse is concernedwith:

    Defining the project scope Creating the project plan

    Defining the necessary resources, both

    internal and external Defining the tasks and deliverables

    Defining timelines

    Defining the final project deliverables2008/1/29 8

  • 8/2/2019 9b206DW Life Cycle_2

    9/25

    Capacity Planning Calculate the record size for each

    table Estimate the number of initial records

    for each table Review the data warehouse access

    requirements to predict indexrequirements

    Determine the growth factor for eachtable

    Identify the largest target tableexpected over the selected period oftime and add approximately 25-30%overhead to the table size to

    determine temporary storage size2008/1/29 9

  • 8/2/2019 9b206DW Life Cycle_2

    10/25

    ase : a er ng a a requ remen s anModeling

    Gathering Data Requirements:

    How the user does business?

    How the users performance is measured?What attributes does the user need?

    What are the business hierarchies?

    What data do users use now and whatwould they like to have?

    What levels of detail or summary do the

    users need?2008/1/29 10

  • 8/2/2019 9b206DW Life Cycle_2

    11/25

    Data Modeling

    A logical data model covering the scope ofthe development project includingrelationships, cardinality, attributes, and

    candidate keys.or

    A Dimensional Business Model that diagramsthe facts, dimensions, hierarchies,relationships and candidate keys for thescope of the development project

    2008/1/29 11

  • 8/2/2019 9b206DW Life Cycle_2

    12/25

    Phase 3: Physical DatabaseDesign and Development

    Designing the database, includingfact tables, relationship tables, anddescription (lookup) tables.

    Denormalizing the data.

    Identifying keys.

    Creating indexing strategies. Creating appropriate database

    objects.

    2008/1/29 12

  • 8/2/2019 9b206DW Life Cycle_2

    13/25

    Phase 4: Data Mapping andTransformation

    Defining the source systems.

    Determining file layouts.

    Developing written transformationspecifications for sophisticatedtransformations.

    Mapping source to target data. Reviewing capacity plans.

    2008/1/29 13

  • 8/2/2019 9b206DW Life Cycle_2

    14/25

    Phase 5: Populating the datawarehouse

    Developing procedures to extract andmove the data.

    Developing procedures to load the data

    into the warehouse. Developing programs or use data

    transformation tools to transform andintegrate data.

    Testing extract, transformation and loadprocedures

    2008/1/29 14

  • 8/2/2019 9b206DW Life Cycle_2

    15/25

    Phase 6: Automating DataManagement Procedures

    Automating and scheduling the dataload process.

    Creating backup and recoveryprocedures.

    Conducting a full test of all of theautomated procedures.

    2008/1/29 15

  • 8/2/2019 9b206DW Life Cycle_2

    16/25

    Phase 7: Application Development- Creating the Starter Set of

    Reports

    Creating the starter set of

    predefined reports. Developing core reports.

    Testing reports.

    Documenting applications. Developing navigation paths.

    2008/1/29 16

  • 8/2/2019 9b206DW Life Cycle_2

    17/25

    Phase 8: Data Validation andTesting

    Validating Data using the starter setof reports.

    Validating Data using standardprocesses.

    Iteratively changing the data.

    2008/1/29 17

  • 8/2/2019 9b206DW Life Cycle_2

    18/25

    Phase 9: Training

    To gain real business value from yourwarehouse development, users of alllevels will need to be trained in:

    The scope of the data in the warehouse. The front end access tool and how it

    works.

    The DSS application or starter set of

    reports - the capabilities and navigationpaths.

    Ongoing training/user assistance as thesystem evolves

    2008/1/29 18

  • 8/2/2019 9b206DW Life Cycle_2

    19/25

    Phase 10: Rollout

    Installing the physical infrastructures forall users.

    Developing the DSS application.

    Creating procedures for adding newreports and expanding the DSSapplication.

    Setting up procedures to backup the DSSapplication, not just the data warehouse.

    Creating procedures for investigating andresolving data integrity related issues.

    2008/1/29 19

  • 8/2/2019 9b206DW Life Cycle_2

    20/25

    Star Schema DatabaseDesign

    The goals of a decision support databaseare often achieved by a database design

    called a star schema. A star schemadesign is a simple structure withrelatively few tables and well-defined joinpaths. This database design, in contrast

    to the normalized structure used fortransaction-processing databases,provides fast query response time and asimple schema that is readily understoodby the analysts and end users.2008/1/29 20

    U d t di St S h

  • 8/2/2019 9b206DW Life Cycle_2

    21/25

    Understanding Star SchemaDesign - Facts and

    DimensionsA star schema contains two types of tables, fact

    tables and dimension tables. Fact tablescontain the quantitative or factual data about a

    business - the information being queried. Thisinformation is often numerical measurementsand can consist of many columns and millions

    of rows. Dimension tables are smaller and holddescriptive data that reflect the dimensions of abusiness. SQL queries then use predefined anduser-defined join paths between fact and

    dimension tables to return selected2008/1/29 21

  • 8/2/2019 9b206DW Life Cycle_2

    22/25

    Dimensions

    Look for the elemental transactions within thebusiness process. This identifies entities that are

    candidates to be fact table.

    Determine the key dimensions that apply to eachfact. This identifies entities that are candidates tobe dimension tables.

    Check that a candidate fact is not actually adimension with embedded facts.

    Check that a candidate dimension is not actuallya fact table within the context of the decisionsupport requirement.

    2008/1/29 22

  • 8/2/2019 9b206DW Life Cycle_2

    23/25

    Step 1 Look for the elemental transactions within thebusiness process

    The first step in the process ofidentifying fact tables is where weexamine the business, and identifythe transactions that may be ofinterest. They will tend to betransactions that describe eventsfundamentals to the business.

    2008/1/29 23

  • 8/2/2019 9b206DW Life Cycle_2

    24/25

    each fact

    The next step is to identify the main dimensions foreach candidate fact table. This can be achieved bylooking at the logical model, and finding out whichentities are associated with the entity representingthe fact table. The challenge here is to focus on thekey dimension entities.

    2008/1/29 24

    St 3 Ch k th t did t f t i t

  • 8/2/2019 9b206DW Life Cycle_2

    25/25

    Step 3 Check that a candidate fact is notactually a dimension table with

    denormalized facts

    Look for denormalized dimensions withincandidate fact tables. It may be the case

    that the candidate fact table is adimension containing repeating groupsof factual attributes.

    2008/1/29 25