9b206DW Life Cycle_2
-
Upload
sachin-kumar -
Category
Documents
-
view
216 -
download
0
Transcript of 9b206DW Life Cycle_2
-
8/2/2019 9b206DW Life Cycle_2
1/25
Maintenance
Occurs when the system is inproduction
Includes: technical operational tasks that are
necessary to keep the systemperforming optimally usage monitoring
performance tuning
index maintenance
system backup
Ongoing support, education, andcommunication with business users
-
8/2/2019 9b206DW Life Cycle_2
2/25
Growth
DW systems tend to expand (if theywere successful)
Is considered as a sign of success
New requests need to be prioritized
Starting the cycle again
Building upon the foundation that has
already been established Focusing on the new requirements
-
8/2/2019 9b206DW Life Cycle_2
3/25
Questions ?
-
8/2/2019 9b206DW Life Cycle_2
4/25
2008/2/4 4
Fact
table
Dimensiontable 1
Dimension
table n
Dimensiontable 2
:
:
:
:
:
SourceDatabase
1
SourceDatabase
2
SourceDatabase
m
MOLAP HOLAP ROLAP
OR OR
Source databases
Star Schema designOLAP implementation Data
storage
Dataextraction
Users
Users
SQL query
OLAP
command
Relational views
with OLAP
Architecture of Three Tier Data Warehouse
----------------------------------------------Top Tier Front-end Processing---
----Middle Tier OLAP Server---
-Bottom TierData Warehouse Server-
-
8/2/2019 9b206DW Life Cycle_2
5/25
Data Warehouse for Decision Support
A data base is a collection of data organized bya database management system.
A data warehouse is a read-only analyticaldatabase used for a decision support system
operation.
A data warehouse for decision support is oftentaking data from various platforms, databases,
and files as source data. The use of advancedtools and specialized technologies may benecessary in the development of decisionsupport systems, which affects tasks,
deliverables, training, and project timelines.2008/1/29 5
-
8/2/2019 9b206DW Life Cycle_2
6/25
Data Warehouse for endusers
A data warehouse is readily user-friendlyby the analyst for end users, even thosewho are not familiar with databasestructure.
Data warehouse is a collection ofintegrated de-normalized databases forfast response performance.
In general, a data warehousing storage isfor at least 5 years long term capacityplanning growth.
2008/1/29 6
-
8/2/2019 9b206DW Life Cycle_2
7/25
Cycle
1. Planning
2. Gathering Data Requirements andModeling
3. Physical Database Design andDevelopment
4. Data Mapping and Transformation5. Data Extraction and Load
6. Automating the Data Management
Process7. Application Development-Creating the
starter sets of reports
8. Data Validation and Testing2008/1/29 7
-
8/2/2019 9b206DW Life Cycle_2
8/25
Phase 1: Planning
Planning for a data warehouse is concernedwith:
Defining the project scope Creating the project plan
Defining the necessary resources, both
internal and external Defining the tasks and deliverables
Defining timelines
Defining the final project deliverables2008/1/29 8
-
8/2/2019 9b206DW Life Cycle_2
9/25
Capacity Planning Calculate the record size for each
table Estimate the number of initial records
for each table Review the data warehouse access
requirements to predict indexrequirements
Determine the growth factor for eachtable
Identify the largest target tableexpected over the selected period oftime and add approximately 25-30%overhead to the table size to
determine temporary storage size2008/1/29 9
-
8/2/2019 9b206DW Life Cycle_2
10/25
ase : a er ng a a requ remen s anModeling
Gathering Data Requirements:
How the user does business?
How the users performance is measured?What attributes does the user need?
What are the business hierarchies?
What data do users use now and whatwould they like to have?
What levels of detail or summary do the
users need?2008/1/29 10
-
8/2/2019 9b206DW Life Cycle_2
11/25
Data Modeling
A logical data model covering the scope ofthe development project includingrelationships, cardinality, attributes, and
candidate keys.or
A Dimensional Business Model that diagramsthe facts, dimensions, hierarchies,relationships and candidate keys for thescope of the development project
2008/1/29 11
-
8/2/2019 9b206DW Life Cycle_2
12/25
Phase 3: Physical DatabaseDesign and Development
Designing the database, includingfact tables, relationship tables, anddescription (lookup) tables.
Denormalizing the data.
Identifying keys.
Creating indexing strategies. Creating appropriate database
objects.
2008/1/29 12
-
8/2/2019 9b206DW Life Cycle_2
13/25
Phase 4: Data Mapping andTransformation
Defining the source systems.
Determining file layouts.
Developing written transformationspecifications for sophisticatedtransformations.
Mapping source to target data. Reviewing capacity plans.
2008/1/29 13
-
8/2/2019 9b206DW Life Cycle_2
14/25
Phase 5: Populating the datawarehouse
Developing procedures to extract andmove the data.
Developing procedures to load the data
into the warehouse. Developing programs or use data
transformation tools to transform andintegrate data.
Testing extract, transformation and loadprocedures
2008/1/29 14
-
8/2/2019 9b206DW Life Cycle_2
15/25
Phase 6: Automating DataManagement Procedures
Automating and scheduling the dataload process.
Creating backup and recoveryprocedures.
Conducting a full test of all of theautomated procedures.
2008/1/29 15
-
8/2/2019 9b206DW Life Cycle_2
16/25
Phase 7: Application Development- Creating the Starter Set of
Reports
Creating the starter set of
predefined reports. Developing core reports.
Testing reports.
Documenting applications. Developing navigation paths.
2008/1/29 16
-
8/2/2019 9b206DW Life Cycle_2
17/25
Phase 8: Data Validation andTesting
Validating Data using the starter setof reports.
Validating Data using standardprocesses.
Iteratively changing the data.
2008/1/29 17
-
8/2/2019 9b206DW Life Cycle_2
18/25
Phase 9: Training
To gain real business value from yourwarehouse development, users of alllevels will need to be trained in:
The scope of the data in the warehouse. The front end access tool and how it
works.
The DSS application or starter set of
reports - the capabilities and navigationpaths.
Ongoing training/user assistance as thesystem evolves
2008/1/29 18
-
8/2/2019 9b206DW Life Cycle_2
19/25
Phase 10: Rollout
Installing the physical infrastructures forall users.
Developing the DSS application.
Creating procedures for adding newreports and expanding the DSSapplication.
Setting up procedures to backup the DSSapplication, not just the data warehouse.
Creating procedures for investigating andresolving data integrity related issues.
2008/1/29 19
-
8/2/2019 9b206DW Life Cycle_2
20/25
Star Schema DatabaseDesign
The goals of a decision support databaseare often achieved by a database design
called a star schema. A star schemadesign is a simple structure withrelatively few tables and well-defined joinpaths. This database design, in contrast
to the normalized structure used fortransaction-processing databases,provides fast query response time and asimple schema that is readily understoodby the analysts and end users.2008/1/29 20
U d t di St S h
-
8/2/2019 9b206DW Life Cycle_2
21/25
Understanding Star SchemaDesign - Facts and
DimensionsA star schema contains two types of tables, fact
tables and dimension tables. Fact tablescontain the quantitative or factual data about a
business - the information being queried. Thisinformation is often numerical measurementsand can consist of many columns and millions
of rows. Dimension tables are smaller and holddescriptive data that reflect the dimensions of abusiness. SQL queries then use predefined anduser-defined join paths between fact and
dimension tables to return selected2008/1/29 21
-
8/2/2019 9b206DW Life Cycle_2
22/25
Dimensions
Look for the elemental transactions within thebusiness process. This identifies entities that are
candidates to be fact table.
Determine the key dimensions that apply to eachfact. This identifies entities that are candidates tobe dimension tables.
Check that a candidate fact is not actually adimension with embedded facts.
Check that a candidate dimension is not actuallya fact table within the context of the decisionsupport requirement.
2008/1/29 22
-
8/2/2019 9b206DW Life Cycle_2
23/25
Step 1 Look for the elemental transactions within thebusiness process
The first step in the process ofidentifying fact tables is where weexamine the business, and identifythe transactions that may be ofinterest. They will tend to betransactions that describe eventsfundamentals to the business.
2008/1/29 23
-
8/2/2019 9b206DW Life Cycle_2
24/25
each fact
The next step is to identify the main dimensions foreach candidate fact table. This can be achieved bylooking at the logical model, and finding out whichentities are associated with the entity representingthe fact table. The challenge here is to focus on thekey dimension entities.
2008/1/29 24
St 3 Ch k th t did t f t i t
-
8/2/2019 9b206DW Life Cycle_2
25/25
Step 3 Check that a candidate fact is notactually a dimension table with
denormalized facts
Look for denormalized dimensions withincandidate fact tables. It may be the case
that the candidate fact table is adimension containing repeating groupsof factual attributes.
2008/1/29 25