Data Warehousing:

47
Data Warehousing: Jeffrey T. Edgell Assistant Executive Director ManTech Advanced Systems International I ManTech Advanced Systems International, Inc.

description

Data Warehousing:. Jeffrey T. Edgell Assistant Executive Director ManTech Advanced Systems International. Decision Support. - PowerPoint PPT Presentation

Transcript of Data Warehousing:

Data Warehousing:

Jeffrey T. Edgell

Assistant Executive Director

ManTech Advanced Systems International

IManTech Advanced Systems International, Inc.

Decision Support

In order to make correct decisions, accurate, meaningful informationinformation about business environments, external issues, and internal workings must be available in a timely fashion.

A Need For New Technology

• Government and industrial entities have been collecting data in electronic format since the 1960s.

• Today, organizations collect millions of pieces of information about every aspect of their operation on a daily basis.

• Data is obtained from multiple disparate sources.

A Need For New Technology

• Often information is replicated, leading to confusion.

• Related data is often retained in seemingly heterogeneous and incompatible platforms.

• Common data attributes are represented in nonstandard formats and naming constructs across systems.

A Need for New Technology

• Most systems are built for data collection (transaction based).

• Designed to support On-Line Transaction Processing (OLTP).

• Designed to support day-to-day business operations.

• Very specific applications built to support interaction with the data.

A Need for New Technology

• Perform best when handling small specific volumes of data.

• Does not accept information from dissimilar sources readily.

• Are not constructed to handle analysis of large amounts of data efficiently.

A Need for New Technology

• Capable of answering questions of a specific nature and time frame.– How many items do I have in stock today?– How many tickets were sold on a specific date?– What is the current price of an item?

A Need for New Technology

• Transaction based systems experience great difficulty in answering analytical and decision support questions.

• Analysis takes a long time, interfering with:– transaction performance– daily operations

• The nature of the data is dynamic and dispersed.

A Need for New Technology

Most organizations have created a “spider web” of systems and data sources.

A Need for New Technology

All of this has created “data overload” and “data confusion”.– What do I do with all of this data?– What does it mean?– Do I really need this data?– I am overwhelmed with the amount of data I am

confronted with.– I cannot make a timely decision (too much data

from too many sources).

Data Warehousing 101

Data warehousing is: A large historical databaselarge historical database designed to accept

key analytical datakey analytical data from multiple andmultiple and disparate sourcesdisparate sources that manage the day-to-day management of enterprise data. Furthermore, the role of the warehouse is to transform transactiontransaction data into corporate informationcorporate information. The warehouse is provided in a read-only fashion to a user.

Data Warehousing 101

  A data warehouse will provide:

  The ability to ask business analysis questionsbusiness analysis questions in a real-time, iterative fashion,real-time, iterative fashion, obtaining decision support information readily and quickly.

Data Warehousing 101

• A data warehouse is notis not: A repository for all corporate data.

• A data warehouse will notwill not: Single handedly solve all of the problems

associated to an enterprise.

Data Warehousing 101

Key components include the following:– data model

– data storage architecture (relational, proprietary)

– data access/replication/transport

– data transformation and scrubbing

– staging and publication

– metadata

– warehouse hardware and software

Data Warehousing 101

Legacy Data

Legacy Data

External Data Source

DataTransformation

Data Warehouse

Information Directory Repository

Data Warehouse Management Layer

Data Warehouse Architecture

Data Warehousing 101

How a warehouse deals with aging data:

1993 QtrlyData

1994 QtrlyData

1995 QtrlyData

1996 MonthData

1997 MonthData

1998 DetailData

Aggregation

Data Warehousing 101

Model concepts:– Fact table(s)

• A table containing multiple measurable descriptors relating to a specific area of business

• Each fact can be viewed, calculated, and aggregated against various defining areas of the business (time, geography, customer)

Data Warehousing 101

Model concepts:– Dimension Table(s)

• Retains information (product description, geography description, customer description) that is descriptive and remains moderately constant over time

Data Warehousing 101

Data Warehouse Modeling– Special modeling techniques must be applied to

provide rapid response of queries on large volumes of data.

– OLTP systems are built with update operations in mind, resulting in normalization and greatly reduced browse performance.

Data Warehousing 101

Common data model techniques are as follows:– star schema– snowflake– fact constellation– relational

Data Warehousing 101

Sample Star Schema ModelSample Star Schema Model

SALES

TIME

STORE CUSTOMER

GEOGRAPHY

Sales FactsSales Facts

DimensionsDimensions DimensionsDimensions

Data Warehousing 101

Sample Snowflake ModelSample Snowflake Model

SALES

TIME

STORE CUSTOMER

GEOGRAPHY

Sales FactsSales Facts

DimensionsDimensions DimensionsDimensions

North

South

East

West

Month

Qtr

Year

East Region

West Region

Data Warehousing 101

TIME

STORE

DimensionsDimensions

CUSTOMER

GEOGRAPHY

DimensionsDimensions

Regional Sales

District Sales

Store Sales

Sample Fact Constellation ModelSample Fact Constellation Model

Data Marting 101

Data marting is: A functional segmentfunctional segment of an enterprise

restricted for purposes of security, locality, performance, or business necessity using modeling and information delivery techniques identical to data warehousing.

Data Marting 101

Why build a data mart?– Allows an organization to visualize the large

but focus on the small and attainable.– Provides a platform for rapid delivery of an

operational system.– Minimizes risk.– A corporate warehouse can be constructed from

the union of the enterprise data marts.

Data Marting 101

DataWarehouse

Data From

Transaction Sources

Financial

Data Mart

Logistics

Data Mart

Contract

Data Mart

Update From theWarehouse

The data warehouse

populates

the data marts.

Data Marting 101

DataWarehouse

Data FromTransaction Sources

FinancialData Mart

LogisticsData Mart

ContractData Mart

Update From theData Marts

The data marts populatethe data warehouse.

Data Marting 101

Data FromTransaction Sources

FinancialData Mart

LogisticsData Mart

ContractData Mart

Virtual Data Warehouse

Abstract Data Warehouse Access LayerData is moved through the

abstract layer on demand.

The data warehouse layermanages the data marts

as a warehouse.

OLAP 101

• OLAP is a powerful graphics-oriented tool used to access the data warehouse

• OLAP supports– Business analysis queries– Data visualization– Trend analysis– Scenario analysis– User defined queries

OLAP 101

• Drill Down– Move from summary to detail

• Roll Up– Move from detail to summary

• Slice and Dice– Look at a specific interest of the business

OLAP 101

• Pivot and Rotate– Looking at data from varying perspectives

• Drill Through– Move to a near transaction level of detail

OLAP 101

• The flavors of OLAP– Multidimensional On-Line Analytical

Processing (MOLAP)– Relational On-Line Analytical Processing

(ROLAP)– Hybrid On-Line Analytical Processing

(HOLAP)

OLAP 101

• MOLAP– Produces a hypercube– Pre-aggregated and pre-calculated– Rapid response times– Limited in the amount of data that can be

managed

OLAP 101

• ROLAP– Data remains in a relational format– Some degree of aggregation– Slower response times– Scales to large amounts of data

OLAP 101

• HOLAP– Can manage data both as ROLAP and MOLAP– Currently evolving– MOLAP vendors are finding it easier to move

into the HOLAP market space

Data Mining 101

As defined by the Gartner Group in 1995, data mining is:

“…the process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in a repository, using pattern recognition technologies and statistical and mathematical techniques.”

Data Mining 101

• Data mining requires an analyst who is familiar with the domain to appropriately model scenarios.

• Data mining assists analysts in uncovering nontrivial data relationships.

• Analysis must be conducted to determine the meanings of these newly identified relationships.

Why Use a Data Warehouse ?

• Data warehousing is a must for anyone who uses multiple data sources to make decisions and understand business (trends, forecasting).

• Those who do not move to warehousing will not be capable of responding to problems and business conditions, thus falling behind the competition.

Why Use a Data Warehouse ?

• For organizations wanting to minimize costs and maximize productivity, warehousing is a must.

• Individuals who spend time gathering data instead of analyzing data require the assistance of a warehouse.

• Organizations that collect data but have difficulty determining meanings and impacts need a data warehouse.

Making the Warehouse a Reality

• Think big but work small.

• Match technology to requirements.

• Build for the future (scalability).

• Work closely with the users.– Requirements– Rapid Application Development (RAD) – Periodic releases to the user community

Real World Success Stories

• Radio Shack – Sales and stocking analysis– Marketing (regionalized mailings)

• Wal-Mart– Sales and stock analysis– Trend analysis– Vendor analysis

Real World Success Stories

• Naval Surface Warfare Center (NSWC)– Procurement– Supply– Workload

• Harris Semiconductor– Yield– Product– Personnel productivity

Real World Success Stories

• Defense Logistics Agency (DLA)/ManTech– Trend analysis– Problem identification– Procurement support– Enterprise data analysis

A Few Observations About Data Warehouses

• Industry and our experience indicate that:– Warehouses that succeed average an ROI of 400% with the top end

being as much as 600% in the first year.

– The incremental approach is most successful (build the warehouse a functional area at a time).

– The average time to gather requirements, perform a design, and deploy a warehouse increment is six months.

– New tools may be required that differ from the transaction environment.

• Software oriented toward intelligent analysis and query of the data warehouse

• Hardware oriented to support the massive storage requirements and analytical queries

Keys to Success

• Do you understand why you are building the warehouse?

• Have you identified both technical and business professionals that you will need to build the warehouse?

• Do you have a strong management sponsor?

• Are you managing the expectations of the users?

Careers in Data Warehousing

• System Administration

• DW Architect• Data Architect• DW Manager• DW Administrator• Decision Support

Analysts

• DBA• Application Developer• Data Cleansing/

Transformation Analyst

• Business Analyst• Management

A Real Architecture

DataWarehouse

FutureArchitecture

DLSC

DISCHP V2500

DSD

WSDB

Other

DSCCHP V2500

DSD

WSDB

Other

C & THP V2500

DSD

WSDB

Other

MedicalHP V2500

DSD

WSDB

Other

DISASAMMS

DSCRHP V2500

DSD

WSDB

Other

T1 T1 T1

T1

FLIS

T1MPP or SMP Cluster

Data Movement UsingMQSeries

DISC Users

DSCC Users

DSCR Users

C & T Users

Medical Users

NIPRNET

Internet Other Users

Other Data SourceAll Key DLSC Analytical

Data