1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

41
1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals

Transcript of 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Page 1: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

1

Chapter 3

Trends In Data Warehousing

Paul K Chen

Data Warehouse Fundamentals

Page 2: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Data Warehousing is Becoming Mainstream

In the early stages, four significant factors drove many

companies to move into data warehousing:

Fierce competition Government deregulation Need to revamp internal processes Imperative for customized marketing

Page 3: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Walmart vs. Amazon.com

Walmart is the US company most quoted for the successful application & deployment of Data Warehousing technology.

Walmart filed lawsuit against Amazon.com for its unlawful way of pirating its DW technology by hiring away its DA personnel by offering hefty stock option to these people.

Page 4: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Significant Factors These significant factors reflect the new trends in data warehousing:

Multiple Data Types Data Visualization Parallel Processing Query Tools Browser Tools Data Fusion Multidimensional Analysis Agent Technology E-Business- ERP, KM, CRM

Page 5: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

3

Decision Making and Data Warehousing

“A data warehouse is the data, processes, tools, and facilities to manage and deliver complete, timely, accurate, and understandable business information to authorized individuals for effective decision making.”

Structured Data

– Includes traditional relational databases

– Typically internal and enterprise-owned

– Predetermined Unstructured Data

– Includes articles, reports, images, and videos

– Utilizes external data and expert opinion

– Ad hoc

Page 6: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Decision Making and Data Warehousing

Management Systems– Extend relational databases to store and support multimedia– User-defined types (UDT) and functions (UDF) in SQL-3

Specialized Servers– Used for data which is incompatible with relational databases

(e.g., Streaming video servers)– Objects may be linked to a relational database

Search Engines– Query by Image Content (shape, color, texture, etc)– Text retrieval on free-text documents– Audio and video searching

Page 7: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

4

Decision Making and Data Warehousing

~ The trend is toward unstructured data and ad hoc warehouses. ~

~ Trend toward multimedia. ~

Page 8: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

5

Types of Decision Support Tools

Data Inquiry– A request for a set of data based on some search criteria

Data Interpretation– Manipulation and visualization of a set of data (statistical

analysis) Multidimensional Analysis (OLAP)

– n-dimensional spreadsheet analysis Information Discovery

– Pattern recognition, trends Browsers

– Search metadata catalogs– Search information object lists– Launch analysis tools

Page 9: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

6

File-based Processing

Page 10: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

7

Types of Decision Support Tools

~ Trend toward utilization of the Web, facilitated by Java. ~

Page 11: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

10

Data Warehouse Architectures

Single Level– Decision support tools access operational data directly– Feasible only with “clean” data– Valid for unstructured data

Two Level Reconciled– Scrubbed operational data supporting ad hoc queries

Two Level Derived– Summarized data

Three Level– Maintains both scrubbed operational data, and summarized

data.

Page 12: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

11

Data Warehouse Architectures

~ Trend toward multidimensional data. ~

Page 13: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

12

Data Stores and Access Enablers

Specialized Multidimensional Databases– Data is peregrinated and loaded into multidimensional

databases– Long loading times but quick response

Relational-like Stores– Indexing is used to proved pseudo-multidimensional

functionality Relational Data Stores

– An extra semantic layer generates multidimensional data on the fly

Hybrids– Details are stored in a traditional relational format– A subset is cached in a multidimensional data structure

Page 14: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

13

Database Management System (DBMS)

Page 15: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

14

Data Stores and Access Enablers

~ Trend toward multidimensional data.

Page 16: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Metadata

Integrated Components– All components (sources, stores, etc) use a common metadata

repository to maintain their metadata Standardized Metadata Interchange

– Components keep their own metadata– Components use a common interchange information model

and syntax to share metadata Synchronized Metadata Interchange

– Metadata changes are updated automatically across all components

Building of Business Metadata– Manually entered, free-text, plain language descriptions

Page 17: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Metadata

~ Trend toward better metadata, exchanged between systems. ~

Page 18: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Middleware - Gluing the Warehouse Together

Definition: software that shields users and developers from differences in services and resources used by applications

Data warehouses often have heterogeneous databases, operating systems, networks, hardware, applications

Page 19: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Business Issues for Middleware

Role of middleware– Assist developer in data extraction/transformation

and populating DW– Assist business user in accessing DW– Therefore needed at different points in life cycle

Types– Copy management: data extraction, transformation,

replication, and propagation– Gateways: DB and independent gateways– Program-to program: RPCs, TP monitors, ORBs– Message-oriented

Page 20: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

15

Data Quality

Preprocessing Ownership– Source application owners know their data– Warehouse owners still must integrate the entire system

Automated Preprocessing Tools– Specialized packages– Generalized tools using pattern processing, lexical analysis,

and statistical matching to reconcile a wide range of data sources

– Custom programming Reliability and Credibility of External Data

– Quality ratings– Posted statistical meta-information (sample size, randomness,

etc)

Page 21: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

16

Data Quality

Trend toward better understanding of data quality. ~

Page 22: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

17

Significant Trends- Multiple Data Types

Structured Numeric

Structured Text

Unstructured Documents

Image

Data WarehouseRepository

Spatial

Video

Audio

Page 23: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Significant Trends- Data Visualization

More Chart Types-Pie chart, scatter plot Interactive Visualization Chart Manipulation Drill Down

Page 24: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Significant Trends- Parallel Processing

Aims to solve decision-support problems using multiple nodes working on the same problem.

Performs many database operations simultaneously, splitting individual tasks into smaller parts so that tasks can be spread across multiple processors.

Parallel DBMSs must be capable of running parallel queries, parallel data loading, table scanning, and data archiving, and back up.

Page 25: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Significant Trends- Parallel Processing

Shared memory architecture (SMP)– All the servers share all the data

Shared nothing architecture (MPP)– Each server has its own partition of data

Page 26: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Significant Trends- Query Tools, Browse Tools

Flexible Presentation –online results and report generator

Aggregate Awareness Crossing Subject Areas Multiple Heterogeneous Sources Integration Overcoming SQL Limitations Data Fusion

Page 27: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Significant Trends- Integrating ERP and Data Warehouse

Option 1: Companies implement the data warehouse solutions of the ERP vendor with the currently available functionality and await the enhancements.

Option 2: Companies implement customized data warehouse and use third-party tools to extract data from the ERP datasets. Retrieving and loading data from the proprietary ERP datasets is not easy.

Option 3: It is a hybrid approach that combines the functionalities provided by the vendor’s data warehouse with additional functionalities from third-party tools.

Page 28: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Significant Trends- Integrating KM and Data Warehouse

It is a systematic process for capturing, integrating, organizing, and communicating knowledge accumulated by employees.

It is a vehicle to share corporate knowledge so that employees may be more more effective and be productive in their work.

A knowledge management system must store all such knowledge in a knowledge repository.

What’s KM?

Page 29: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Significant Trends- Integrating KM and Data Warehouse

Sales have dropped in the South region.

Your marketing VP is able to discern this from your data warehouse by running some queries and doing some preliminary analysis. If he or she has access to a document prepared by an analyst explaining why the sales are low and suggesting remedial action.

Knowledge must be linked to the sales result to provide context to the sales numbers from the data warehouse.

A specific corporate scenario:

Page 30: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Significant Trends- Integrating KM and Data Warehouse

An airplane sales scenario: The following information is essentialFor a successful pitch for airplane sales.

Model configuration Production schedule (Delivery schedule) Part replacement Warranty

Knowledge obtained from the knowledge management system can provide context to the information received from the data warehouse to understand the story behind the above information.

Page 31: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Summary of Trends

Ad Hoc Questions Multidimensional Analysis (OLAP) Web-Enabled Data Warehouse Multimedia Middleware Metadata Interchange Integrating ERP with Data Warehouse Integrating KM with Data Warehouse

Page 32: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Complete E-Business Suite– A Review

One Database

MarketingSales

Order Mgt

Procurement

Supply Chain (SCM)Manufacturing

FinancialServices

Human Resources

Projects

CustomerRelationship(CRM)

ERP EAI

Page 33: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Information System Categories

Page 34: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Information System Categories

Page 35: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Data Warehouse & ERP

– ERP = Enterprise Resource Planning

– A software solution that addresses enterprise needs taking the process view of an organization to meet the

organization goals tightly integrating all the functions

of an organization.

-- It integrates all the departments and functions across

a company into a single computer system that can serve all those different departments’ particular

needs.

Page 36: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

WHY ERP?

Business

Customer satisfaction

Business development – new areas, products and services

Ability to face competition

Efficient processes required for company’s growth

IT

Present software does not met business needs.

Legacy systems difficult to maintain

Obsolete hardware/software difficult to maintain

Page 37: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

How ERP?

ERP Combines various department systems into a single, integrated software program that runs off a single database so that the various departments can more easily share information and communicate with each other.

The best part of ERP is the way in which it improves the order fulfillment process that is taking the customer order and process it into an invoice and revenue.

It doesn’t handle the front-end that is handled by CRM (Customer Relationship Management).

Page 38: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

How ERP?(cont’d) When a customer service representative enters a customer order to

an ERP system, he has all the information necessary to complete the order such as customer’s credit rating and order history from the finance module, the company’s inventory levels from the warehouse module and the shipping dock’s trucking schedule from the logistics module.

How it’s being done: It integrates the financial information and customer order information . It does so by integrating the following:

Database Application Interfaces Tools BPR

Page 39: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

How ERP? (cont’d)

It standardizes and speeds up the manufacturing process. This saves time, increases productivity and reduces head count.

It reduces the inventory. Due to the information available about all the orders it helps to maintain the right level of stock and smoothes the manufacturing process.

Page 40: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

Data Warehouse & EAI

What is EAI? EAI refers to Enterprise Application Integration. EAI is the merging of applications and data from various new and legacy systems within a business. Various means are employed to accomplish EAI, including middleware, in order to unify IT resources, maximize new ERP investments, diminish errors and get everyone on the same page. EAI enables companies to link their existing software applications with each other and with portals. EAI provide the ability to get their applications to exchange critical data. EAI is usually close to the top of any CIO's list of concerns. There are different approaches to EAI. Some rely on linking specific applications with tailored code, but most rely on generic solutions, typically called middleware. XML, combined with SOAP and UDDI is a kind of middleware.

Page 41: 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.

E-Business

~ Trend toward better understanding as well as consolidation of internal processes and data ~

~ Trend toward web-enabled data warehouse. ~