1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.
-
Upload
anissa-horn -
Category
Documents
-
view
219 -
download
1
Transcript of 1 Chapter 3 Trends In Data Warehousing Paul K Chen Data Warehouse Fundamentals.
1
Chapter 3
Trends In Data Warehousing
Paul K Chen
Data Warehouse Fundamentals
Data Warehousing is Becoming Mainstream
In the early stages, four significant factors drove many
companies to move into data warehousing:
Fierce competition Government deregulation Need to revamp internal processes Imperative for customized marketing
Walmart vs. Amazon.com
Walmart is the US company most quoted for the successful application & deployment of Data Warehousing technology.
Walmart filed lawsuit against Amazon.com for its unlawful way of pirating its DW technology by hiring away its DA personnel by offering hefty stock option to these people.
Significant Factors These significant factors reflect the new trends in data warehousing:
Multiple Data Types Data Visualization Parallel Processing Query Tools Browser Tools Data Fusion Multidimensional Analysis Agent Technology E-Business- ERP, KM, CRM
3
Decision Making and Data Warehousing
“A data warehouse is the data, processes, tools, and facilities to manage and deliver complete, timely, accurate, and understandable business information to authorized individuals for effective decision making.”
Structured Data
– Includes traditional relational databases
– Typically internal and enterprise-owned
– Predetermined Unstructured Data
– Includes articles, reports, images, and videos
– Utilizes external data and expert opinion
– Ad hoc
Decision Making and Data Warehousing
Management Systems– Extend relational databases to store and support multimedia– User-defined types (UDT) and functions (UDF) in SQL-3
Specialized Servers– Used for data which is incompatible with relational databases
(e.g., Streaming video servers)– Objects may be linked to a relational database
Search Engines– Query by Image Content (shape, color, texture, etc)– Text retrieval on free-text documents– Audio and video searching
4
Decision Making and Data Warehousing
~ The trend is toward unstructured data and ad hoc warehouses. ~
~ Trend toward multimedia. ~
5
Types of Decision Support Tools
Data Inquiry– A request for a set of data based on some search criteria
Data Interpretation– Manipulation and visualization of a set of data (statistical
analysis) Multidimensional Analysis (OLAP)
– n-dimensional spreadsheet analysis Information Discovery
– Pattern recognition, trends Browsers
– Search metadata catalogs– Search information object lists– Launch analysis tools
6
File-based Processing
7
Types of Decision Support Tools
~ Trend toward utilization of the Web, facilitated by Java. ~
10
Data Warehouse Architectures
Single Level– Decision support tools access operational data directly– Feasible only with “clean” data– Valid for unstructured data
Two Level Reconciled– Scrubbed operational data supporting ad hoc queries
Two Level Derived– Summarized data
Three Level– Maintains both scrubbed operational data, and summarized
data.
11
Data Warehouse Architectures
~ Trend toward multidimensional data. ~
12
Data Stores and Access Enablers
Specialized Multidimensional Databases– Data is peregrinated and loaded into multidimensional
databases– Long loading times but quick response
Relational-like Stores– Indexing is used to proved pseudo-multidimensional
functionality Relational Data Stores
– An extra semantic layer generates multidimensional data on the fly
Hybrids– Details are stored in a traditional relational format– A subset is cached in a multidimensional data structure
13
Database Management System (DBMS)
14
Data Stores and Access Enablers
~ Trend toward multidimensional data.
Metadata
Integrated Components– All components (sources, stores, etc) use a common metadata
repository to maintain their metadata Standardized Metadata Interchange
– Components keep their own metadata– Components use a common interchange information model
and syntax to share metadata Synchronized Metadata Interchange
– Metadata changes are updated automatically across all components
Building of Business Metadata– Manually entered, free-text, plain language descriptions
Metadata
~ Trend toward better metadata, exchanged between systems. ~
Middleware - Gluing the Warehouse Together
Definition: software that shields users and developers from differences in services and resources used by applications
Data warehouses often have heterogeneous databases, operating systems, networks, hardware, applications
Business Issues for Middleware
Role of middleware– Assist developer in data extraction/transformation
and populating DW– Assist business user in accessing DW– Therefore needed at different points in life cycle
Types– Copy management: data extraction, transformation,
replication, and propagation– Gateways: DB and independent gateways– Program-to program: RPCs, TP monitors, ORBs– Message-oriented
15
Data Quality
Preprocessing Ownership– Source application owners know their data– Warehouse owners still must integrate the entire system
Automated Preprocessing Tools– Specialized packages– Generalized tools using pattern processing, lexical analysis,
and statistical matching to reconcile a wide range of data sources
– Custom programming Reliability and Credibility of External Data
– Quality ratings– Posted statistical meta-information (sample size, randomness,
etc)
16
Data Quality
Trend toward better understanding of data quality. ~
17
Significant Trends- Multiple Data Types
Structured Numeric
Structured Text
Unstructured Documents
Image
Data WarehouseRepository
Spatial
Video
Audio
Significant Trends- Data Visualization
More Chart Types-Pie chart, scatter plot Interactive Visualization Chart Manipulation Drill Down
Significant Trends- Parallel Processing
Aims to solve decision-support problems using multiple nodes working on the same problem.
Performs many database operations simultaneously, splitting individual tasks into smaller parts so that tasks can be spread across multiple processors.
Parallel DBMSs must be capable of running parallel queries, parallel data loading, table scanning, and data archiving, and back up.
Significant Trends- Parallel Processing
Shared memory architecture (SMP)– All the servers share all the data
Shared nothing architecture (MPP)– Each server has its own partition of data
Significant Trends- Query Tools, Browse Tools
Flexible Presentation –online results and report generator
Aggregate Awareness Crossing Subject Areas Multiple Heterogeneous Sources Integration Overcoming SQL Limitations Data Fusion
Significant Trends- Integrating ERP and Data Warehouse
Option 1: Companies implement the data warehouse solutions of the ERP vendor with the currently available functionality and await the enhancements.
Option 2: Companies implement customized data warehouse and use third-party tools to extract data from the ERP datasets. Retrieving and loading data from the proprietary ERP datasets is not easy.
Option 3: It is a hybrid approach that combines the functionalities provided by the vendor’s data warehouse with additional functionalities from third-party tools.
Significant Trends- Integrating KM and Data Warehouse
It is a systematic process for capturing, integrating, organizing, and communicating knowledge accumulated by employees.
It is a vehicle to share corporate knowledge so that employees may be more more effective and be productive in their work.
A knowledge management system must store all such knowledge in a knowledge repository.
What’s KM?
Significant Trends- Integrating KM and Data Warehouse
Sales have dropped in the South region.
Your marketing VP is able to discern this from your data warehouse by running some queries and doing some preliminary analysis. If he or she has access to a document prepared by an analyst explaining why the sales are low and suggesting remedial action.
Knowledge must be linked to the sales result to provide context to the sales numbers from the data warehouse.
A specific corporate scenario:
Significant Trends- Integrating KM and Data Warehouse
An airplane sales scenario: The following information is essentialFor a successful pitch for airplane sales.
Model configuration Production schedule (Delivery schedule) Part replacement Warranty
Knowledge obtained from the knowledge management system can provide context to the information received from the data warehouse to understand the story behind the above information.
Summary of Trends
Ad Hoc Questions Multidimensional Analysis (OLAP) Web-Enabled Data Warehouse Multimedia Middleware Metadata Interchange Integrating ERP with Data Warehouse Integrating KM with Data Warehouse
Complete E-Business Suite– A Review
One Database
MarketingSales
Order Mgt
Procurement
Supply Chain (SCM)Manufacturing
FinancialServices
Human Resources
Projects
CustomerRelationship(CRM)
ERP EAI
Information System Categories
Information System Categories
Data Warehouse & ERP
– ERP = Enterprise Resource Planning
– A software solution that addresses enterprise needs taking the process view of an organization to meet the
organization goals tightly integrating all the functions
of an organization.
-- It integrates all the departments and functions across
a company into a single computer system that can serve all those different departments’ particular
needs.
WHY ERP?
Business
Customer satisfaction
Business development – new areas, products and services
Ability to face competition
Efficient processes required for company’s growth
IT
Present software does not met business needs.
Legacy systems difficult to maintain
Obsolete hardware/software difficult to maintain
How ERP?
ERP Combines various department systems into a single, integrated software program that runs off a single database so that the various departments can more easily share information and communicate with each other.
The best part of ERP is the way in which it improves the order fulfillment process that is taking the customer order and process it into an invoice and revenue.
It doesn’t handle the front-end that is handled by CRM (Customer Relationship Management).
How ERP?(cont’d) When a customer service representative enters a customer order to
an ERP system, he has all the information necessary to complete the order such as customer’s credit rating and order history from the finance module, the company’s inventory levels from the warehouse module and the shipping dock’s trucking schedule from the logistics module.
How it’s being done: It integrates the financial information and customer order information . It does so by integrating the following:
Database Application Interfaces Tools BPR
How ERP? (cont’d)
It standardizes and speeds up the manufacturing process. This saves time, increases productivity and reduces head count.
It reduces the inventory. Due to the information available about all the orders it helps to maintain the right level of stock and smoothes the manufacturing process.
Data Warehouse & EAI
What is EAI? EAI refers to Enterprise Application Integration. EAI is the merging of applications and data from various new and legacy systems within a business. Various means are employed to accomplish EAI, including middleware, in order to unify IT resources, maximize new ERP investments, diminish errors and get everyone on the same page. EAI enables companies to link their existing software applications with each other and with portals. EAI provide the ability to get their applications to exchange critical data. EAI is usually close to the top of any CIO's list of concerns. There are different approaches to EAI. Some rely on linking specific applications with tailored code, but most rely on generic solutions, typically called middleware. XML, combined with SOAP and UDDI is a kind of middleware.
E-Business
~ Trend toward better understanding as well as consolidation of internal processes and data ~
~ Trend toward web-enabled data warehouse. ~