CHAPTER SIX Databases and Data Warehouses. Information Granularity Refers to the level of detail of...

47
CHAPTER SIX Databases and Data Warehouses

Transcript of CHAPTER SIX Databases and Data Warehouses. Information Granularity Refers to the level of detail of...

CHAPTER SIX

Databases and Data Warehouses

Information Granularity

Refers to the level of detail of information Detailed (POS transaction) Course (Global sales totals)

Transactional vs. Analytical Information

Transactional information comes from a business process A bank deposit A credit card charge

Analytical information uses transactional data for the purposes of decision making Account balance trends Using credit card history to detect fraud

Transactional vs. Analytical Information

Information Dimensions

Information timeliness Obsolete information is useless Today’s information needs to be

provided in real time or near real time Information quality

Wrong information is useless Redundant information can be the

cause of errors Information must be complete

Data inconsistency and data integrity

Database Management

Characteristics Complex Databases often spread across multiple

servers Databases often spread across multiple

physical disks Fault tolerance is critical

Databases may be distributed

Database Vendors

The industry has consolidated IBM

DB2 Universal Oracle Microsoft

SQL Server Access

Sun (MySQL) Is now Oracle

Database Performance

Transaction Processing Performance Council provides standard benchmarks

TPC-C – Online transaction processing

TPC-E – Online brokerage transactions

TPC-H – Ad-hoc decision support TPC-W – Web / E-commerce

Database Performance (TPC-C)

Multiple transaction types Independent of software and

hardware Scalable Basis is online transaction

processing (OLTP)

1960s Data Management

These are legacy systems Batch processing

Characterized by traditional file processing

Data processing was sequential Not possible to directly locate a particular file

record Data dependent on the programs that

used the data Program data dependence

1970s Data Management

Batch processing gives way to on line transaction processing Files stored on disk rather than tape Any record can be located in the same amount

of time Technologies

Indexed Sequential Access Method (ISAM) Virtual Sequential Access Method (VSAM)

Direct Access files Use a hashing function to derive record keys

1980s Data Management

Databases are becoming commonplace

Personal computer databases are evolving DBase R-Base

1990s Data Management

Huge data stores and transaction processing capabilities

Distributed databases Object-oriented databases 6 Million+ transactions per second

Realities of a DBMS

Data centric rather than application centric Can be a repository for all an organization’s

data Databases tend to be centralized Queries get data from a DBMS

SQL is the standard query language Report generators create printed and Web-

based reports Applications interface with DBMS

Types of Databases

Database models include: Hierarchical database model – A tree-

based structure Network database model –

Mathematically, a directed graph Relational database model – stores

information in the form of logically related two-dimensional tables

Object-oriented databases

Elements of a Database

Logical view and physical view Users see and work with the logical

view Physical view is controlled by the

database management system itself

Entities and Attributes

Relational databases store information in tables (entities) Customer / order / product

Tables contain fields (attributes) Customer name, address

Keys

Each table has a primary key that uniquely identifies each record Natural keys have some meaning (stock

symbol) Artificial keys have no intrinsic meaning

(your R number) Foreign keys are used to link tables

in one-to-many relationships

Database Interaction

Advantages of an RDMS (Scalability)

Database can scale to the terabyte or petabyte range NSA maintains 1.9 trillion telephone call

records Large databases can span several

servers and storage devices

Advantages of an RDBMS (Redundancy)

Databases can be configured to write duplicate (redundant) information Citibank

Journaling and checkpointing are supported

Advantages of an RDBMS (Integrity)

Relational integrity constraints are rules that apply to the relationships between tables

Business integrity constraints enforce business rules Not really a part of the DBMS itself

Advantages of an RDBMS (Information Security)

A DBMS supports advanced access rights By table and fields By time of day By location By row information

Data-driven Web Sites

Nearly all transactional Web sites rely on a database Amazon Your bank Any shopping cart application Ebay or Craig’s List Facebook and You Tube

Database Integration

Databases often need to be integrated Because of mergers and acquisitions Because of organizational changes

We are referring to connections to multiple databases

Data Warehouses (Introduction)

Central source for clean data May contain internal or external data

Use to spot hidden patterns in data May be integrated with operational

database Parts of a data warehouse are called data

marts Data warehouses contain an analytical

component

Cleansing Data

Data is often obtained from a myriad of sources External lists Internal databases Other databases

This data must be cleansed and sanitized to remove Redundancy / errors / etc…

Data Warehouses (Illustration)

Multidimensional Analysis

Data are often analyzed as 3-dimensional cubes

Cubes are then ‘sliced and diced’ to look at various layers

Multidimensional Analysis (Illustration)

The cost of Perfect Information

Database Design (Introduction)

In the systems process, we design before we implement Requirements specification Conceptual design Logical design Physical Design

Database Design Tools

Unified Modeling Language (UML) Visio Rational Rose

Entity relationship diagrams describes relationships between data

Normalization eliminates redundant data

Database Management HR

Database administrators Data managers Programmers and systems analysts Data security

BUSINESS INTELIGENCE / DATA MINING

Business Intelligence (Introduction)

Simply put, it’s internal and external data used to support better decision making

It’s challenging to sift through the mountains of data

It requires cross-functional collaboration between systems

More in the next chapter but we use ERP systems to improve business intelligence

Business Intelligence (Industries)

BI applies to all industries Retail and sales

Understanding procurement and distribution (SCM) / customers (CRM)

Banking Understand credit worthiness / fraud

behavior Insurance

Forecast claim risk and understand at – risk customers

Business Intelligence (Industries)

Airlines Routing planes / minimize turnaround

time (Southwest) Marketing

Demographics Sell based on known customer behavior

(Harrah’s) Amazon

Business Intelligence (Levels)

Operational Day-to-day operations (building a Dell)

Tactical Short term (Dell ordering supplies)

Strategic Long term organizational goals

The systems that provide BI typically do so at all levels

BI Levels (Illustration)

BI and Latency

From the time of acquisition, how long does it take to analyze (analysis latency)

Time to make a decision based on the analysis E-transactions significantly reduce

latency

Data Mining (Introduction)

Data gets mined (analyzed) from data contained in a data warehouse or data mart

Specialized tools are used to analyze data for ‘interesting nuggets’

Ways to mine Drill down (general to specific) Drill up (specific to general)

Data Mining (Clustering)

Cluster analysis groups data by trait or traits

Examples Don’t drink the water in Fallon Segment customers by zip codes

Data Mining (Association)

Answers the question “What traits are associated with other traits” When I stay at Harrah’s,

I gamble I eat at the Sage room

When I stay in Vegas, I gamble more

Data Mining (Statistical Analysis)

It’s basic statistics Analysis of variance Correlation coefficients Etc…

BI Benefits

We can understand what’s happening inside and outside a department Sales knows about product inventory

levels and production schedules Production knows about sales and sales

forecasts Finance knows about the sales

forecasts too This information is provided in near

real time

Quantifying BI

Some benefits can be clearly quantified Costs went down Productivity increased Inventory levels were optimized 10%

Some are indirectly quantified Some benefits are intangible Sometimes, we get unexpected

results