Organizational intelligence technologies There are three kinds of intelligence: one kind understands...

54
Organizational intelligence technologies There are three kinds of intelligence: one kind understands things for itself, the other appreciates what others can understand, the third understands neither for itself nor through others. This first kind is excellent, the second good, and the third kind useless. Machiavelli, The Prince, 1513.

Transcript of Organizational intelligence technologies There are three kinds of intelligence: one kind understands...

Organizational intelligence technologies

There are three kinds of intelligence: one kind understands things for itself, the other appreciates what others can understand, the third understands

neither for itself nor through others. This first kind is excellent, the second good, and the third kind useless.

Machiavelli, The Prince, 1513.

Organizational intelligence

Organizational intelligence is the outcome of an organization’s efforts to collect store, process, and interpret data from internal and external sourcesIntelligence in the sense of gathering and distributing information

Types of information systems

Type of information system

System’s purpose

Transaction processing system

TPS

Collects and stores data from routine transactions

Management information system

MIS

Converts data from a TPS into information for planning, controlling, and managing an organization

Decision support system

DSS

Supports managerial decision making by providing models for processing and analyzing data

Business Intelligence

BI

Enables the business to develop a better understanding of its key stakeholders and organizational environment

On-line analytical processing

OLAP

Presents a multidimensional, logical view of data to the analyst with no requirements as to how the data are stored

Data mining Uses statistical analysis and artificial intelligence techniques to identify hidden relationships in data

The information systems cycle

Transaction processing systems

Can generate huge volumes of dataA telephone company may generate several hundred million records per dayRaw material for organizational intelligence

The problem

Organizational memory is fragmented

Different systemsDifferent database technologiesDifferent locations

An underused intelligence system containing undetected key facts about customers

The data warehouse

A repository of organizational dataCan be measured in petabytes (1015)

Managing the data warehouse

ExtractionTransformationCleaningLoadingSchedulingMetadata

Extraction

Pulling data from existing systemsOperational systems were not designed for extraction to load into a data warehouseApplications are often independent entitiesTime consuming and complexAn ongoing process

Transformation

Encodingm/f, male/female to M/F

Unit of measureinches to cms

Fieldsales-date to salesdate

Datedd/mm/yy to yyyy/mm/dd

Cleaning

Same record stored in different departmentsMultiple records for a companyMultiple entries for the same organizationMisuse of data entry fields

Scheduling

A trade-offToo frequent is costlyInfrequently means old data

Metadata

A data dictionary containing additional facts about the data in the warehouse

Description of each data typeFormat Coding standardsMeaningOperational system sourceTransformationsFrequency of extracts

Warehouse architectures

CentralizedFederatedTiered

Centralized data warehouse

Federated data warehouse

Tiered data warehouse

The hardware/software decision

The default is rapidly becomingHadoop for file managementMapReduce for programmingCommodity nodes for processing

Exploiting data stores

Verification and discoveryData miningOLAP

Verification and discovery

Verification Discovery

What is the average sale for in-store and catalog customers?

What is the best predictor of sales?

What is the average high school GPA of students who graduate from college compared to those who do not?

What are the best predictors of college graduation?

OLAP

Relational model was not designed for data synthesis, analysis, and consolidationThis is the role of spreadsheets and other special purpose softwareNeed to complement RDBMS technology with a multidimensional view of data

TPS versus OLAP

TPS OLAP

Optimize for transaction volume

Optimize for data analysis

Process a few records at a time

Process summarized data

Real time update as transactions occur

Batch update (e.g., daily)

Based on tables Based on hypercubes

Raw data Aggregated data

SQL is widely used MDX becoming a standard

ROLAP

A relational OLAPA multidimensional model is imposed on a relational structureRelational is a mature technology with extensive data management featuresNot as efficient as OLAP

The star structure

A central fact table is connected to multiple dimensional tables

A single join can relate the fact table with any one of the dimensional tables

The snowflake structure

An extension of the star schema to handle very large dimensional tables

Multiple joins might be required to fetch data.

Rotation

Drill down

Region Sales variance

Africa 105%

Asia 57%

Europe 122%

North America 97%

Pacific 85%

South America 163%

Nation Sales variance

China 123%

Japan 52%

India 87%

Singapore 95%

A hypercube

A three-dimensional hypercube display

Page Columns

Region: North

Sales

Red blob

Blue blob

Total

1996

Rows 1997

Year Total

A six-dimensional hypercube

Dimension Example

Brand Mt. Airy

Store Atlanta

Customer segment

Business

Product group Desks

Period January

Variable Units sold

A six-dimensional hypercube display

Page Columns

MonthSegment

Product groupVariable

March Business Desks Chairs

Units Revenue Units Revenue

Carolina Atlanta

Boston

Rows Mt. Airy Atlanta

Brand Boston

Store Totals

The link between RDBMS and MDDB

MDDB design

Key conceptsVariable dimensions• What is tracked

• Sales

Identifier dimensions• Tagging what is tracked

• Time, product, and store of sale

Prompts for identifying dimensions

Prompt ExampleWhen? June 5, 2013

10:27amWhere? ParisWhat? TentHow? CatalogWho? Young adult womanWhy? Camping trip to

BoliviaOutcome?

Revenue of €624.00

Transaction data

Transaction data

Face recognition or credit card co.

Social media

Variables and identifiers

Identifier time (hour)

Variablesales

(dollars)

10:00 523

11:00 789

12:00 1,256

13:00 4,128

14:00 2,634

Identifier

hit

Variabletime (hh:mm:ss)

1 9:34:45

2 9:34:57

3 9:36:12

4 9:41:56

Exercise

An international hotel chain has asked you to design a multidimensional database for its marketing department. What identifier and variable dimensions would you select?

Analysis and variable type

Identifier dimension

Continuous Nominal or ordinal

Variable dimension

Continuous

Regression and curve fittingSales by quarter

Analysis of varianceSales by store

Nominal or ordinal

Logistic regression Customer response (yes or no) to the level of advertising

Contingency table analysisNumber of sales by region

Multidimensional expressions (MDX)

A language for reporting data stored in a multidimensional databaseSQL like SELECT {[measures].[unit sales] }

ON COLUMNS FROM [sales]MeasuresUnit sales

266,773

Pentaho

Open source Business Intelligence projectBuilds on Mondrian, Jpivot, and other open source BI productsHome page

Data mining

The search for relationships and patternsApplications

Database marketingPredicting bad loansDetecting flaws in VLSI chipsIdentifying quasars

Data mining functions

Associations85 percent of customers who buy a certain brand of wine also buy a certain type of pasta

Sequential patterns32 percent of female customers who order a red jacket within six months buy a gray skirt

ClassifyingFrequent customers as those with incomes about $50,000 and having two or more children

ClusteringMarket segmentation

PredictingPredict the revenue value of a new customer based on that person’s demographic variables

Data mining technologies

Decision treesGenetic algorithmsK-nearest-neighbor methodNeural networksData visualization

SQL-99 and OLAP

SQL can be tedious and inefficientThe following questions require four queries

Find the total revenueReport revenue by locationReport revenue by channel Report revenue by location and channel

SQL-99 extensions

GROUP BY extended withGROUPING SETSROLLUPCUBE

MySQL supports only ROLLUP and in a slightly different format

GROUPING SETSSELECT location, channel, SUM(revenue)FROM expedGROUP BY GROUPING SETS (location, channel);

GROUPING SETS

Location Channel Revenue

null Catalog 108762

null Store 347537

null Web 27166

London null 214334

New York null 39123

Paris null 143303

Sydney null 29989

Tokyo null 56716

ROLLUP

SELECT location, channel, SUM(revenue)FROM expedGROUP BY ROLLUP (location, channel);

ROLLUPLocation Channel Revenue

null null 483465London null 214334New York null 39123Paris null 143303Sydney null 29989Tokyo null 56716London Catalog 50310London Store 151015London Web 13009New York Catalog 8712New York Store 28060New York Web 2351Paris Catalog 32166Paris Store 104083Paris Web 7054Sydney Catalog 5471Sydney Store 21769Sydney Web 2749Tokyo Catalog 12103Tokyo Store 42610Tokyo Web 2003

CUBE

SELECT location, channel, SUM(revenue)FROM expedGROUP BY CUBE (location, channel);

Location Channel Revenuenull Catalog 108762null Store 347537null Web 27166null null 483465London null 214334New York null 39123Paris null 143303Sydney null 29989Tokyo null 56716London Catalog 50310London Store 151015London Web 13009New York Catalog 8712New York Store 28060New York Web 2351Paris Catalog 32166Paris Store 104083Paris Web 7054Sydney Catalog 5471Sydney Store 21769Sydney Web 2749

Tokyo Catalog 12103

Tokyo Store 42610

Tokyo Web 2003

CUBE

MySQL version of ROLLUPSELECT location, FORMAT(SUM(revenue),0)FROM expedGROUP BY location WITH ROLLUP;

SELECT location, channel, FORMAT(SUM(revenue),0)

FROM expedGROUP BY location, channel WITH ROLLUP;

Exercises

Using ClassicModelsCompute total payments by country without and with ROLLUPCompute total payments by country and year without and with ROLLUPCompute total value of orders by country, and product line without and with ROLLUP

SQL OLAP extensions

UsefulNot as powerful as MDDB tools

Conclusion

Data management is an evolving disciplineData managers have a dual responsibility

Manage data to be in business todayManage data to be in business tomorrow

Data managers now need to support organizational intelligence technologies