Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

50

Transcript of Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

Page 1: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 2: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 3: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 4: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

(1960’s and earlier)

- primitive file processing

Data collection and database creation

(1970’s)

- data modeling tools

- indexing and data organization techniques

- query languages and query processing

- user interfaces

- optimization methods

- on-line transactional processing (OLTP)

Database management systems

- network and relational database systems

(mid-1980’s - present)

- advanced data models:

extended-relational, object-

oriented, object-relational

- application-oriented: spatial,

temporal, multimedia, active,

scientific, knowledge-bases,

World Wide Web.

Advanced databases systems

(2000 - ...)

New generation of information systems

Data warehousing and data mining

(late-1980’s - present)

- data warehouse and OLAP technology

- data mining and knowledge discovery

Page 5: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

How can I analyze

this data????

???

Page 6: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

Knowledge

[gold nuggets]

[ a mountain of data]

[a shovel]

[a pick]

[beads of sweat]

Page 7: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

patterns

knowledge

Integration

Cleaning &

Data

Mining

Selection &

Transformation

..

..

data

warehouse

data basesflat files

Evaluation

& Presentation

Page 8: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

WarehouseDataData

Base

EngineData Mining

Database or

Server Data Warehouse

Data cleaningdata integration filtering

Graphic User Interface

KnowledgeBase

Pattern Evaluation

Page 9: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 10: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

data

warehouse

clean

transform

integrate

load

client

client

query

and

analysis

tools.

.

.

.

.

.

data source in Vancouver

data source in New York

data source in Chicago

Page 11: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

roll-upon time data

drill-down

for Q1on address

homeentertainment

(types)item

computer

phone

security

time

Q1

Q2

Q3

Q4

(cities)address

New York

Montreal

Vancouver

Chicago

14K825K605K

(quarters)

homeentertainment

(types)item

computer

phone

security

March

Feb

Jan

time(months)

(cities)address

New York

Montreal

Vancouver

Chicago

400K

150K

100K

150K

homeentertainment

(types)item

computer

phone

security

time(quarters)

Q1

Q2

Q3

Q4

address(regions)

North

South

East

West

a)

<Vancouver,Q1,security>

b)

Page 12: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 13: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 14: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 15: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 16: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

+

+

+

Page 17: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 18: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

MachineLearning

StatisticsSystemsDatabase

ScienceInformation

Visualization Other disciplines

Page 19: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 20: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 21: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 22: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 23: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 24: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 25: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 26: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 27: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 28: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 29: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 30: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 31: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 32: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 33: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 34: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

870

925

789

698

984

1002

682

784

728

623

872

591

89

38

43

882

968

746

854

1087

818

580381038927

501301023812

51231952680

Q1

Q2

Q3

Q4

New York

Montreal

(quarters)

Chicago(cities)location

14K825K 400K605K

time

security

phone

computer

item(types)

entertainmenthome

Vancouver

homeentertainment

(types)item

computer

phone

security

homeentertainment

(types)item

computer

phone

security

homeentertainment

(types)item

computer

phone

security

time(quarters)

Q1

Q2

Q3

Q4

14K825K605K 400K

New York

Montreal

Vancouver

Chicago(cities)location "SUP1" "SUP2" "SUP3"supplier = supplier = supplier =

Page 35: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

all

item location suppliertime

time, supplier item, supplier

time, location

time, item

item, location location, supplier

time, item, location

item, location, suppliertime, item, supplier

time, location, supplier

1-D cuboids

0-D (apex) cuboid

3-D cuboids

2-D cuboids

4-D (base) cuboiditem, item, location, supplier

Sales FactTime Dimensionyearquartermonthday_of_weekdaytime_key

Location Dimension

country

citystreetlocation_key

Branch Dimension

branch_key

branch_key

Item Dimension

province_or_state

item_key

time_key

branch_type

item_key

branditem_name

typesupplier_type

branch_name

location_key

dollars_soldunits_sold

Page 36: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

time_key

Sales FactTime Dimension

month

time_keyLocation Dimension

supplier_keySupplier Dimension

supplier_type

location_key

city_key countryCity Dimension

year

day_of_week

street

city_keycity

supplier_key

location_key

dollars_soldunits_sold

quarter

day

Branch Dimension

branch_typebranch_namebranch_key

item_key

branch_key

Item Dimension

province_or_state

type

item_key

branditem_name

Page 37: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

time_key

Sales Fact

units_sold

dollars_soldlocation_key

brand

Shipper Dimension

shipper_keyfrom_locationto_location

Time Dimensionyearquartermonth

time_key

day_of_weekday

location_keystreetcity

countryLocation Dimension

Shipping Factshipper_typelocation_key

Branch Dimension

branch_typebranch_namebranch_key

item_key

branch_key

item_name

Item Dimension

item_key

province_or_state

shipper_nameshipper_key

typetime_keyitem_key

dollars_costunits_shipped

Page 38: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 39: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 40: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 41: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

British

Columbia

Vancouver Victoria

Ontario Quebec

Toronto Montreal

New York

New York Los Angeles San Francisco

California Illinois

Chicago

Canada USA

............ ... ...

...

......

all

... ... ...... ... ...

location

all

country

province_or_state

city

month

quarter

year

week

day

country

city

street

province_or_state

Page 42: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

($0 - $200]

($100 - $200]

($200 - $400]

($200 - $300]

($400 - $600]

($400 - $500]

($600 - $800]

($600 - $700] ($700 - $800]($500 - $600]($300 - $400]

($800 - $1,000]

($800 - $900]

($0 - $1000]

($0 - $100] ($900 - $1,000]

Page 43: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

phone

(types)item

computer security

time

entertainment

(quarters)

Q2

Q3

Q4

location(countries)

US

Canada

Q1

home

(cities)location

Montreal

Vancouver

time(quarters)

Q1

Q2

(types)item

homeentertainment

computer

(cities)location

New York

Montreal

Vancouver

Chicago

time(quarters)

Q1

Q3

Q4

Q2

homeentertainment

(types)item

computer

phone

security

14K825K605K 400K

on time

(from quarters

to months)

drill-downon location

roll-up

(from cities to countries)

for time="Q2"

slice

(time="Q1" or "Q2") and

dice for

(location="Montreal" or "Vancouver") and

(item="home entertainment" or "computer")

homeentertainment

(types)item

computer

phone

security

time(months)

(cities)location

Vancouver

Montreal

Chicago

New York

homeentertainment

computer

phone

security

(types)item

homeentertainment

(types)item

computer

phone

security

Chicago

New York

MontrealVancouver

(cities)location

pivot

150K

100K

150K

New York

Montreal

Vancouver

Chicago(cities)location

March

AprilMay

June

July

August

September

October

November

December

January

February

Page 44: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

time

location

customer

namestreet

continent

city

province_or_state

country

itemday

month

quarter

year

category

group

brandname typecategory

Page 45: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 46: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 47: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

LoadTransform

CleanExtract

Refresh

Query/Report Analysis Data Mining

OLAP Server OLAP ServerOutput

Operational Databases External sources

Data Cleaning

and

Data Integration

Data Storage

OLAP Engine

Front-End Tools

Metadata Repository

AdministrationMonitoring Data MartsData Warehouse

Page 48: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...

EnterpriseData

Warehouse

Define a high-level corporate data model

model

refinement model refinement

DataMartMart

Data

Data MartsDistributed

Multi-Tier

WarehouseData

Page 49: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...
Page 50: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining ...