1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund, Knowledge Discovery...

56
1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund, www-ai.cs.uni-dortmund.de Knowledge Discovery in Databases (KDD) The Mining Mart approach Case studies – Item sales – Intensive care

Transcript of 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund, Knowledge Discovery...

Page 1: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

1

Intelligent Choices Preceeding Data Analysis

Katharina MorikUniv. Dortmund, www-ai.cs.uni-dortmund.de

• Knowledge Discovery in Databases (KDD)• The Mining Mart approach• Case studies

– Item sales– Intensive care

Page 2: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

2

The UCI Library Approach

• Learning task: classification• Evaluation criteria: accuracy and coverage• Data sets

– Small number of examples– Small number of features– All and only relevant features included – No noise

Page 3: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

3

KDD Task

• Learning task of the application needs to be transferred to a formal learning task (classification, regression, clustering)– „I want to predict sales 4 weeks ahead“– „I want to know more about my best (worst) customers“– „I want to detect fraud“

• Databases– Very large number of records– Very large number of features– Relevant fatures missing– Noise included

Page 4: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

4

Observation

Experienced users can apply any learning system successfully to any application, since they prepare the data well...

• The representation LE of examples and the choice of a sample determines the applicability of learning methods.

• A chain of data transformations (learning steps or manual preprocessing) leads to LE of the method that delivers the desired result.

Experienced users remember prototypical successful transformation/learning chains

Page 5: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

5

LE1

LH1

LE2

LH2

...

LEn-1

LHn-1 = LEn

LHn

LHn+m

LEn+m

...

LHn+1

= LEn+1

learning/data mining

The Real Process

application:data users performance system

Page 6: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

6

Intelligent Choices

80% of the KDD work is invested into:• Choosing the learning task• Sampling• Feature generation, extraction, and selection• Data cleaning• Model selection or tuning the hypothesis space• Defining appropriate evaluation criteria

Page 7: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

7

The Mining Mart Approach

Best practice cases of preprocessing chains exist...

• Data, LE and LH are described on the meta level.

• The meta-level description is presented in application terms.

• MiningMart users choose a case and apply the corresponding transformation and learning chain to their application.

... and more can be obtained!

Page 8: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

8

Call for Participation

• MiningMart develops an operational meta-language for describing data and operators.

• MiningMart prepares the first cases of KDD.• MiningMart will present the case-base in the WWW.• You may contribute to the endavour!

– Apply the meta-language to your application and deliver it as a positive example to the case-base; or

– apply a case of MiningMart to your data.

Page 9: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

9

The Consortium

• Katharina Morik Univ. Dortmund, D (Coordinator)• Lorenza Saitta Univ. Piemonte del Avogadro, I• Pieter Adriaans Perot Systems Netherland, NL

? Michael May GMD, D• Jörg-Uwe Kietz SwissLife, CH• Fabio Malabocchia TILab, I

Page 10: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

10

The Mining Mart System

Manual Pre-processingOperators: Time multi-relation

Meta-dataApplicability

ML-Operators: Time Parameters Features Description

Logic

Raw-data

Meta-data

Augmented data of results

Meta-data

KDD process tasks, problem models

Case-base of successful KDD process

Human Computer Interface

Page 11: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

11

The Meta Model for Meta Data

The Relational Modeldescribes the database

The Conceptual Modeldescribes the individuals and classes of the domainwith their relations

The Case Modeldescribes chains of preprocessing operators

The Execution Modelgenerates SQL statementsor calls to external tools

Page 12: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

12

Use of the Meta Model

• The meta model is stored in a database.• The database manager delivers the relational model.• The data analyst delivers the conceptual model.• The KDD expert delivers or adjusts a case model.

First cases are delivered by the Mining Mart project.• The system compiles meta data into SQL statements

and calls to external tools hence executing the case model on the data.

Page 13: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

13

Sales of Items of a Drugstore

0

20

40

60

80

100

120

140

160

01

|96

07

|96

13

|96

19

|96

25

|96

31

|96

37

|96

43

|96

49

|96

03

|9

7 09

|97

15

|97

21

|97

27

|97

33

|97

39

|97

45

|97

51

|97

05

|98

11

|98

17

|98

23

|98

29

|98

35

|98

41

|98

47

|98

53

|98

Week

Sal

es

Insect killers 1Insect killers 2Sun milkCandles 1Baby food 1BeautySweetsSelf-tanning creamCandles 2Baby food 2

Page 14: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

14

Learning Task 1:Predict Sales of an Item

Given drug store sales data of 50 items in 20 shops over 104 weeks

predict the sales of an item such that

the prediction never underestimates the sale,

the prediction overestimates less than the rule of thumb.

Observation: 90% of the items are sold less than 10 times a week.

Requirement: prediction horizon is more than 4 weeks ahead.

Page 15: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

15

Shop Application -- Data

Shop Week Item1 ... Item50Dm1 1 4 ... 12Dm1 ... ... ... ...Dm1 104 9 ... 16Dm2 1 3 ... 19... ... ... ... ...Dm20 104 12 ... 16

LE DB1: I: T1 A1 ... A 50; set of multivariate time series

Page 16: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

16

Preprocessing

• From shops to items: multivariate to univariate

LE1´: i:t1 a1 ... tk ak

For all shops for all items:Create view Univariate asSelect shop, week, itemi

Where shop=“dmj”From Source;

• Multiple learning

Dm1_Item1...

1 4 ... 104 9

Dm1_Item50 1 12... 104 16

....

Dm20_Item50 1 14... 104 16

Page 17: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

17

Method 1 for Task 1:Exponential Smoothing

• Univariate time series as input ( LE1` ),

• incremental method: current hypothesis h and new observation o yield next hypothesis by h := h + o, where is given by the user,

• predicts sales of n-next week by last h.

Page 18: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

18

Method 2 for Task 1:SVM in the Regression Mode

• Multiple learning: for each shop and each item, the support vector machine learns a function which is then used for prediction.

• Asymmetric loss:– underestimation is multiplied by 20,

i.e. 3 sales too few predicted -- 60 loss– overestimation is counted as it is,

i.e. 3 sales too much predicted -- 3 loss(Stefan Rüping 1999)

Page 19: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

19

Further Preprocessing

• Obtaining many vectors from one series by sliding windows

LH5 i:t1 a1 ... tw aw

move window of size w by m stepsDm1_Item1_1Dm1_Item1_2

12

4...4...

56

78

...Dm1_Item1_100 100 6... 104 9

...

...Dm20_Item50_100 100 12... 104 16

Page 20: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

20

window horizon value to predict

time

sales

Article 766933 (bag?)

Page 21: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

21

horizon SVM exp. smoothing

1 56.764 52.40

2 57.044 59.04

3 57.855 65.62

4 58.670 71.21

8 60.286 88.44

13 59.475 102.24

Comparison with Exponential Smoothing

Page 22: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

22

loss

horizon

Page 23: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

23

Learning Task 2:Learning Sequences

Are there typical sequences that are valid for all items?– After an action for an item its sales decrease.– Each decrease of sales is followed by an increase.

• Given a set of subsequent eventsfind frequent sequences.

Page 24: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

24

Univariate time series

?

Subsequent events

?

LHn-1 = LEn

LHn frequent event sequences

From Sales Data toEvent Sequences

Multivariate time series

?

Page 25: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

25

From Series to Sequences

• Given some time series detect events (states, intervals)

An event is a triple (state, begin,finish).

The state might be a label or a (mean) value.

Typical labels are: increase, decrease, stable...

Page 26: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

26

Unsupervised Methods

• All contiguous observations within one level (range) form one event (Bauer).

• All contiguous observations with more or less the same gradient form one event (Morik, Wessel).

• Clusters of subsequences form events (Das).

Page 27: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

27

Moving Gradient

•• •

• • •

• • •

Time1 4 5 7 10 12

Determining the time intervals with user-given tolerance threshhold.Abstracting into classes of gradients: increase,peak,decrease, stable...

Page 28: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

28

Sales of Item 182830 in Shop 55

Page 29: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

29

Summarizing Sales byTolerant Moving Gradient

(Wessel, Morik 1999)

Page 30: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

30

Univariate time series

Moving gradient

Subsequent events

?

LHn-1 = LEnLHn frequent event sequences

From Subsequent Eventsto Event Sequences

Multivariate time series

?

Page 31: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

31

Transformation into Facts

stable(182830,1,33,0).decreasing(182830, 33,34,-6).stable(182830, 34, 39,0).increasing(182830, 39, 40,7).decreasing(182830, 40, 42,-5).stable(182830, 42,108,0).

LE4:

Page 32: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

32

Summarizing Item 646152 in Shop 55 by Intolerant Moving Gradient

Page 33: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

33

Corresponding Facts

increasing(646152,1,2,3).decreasing(646152,2,3,-11).increasingPeak(646152,3,4,22)....stable(646152, 25,37,0).increasing(646152, 37, 38, 8).decreasing(646152, 38, 39, -7).stable(646152, 39,40, 0).increasing(646152, 40, 41,7).decreasing(646152, 41, 42,-8).increasing(646152, 42, 43,10).stable(646152, 43, 48,-1).

small time intervals

Page 34: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

34

Method 3 for Task 2: Inductive Logic Programming

• Rules about sequences:p1(I, Tb, Te, A r), p2(I, Te, Te2, As) p3(I, Te2, Te3, A t)

• Results for sequences of sales trends:increasing (Item, Tb, Te) decreasing(Item, Te, Te2) increasing (Item, Tb, Te), decreasing(Item, Te, Te2)

stable(Item, Te2, Te3)

Page 35: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

35

Same Data -- Several Cases

• Predict sales of a particular item in a particular shopmultivariate to univariate, multiple exponential smoothing ORmultivariate to univariate, sliding windows, multiple learning with regression SVM

• Find relations between trends that are valid for all sales in all shopsmultivariate to univariate, summarizing, transformation into facts, rule learning

Page 36: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

36

Applications in Intensive Care

• On-line monitoring of intensive care patients• high-dimensional data about patient and medication• measured every minute• stored in the Emtec database of patient records ---• learning when to intervene in which way.

Page 37: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

37

Patient G.C., male, 60 years old Hemihepatektomie right

Page 38: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

38

The Data

LE DB2 i 1: t 1 a 1 1 ... a 1 k

i1: t 2 a 2 1 ... a 2 k

...

i2: t 1 a 1 1 ... a 1 k

...

set of rows for each patient:1 row for each minute

Page 39: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

39

Preprocessing

• Chaining database rowsi 1: t 1 a 1 1

... a 1 k, t 2 a 2 1 ... a 2 k , ...

• Multivariate to univariatei 1: t 1 a 1, t 2 a 1

... t m a 1

i 1: t 1 a 2, t 2 a 2 ... t m a 2

...

• Detecting level changes and outliers

Page 40: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

40

Phase State Analysis

Phase state )1ty,t(yyt +=r

Time series Ny,...,1y

Deter-ministicProcess

yt

time t yt

yt+1

AR(1)-process with outlier (AO)

yt

timet yt

yt+1

Heart rate

HRt

time t yt

yt+1

U.Gather, M. Bauer

Page 41: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

41

Level Change Detection

level_change(pat4999, 50, 112, hr, up)

level_change(pat4999, 112, 164, hr, down)

level_change(pat4999, 10, 74, art, constant)

level_change(pat4999, 74, 110, art, down)

Computed Feature

Comparing norm values for a vital sign and its mean in a time interval (± standard deviation):

deviation(pat4999, 10, 74, art, up)

Page 42: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

42

Learning Task 3:Recommend Interventions for Patients

Are there valid rules

for all multivariate time series,

such that therapeutical interventions follow from a patient’s state?

Page 43: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

43

Method 3:Inductive Logic Programming

Given patient records in the form of facts:• deviations -- time intervals• therapeutical interventions -- time points• types of vital signs (group1: hr, swi, co; group2: art, vr)

Learn rules about interventions:

group1(V), deviation(P, T1, T2, V, Dir)

noradrenaline(P, T2, Dir)

Page 44: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

44

The Chain of Preprocessing Steps

LE DB2 : i 1: t 1 a 1 1

... a 1 k

i1: t 2 a 2 1 ... a 2 k

... i2: t 1 a 1 1

... a 1 k

chaining db rowsi 1: t 1 a 1 1

... a k1 t 2 a 1 2

... a k 2 ...i2: t 1 a 1 1

... a k 1t 2 a 12

... a k 2 ...

multi- tounivariatei 1: t 1 a 1 1 t 2 a 1 2

i 1: t 1 a 2 1 t 2 a 2 2

...

level changes(i 1,t i ,t j,A)...

computedfeature(i 1,t i ,t j,A,D)

relational learningp 1(I,T i,T j,A,D), p 2 (I,Tj,Tk,A,D)

p 3(I,Tk, Dir)

Page 45: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

45

Learning Task 4:Predict Next Minute‘s

Intervention

Given a patient’s state at time ti,

learn whether and how to intervene at t i+1

Preprocessing:• Selection of time points where an intervention was done• Multiple to binary class

for each drug, form the concepts drug_up, drug_down

• Multiple learning for each binary class resulting inclassifiers for each drug and direction of dose change (SVM_light)

Page 46: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

46

The Chain of Preprocessing Steps

LE DB2 : i 1: t 1 a 1 1

... a 1 k

i1: t 2 a 2 1 ... a 2 k

... i2: t 1 a 1 1

... a 1 k

Select time points with interventionsi 1: t i a 1 i

... a ki

i2: t j a 1j ... a kj

...

Form binary classesa1_up +: a 2 ...

a k

...

a1_up-: a 2 ...a k

....

a6_down+: a 2 ...a k..

a6_down-: a 2 ...

a k....

Learning classifiers using SVM_lighta1_up +: w2 a 2 ...

wk a k

...

a6_down +: w2 a 2 ... wk a k

Page 47: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

47

Same Data -- Several Cases

• Find time relations that express therapy protocols chaining db rows, multivariate to univariate, level changes, deviations, RDT

• Predict intervention for a particular drug select time points, multiple to binary class, SVM_light

Page 48: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

48

Behind the Boxes

Db schemaindicating time attribute(s),granularity,...

Select statement in abstract form,instantiated by db schema

Creating views in abstract form,instantiated by db schema andlearning task

Calling SVM_light and writing results

Syntactictransformation for SVM

Multiple learning control

Page 49: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

49

Functionality of MD-Compiler

Table_x Table_y Table_zBase Tables

V_01

RowSelection

V_02 V_03

RowSelection RowSelection

Views,created by theMD-Compiler

V_01 vc_1

MultiColumnFeatureConstruction

vc_2 vc_3

V_04

FeatureSelection

RowSelection

V_05

Manual preprocessing operators of M4 are very elementary.

Results of operators are mostly views.

Page 50: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

50

Several view definitions

CREATE VIEWV_01(attrib_a , attrib_b , attrib_c )ASSELECTx_id , x_a, x_bFROM Table_x;

Information from M4-Relational(BaseAttribute)

Inline-View

Physical View-Object

created by MD-Compiler

for reading data and

executing statistics

Inline-View

stored as sql-string

in M4-Relational

Inline View versus Physical View-Object

Page 51: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

51

Several view definitions

Inline View versus Physical View-Object

Table_x Table_y Table_zBase Tables

V_01

RowSelection

V_02 V_03

RowSelection RowSelection

Views,created by theMD-Compiler

V_01 vc_1

MultiColumnFeatureConstruction

vc_2 vc_3

V_04

FeatureSelection

RowSelection

V_05

Create View V_05 (...) as

select ... from (select ... from (select ... from Table_x)

instead of

Create View V_05 (...) as select ... from V_04

Page 52: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

52

Several view definitions

Materialized view:

Materialized_View_1

Table_x Table_y

V_01

V_04

Table_z

V_02 V_03

V_05

V_06

V_07

Created by the MD-Compiler

automatically in the background

+ performance gain when selecting

data from V_04 or V_07

+ all operation-outputs can be realized

as views- additional storage needed

Page 53: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

53

System-Architecture

Mining Mart Database

T1

T4

T2 T3

T5

T6

MD Compiler

Java-Code

M4-RelationEditor

Statistics

PL/Sql

TimeOperators

Java-Codefrom UniDo

M4-Relational Model

M4-Case Model

M4-Concept

EditorM4-Conceptual Model

M4-Case Editor

Page 54: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

54

Summary of Cases Involving Time

Db schemaindicating time attribute(s)

Inductive logic programming (RDT)SVM_light for classificationSVM for regressionExponential smoothing

SyntactictransformationsL E1

...LE4

Sliding windowsSummarizing windowsLevel changes

Page 55: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

55

Summary

• Preprocessing is the key issue in data analysis!• Goal: Support users in making intelligent choices• Approach: Cases of best practice• View of a computer scientist:

– Scalability to very large databases– Meta-data driven processing

• Case studies on analysing data involving time

Page 56: 1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund,  Knowledge Discovery in Databases (KDD) The Mining.

56

MiningMart Approach

• Manager -- end-userknows about the business case

• Database manager knows about the data

• Case designer -- power-userexpert in KDD

• Developer supplies (learning) operators