Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA -...
-
Upload
michael-douglas -
Category
Documents
-
view
217 -
download
2
Transcript of Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA -...
![Page 1: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/1.jpg)
Data Mining:How to make islands of
knowledge emerging out of oceans of data
Hugues Bersini
IRIDIA - ULB
![Page 2: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/2.jpg)
PLAN
Rapid intro to data warehouse data mining:
two super techniques of data mining
incomprehensible:
Understand and predict
Lazy for time series prediction
Bagfs for classification
![Page 3: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/3.jpg)
The Data Miner Steps
Data Warehousing Data Preparation
– Cleaning + Homogeneisation
– Transformation - Composition
– Reduction
– For time series: time adjustment Data Modelling : What researchers are
mainly interested in.
![Page 4: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/4.jpg)
Data Warehouse
![Page 5: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/5.jpg)
![Page 6: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/6.jpg)
![Page 7: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/7.jpg)
Re-organization of data
Subject oriented integrated transversals with history non volatile from production data ---> to decision-based
data
![Page 8: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/8.jpg)
![Page 9: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/9.jpg)
![Page 10: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/10.jpg)
Data Mining
Uunderstand and predict
![Page 11: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/11.jpg)
Modelling the data: only if structure and regularities in the data
Data mining IS NOT OLAP
To understandthe data
To predictnew data
WHY ??
![Page 12: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/12.jpg)
The main techniques of data-mining
Clustering Outlier detection Association analysis Forecasting Classification
![Page 13: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/13.jpg)
Data Mining: to understand and/or to predict
discoveringstructure in data
discovering I/Orelationship in data
![Page 14: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/14.jpg)
Nothing new under the sun
New methods extending old ones in the domain of non-linear (NN) and
symbolic (decision tree)
Exponential explosion of data
Extracting from huge data base
More sensitive than ever
![Page 15: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/15.jpg)
CEDITI September 2, 1998 3
ExploitExploitData storeData StoreData StoreData Store
Data volume doubles every 18 months world-wide
Problem• How to extract relevant knowledge for our decisions from such amounts of data?
Solutions• Throw it away before using it (most popular)• Query it (Query and OLAP tools)• Summarize it: extract essence from the bulk according to targeted decision (Data Mining)
Decisions
![Page 16: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/16.jpg)
Discovering structure in data
When in a space with a metric– Hierarchical clustering– K-Means– NN clustering - Kohonen’s map
In space without any metric but a cost function:– Grouping Genetic Algorithms ....
![Page 17: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/17.jpg)
Clustering and outlier
![Page 18: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/18.jpg)
Market Basket Analysis: Association analysis
Transn. Juice Tea Coffee Milk Sugar Pop1 0 0 0 0 0 02 0 2 2 4 3 03 1 0 0 0 0 04 0 1 0 0 0 05 1 2 1 1 0 06 0 2 1 3 2 07 0 0 0 0 0 68 0 0 0 0 0 09 4 0 0 0 0 0
10 0 0 1 1 0 011 0 0 0 0 0 612 0 0 1 1 0 013 0 0 0 0 0 514 0 0 0 0 0 015 1 2 0 2 0 016 0 1 1 1 2 117 1 0 1 0 0 018 2 0 0 0 0 019 0 0 0 0 0 220 3 0 0 0 0 3
Quantity bought
![Page 19: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/19.jpg)
Calcul of Improvement
IMPROVEMENT = (N * xij) / (ni * nj)
Improvement Juice Tea Coffee Milk Sugar PopJuice 0 0,95 0,82 0,82 0 0,17Tea 0.95 0 1,9 2.38 3,33 0.56Coffee 0.82 1,9Milk 0,82Sugar 0 3,33Pop 0,17
![Page 20: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/20.jpg)
Discovering I/O relationship in data
?
classification time series predictionO = the classI = (x,y)
O = x(t+1)I = x(t)
t
x(t)
x
y
understanding I/O relationship Predicting which O for new I
??
![Page 21: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/21.jpg)
Le CV d’IRIDIA en data mining
Reconnaissance de défauts vitreux chez Glaverbel Prediction de fluctuations boursières avec MasterFood et dieteren Reconnaissance d’incidents et
prédiction de charge électrique avec Tractebel Analyse des retards aériens avec Eurocontrôle Modélisation de Processus Industriel
avec Honeywell, FAFER et Siemens Moteur de recherche Internet convivial avec la Region Wallonne Classification de pixels pour les images de satelittes
![Page 22: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/22.jpg)
Financial prediction
Task: predict the future trends of the financial series.
daily stock market index
0
5000
10000
15000
20000
25000
30000
10
/01
/94
10
/05
/94
10
/09
/94
10
/01
/95
10
/05
/95
10
/09
/95
10
/01
/96
10
/05
/96
10
/09
/96
10
/01
/97
10
/05
/97
10
/09
/97
10
/01
/98
10
/05
/98
10
/09
/98
10
/01
/99
10
/05
/99
10
/09
/99
MIB
Goal: automatic trading system to anticipate the fluctuations of the market.
![Page 23: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/23.jpg)
Economic variablesCar matriculations in Belgium
300000320000
340000360000
380000400000420000
440000460000
480000500000
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
Task: predict how many cars will be matriculated next year.
Goal: support the marketing campaign of a car dealer.
![Page 24: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/24.jpg)
Modeling of industrial plants
Task: predict the flow stress of the steel plate as a function of the chemical and physical properties of the material.
Rolling steel mill
Goal: cope with different types of metals, reduce the production time and improve final quality.
![Page 25: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/25.jpg)
Control
Task: model the dynamics of the plant on the basis ofaccessible information.
Waste water treatment plant
Goal: control the level of water pollutants.
![Page 26: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/26.jpg)
Environmental problems
Task: predicting the biological state
(e.g. density of algae communities)
as a function of chemicals.
Goal: make automatic the analysis of
the state of the river by monitoring
chemical concentrations.
Algae summer blooming
![Page 27: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/27.jpg)
In the medical domain
automatic diagnosis of cancer detection of respiratory problems electrocardiogram analysis help to paraplegic
![Page 28: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/28.jpg)
APPLICATION DU DATA MINING DANS LE DOMAINE DU CANCER:
Application à l'aide au diagnostic et au pronosticen pathologie tumorale.
APPLICATION DU DATA MINING DANS LE DOMAINE DU CANCER:
Application à l'aide au diagnostic et au pronosticen pathologie tumorale.
En collaboration avec le Laboratoire d'Histopathologie (R. Kiss),
Faculté de Médecine, U.L.B.
![Page 29: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/29.jpg)
patient tumeur
chirurgie
DIAGNOSTIC(pathologistes)
traitementadjuvant
bilanclinique
critères histologiques:- perte de différenciation- invasion
critères cytologiques:- taille des noyaux- mitoses- plages d’hyperchromatisme
faible, modéré, élevé
Amélioration du diagnostic Adéquation du traitement
Augmentation de la survie
![Page 30: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/30.jpg)
Exemple:
Tumeurs primitives cérébrales (adultes): GLIOMES
II
III
IV
II
III
?
MALIGNITESURVIE
INVASION du tissu sain
EPENDymomes OLIGOdendrogliomes ASTROcytomes
II
III
?
nul léger fort
SURVIE
![Page 31: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/31.jpg)
“Objectivation” d’éléments diagnostiques
quantification de critères (cytologiques et
histologiques)
microscopie assistée par ordinateur
++
++
+
+ ++
+
++
+
+
+
o
o
o
oo
o
o
o
ooo
oooo
ox?
x?
x?
x?
x?
x?
traitement des données
Extraction d’informations diagnostiques et/ou
prognostiques fiables et reproductibles
![Page 32: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/32.jpg)
500 à 1000 noyauxpar tumeurs.
30 variables tumorales:• moyenne• déviation standard
Nuclear DNA Content Morphonuclear Features and
Chromatin Texture Attributes
DNA histogram type (DHT) P1 = Nuclear Area (NA)
DNA index (DI) P2 = Integrated Optical Dens ity (IOD)
% Diploid Cell Nuclei (%2C) P3 = Mean Optical Density (MOD)
% Hyperdiploid (%H2C) P4 = Skewness (SK)
% Triploid (%3C) P5 = Variance of Optical Density (VOD)
% Hypertriploid (%H3C) P6 = Kurtosis (K)
% Tetraploid (%4C) P7 = Short Run Length (SRL)
% Hypertetraploid (%H4C) P8 = Long Run Length (LRL)
% Pentaploid (%5C) P9 = Grey Level Distribution (GLD)
P10 = Relative Distribution Frequencies (RLD)
P11 = Relative Distribution Percentage (RLP)
P12 = Local Mean (LM)
P13 = Energy (E)
P14 = Coefficient Variance (CV)
P15 = Co-occurrence Matrix (C)
![Page 33: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/33.jpg)
Application to teledetection
ARIOS C4.5 Bagfsterres arables : sols cultivés (29.7%) 50.0 35.7tissu urbain discontinu (23.1%) 22.5 48.2terres arables : sols nus (22%) 36.2 17.3prairies (15%) 56.4 31.8feuillus (6.6%) 75.6 17.1zones industri ou commerc (2%) 92.0 33.2réseaux routiers et espaces assoc (0.7%) 80.0 17.6plans d'eau (0.3%) 38.0 3.4tissu urbain continu (0.2%) 40.0 2.1réseaux ferroviaires et espaces assoc. (0.2%) 51.3 5.1conifères (0.1%) 68.7 0.5Error rate 48.2 32.3Kappa 0.48 0.60
![Page 34: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/34.jpg)
BagfsBagfs
![Page 35: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/35.jpg)
On internet
The Hyperprisme project Text Mining Automatic profiling of users
– Key words: positif, negatif,… Automatic grouping of users on the basis of
their profiles See Web
![Page 36: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/36.jpg)
Different approaches
Model Data
Comprehensible Non comprehensible
Local Global
Non readable
SVM
Accuracy of prediction
![Page 37: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/37.jpg)
Understanding and Predicting
Building ModelsA model needs data to exist but, once it exists, it can exist without the data.
ModelStructure
ParametersTo fit the data
Linear, NN, Fuzzy, ID3, Wavelet, Fourier, Polynomes,...
![Page 38: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/38.jpg)
From data to prediction
RAWDATA PREPROCESSING
MODEL
LEARNING
PREDICTION
TRAINING DATA
![Page 39: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/39.jpg)
Supervised learning
PHENOMENON
MODEL
input output
prediction
error
• Finite amount of noisy observations.
• No a priori knowledge of the phenomenon.
OBSERVATIONS
![Page 40: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/40.jpg)
Model learning
MODELGENERATION
MODELVALIDATION
PARAMETRICIDENTIFICATION
MODELSELECTION
STRUCTURALIDENTIFICATION
![Page 41: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/41.jpg)
The Practice of Modelling
Data + OptimisationMethods
Physical KnowledgeEngineering Models
THE MODEL
Rules of ThumbLinguistic Rules
Accurate
Simple
Robust
Understandable
good fordecision
![Page 42: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/42.jpg)
Comprehensible models
Decision trees Qualitative attributes Force the attributes to be treated separately classification surfaces parallel to the axes good for comprehension because they select and
separate the variables
![Page 43: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/43.jpg)
Decision trees
Very used in practice. One of the favorite data mining methods
Work with noisy data (statistical approaches) can learn logical model out of data expressed by and/or rules
ID3, C4.5 ---> Quinlan Favoring little trees --> simple models
![Page 44: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/44.jpg)
At every stage the most discriminant attribute The tree is being constructed top-down adding a new
attribute at each level The choice of the attribute is based on a statistical criteria
called : “the information gain” Entropie = -pouilog2poui - pnonlog2pnon
Entropie = 0 if Poui/non = 1 Entropie = 1 if Poui/non = 1/2
![Page 45: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/45.jpg)
Information gain
S = set of instances, A set of attributes and v set of values of attributes A
Gain (S,A) = Entropie(S)-v|Sv|/|S|*Entropie(Sv)
the best A is the one that maximises the Gain The algorithm runs in a recursive way The same mechanism is reapplied at each level
![Page 46: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/46.jpg)
![Page 47: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/47.jpg)
Mais !!!!
Is a good client if (x - y)>30000
Salaire mensuel
Remboursement d’emprunt
30000
.
![Page 48: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/48.jpg)
Other comprehensible models
Fuzzy logic Realize an I/O mapping with linguistic rules If I eat “a lot” then I take weight “a lot”
![Page 49: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/49.jpg)
Exemple trivial
Linéaire, optimalautomatique, simple
X
Y
![Page 50: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/50.jpg)
Le flou
![Page 51: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/51.jpg)
![Page 52: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/52.jpg)
![Page 53: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/53.jpg)
![Page 54: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/54.jpg)
X
Y
Si x est très petit alors y est petitSi x est petit alors y est moyenSi x est moyen alors y est moyen
lisible ?interfaçable ?adaptatifuniverselsemi-automatique
![Page 55: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/55.jpg)
Non comprehensible models
From more to less– linear discriminant– local approaches
– fuzzy rules– Support Vector Machine– RBF
– global approaches– NN– polynômes, wavelet,…– Support Vector Machine
![Page 56: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/56.jpg)
Le neuronal
![Page 57: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/57.jpg)
![Page 58: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/58.jpg)
précisuniverselblack-boxSemi-automatique
![Page 59: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/59.jpg)
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5Target function
Nonlinear relationship
input
output
![Page 60: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/60.jpg)
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5Training set
Observations
output
query queryquery
![Page 61: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/61.jpg)
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Global modeling
input
output
![Page 62: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/62.jpg)
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Prediction with global models
queryquery query
![Page 63: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/63.jpg)
Advantages
Exist without data Information compression
Mainly SVM: mathématiques, pratiques, logique et génériques.
Detect a global structure in the data Allow to test the sensitivity of the variables Can easily incorporate prior knowledge
![Page 64: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/64.jpg)
Drawbacks
Make assumption of uniformity Have the bias of their structure Are hardly adapting Which one to choose.
![Page 65: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/65.jpg)
`Weak classifiers´ ensembles
Classifier capacity reduced in 2 ways : – simplified internal architecture– NOT all the available information
Better generalisation, reducing overfitting Improving accuracy by decorrelating classifiers errors by increasing the variability in the learning
space.
![Page 66: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/66.jpg)
Two distinct views of the information
"Vertically", weighting the samples– active learning - not investigated– bagging, boosting, ECOC for multiple classifier
systems
"Horizontally", selecting features– feature selection methods– MFS and its extensions.
also : manipulating class label (ECOC) - not investigated yet
![Page 67: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/67.jpg)
`Bagging´ : resampling the learning set
Bootstraps aggregating (Leo Breiman)
– random and independant perturbation of the
learning set.
– vital element : instability of the inducer*. e.g. C4.5, neural network but not kNN !
– increase accuracy by reducing variance* inducer = base learning algorithm : c4.5, kNN, ...
![Page 68: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/68.jpg)
Learning set resampling : `Arcing´
Adaptive resampling or reweighting of the learning set (Leo Breiman terminology).
Boosting (Freund & Schapire) sequential reweighting based on the description accuracy.
e.g. AdaBoost.M1 for multi-class problems.
needs unstability so as bagging better variability than bagging. sensible to noisy databases. better than bagging on non-noisy databases
![Page 69: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/69.jpg)
Mutliple Feature Subsets : Stephen D. Bay (1/2)
problem ? – kNN is stable vertically so Bagging doesn't
work. horizontally : MFS - combining random
selections of features with or without replacement.
question ?– what about other inducers such C4.5 ??
![Page 70: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/70.jpg)
Hypo : kNN uses its ‘ horizontal ’ instability. Two parameters :
– K=n/N, proportion of features in subsets.– R, number of subsets to combine.
MFS is better than single kNN with FSS and BSS, feature selections techniques.
MFS is more stable than kNN on added irrelevant features.
MFS decreases variance and bias through randomness.
Multiple Feature Subsets : Stephen D. Bay (2/2)
![Page 71: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/71.jpg)
BAGFS : a multiple classifier system
BAGFS = MFS inside each Bagging. BAGMFS = MFS & Bagging together. 3 parameters
– B, number of bootstraps
– K=n/N, proportion of features in subsets
– R, number of feature subsets
decision rule : majority vote
![Page 72: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/72.jpg)
BAGFS architecture around C4.5
not useful
![Page 73: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/73.jpg)
Material
15 UCI continuous databases
10-fold cross validations
C4.5 Rel 8 with CF=0.25, MINOBJ=2, pruning
no normalisation, no other pre-treatment on data.
majority vote
![Page 74: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/74.jpg)
Experiments
Testing parametrization
– optimizing K between 0.1 and 1 by means of a nested
10-fold cross-validation
– R= 7, B= 7 for two-level method : Bagfs 7x7
– set of 50 classifiers otherwize : Bag 50, BagMfs 50,
MFS 50, Boosting 50
![Page 75: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/75.jpg)
Experimental Resultsc45 bagmfs 50 bagfs 7x7 boosting 50 bag 50 mfs 50
hepatitis 77.6 82.7 84.1 82.1 81.0 83.2glass 64.8 77.3 76.6 74.4 74.8 75.2iris 92.7 93.4 93.2 92.4 92.3 93.5ionosphere 90.9 93.7 93.5 93.2 92.8 93.6liver disorders 64.1 73.5 70.5 72.3 72.8 65.6new-thyroid 92.0 94.9 94.5 93.5 93.8 92.7ringnorm 91.9 97.9 97.7 95.3 95.6 97.6twonorm 85.4 96.9 96.7 96.4 96.6 96.6satimage 86.8 91.4 91.3 90.0 90.8 92.1waveform 76.2 84.6 83.9 84.0 83.2 83.9breast-cancer-w 94.7 96.9 96.8 95.5 95.3 96.8wine 85.7 92.3 90.8 91.3 91.3 89.6segmentation 93.4 98.2 98.4 95.1 96.6 98.7Image 96.5 97.3 97.8 96.7 97.6 97.6car 92.1 93.2 92.5 92.1 93.2 92.2diabetes 72.4 75.7 75.7 76.2 75.7 74.0
84.8 90.0 89.6 88.8 89.0 88.9
• McNemar test of significance (95%) : Bagfs performs never signif. worse and even sign. better on at least 4 databases (see red databases).
![Page 76: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/76.jpg)
How adjusting the parameters B, K, R– internal cross validation ?– dimensionality and variability measures hypothesis
Interest of a second level ?– About irrelevant and (un)informative features ? – Does bagging + feature selections work better ?– How proving the interest of MFS randomness ?
How using bootstraps complementary ?– Can we ? – What to do ?
How proving horizontal unstability of C4.5 ? Comparison with 1-level bagging and MFS
– Same number of classifiers ?– Advantage of tuning parameters ?
BAGFS : discussions
![Page 77: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/77.jpg)
BAGFS : next... More databases
– including nominal features
– including missing values
Other decision rules : Bayesian approach, ranking, ... Other inducers : LDA, kNN, logistic regr., MLP ?? Another level : boosting + MFS ?
![Page 78: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/78.jpg)
Which best model ??when they all can perfectly fit the data
They all can perfectly fit the data but
! they don’t approach the datain the same way. This approach depends on their structure
![Page 79: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/79.jpg)
This explains the importance of
Cross-validation
this valuemakes the difference
Model A vs Model B
A B trainingtesting
![Page 80: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/80.jpg)
Which one to choose
Capital role of crossvalidation. Hard to run One possible response
![Page 81: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/81.jpg)
Lazy methodsComing from fuzzy
![Page 82: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/82.jpg)
Model or Examples ??
Build a Model
Predictionbased on the model
Prediction basedon the examples
![Page 83: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/83.jpg)
A model
?
?
??
![Page 84: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/84.jpg)
Lazy Methods
Accuracy entails to keep the data and don’t use any intermediary model: the best model is the data
Accuracy requires powerful local models with powerful cross-validation methods
Made possible again due to the computer power
lazy methods is a new trend which is a revival of an old trend
![Page 85: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/85.jpg)
Lazy methods
A lot of expressions for the same thing:– memory-based, instance-based, examples-
based,distance-based– nearest-neighbour
lazy for regression, classification and time series prediction
lazy for quantitative and qualitative features
![Page 86: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/86.jpg)
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Local modeling
![Page 87: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/87.jpg)
Prediction with local models
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
queryquery query
![Page 88: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/88.jpg)
Local modeling procedure
The identification of a local model can be summarized in these steps:
The thesis focused on the bandwidth selection problem.
Compute the distance between the query and the training samples according to a predefined metric.
Rank the neighbors on the basis of their distance to the query.
Select a subset of the nearest neighbors according to the bandwidth which measures the size of the neighborhood.
Fit a local model (e.g. constant, linear,...).
![Page 89: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/89.jpg)
Bias/variance trade-off: overfitting
-0.6 -0.4 -0.2 0 0.2 0.4 0.6-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
Prediction error
too few neighbors overfitting large prediction error
![Page 90: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/90.jpg)
Bias/variance trade off: underfitting
-0.6 -0.4 -0.2 0 0.2 0.4 0.6-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
too many neighbors underfitting large prediction error
Prediction error
![Page 91: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/91.jpg)
Validation croisée: Press
Fait un leave-one-out sans le faire pour les modèles linéaires
Un gain computationnel énorme Rend possible une des validations croisées
les plus puissantes à un prix computationel infime.
![Page 92: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/92.jpg)
Data-driven bandwidth selection
-0.6 -0.4 -0.2 0 0.2 0.4 0.6-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
-0.6 -0.4 -0.2 0 0.2 0.4 0.6-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
-0.6 -0.4 -0.2 0 0.2 0.4 0.6-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
-0.6 -0.4 -0.2 0 0.2 0.4 0.6-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
(k m),
MSE (k m)
(k M),
MSE (k M)
(k m+1),
MSE (k m+1)
MSE (k m) MSE (k m+1) MSE (k M)
PREDICTION
identification
validation
identification
validation
model selection
![Page 93: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/93.jpg)
Advantages
No assumption of uniformity Justified in real life Adaptive Simple
![Page 94: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/94.jpg)
From local learning to Lazy Learning (LL)
By speeding up the local learning procedure, we can delay the learning procedure to the moment when a prediction in a query point is required (query-by-query learning).
This method is called lazy since the whole learning procedure is deferred until a prediction is required.
Example of non lazy methods (eager) are neural networks where learning is performed in advance, the fitted model is stored and data are discarded.
![Page 95: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/95.jpg)
Static benchmarks
Datasets: 15 real and 8 artificial datasets from the ML repository.
Methods: Lazy Learning, Local modeling, Feed Forward
Neural Networks, Mixtures of Experts, Neuro Fuzzy,
Regression Trees (Cubist).
Experimental methodology: 10-fold cross-validation.
Results: Mean absolute error, relative error, paired t-test.
![Page 96: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/96.jpg)
Dataset No. examples No. inputsKin_8nh 8192 8Kin_8fm 8192 8Kin_8nm 8192 8Kin_32fh 8192 32Kin_32nh 8192 32Kin_32fm 8192 32Kin_32 8192 32
Dataset No. examples No. inputsHousing 330 8Cpu 506 13Prices 209 6Mpg 159 16Servo 392 7Ozone 167 8Bodyfat 252 13Pool 253 3Energy 2444 5Breast 699 9Abalone 4177 10Sonar 208 60Bupa 345 6Iono 351 34Pima 768 8
Observed data Artificial data
![Page 97: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/97.jpg)
Experimental results: paired comparison (I)
Each method compared with all the others (9*23 =207 comparisons)
Method No. times significantly worseLL linear 74LL constant 96LL combination 23Local modeling linear 58Local modeling constant 81Cubist 40Feed Forward NN 53Mixtures of experts 80Local Model Network (fuzzy) 132Local Model Network (k-mean) 145
The lower, the better !!
![Page 98: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/98.jpg)
Experimental results: paired comparison (II)
Each method compared with all the others (9*23 = 207 comparisons)
The larger, the better !!
Method No. times significantly betterLL linear 80LL constant 59LL combination 129Local modeling linear 89Local modeling constant 74Cubist 110Feed Forward NN 116Mixtures of experts 72Local Model Network (fuzzy) 32Local Model Network (k-mean) 21
![Page 99: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/99.jpg)
Lazy Learning for dynamic tasks
long horizon forecasting based on the iteration of a LL one-step-ahead predictor.
Nonlinear control– Lazy Learning inverse/forward control.– Lazy Learning self-tuning control.– Lazy Learning optimal control.
![Page 100: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/100.jpg)
Dynamic benchmarks
Multi-step-ahead prediction:– Benchmarks: Mackey Glass and 2 Santa Fe time series
– Referential methods: recurrent neural networks.
Nonlinear identification and adaptive control:– Benchmarks: Narendra nonlinear plants and bioreactor.
– Referential methods: neuro-fuzzy controller, neural controller, linear controller.
![Page 101: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/101.jpg)
Santa Fe time series
0 100 200 300 400 500 600 700 800 900 10000
50
100
150
200
250
300
Task: predict the continuation of the series for the next 100 steps.
![Page 102: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/102.jpg)
Lazy Learning prediction
LL is able to predict the abrupt change around t =1060 !
![Page 103: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/103.jpg)
Awards in international competitions
Data analysis competition: awarded as a runner-up among 21 participants at the 1999 CoIL International Competition on Protecting rivers and streams by monitoring chemical concentrations and algae communities.
Time series competition: ranked second among 17 participants to the International Competition on Time Series organized by the International Workshop on Advanced Black-box techniques for nonlinear modeling in Leuven, Belgium
![Page 104: Data Mining: How to make islands of knowledge emerging out of oceans of data Hugues Bersini IRIDIA - ULB.](https://reader035.fdocuments.in/reader035/viewer/2022081602/55161d42550346c6758b4596/html5/thumbnails/104.jpg)
Pragmatic conclusions
Comprehension --> decision tree, fuzzy logic
Precision: global model
lazy methods
You need to be determined on why you do models.