Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)
description
Transcript of Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)
![Page 1: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/1.jpg)
Djamel A. Zighed and Nicolas Nicoloyannis
ERIC LaboratoryUniversity of Lyon 2 (France)
Prague Sept. 04
![Page 2: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/2.jpg)
About Computer science dep.
• In Lyon, there are 3 universities, 100000 students
• Lumière university Lyon 2, has 22000 students, • Lyon 2, is mainly a liberal art university• The faculty of economic has tree departments,
among them the computer science one• We belong to this department• We have Bachelor, Master and PhD programs
for 300 students
![Page 3: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/3.jpg)
ERIC Lab at the University
Economic Sociology Linguistic Law
Faculties of university of Lyon 2
ERICResearch centers of the university
Knowledge Engineering Research Center- The budget of ERIC doesn’t depend from the university, it’s given parThe national ministry of education- We have a large autonomy in decision making
![Page 4: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/4.jpg)
ERIC Lab
• Born in 1995, • 11 professors (N. Nicoloyannis, director)• 15 PhD Students• Grants+contracts+WK+…=200K€/year• Research topics
– Data mining (theory, tools and applications)– Data warehouse management (T,T,A)
![Page 5: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/5.jpg)
Data Mining (T,T,A)• Theory
– Induction graphs– Learning and classification
• Tools– SIPINA : Plate form for data mining
• Applications– Medical fields– Chemical applications– Human science– …
Data mining TTA for complex data
![Page 6: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/6.jpg)
Data mining on complex data
• An example : Breast cancer diagnosis
![Page 7: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/7.jpg)
Motivations
c
r
yyYxxX
ΩYX
,, ,,
data ofset a be attributes twobe and
Let
1
1
Contingency table Y
X
1y cy
1x
rx
rcn
11n
1rn
cn1
.1n
.rn
n1.n cn.
XYT
XYT Association measure :It measures the strength of the relationshipbetween X and Y
![Page 8: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/8.jpg)
Motivations
c
r
yyYxxX
ΩYX
,, ,,
data ofset a be attributes twobe and
Let
1
1
Y
X
1y cy
1x
rx
rcn
11n
1rn
cn1
.1n
.rn
n1.n cn.
Contingency table
XYT
XYT
Association measure :It measures the strength of the relationshipbetween X and Y
![Page 9: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/9.jpg)
Motivations
c
r
yyYxxX
ΩYX
,, ,,
data ofset a be attributes twobe and
Let
1
1
Y
X
1y cy
1x
rx
rcn
11n
1rn
cn1
.1n
.rn
n1.n cn.
Contingency table
XYT
XYT
Association measure :It measures the strength of the relationshipbetween X and Y
![Page 10: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/10.jpg)
Motivations
c
r
yyYxxX
ΩYX
,, ,,
data ofset a be attributes twobe and
Let
1
1
Y
X
1y cy
1x
rx
rcn
11n
1rn
cn1
.1n
.rn
n1.n cn.
Contingency table
XYT
XYT
Association measure :It measures the strength of the relationshipbetween X and Y
According to a specific association measure, may we improve the strength of the relationship by merging some rows and/or some columns ?
![Page 11: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/11.jpg)
Motivations
Y
X
1y cy
1x
rx
rcn
11n
1rn
cn1
.1n
.rn
n1.n cn.
Contingency table
XYT
XYT Association measure :It measures the strength of the relationshipbetween X and Y
XYXY
XY
TTrrcc
T
' and ' and'
: that such '
According to a specific association measure, may we improve the strength of the relation ship by merging some rows and/or some columns ?
![Page 12: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/12.jpg)
An example
140ˆ .tTXY
![Page 13: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/13.jpg)
Goal:Find the groupings that maximize the association between attributes
Yes, we can improve the association by reducing the size of the contingency
table
tt ˆ'ˆ
320'ˆ' .tT XY
For the preceding examplethe maximization of the Tschuprow’s t gives
![Page 14: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/14.jpg)
Extension
c
r
yyYxxX
ΩYX
,, ,,
data ofset a be attributes twobe and
Let
1
1
Y
X
1y cy
1x
rx
rcn
11n
1rn
cn1
.1n
.rn
n1.n cn.
Contingency table
XYT
XYT
According to a specific association measure, may we find the optimal reduced contingency table ?
iXYiXY
XY
TT
ll
cc
T
max *
*
*
*
![Page 15: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/15.jpg)
Optimal solution (exhaustive search)
Goal : Find the best cross partition on T
case ordinal
case nominal
XT#P
YX
YY
XX
Y
X
T#T#
TT#TT#
YTXT
PP
PPPP
PP
ischeck tohave wecases ofnumber The
set theof size the: set theof size the:
over about brought partitions all ofset The : over about brought partitions all ofset The :
![Page 16: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/16.jpg)
Optimal solution (exhaustive search)
case ordinal
case nominal
XT#P
![Page 17: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/17.jpg)
Optimal solution (exhaustive search)
According to a specific association measure, may we find the optimal reduced contingency table ?
Yes, but the solution is intractable in real word because of the high time complexity
![Page 18: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/18.jpg)
Heuristic
1
,0
whenStop
2,1 ly successive determines algorithm The
categoriesfinest the withStarting
kk
k
cr
TT
kT
TT
Proceed successively to the grouping of 2 (row or column) values that
maximizes the increase in the association criteria.
![Page 19: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/19.jpg)
Complexity
![Page 20: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/20.jpg)
Simulation
Goal: How far is the quasi-optimal solution from the true optimum?
Comparison tractable for tables not greater than 6 × 6.
Simulation DesignRandomly generate 200 tables
Analysis of the distribution of the deviations between optima andquasi-optima.
Generating the Tables10000 cases distributed in the cxr cells of the table with an uniform distribution (worst case).
![Page 21: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/21.jpg)
Quasi-optimal solution
![Page 22: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/22.jpg)
Quasi-optimal solution
![Page 23: Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)](https://reader036.fdocuments.in/reader036/viewer/2022062816/56814c1e550346895db920c2/html5/thumbnails/23.jpg)
Conclusion• Implementation for new approach induction decision
tree.– Zighed, D.A., Ritschard, G., W. Erray and V.-M. Scuturici (2003),
Abogodaï,a New approach for Decision Trees, in Lavrac, N., D.Gamberger, L. Todorovski and H. Blockeel (eds), Knowledge Discovery in databases: PKDD 2003 , LNAI 2838, Berlin: Springer, 495--506.
– Zighed D. A., Ritschard G., Erray W., Scuturici V.-M. (2003), Decision tree with optimal join partitioning, To appear in Journal of Information Intelligent Systems, Kluwer (2004).
• Divisive top-down approach• Extension to multidimensionnal case