Siggi Olafsson Associate Professor Department of Industrial Engineering Iowa State University 20 th...
-
Upload
erick-obrien -
Category
Documents
-
view
215 -
download
1
Transcript of Siggi Olafsson Associate Professor Department of Industrial Engineering Iowa State University 20 th...
Siggi OlafssonAssociate Professor
Department of Industrial EngineeringIowa State University
20th European Conference on Operational ResearchRhodes, Greece, July 5 - 8
Operations Research&
Data Mining
20th European Conference on Operational Research, July 4-7, 2004
2
Purpose of Talk Give a definition and an
overview of data mining as it relates to operations research
Present some examples to give the flavor for the type of work that is possible
My views and future of OR and data mining
Aim for it to be accessible without prior knowledge of data mining
ShouldI be
here?
20th European Conference on Operational Research, July 4-7, 2004
3
Overview Background Intersection of OR and Data Mining
Optimization algorithms used for data mining Data visualization Attribute selection Classification Unsupervised learning
Data mining used in OR applications Production scheduling
Optimization methods applied to output of standard data mining algorithms
Selecting and improving decision trees Open research areas
20th European Conference on Operational Research, July 4-7, 2004
4
Background Rapidly growing interest in data mining among
operations research academics and practitioners
For example evidenced by increased data mining presence in professional organizations New INFORMS Section on Data Mining
Large number of data mining sessions at INFORMS and IIE research conferences
Special issues in Computers & Operations Research, IIE Transactions, Discrete Applied Mathematics, etc.
Numerous presentations/sessions at this conference
20th European Conference on Operational Research, July 4-7, 2004
5
What is Data Mining?
20th European Conference on Operational Research, July 4-7, 2004
6
What is Data Mining, Really? Extracting meaningful, previously unknown
patterns or knowledge from large databases
The knowledge discovery process
DefineObjective
PrepareData
MineKnowledge
InterpretResults
Data cleaning
Data selection
Attribute selection
Visualization
Classification
Association rule
discovery
Clustering
Business/scientific
objective
Data mining
objective
Predictive models
Structural insights
20th European Conference on Operational Research, July 4-7, 2004
7
Interdisciplinary FieldStatistics
Databases
Optimization
MachineLearning
Data Mining
20th European Conference on Operational Research, July 4-7, 2004
8
Input Engineering Preparing the data may take as much as 70% of
the entire effort Numerous steps, including
Combining data sources Transforming attributes Data cleaning Data selection Attribute selection Data visualization
Many of those have connections with operations research and optimization in particular
20th European Conference on Operational Research, July 4-7, 2004
9
Overview Background Intersection of OR and Data Mining
Optimization algorithms used for data mining Data visualization Attribute selection Classification Unsupervised learning
Data mining used in OR applications Production scheduling
Optimization methods applied to output of standard data mining algorithms
Selecting and improving decision trees Open research areas
20th European Conference on Operational Research, July 4-7, 2004
10
Data Visualization Visualizing the data is important in any
data mining project Generally difficult because the data is
always high-dimensional, i.e., hundreds or thousands of attributes (variables)
How can we best visualize such data in 2 or 3 dimensions?
Traditional techniques include multidimensional scaling, which uses nonlinear optimization
20th European Conference on Operational Research, July 4-7, 2004
11
Optimization Formulation Recent combinatorial optimization formulation by Abbiw-
Jackson, Golden, Raghavan, and Wasil (2004) Map a set M of m points from Rr to Rq, q = 2,3 Approximate the q-dimensional space by a lattice N
1,0
,1s.t.
),(),,(min
1
ik
Nkik
Mij
Mj Nk Nljlikneworiginal
x
Mix
xxlkdjidF
etc map,Sammon square,least assuch Function
in measure Distance),(
in measure Distance),(
F
lkd
jidq
new
roriginal
R
R
20th European Conference on Operational Research, July 4-7, 2004
12
Solution Methods Quadratic Assignment Problem (QAP) Not possible to solve exactly for large scale problems Local search procedure proposed
Key to the formulation is selection of objective function, e.g., Sammon map
MiijMj Nk Nl original
jlikneworiginal
MiijMj
original jid
xxlkdjid
jid ),(
),(),(
),(
1min
2
20th European Conference on Operational Research, July 4-7, 2004
13
Overview Background Intersection of OR and Data Mining
Optimization algorithms used for data mining Data visualization Attribute selection Classification Unsupervised learning
Data mining used in OR applications Production scheduling
Optimization methods applied to output of standard data mining algorithms
Selecting and improving decision trees Open research areas
20th European Conference on Operational Research, July 4-7, 2004
14
Attribute Selection Usually large number of attributes Some attributes are redundant or
irrelevant and should be removed Benefits:
Faster subsequent induction Simpler models (important in data mining) Better (predictive) performance of models Discover which attributes are important
(descriptive or structural knowledge)
20th European Conference on Operational Research, July 4-7, 2004
15
Optimization Formulation Define decision variable
Combinatorial optimization problem
Number of solutions is 2n-1
How should the objective function be defined?
otherwise.,0
selected, is attribute if,1 jx j
jx
xxxf
j
n
1,0s.t.
,...,,max 21xx
20th European Conference on Operational Research, July 4-7, 2004
16
Solution Methods Non-linear objective function
(Defining a good objective is a major issue)
Mathematical programming approach (Bradley, Mangasarian and Street, 1998)
Metaheuristics have been applied extensively Genetic algorithms, simulated annealing Nested partitions method (Olafsson and Yang, 2004)
Intelligent partitioning: take advantage of what is known in data mining about evaluating attributes
Random instance sampling: in each step the algorithm uses a sample of instances, which improves scalability
20th European Conference on Operational Research, July 4-7, 2004
17
Learning from Data Each data point (instance) represents an
example from which we can learn The instances are either
Labeled (supervised learning) One attribute is of special interest (called the class or
target) and each instance is labeled by its class value Unlabeled (unsupervised learning)
Instances are assumed to be independent (However, spatial and temporal data mining
are active areas of research)
20th European Conference on Operational Research, July 4-7, 2004
18
Learning Tasks in Data Mining Classification (supervised learning)
Learn how to classify data in one of a given number of categories or classes
Clustering (unsupervised learning) Learn natural groupings (clusters) of data
Association Rule Discovery Learn correlations (associations) among the
data instances Also called market basket analysis
20th European Conference on Operational Research, July 4-7, 2004
19
Overview Background Intersection of OR and Data Mining
Optimization algorithms used for data mining Data visualization Attribute selection Classification Unsupervised learning
Data mining used in OR applications Production scheduling
Optimization methods applied to output of standard data mining algorithms
Selecting and improving decision trees Open research areas
20th European Conference on Operational Research, July 4-7, 2004
20
Classification Classification is the most common learning
task in data mining Many methods have been proposed
Decision trees, neural networks, support vector machines, Bayesian networks, etc.
The algorithm is trained on part of the data and the accuracy tested on independent data (or use cross-validation)
Optimization is relevant to many classification methods
20th European Conference on Operational Research, July 4-7, 2004
21
Optimization Formulation Suppose we have n attributes and each instance has been
labeled as belonging to one of two classes Represent by two matrices A and B Need to learn what separates the points in the two sets (if
they can be separated) In a 1965 Operations Research article, Olvi Mangasarian
studied the case where the two sets can be separated with a hyperplane:
0
, ,
wx
eBweAw
20th European Conference on Operational Research, July 4-7, 2004
22
Class A
Class B
x1
x2
Separating hyperplane
Closest points inconvex hulls
c
d
Separating Hyperplane
20th European Conference on Operational Research, July 4-7, 2004
23
Finding the Closest Points
0
1
1
s.t.2
1min
B Class
A Class
B Class
A Class
2
,
i
i:i
i:i
i:ii
i:ii
dc
xd
xc
dc
Formulate as QP:
20th European Conference on Operational Research, July 4-7, 2004
24
Support Vector Machines
Class A
Class B
SeparatingHyperplane
Support Vectors
x1
x2
20th European Conference on Operational Research, July 4-7, 2004
25
Limitations The points (instances) may not be separable by a
hyperplane Add error terms to minimize
A linear separation is quite limited
Class A
Class B
x1
x2
Solution is to map the data to a higher dimensional space
20th European Conference on Operational Research, July 4-7, 2004
26
Wolfe Dual Problem First formulate the Wolfe dual
Now the data only appears in the dot product in the objective function
.0
0subject to
2
1max
,
2
iii
i
jijijiji
ii
y
C
yy
xxwα
20th European Conference on Operational Research, July 4-7, 2004
27
Kernel Functions Use kernel functions to map the data and replace
the dot product with
For example,
)()()( yxyx, K Hn R:
)tanh()(
)(
)1()(22
2/
yxyx,
yx,
yxyx,yx
K
eK
K p
20th European Conference on Operational Research, July 4-7, 2004
28
Other Classification Work Extensive publications on SVM and mathematical
programming for classifications Several other approaches also relevant, e.g.
Logical Analysis of Data (LAD) learns logical expressions to classify the target attribute (series of papers by Hammer, Boros, et al.)
Related approach is Logic Data Miner Lsquare (e.g., talk by Felici, Truemper, and Paola last Monday)
Bayesian networks are often used, and finding the best structure of such networks is a combinatorial optimization problem
Further discussed in the next talk
20th European Conference on Operational Research, July 4-7, 2004
29
Overview Background Intersection of OR and Data Mining
Optimization algorithms used for data mining Data visualization Attribute selection Classification Unsupervised learning
Data mining used in OR applications Production scheduling
Optimization methods applied to output of standard data mining algorithms
Selecting and improving decision trees Open research areas
20th European Conference on Operational Research, July 4-7, 2004
30
Data Clustering Now we do not have labeled data to train
(unsupervised learning) Want to identify natural clusters or
groupings of data instances Many possible set of clusters
What makes a set of clusters good?
20th European Conference on Operational Research, July 4-7, 2004
31
Optimization Formulation Given a set A of m points, find the centers Cj of k
clusters that minimize the 1-norm
This formulation is due to Bradley, Mangasarian, and Street (1997)
Much more work is needed in this area
kjmiDCAD
De
ijjTiij
m
iij
T
jDC
,...,1;,...,1 ,s.t.
minmin1
,
20th European Conference on Operational Research, July 4-7, 2004
32
Association Rule Discovery Find strong associations among instances (e.g.,
high support and confidence) Originally used in market basket analysis, e.g.,
what products are candidates for cross-sell, up-sell, etc.
Define an item as an attribute-value pair Algorithm approach (Agrawal et al., 1992, Apriori
and related methods): Generate frequent item sets with high support Generate rules from these sets with high confidence
20th European Conference on Operational Research, July 4-7, 2004
33
Objectives for Association Rules Want high support and high confidence
Maximizing support would lead to only discovering a few trivial rules (those that occur very frequently)
Maximizing confidence leads to obvious rules (those that are 100% accurate)
Support and confidence are usually treated as constraints (user specified minimum)
Still need measures for good rules (i.e., rules that add insights and are hence interesting)
Significant opportunities for optimizing the rules that are obtained (not much work, yet)
20th European Conference on Operational Research, July 4-7, 2004
34
Overview Background Intersection of OR and Data Mining
Optimization algorithms used for data mining Data visualization Attribute selection Classification Unsupervised learning
Data mining used in OR applications Production scheduling
Optimization methods applied to output of standard data mining algorithms
Selecting and improving decision trees Open research areas
20th European Conference on Operational Research, July 4-7, 2004
35
Data Mining for OR Applications
Data mining can be used to complement
traditional OR methods in many areas
Example applications areas:
E-commerce
Supply chain management (e.g., to enable
customer-value management in the chain)
Production scheduling
20th European Conference on Operational Research, July 4-7, 2004
36
Data Mining for Scheduling Production scheduling is often ad-hoc in practice
Experience and intuition of human schedulers
Li and Olafsson (2004) propose a method to learn
directly from production data
Benefits Make scheduling practices explicit
Incorporate in automatic scheduling system
Insights into operations
Improve schedules
20th European Conference on Operational Research, July 4-7, 2004
37
Background Scheduling task
Given a finite set of jobs, sequence the jobs in order of priority
Many simple dispatching rules available Machine learning in scheduling
Considerable work over two decades Expert systems Inductive learning
Select dispatching rules from simulated data Has not been applied directly to scheduling data
(which would be data mining)
20th European Conference on Operational Research, July 4-7, 2004
38
Simple Example: Dispatching List
Job ID
Release Time
Start Time
Processing Time
Completion Time
J5 0 0 17 17J1 10 17 15 32J3 18 32 20 52J4 0 52 7 59J2 30 59 5 64
How were these five jobs scheduled?
Longest processing time first (LPT)
20th European Conference on Operational Research, July 4-7, 2004
39
Data Mining Formulation Determine the target concept
Dispatching rules are a pair-wise comparison
Learning task: Given two jobs, which job should
be dispatched first?
Data preparation Construct a flat file
Each line (instance/data object) is an example
of the target concept
20th European Conference on Operational Research, July 4-7, 2004
40
J1 15 10 J2 5 30 Yes
J1 15 10 J3 20 18 Yes
J1 15 10 J4 7 0 Yes
J1 15 10 J5 17 0 No
J2 5 30 J1 15 10 No
J2 5 30 J3 20 18 No
J2 5 30 J4 7 0 No
Job1
Processing Time1
Release1
Job 2
Processing Time2
Release2
Job1ScheduledFirst
Prepared Data File
20th European Conference on Operational Research, July 4-7, 2004
41
Input Engineering Attribute creation (i.e., composite
attributes) and attribute selection is an important part of data mining
Add attributes: ProcessingTimeDifference ReleaseDifference Job1Longer Job1ReleasedFirst
Select the best subset of attributes Apply the C4.5 decision tree algorithm
20th European Conference on Operational Research, July 4-7, 2004
42
Decision Tree
Yes No
Job 1 Longer?
Yes Yes NoNo
Job 1 ReleasedFirst?
Job 1 ReleasedFirst?
No Yes
-8 > -8
NoProcessing Time
DifferenceYesLPT forreleased jobs
NoDo not wait for Job 1 if not much longer than Job 2
Yes
5 > 5
Processing TimeDifference
Wait for Job 1 to bereleased if it is muchlonger than Job 2
20th European Conference on Operational Research, July 4-7, 2004
43
Structural Knowledge The dispatching rule is LPT Mine data that use this rule and the processing
time and release time data The induced model takes into account:
Possible range of processing times Largest delay caused by a not released job
New structural patterns, not explicitly known by the dispatcher, discovered
Next step is to improve schedules Instance selection: learn from best practices Optimize the decision tree
20th European Conference on Operational Research, July 4-7, 2004
44
Overview Background Intersection of OR and Data Mining
Optimization algorithms used for data mining Data visualization Attribute selection Classification Unsupervised learning
Data mining used in OR applications Production scheduling
Optimization methods applied to output of standard data mining algorithms
Selecting and improving decision trees Open research areas
20th European Conference on Operational Research, July 4-7, 2004
45
Optimizing Decision Trees Decision tree induction is often unstable Genetic algorithms have been used to
select the best tree from a set of trees Kennedy et al. (1997) encode decision trees
and define crossover and mutation operators The accuracy of the tree is the fitness function A series of papers by Fu, Golden, et al. (2003;
2004a; 2004b) builds further on this approach Other optimization methods could also
apply and other outputs can be optimized
20th European Conference on Operational Research, July 4-7, 2004
46
Overview Background Intersection of OR and Data Mining
Optimization algorithms used for data mining Data visualization Attribute selection Classification Unsupervised learning
Data mining used in OR applications Production scheduling
Optimization methods applied to output of standard data mining algorithms
Selecting and improving decision trees Open research areas
20th European Conference on Operational Research, July 4-7, 2004
47
Conclusions Although data mining related optimization work
dates back to the 1960s, most problems are still open or need more research
Need to be aware of the key concerns of data mining: extracting meaningful, previously unknown patterns or knowledge from large databases Algorithms should handle massive data sets, that is, be
scalable with respect to both time and memory use Results often focus on simple to interpret meaningful
patterns that provide structural insights Previously unknown means few modeling assumptions
that restrict what can be discovered
20th European Conference on Operational Research, July 4-7, 2004
48
Open Problems Many data mining problems can be formulated as
optimization problems Seen numerous examples, e.g., classification and
attribute selection (most work for these problems) Many areas have not been addressed or need more work
(in particular, clustering and association rule mining) Optimizing model outputs is very promising Use of data mining in OR applications has been
very little investigated Supply chain management Logistics and transportation Planning and scheduling
20th European Conference on Operational Research, July 4-7, 2004
49
Questions? For more information after today:
Email me at [email protected] Visit my homepage at http://www.public.iastate.edu/~olafsson Consult Dilbert
20th European Conference on Operational Research, July 4-7, 2004
50
Select References The following surveys on optimization and data mining are available:
1. Padmanabhan, B. and A. Tuzhilin (2003). “On the Use of Optimization for Data Mining: Theoretical Interactions and eCRM Opportunities,” Management Science 49: 1327-1343.
2. Bradley, P.S., U.M. Fayyad, and O.L. Mangasarian (1999). “Mathematical Programming for Data Mining: Formulations and Challenges,” INFORMS Journal of Computing 11: 217-238.
Work mentioned in presentation:3. Abbiw-Jackson, B. Golden, S. Raghavan, and E. Wasil (2004). “A Divide-and-Conquer Local Search Heuristic for
Data Visualization,” Working Paper, University of Maryland.4. Boros, E. P.L. Hammer, T. Ibaraki, A. Kogan (1997). “Logical Analysis of Numerical Data,” Mathematical
Programming 79: 163-190.5. Bradley, P.S., O.L. Mangasarian, and W.N. Street (1997). “Clustering via Concave Minimization,” in M.C. Mozer, M.I.
Jordan, T. Petsche (eds.) Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA.6. Bradley, P.S., O.L. Mangasarian, and W.N. Street (1998). “Feature Selection via Mathematical Programming,”
INFORMS Journal of Computing 10: 209-217.7. Fu, Z., B. Golden, S. Lele, S. Raghavan, and E. Wasil (2003). “A Genetic Algorithm-Based Approach for Building
Accurate Decision Trees,” INFORMS Journal of Computing 15: 3-22.8. Kennedy, H., C. Chinniah, P. Bradbeer, and L. Morss (1997). “The Construction and Evaluation of Decision Trees: A
Comparison of Evolutionary and Concept Learning Methods,” in D. Corne and J.L. Shapiro (eds.) Evolutionary Computing, Lecture Notes in Computer Science, Springer-Verlag, 147-161.
9. Li, X. and S. Olafsson (2004). “Discovering Dispatching Rules using Data Mining,” Journal of Scheduling, to appear.10. Mangasarian, O.L. (1965). “Linear and Nonlinear Separation of Patterns by Linear Programming,” Operations
Research 13: 455-461.11. Olafsson, S. and J. Yang (2004). “Intelligent Partitioning for Feature Selection,” INFORMS Journal on Computing, to
appear.