Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods
description
Transcript of Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods
Comparing the Parallel Automatic Comparing the Parallel Automatic Composition of Inductive ApplicationsComposition of Inductive Applications with Stacking Methods with Stacking Methods
Hidenao Abe & Takahira Yamaguchi
Shizuoka University, JAPAN
{hidenao,yamaguti}@ks.cs.inf.shizuoka.ac.jp
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 2
ContentsContents
• Introduction
• Constructive meta-learning based on method repositories
• Case study with common data sets
• Conclusion
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 3
Overview of meta-learning schemeOverview of meta-learning scheme
学習学習学習Learning Algorithms
Meta-LearningExecutor
データセットData Set
Meta Knowledge
A Better result to the given data setthan its done
by each single learning algorithm
Meta-learningalgorithm
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 4
Selective meta-learning schemeSelective meta-learning scheme
• Integrating base-level classifiers, which are learned with different training data sets generating by– “Bootstrap Sampling” (bagging)– weighting ill-classified instances (boosting)
• Integrating base-level classifiers, which are learned with different algorithms, with– simple voting (voting)– constructing meta-classifier with a meta data set and a
meta-level learning algorithm (stacking, cascading)
They don’t work well,They don’t work well, when no base-level learning when no base-level learningalgorithm works well to the given data set !!algorithm works well to the given data set !!
-> Because they don’t de-compose base-level learning algorithms.
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 5
ContentsContents
• Introduction• Constructive meta-learning based on method
repositories– Basic idea of our constructive meta-learning– An implementation of the constructive meta-learning called
CAMLET– Parallel execution and search for CAMLET
• Case Study with common data sets• Conclusion
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 6
De-composition&Organization
Search &Composition+
C4.5C4.5
VSVS
CSCS
AQ15AQ15
Analysis of RepresentativeInductive Learning Algorithms
Basic Idea of ourBasic Idea of ourConstructive Meta-LearningConstructive Meta-Learning
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 7
CSCS
AQ15AQ15
Analysis of Representative Inductive Learning Algorithms
Basic Idea ofBasic Idea ofConstructive Meta-LearningConstructive Meta-Learning
De-composition&Organization
Search &Composition+
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 8
Analysis of Representative Inductive Learning Algorithms
Automatic Composition of Inductive Applications
De-composition&Organization
Search &Composition+
Basic Idea ofBasic Idea ofConstructive Meta-LearningConstructive Meta-Learning
Organizing inductive learning methods,control structures and treated objects.
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 9
Analysis of Representative Inductive Analysis of Representative Inductive Learning AlgorithmsLearning Algorithms
• Version Space• AQ15• ID3• C4.5• Boosted C4.5• Bagged C4.5• Neural Network with back propagation• Classifier Systems
We have analyzed the following 8 learning algorithms:
We have identified 22 specific inductive learning methods.
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 10
Inductive Learning Method RepositoryInductive Learning Method Repository
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 11
Data Type HierarchyData Type Hierarchy
Organization of input/output/reference data typesOrganization of input/output/reference data types for inductive learning methods for inductive learning methods
classifier-set
data-set
If-Then rulestree (Decision Tree)
network (Neural Net)
training data set
test data set
Objects
validation data set
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 12
Identifying Inductive Learning Identifying Inductive Learning Methods and Control StructuresMethods and Control Structures
GeneratingGeneratingTraining & Training & ValidationValidationData SetsData Sets
GeneratingGeneratinga classifier a classifier
setset
Evaluating Evaluating a classifiera classifier
setset
Modifying Modifying training and training and
validation data setsvalidation data sets
Modifying Modifying a classifiera classifier
setset
EvaluatingEvaluating classifier setsclassifier sets
to test to test data setdata set
Modifying Modifying a classifiera classifier
setset
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 13
CAMLETCAMLET
CAMLET:CAMLET:a Computer Aided Machine Learning Engineering Toola Computer Aided Machine Learning Engineering Tool
Method RepositoryData Type HierarchyControl Structures
Compile
Go & Test
Construction
Instantiation
Go to the goal accuracy?
No Yes
Data Sets, Goal Accuracy
Inductive application, etc.
Refinement
User
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 14
CAMLETCAMLET
CAMLET with a parallel environmentCAMLET with a parallel environment
Method RepositoryData Type HierarchyControl Structures
Compile
Go & Test
ExecutionLevel
Construction
Instantiation
Go to the goal accuracy?
No Yes
Data Sets, Goal Accuracy
Inductive application, etc.
Composition Level
Refinement
SpecificationsData Sets
Results
User
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 15
Parallel Executions of Inductive Parallel Executions of Inductive ApplicationsApplications
Constructing specs
Sending specs
Initializing
R: receiving ,Ex: executing, S: sending
Composition Level is necessary one process
Execution Level is necessary one or more process elementsR R R R
Waiting
Executing
Ex Ex Ex Ex
Receiving a result
Receiving
Ex Ex S Ex
Refining the specSending refined one
Refining & Sending
Ex Ex ExR
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 16
Refinement of Inductive Applications Refinement of Inductive Applications with GAwith GA
t Generation
Selection
Parents
Crossover andMutation
Executed Specs &Their Results
X Generation
execute& add
1. Transform executed specifications to chromosomes
2. Select parents with “Tournament Method”
3. Crossover the parents and Mutate one of their children
4. Execute children’s specification, transforming chromosomes to specs
* If CAMLET can execute more than two I.A at the same time, some slow I.A will be added to t+2 or later Generation.
t+1 Generation
Children
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 17
CAMLETCAMLET
CAMLET on a parallel GA refinementCAMLET on a parallel GA refinement
Method RepositoryData Type HierarchyControl Structures
Compile
Go & Test
ExecutionLevel
Construction
Instantiation
Go to the goal accuracy?
No Yes
Data Sets, Goal Accuracy
Inductive application, etc.
Composition Level
Refinement
SpecificationsData Sets
Results
User
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 18
ContentsContents
• Introduction
• Constructive meta-learning based on method repositories
• Case Study with common data sets– Accuracy comparison with common data sets– Evaluation of parallel efficiencies in CAMLET
• Conclusion
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 19
Set up of accuracy comparison Set up of accuracy comparison with common data setswith common data sets
• We have used two kinds of common data sets– 10 Statlog data sets– 32 UCI data sets (distributed with WEKA)
• We have input goal accuracies to each data set as the criterion.
• CAMLET has output just one specification of the best inductive application to each data set and its performance.– CAMLET has searched about 6,000 inductive applications for the best
one, executing up to one hundred inductive applications.
• 10 “Execution Level” PEs have been used to each data set.
• Stacking methods implemented in WEKA data mining suites
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 20
Statlog common data setsStatlog common data sets
Dataset #Class #Att. (Nom:Num) #Instances Evaluationaustralian 2 14 (6:8) 690 10CVdiabetes 2 8 (0:8) 768 10CVdna 3 180 (180:0) 2,000/ 1,186 Testgerman 2 20 (17:3) 1,000 10CVheart 2 13 (6:7) 270 10CVletter- liacc 26 16 (0:16) 16,000/ 4,000 Testsatimage 6 36 (0:36) 4,435/ 2,000 Testsegment 7 18 (0:18) 2,317 10CVshuttle 7 9 (0:9) 43,500/ 14,500 Testvehicle 4 18 (0:18) 846 10CV
To generate 10-fold cross validation data sets’ pairs,We have used SplitDatasetFilter implemented in WEKA.
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 21
Setting up of Stacking methods to StSetting up of Stacking methods to Statlog common data setsatlog common data sets
J4.8 (with pruning, C=0.25)
IBk (k=5)
Bagged J4.8 (Sampling rate=55%, 5 Iterations)
Bagged J4.8 (Sampling rate=55%, 10 Iterations)
Boosted J4.8 (5 Iterations) Boosted J4.8 (10 Iterations)
Part (with pruning, C=0.25)
Naïve Bayes
J4.8 (with pruning, C=0.25)
Naïve BayesClassification withLinear Regression
(with all of features)
Meta-level Learning Algorithms
Base-level Learning Algorithms
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 22
Evaluation on accuracies to StatlEvaluation on accuracies to Statlog common data setsog common data sets
707580859095
100australian
diabetes
dna
german
heart
letter- liacc
satimage
segment
shuttle
vehicle
J 4.8 NB LR (Goal) CAMLET
Inductive applications composed by CAMLET have shown us asgood performance as that of stacking with Classification via Linear Regression.
Algorithm Average (%)J4.8 87.0NB 86.6LR 87.6CAMLET 87.4
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 23
Setting up of Stacking methods to Setting up of Stacking methods to UCI common data setsUCI common data sets
J4.8 (with pruning, C=0.25)
IBk (k=5)
Bagged J4.8 (Sampling rate=55%, 10 Iterations)
Boosted J4.8 (10 Iterations)
Part (with pruning, C=0.25)
Naïve Bayes
J4.8 (with pruning, C=0.25)
Naïve BayesLinear Regression
(with all of features)
Meta-level Learning Algorithms
Base-level Learning Algorithms
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 24
Evaluation on accuracies to UCI Evaluation on accuracies to UCI common data setscommon data sets
30.00
65.00
100.00anneal
audiologyautos
balance-scalebreast-cancer
breast-w
colic
credit-a
credit-g
diabetes
glass
heart-c
heart-hheart-statlog
hepatitishypothyroidionosphere
iriskr-vs-kp
laborletter
lymph
mushroom
primary-tumor
segment
sick
sonar
soybean
splicevehicle
votevowel
J 4.8
NB
LR
Max. of 3stackingmethods(Goal)CAMLET
Algorithm J4.8 NB LR Max. of 3 CAMLET
Average 81.92 81.97 81.07 83.79 83.23
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 25
Analysis of parallel efficiency of Analysis of parallel efficiency of CAMLETCAMLET
• We have analyzed parallel efficiencies of executions of the case study with “Statlog common data sets”.– We have used 10 processing elements to execute each i
nductive applications.
• Parallel Efficiency is calculated as the following formula:
x
i
n
jije
utionTimeActualExecficiencyParallelEf
1 1
1<=x<100, n:#fold, e:Execution Time of Each PE
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 26
Parallel Efficiency of the case stuParallel Efficiency of the case study with StatLog data setsdy with StatLog data sets
5 6 7 8 9 10
australian
diabetes
german
heart
segment
vehicle
5 6 7 8 9 10
dna
letter
satimage
shuttle
Parallel Efficiencies of 10-fold CV data sets
Parallel Efficiencies of training-test data sets
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 27
Why the parallel efficiency of “satiWhy the parallel efficiency of “satimage” data set was so low?mage” data set was so low?
0
500
1000
1500
2000
2500
0 20 40 60 80 100
#Executions of Inductive Applications
Exec
utio
n Ti
me
(sec
.)
Because: - CAMLET could not find satisfactory inductive application to this data set within 100 executions of inductive applications.- In the end of search, inductive applications with large computational cost badly affected to parallel efficiency.
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 28
ContentsContents
• Introduction
• Constructive meta-learning based on method repositories
• Case Study with common data sets
• Conclusion
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 29
ConclusionConclusion• CAMLET has been implemented as a tool for
“Constructive Meta-Learning” scheme based on method repositories.
• CAMLET shows us a significant performance as a meta-learning scheme.– We are extending the method repository to construct data mining
application of whole data mining process.• The parallel efficiency of CAMLET has been
achieved greater than 5.93 times.– It should be improved
• with methods to equalize each execution time such as load balancing or/and three-layered architecture.
• with more efficient search method to find satisfactory inductive application faster.
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 30
Thank you!Thank you!
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 31
Stacking: training and predictionStacking: training and predictionTraining phase Prediction phase
Training Set
Training Set(Int.)
Test Set(Int.)
Training Set Test Set
Meta-levelTraining Set
Meta-levelclassifier
Meta-level learning
ベースレベル分類器
ベースレベル分類器
Base-levelclassifiers
Learning ofBase-level classifiers
Transformation
Repeatingwith CV
Meta-levelTest Set
Transformation
Prediction
Results of Preduction
ベースレベル分類器
ベースレベル分類器
Base-levelclassifiers
Meta-levelclassifiers
Learning ofBase-level classifiers
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 32
How to transforme base-level attriHow to transforme base-level attributes to meta-level attributesbutes to meta-level attributes
Att. A Att. B Att. C Classesa1 0.3 c3 0a2 5 c2 1
… … … …
Classifier byAlgorithm 1
Learning with Algorithm 1
Prob. Of Predicting "0"Prob. Of Predicting "1"0.99 0.01 00.4 0.6 1
… … …
Algorithm 1Classes
1. Getting prediction probability to each class value with the classifier.2. Adding base-level class value of each base-level instance
A algorithmC class values#Meta-level Att. = AC
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 33
Meta-level classifiers (Ex.)Meta-level classifiers (Ex.)
The prediction probability of”0”With classifier by algorithm 1
The prediction probability of”1”With classifier by algorithm 2
The prediction probability of”1”With classifier by algorithm 1
The prediction probability of”0”With classifier by algorithm 3
Prediction“1”
....
≦0.1> 0.1
≦0.95> 0.95 > 0.9≦0.9
Prediction“0”
Prediction“1”
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 34
Accuracies of 8 base-level learning aAccuracies of 8 base-level learning algorithms to Statlog data setslgorithms to Statlog data sets
StackingBase-level algorithms
Algorithms Avg.(10 data sets)J 4.8 84.84IBk(5) 84.32Part 84.10NaiveBayes 76.53Bagging(5) 85.29Bagging(10) 86.17Boosting(5) 85.56Boosting(10) 85.94Max 87.58
Accuracy(%)
CAMLETBase-level algorithms
Algorithms Avg.(10 data sets)C4.5 DT 81.07ID3 DT 81.76NeuralNetwork 63.09ClassifierSystems 64.85Bagging(5) 84.28Bagging(10) 85.17Boosting(5) 84.52Boosting(10) 85.59Max 87.04
Accuracy(%)
H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 35
C4.5 Decision Tree without pruning C4.5 Decision Tree without pruning (reproduction of CAMLET)(reproduction of CAMLET)
Generating training and validation data setswith a void validation set
Generating a classifier set (decision tree)with entropy + information ratio
Evaluating a classifier setwith set evaluation
Evaluating classifier sets to test data setwith single classifier set evaluation
1. Select just one control structure2. Fill each method with specific methods3. Instantiate this spec.4. Compile the spec.5. Execute its executable code6. Refine the spec. (if needed)