Post on 29-Mar-2019
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Scalable Bayesian Network Classifiers
Geoff WebbAna MartinezNayyar Zaidi
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Introduction
• Learning from large data
is not just about scaling-upexisting algorithms.
• Large data is best tackled by fundamentally new learningalgorithms.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Introduction
• Learning from large data is not just about scaling-upexisting algorithms.
• Large data is best tackled by fundamentally new learningalgorithms.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Introduction
• Learning from large data is not just about scaling-upexisting algorithms.
• Large data is best tackled by fundamentally new learningalgorithms.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Learning curves
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
• Error typically reduces as data quantity increases.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Learning curves
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
• Error typically reduces as data quantity increases.
• Different algorithms have different curves.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Learning curves
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
Overfitting on small data
• Algorithms that closely fit complex multivariatedistributions will tend to overfit small data
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Learning curves
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
Overfitting on small data
Best on large data
• Algorithms that closely fit complex multivariatedistributions will tend to overfit small data, but can betterfit large data
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Learning curves
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
Overfitting on small data
Best on large data
• Algorithms that closely fit complex multivariatedistributions will tend to overfit small data, but can betterfit large data: Bias-Variance Tradeoff
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Capacity to fit = model space + optimization
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1RM
SE
Training set size
Naïve Bayes
Logistic Regression
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Much research of questionable relevance
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
• Most prior research has used relatively small data.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Requirements
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
• Need algorithms that can closely fit complex multivariatedistributions
, while being very computationally efficient.• low bias• few passes through data• out of core
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Requirements
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
• Need algorithms that can closely fit complex multivariatedistributions, while being very computationally efficient.
• low bias• few passes through data• out of core
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Requirements
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
• Need algorithms that can closely fit complex multivariatedistributions, while being very computationally efficient.
• low bias
• few passes through data• out of core
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Requirements
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
• Need algorithms that can closely fit complex multivariatedistributions, while being very computationally efficient.
• low bias• few passes through data
• out of core
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Requirements
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
• Need algorithms that can closely fit complex multivariatedistributions, while being very computationally efficient.
• low bias• few passes through data• out of core
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
State-of-the-art
• Most low bias algorithms do not scale• Random Forest• SVM• Deep Learning
and inherently in-core
• Selective KDB is a scalable low-bias Bayesian NetworkClassifier
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
State-of-the-art
• Most low bias algorithms do not scale• Random Forest• SVM• Deep Learning
and inherently in-core
• Selective KDB is a scalable low-bias Bayesian NetworkClassifier
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
State-of-the-art
• Most low bias algorithms do not scale• Random Forest• SVM• Deep Learning
and inherently in-core
• Selective KDB is a scalable low-bias Bayesian NetworkClassifier
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Outline
1 Introduction
2 Bayesian Network Classifiers
3 Selective KDB
4 Experiments
5 Conclusions & Future Research
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Bayesian Network Classifiers
• Defined by parent relation π and Conditional ProbabilityTables (CPTs)
• π encodes conditional independence / structure• CPTs encode conditional probabilities
• Classifies using P(y | x) ∝ P(y | πY )∏
P(xi | πi )• Usually makes Y a parent of all Xi
Y
X1 X2 X3 X4 X5
• Given π, CPTs can be learned by counting jointfrequencies
• single pass• incremental
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
k-Dependence Bayes (KDB)
• Two pass learning
• 1st pass, learn structure:• Collect counts for pairs of attributes with the class.• Order attributes based on MI with the class.• Select parents based on CMI.
• no more than k parents• parents must be earlier in the order
Y
X1 X2 X3 X4 X5
• 2nd pass, learn CPTs:• Collect statistics according to the structure learned.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
k-Dependence Bayes (KDB)
• Two pass learning
• 1st pass, learn structure:• Collect counts for pairs of attributes with the class.• Order attributes based on MI with the class.• Select parents based on CMI.
• no more than k parents• parents must be earlier in the order
Y
X1 X2 X3 X4 X5
• 2nd pass, learn CPTs:• Collect statistics according to the structure learned.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
k-Dependence Bayes (KDB)
• Training Space: O(ya2v2 + yavk+1
)• Classification Space: O
(yavk+1
)• Training Time: O
(ta2 + ya2v2 + tak
)• Classification Time: O(yak)
t = no. of training examplesa = number of attributes;v = average number of values;y = number of classes
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
k-Dependence Bayes (KDB)
• Parameter k controls bias-variance tradeoff
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
Naïve Bayes
KDB k=1
KDB k=2
KDB k=3
KDB k=4
KDB k=5
• No a priori means to anticipate best k
• Spurious attributes may increase error
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
k-Dependence Bayes (KDB)
• Parameter k controls bias-variance tradeoff
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
Naïve Bayes
KDB k=1
KDB k=2
KDB k=3
KDB k=4
KDB k=5
• No a priori means to anticipate best k
• Spurious attributes may increase error
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
k-Dependence Bayes (KDB)
• Parameter k controls bias-variance tradeoff
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
Naïve Bayes
KDB k=1
KDB k=2
KDB k=3
KDB k=4
KDB k=5
• No a priori means to anticipate best k
• Spurious attributes may increase error
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
KDB models are nested
• Mi ,j = KDB with k = i using attributes 1 to j .
Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4
2 M2,1 M2,2 M2,3 M2,4
3 M3,1 M3,2 M3,3 M3,4
4 M4,1 M4,2 M4,3 M4,4
• Mi , j is a minor extension of Mi , j−1 and Mi−1, j
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
KDB models are nested
• Mi ,j = KDB with k = i using attributes 1 to j .
Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4
2 M2,1 M2,2 M2,3 M2,4
3 M3,1 M3,2 M3,3 M3,4
4 M4,1 M4,2 M4,3 M4,4
• Mi , j is a minor extension of Mi , j−1 and Mi−1, j
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
KDB models are nested
• Mi ,j = KDB with k = i using attributes 1 to j .
Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4
2 M2,1 M2,2 M2,3 M2,4
3 M3,1 M3,2 M3,3 M3,4
4 M4,1 M4,2 M4,3 M4,4
• Mi , j is a minor extension of Mi , j−1 and Mi−1, j
Y
X1 X2 X3 X4 X5
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
KDB models are nested
• Mi ,j = KDB with k = i using attributes 1 to j .
Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4
2 M2,1 M2,2 M2,3 M2,4
3 M3,1 M3,2 M3,3 M3,4
4 M4,1 M4,2 M4,3 M4,4
• Mi , j is a minor extension of Mi , j−1 and Mi−1, j
Y
X1 X2 X3 X4 X5
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
KDB models are nested
• Mi ,j = KDB with k = i using attributes 1 to j .
Attributes (ordered by MI)k 1 2 3 41 M1,1 M1,2 M1,3 M1,4
2 M2,1 M2,2 M2,3 M2,4
3 M3,1 M3,2 M3,3 M3,4
4 M4,1 M4,2 M4,3 M4,4
• Mi , j is a minor extension of Mi , j−1 and Mi−1, j
Y
X1 X2 X3 X4 X5
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Selective KDB
• In one extra pass select an attribute subset and best kusing leave-one-out CV.
• The full model subsumes all k × a submodels.
• Very efficient selection between a large class of strongmodels.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Selective KDB
• In one extra pass select an attribute subset and best kusing leave-one-out CV.
• The full model subsumes all k × a submodels.
• Very efficient selection between a large class of strongmodels.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Selective KDB
• In one extra pass select an attribute subset and best kusing leave-one-out CV.
• The full model subsumes all k × a submodels.
• Very efficient selection between a large class of strongmodels.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Leave-one-out CV
• Very low bias estimator of out-of-sample performance
• Incremental cross-validation makes it VERY efficient forBNC
• Collect counts from all data once• When classifying a hold-out object subtract it from the
counts
• All k × a nested models can be evaluated with little morecomputation than the full KDB model
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Selective KDB
• Training Space: O(ya2v2 + yavk+1
)• Classification Space: O
(ya∗vk
∗+1)
• Training Time: O(ta2 + ya2v2 + tayk
)• Classification Time: O(ya∗k∗)
a = number of attributes;v = average number of values;y = number of classesa∗ = number of attributes selected (a∗ ≤ a).k∗ = best value of k found (k∗ ≤ kmax).
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Selective KDB
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
Naïve Bayes
KDB k=1
KDB k=2
KDB k=3
KDB k=4
KDB k=5
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Selective KDB
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
Naïve Bayes
KDB k=1
KDB k=2
KDB k=3
KDB k=4
KDB k=5
SKDB Only K
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Selective KDB
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
Naïve BayesKDB k=1KDB k=2KDB k=3KDB k=4KDB k=5SKDB Only KSKDB k=5 Only Atts
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Selective KDB
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
RMSE
Training set size
Naïve BayesKDB k=1KDB k=2KDB k=3KDB k=4KDB k=5SKDB Only KSKDB k=5 Only AttsSKDB
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
16 datasets (165K-54M examples)No. of cases No. of No. of
Name (million) Atts. Classes Sizelocalization 0.165 5 11 11MB
census-income 0.299 41 2 136MB
USPSExtended 0.342 676 2 603MBs
MITFaceSetA 0.474 361 2 584MB
MITFaceSetB 0.489 361 2 603MB
MSDYear-Prediction 0.515 90 90 601MBs
covtype 0.581 54 7 72MB
MITFaceSetC 0.839 361 2 1.1GB
poker-hand 1.025 10 10 24MB
uscensus1990 2.458 67 4 325MB
PAMAP2 3.851 54 19 1.7GB
kddcup 5.210 41 40 754MB
linkage 5.749 11 2 251MB
mnist8ms 8.100 784 10 19GBs
satellite 8.705 138 24 3.6GB
splice 54.628 141 2 7.3GBs sparse format.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Numeric attributes
• 5 bin equal frequency discretisation
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Out-of-core BNCs
RMSE
KDB5 vs SKDB best KDB vs SKDB
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Out-of-core BNCs
• Training Time Comparisons
100
1000
10
100
(sec
onds
)
NB
1
Tim
e TAN
AODE
KDB k=5
SKDB
0,1
zatio
n
ncom
e
r-ha
nd
natio
n
ovty
pe
cens
us
eSet
A
eSet
B
ende
d
eSet
C
ddcu
p
SKDB
loca
liz
Cen
sus-
in
Poke
r
don
co usc
MIT
Face
MIT
Face
UPS
Sext
e
MIT
Face kd
Datasets
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Out-of-core BNCs
• Classification Time Comparisons
100
1000
k=4k=3
k=4k=3
k=5 k=4.5
10
me
(sec
onds
)
NBTAN
k=4 k=4 k=2
k=4.8 k=51
Tim TAN
AODEKDB k=5SKDB
k 4 k 4 k 20,1
ocal
izat
ion
oker
-han
d
us-in
com
e
dona
tion
covt
ype
usce
nsus
TFac
eSet
A
Sext
ende
d
TFac
eSet
B
TFac
eSet
C
kddc
up
lo P
Cen
su
MIT
UPS
S
MIT
MIT
Datasets
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Out-of-core SGD
• SGD in Vowpal Wabbit (VW):• Squared and logistic function.• Quadratic features (best results).• Different number of passes (3 or 10).• Discrete attributes into binary features.• One-against-all for multiclass classification.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Out-of-core SGD
0.0001 0.001 0.01 0.1 10.0001
0.001
0.01
0.1
1
VWLF - RMSE (log. scale)
SK
DB
(m
ax k
=5)
- R
MS
E (
log.
sca
le)
0.0001 0.001 0.01 0.1 10.0001
0.001
0.01
0.1
1
VWSF - RMSE (log. scale)
SK
DB
(m
ax k
=5)
- R
MS
E (
log.
sca
le)
VW (logistic) - RMSE VW (squared) - 0-1 Loss
VWRMSE (logistic) 0-1 Loss (squared)
Selective KDB (7+2)-0-7 8-1-7
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Out-of-core SGD
• Training Time Comparisons
100
1000
10000
(sec
onds
)
1
10
tion
ome
and
tion yp
e
nsus
etA
etB
ded
etC
cup
Tím
e
VWSF
VWLF
SKDB
loca
lizat
Cen
sus-
inco
Pok
er-h
a
dona
t
covt
y
usce
n
MIT
Fac
eSe
MIT
Fac
eS
UP
SS
exte
nd
MIT
Fac
eS
kddc
Datasets
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Out-of-core SGD
• Classification Time Comparisons
10
100
(sec
onds
)
0.1
1
tion
and
ome
tion
ype
nsus etA
ded
etB
etC
cup
Tim
e
VWSF
VWLF
SKDB
loca
lizat
Pok
er-h
a
Cen
sus-
inco
dona
t
covt
y
usce
n
MIT
Fac
eSe
UP
SSe
xten
d
MIT
Fac
eS
MIT
Fac
eS
kddc
Datasets
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
In-core BayesNet and RF
BayesNet RFx = sampled dataset
BayesNet RF (Num)k-selective KDB 6-3-3 5-0-6
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
In-core BayesNet and RF
• Training Time Comparisons
100
1000
10000
100000
seco
nds)
BayesNet
RF
0.1
1
10
tion
ome
tion
type etA
Set
B
nded
Tim
e ( s RF
SKDB
loca
lizat
Cen
sus-
inco
dona
t
covt
MIT
Fac
eS
MIT
Fac
eS
US
PS
Ext
en
Datasets
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
In-core BayesNet and RF
• Classification Time Comparisons
10
100
seco
nds)
BayesNet
RF
0.1
1
tion
ome
tion
type etA
Set
B
nded
Tim
e (s
SKDB
loca
liza
t
Cen
sus-
inco
dona
t
covt
MIT
Fac
eS
MIT
Fac
e S
US
PS
Ext
en
Datasets
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Global comparisons
• Cumulative Ranking
• Selective KDB copes well with high-dimensional datasetsand datasets with more than 1 million points.
• RF performs better on datasets with small # of attributes.
• VW has advantage for sparse numeric data.
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Global comparisons
• Ranking
7.63
6.817
8
5.88
5
6
king
3.47 3.44 3.34 2.912.533
4Ran
k
1
2
NB AODE TAN BayesNet VWLF KDB (best k) RF (Num) SKDBy ( ) ( )
Learners
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Global comparisons (Time)
Learner
Tim
e (s
econ
ds)
NB TAN AODE KDB ksKDB VWsq VWlog
050
0010
000
1500
0
Learner
Tim
e (s
econ
ds)
− lo
g. s
cale
NB TAN AODE KDB ksKDB VWsq VWlog
5010
020
050
010
00
Training Test
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Stepping back
• The key trick is nested evaluation of a large class ofcount-based models
• Works also with 2-pass selective evaluation of AnDEmodels
• Combines the efficiency of simple generative techniqueswith the power of discriminative learning
• Can use any loss function that is a function of eachP̂(y | x)
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Future Research
• Numeric attributes• it seems that there should be something better than
discretisation.• this remains elusive ...
• Overfitting avoidance
• Increase number of alternative models considered
• Explore other forms of nested BNCs
• Two pass Selective KDB• sample small test set in second pass
• Single pass generative/discriminative learning• initially learn a simple generative model• collect discriminative statistics and refine the model when
there is sufficient evidence to make a choice• refinements might be attribute selection, structural
refinement or weighting
• repeat
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Satellite learning curves
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RMSE
Training set size
NBKDB K=1KDB K=2KDB K=3KDB K=4KDB K=5KDB K=6KDB K=7KDB K=8
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Satellite learning curves
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RMSE
Training set size
NBKDB K=1KDB K=2KDB K=3KDB K=4KDB K=5KDB K=6KDB K=7KDB K=8SKDB K=10
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Satellite learning curves
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RMSE
Training set size
NBKDB K=1KDB K=2KDB K=3KDB K=4KDB K=5KDB K=6KDB K=7KDB K=8SKDB K=10Random Forest
ScalableBayesianNetwork
Classifiers
Geoff WebbAna MartinezNayyar Zaidi
Introduction
BayesianNetworkClassifiers
Selective KDB
Experiments
Conclusions &FutureResearch
Conclusions
• Large data calls for fundamentally new learning algorithms
• We are pioneering a new generation of theoreticallywell-founded algorithms that are both scalable to verylarge data and capable of exploiting the fine detailinherent therein