Cidu2011 Banerjee Intro to Ml 01
description
Transcript of Cidu2011 Banerjee Intro to Ml 01
Tutorial Introduction to Machine Learning
Arindam Banerjee banerjeecsumnedu
Dept of Computer Science amp Engineering
University of Minnesota Twin Cities
2011 NASA Conference on Intelligent Data Understanding
October 19 2011
Success Stories
Parent of a baby
Medical Doctor Intro to Machine Learning 2
Success Stories
Bioinformatics
Computer vision
Medical imaging
Social network analysis
From UCLA Center on Aging
Intro to Machine Learning 3
Types of Data
bull Data Types ndash Discrete Ordinal Continuous ndash Univariate Multi-variate ndash Structured Temporal Spatial
Spatiotemporal
bull Data Dependencies ndash Independent and Identically
Distributed (IID) ndash Graphical Models ndash Linear vs Non-linear dependencies
bull Other Considerations ndash Observable vs Latent variables
Average wind direction
From NOAA Earth System Research Laboratory
Average annual temperature
Intro to Machine Learning 4
bull Supervised ndash Given learn ndash Examples Classification Regression
bull Unsupervised
ndash Given find structure in X ndash Examples Clustering Embedding
bull Semi-supervised ndash Given and Learn
Types of Learning
Intro to Machine Learning 5
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Classification
Regression
Regularization
Example Land Variable Regression
Nonlinear Models Ensembles
6
Classification Linear Models
bull Linear Classification Rule ndash If then ndash If then
bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps
bull Two Aspects Optimization Statistical guarantees
Intro to Machine Learning
+1
-1
7
Regression Linear Models
bull Linear Regression
ndash Model
bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise
bull Two Aspects Optimization Statistical Guarantees
Intro to Machine Learning 8
Hierarchical Linear Models
bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value
etc) is fed into next node
bull Family of modelsmethods ndash ClassificationRegression trees
Multi-layer Perceptrons Deep Belief Networks
bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees
Intro to Machine Learning 9
bull Controlling complexity of predictor is key
ndash Low loss with low complexity predictor guarantees good generalization
bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models
Regularization Generalization
Intro to Machine Learning
x
y
Loss term Regularization term
10
bull The Kernel Trick
ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces
bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels
Non-linear Methods
Intro to Machine Learning 11
bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging
bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features
Ensembles
Intro to Machine Learning 12
High Dimensional Modeling
bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees
bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression
Intro to Machine Learning 13
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Success Stories
Parent of a baby
Medical Doctor Intro to Machine Learning 2
Success Stories
Bioinformatics
Computer vision
Medical imaging
Social network analysis
From UCLA Center on Aging
Intro to Machine Learning 3
Types of Data
bull Data Types ndash Discrete Ordinal Continuous ndash Univariate Multi-variate ndash Structured Temporal Spatial
Spatiotemporal
bull Data Dependencies ndash Independent and Identically
Distributed (IID) ndash Graphical Models ndash Linear vs Non-linear dependencies
bull Other Considerations ndash Observable vs Latent variables
Average wind direction
From NOAA Earth System Research Laboratory
Average annual temperature
Intro to Machine Learning 4
bull Supervised ndash Given learn ndash Examples Classification Regression
bull Unsupervised
ndash Given find structure in X ndash Examples Clustering Embedding
bull Semi-supervised ndash Given and Learn
Types of Learning
Intro to Machine Learning 5
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Classification
Regression
Regularization
Example Land Variable Regression
Nonlinear Models Ensembles
6
Classification Linear Models
bull Linear Classification Rule ndash If then ndash If then
bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps
bull Two Aspects Optimization Statistical guarantees
Intro to Machine Learning
+1
-1
7
Regression Linear Models
bull Linear Regression
ndash Model
bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise
bull Two Aspects Optimization Statistical Guarantees
Intro to Machine Learning 8
Hierarchical Linear Models
bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value
etc) is fed into next node
bull Family of modelsmethods ndash ClassificationRegression trees
Multi-layer Perceptrons Deep Belief Networks
bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees
Intro to Machine Learning 9
bull Controlling complexity of predictor is key
ndash Low loss with low complexity predictor guarantees good generalization
bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models
Regularization Generalization
Intro to Machine Learning
x
y
Loss term Regularization term
10
bull The Kernel Trick
ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces
bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels
Non-linear Methods
Intro to Machine Learning 11
bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging
bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features
Ensembles
Intro to Machine Learning 12
High Dimensional Modeling
bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees
bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression
Intro to Machine Learning 13
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Success Stories
Bioinformatics
Computer vision
Medical imaging
Social network analysis
From UCLA Center on Aging
Intro to Machine Learning 3
Types of Data
bull Data Types ndash Discrete Ordinal Continuous ndash Univariate Multi-variate ndash Structured Temporal Spatial
Spatiotemporal
bull Data Dependencies ndash Independent and Identically
Distributed (IID) ndash Graphical Models ndash Linear vs Non-linear dependencies
bull Other Considerations ndash Observable vs Latent variables
Average wind direction
From NOAA Earth System Research Laboratory
Average annual temperature
Intro to Machine Learning 4
bull Supervised ndash Given learn ndash Examples Classification Regression
bull Unsupervised
ndash Given find structure in X ndash Examples Clustering Embedding
bull Semi-supervised ndash Given and Learn
Types of Learning
Intro to Machine Learning 5
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Classification
Regression
Regularization
Example Land Variable Regression
Nonlinear Models Ensembles
6
Classification Linear Models
bull Linear Classification Rule ndash If then ndash If then
bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps
bull Two Aspects Optimization Statistical guarantees
Intro to Machine Learning
+1
-1
7
Regression Linear Models
bull Linear Regression
ndash Model
bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise
bull Two Aspects Optimization Statistical Guarantees
Intro to Machine Learning 8
Hierarchical Linear Models
bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value
etc) is fed into next node
bull Family of modelsmethods ndash ClassificationRegression trees
Multi-layer Perceptrons Deep Belief Networks
bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees
Intro to Machine Learning 9
bull Controlling complexity of predictor is key
ndash Low loss with low complexity predictor guarantees good generalization
bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models
Regularization Generalization
Intro to Machine Learning
x
y
Loss term Regularization term
10
bull The Kernel Trick
ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces
bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels
Non-linear Methods
Intro to Machine Learning 11
bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging
bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features
Ensembles
Intro to Machine Learning 12
High Dimensional Modeling
bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees
bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression
Intro to Machine Learning 13
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Types of Data
bull Data Types ndash Discrete Ordinal Continuous ndash Univariate Multi-variate ndash Structured Temporal Spatial
Spatiotemporal
bull Data Dependencies ndash Independent and Identically
Distributed (IID) ndash Graphical Models ndash Linear vs Non-linear dependencies
bull Other Considerations ndash Observable vs Latent variables
Average wind direction
From NOAA Earth System Research Laboratory
Average annual temperature
Intro to Machine Learning 4
bull Supervised ndash Given learn ndash Examples Classification Regression
bull Unsupervised
ndash Given find structure in X ndash Examples Clustering Embedding
bull Semi-supervised ndash Given and Learn
Types of Learning
Intro to Machine Learning 5
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Classification
Regression
Regularization
Example Land Variable Regression
Nonlinear Models Ensembles
6
Classification Linear Models
bull Linear Classification Rule ndash If then ndash If then
bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps
bull Two Aspects Optimization Statistical guarantees
Intro to Machine Learning
+1
-1
7
Regression Linear Models
bull Linear Regression
ndash Model
bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise
bull Two Aspects Optimization Statistical Guarantees
Intro to Machine Learning 8
Hierarchical Linear Models
bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value
etc) is fed into next node
bull Family of modelsmethods ndash ClassificationRegression trees
Multi-layer Perceptrons Deep Belief Networks
bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees
Intro to Machine Learning 9
bull Controlling complexity of predictor is key
ndash Low loss with low complexity predictor guarantees good generalization
bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models
Regularization Generalization
Intro to Machine Learning
x
y
Loss term Regularization term
10
bull The Kernel Trick
ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces
bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels
Non-linear Methods
Intro to Machine Learning 11
bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging
bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features
Ensembles
Intro to Machine Learning 12
High Dimensional Modeling
bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees
bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression
Intro to Machine Learning 13
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
bull Supervised ndash Given learn ndash Examples Classification Regression
bull Unsupervised
ndash Given find structure in X ndash Examples Clustering Embedding
bull Semi-supervised ndash Given and Learn
Types of Learning
Intro to Machine Learning 5
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Classification
Regression
Regularization
Example Land Variable Regression
Nonlinear Models Ensembles
6
Classification Linear Models
bull Linear Classification Rule ndash If then ndash If then
bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps
bull Two Aspects Optimization Statistical guarantees
Intro to Machine Learning
+1
-1
7
Regression Linear Models
bull Linear Regression
ndash Model
bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise
bull Two Aspects Optimization Statistical Guarantees
Intro to Machine Learning 8
Hierarchical Linear Models
bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value
etc) is fed into next node
bull Family of modelsmethods ndash ClassificationRegression trees
Multi-layer Perceptrons Deep Belief Networks
bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees
Intro to Machine Learning 9
bull Controlling complexity of predictor is key
ndash Low loss with low complexity predictor guarantees good generalization
bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models
Regularization Generalization
Intro to Machine Learning
x
y
Loss term Regularization term
10
bull The Kernel Trick
ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces
bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels
Non-linear Methods
Intro to Machine Learning 11
bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging
bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features
Ensembles
Intro to Machine Learning 12
High Dimensional Modeling
bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees
bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression
Intro to Machine Learning 13
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Classification
Regression
Regularization
Example Land Variable Regression
Nonlinear Models Ensembles
6
Classification Linear Models
bull Linear Classification Rule ndash If then ndash If then
bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps
bull Two Aspects Optimization Statistical guarantees
Intro to Machine Learning
+1
-1
7
Regression Linear Models
bull Linear Regression
ndash Model
bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise
bull Two Aspects Optimization Statistical Guarantees
Intro to Machine Learning 8
Hierarchical Linear Models
bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value
etc) is fed into next node
bull Family of modelsmethods ndash ClassificationRegression trees
Multi-layer Perceptrons Deep Belief Networks
bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees
Intro to Machine Learning 9
bull Controlling complexity of predictor is key
ndash Low loss with low complexity predictor guarantees good generalization
bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models
Regularization Generalization
Intro to Machine Learning
x
y
Loss term Regularization term
10
bull The Kernel Trick
ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces
bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels
Non-linear Methods
Intro to Machine Learning 11
bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging
bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features
Ensembles
Intro to Machine Learning 12
High Dimensional Modeling
bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees
bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression
Intro to Machine Learning 13
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Classification Linear Models
bull Linear Classification Rule ndash If then ndash If then
bull Family of methods ndash Support Vector Machines Perceptrons Decision Stumps
bull Two Aspects Optimization Statistical guarantees
Intro to Machine Learning
+1
-1
7
Regression Linear Models
bull Linear Regression
ndash Model
bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise
bull Two Aspects Optimization Statistical Guarantees
Intro to Machine Learning 8
Hierarchical Linear Models
bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value
etc) is fed into next node
bull Family of modelsmethods ndash ClassificationRegression trees
Multi-layer Perceptrons Deep Belief Networks
bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees
Intro to Machine Learning 9
bull Controlling complexity of predictor is key
ndash Low loss with low complexity predictor guarantees good generalization
bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models
Regularization Generalization
Intro to Machine Learning
x
y
Loss term Regularization term
10
bull The Kernel Trick
ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces
bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels
Non-linear Methods
Intro to Machine Learning 11
bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging
bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features
Ensembles
Intro to Machine Learning 12
High Dimensional Modeling
bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees
bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression
Intro to Machine Learning 13
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Regression Linear Models
bull Linear Regression
ndash Model
bull Family of methodsmodels ndash Least squares Ridge regression Sparse regression ndash Generalized linear models Exponential family noise
bull Two Aspects Optimization Statistical Guarantees
Intro to Machine Learning 8
Hierarchical Linear Models
bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value
etc) is fed into next node
bull Family of modelsmethods ndash ClassificationRegression trees
Multi-layer Perceptrons Deep Belief Networks
bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees
Intro to Machine Learning 9
bull Controlling complexity of predictor is key
ndash Low loss with low complexity predictor guarantees good generalization
bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models
Regularization Generalization
Intro to Machine Learning
x
y
Loss term Regularization term
10
bull The Kernel Trick
ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces
bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels
Non-linear Methods
Intro to Machine Learning 11
bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging
bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features
Ensembles
Intro to Machine Learning 12
High Dimensional Modeling
bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees
bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression
Intro to Machine Learning 13
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Hierarchical Linear Models
bull Hierarchical Models ndash Each node is a linear model ndash Node outcome (decision value
etc) is fed into next node
bull Family of modelsmethods ndash ClassificationRegression trees
Multi-layer Perceptrons Deep Belief Networks
bull Two Aspects ndash LearningOptimization ndash Statistical Guarantees
Intro to Machine Learning 9
bull Controlling complexity of predictor is key
ndash Low loss with low complexity predictor guarantees good generalization
bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models
Regularization Generalization
Intro to Machine Learning
x
y
Loss term Regularization term
10
bull The Kernel Trick
ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces
bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels
Non-linear Methods
Intro to Machine Learning 11
bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging
bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features
Ensembles
Intro to Machine Learning 12
High Dimensional Modeling
bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees
bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression
Intro to Machine Learning 13
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
bull Controlling complexity of predictor is key
ndash Low loss with low complexity predictor guarantees good generalization
bull Examples ndash L2 (SVM Ridge Regression) L1 (Lasso) ndash Bayesian models Prior penalizes complex models
Regularization Generalization
Intro to Machine Learning
x
y
Loss term Regularization term
10
bull The Kernel Trick
ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces
bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels
Non-linear Methods
Intro to Machine Learning 11
bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging
bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features
Ensembles
Intro to Machine Learning 12
High Dimensional Modeling
bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees
bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression
Intro to Machine Learning 13
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
bull The Kernel Trick
ndash Map data to high-dimensional space (ldquoRKHSrdquo) ndash Linear model in RKHS equiv Nonlinear model in data space ndash Avoid explicit modeling in high-dimensional spaces
bull Models and Methods ndash Kernelized Classification Regression Dimensionality Reduction ndash Multiple Kernel Learning Combination of Kernels
Non-linear Methods
Intro to Machine Learning 11
bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging
bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features
Ensembles
Intro to Machine Learning 12
High Dimensional Modeling
bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees
bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression
Intro to Machine Learning 13
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
bull Main idea ndash Combination reduces loss does not increase complexity ndash Bayesian model averaging
bull Examples ndash Bagging Boosting Random forests ndash Base models on samplesreweighted versions of points features
Ensembles
Intro to Machine Learning 12
High Dimensional Modeling
bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees
bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression
Intro to Machine Learning 13
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
High Dimensional Modeling
bull The ldquoSmall n High prdquo regime ndash Underspecified difficult to fit models with guarantees
bull ldquoLow Dimensionalrdquo models ndash Small number of features are relevant Feature Selection ndash Low intrinsic dimensionality Manifold embedding ndash ModelsMethods Sparse regression Non-parametric regression
Intro to Machine Learning 13
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Land Variable Regression
bull Land-Sea variable interactions ndash How do sea variables affect land variables ndash Are proximal locations important ndash Are there long range spatial dependencies (tele-connections)
bull NCEPNCAR Reanalysis 1 Monthly means for 1948-2010 ndash Covariates Temperature Sea Level Pressure Precipitation Relative
Humidity Horizontal Wind Speed Vertical Wind Speed ndash Response Temperature Precipitation at 9 locations
bull High-dimensional Regression ndash Dimension p = Lm L locations m variableslocation ndash Two considerations
bull Not all locations are relevant bull Even if a location is relevant not all variables are relevant
Intro to Machine Learning 14
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Land Regions for Prediction Temp Precip
Intro to Machine Learning 15
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Ordinary Least Squares
bull Naive method Use ordinary least squares (OLS)
bull For n lt mL OLS estimate non-unique bull In general y depends on all mL covariates complex model
θ
Location 1 Location L
Response Variable
16
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Sparse Group Lasso (SGL)
bull High-dimensional Regression with ldquoSparserdquo ldquoGroup Sparsityrdquo
Location 1 Location L
Response Variable 17
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Example Land Variable Regression
Relevant Variables for Temperature Prediction in Brazil
Sparse Group Lasso (SGL) -Regression with Feature Selection -Consistency Guarantees
18
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Error (RMSE) SGL NC OLS
Variable Region Sparse Group Lasso Network Clusters Ordinary Least
Squares A
ir Te
mpe
ratu
re
Brazil 0198 0534 0348 Peru 0247 0468 0387 West USA 0270 0767 0402 East USA 0304 0815 0348 W Europe 0379 0936 0493 Sahel 0320 0685 0413 S Africa 0136 0726 0267 India 0205 0649 0300 SE Asia 0298 0541 0383
Prec
ipita
tion
Brazil 0261 0509 0413 Peru 0312 0864 0523 West USA 0451 0605 0549 East USA 0365 0686 0413 W Europe 0358 045 0551 Sahel 0427 0533 0523 S Africa 0235 0697 0378 India 0146 0672 0264 SE Asia 0159 0665 0312 19
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Basics
Inference
Example Matrix Completion
Example Drought Detection
Example Text Analysis
20
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Intro to Machine Learning
Graphical Models What and Why bull Graphical models
ndash Dependencies between (random) variables avoid IID assumptions ndash Closer to reality learninginference is much more difficult
bull Basic nomenclature ndash Node = Random Variable Edge = Statistical Dependency
bull Directed Graphs ndash A directed graph between random variables ndash Example Bayesian networks Hidden Markov Models ndash Joint distribution is a product of P(child|parents)
bull Undirected Graphs ndash An undirected graph between random variables ndash Example MarkovConditional random fields ndash Joint distribution in terms of potential functions
X1
X3
X4 X5
X2
21
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Intro to Machine Learning
Example Burglary Network
22
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Intro to Machine Learning
Bayesian Networks
bull Joint distribution in terms of P(X|Parents(X))
X1
X3
X4 X5
X2
23
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Intro to Machine Learning
Example II Rain Network
24
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Intro to Machine Learning
Latent Variable Models
bull Bayesian network with hidden variables ndash Semantically more accurate less parameters
Heart disease
25
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Intro to Machine Learning
Inference
bull Some variables in the Bayes net are observed ndash the evidencedata eg John has not called Mary has called
bull Inference ndash How to compute valueprobability of other variables ndash Example What is the probability of Burglary ie P(b|notjm)
26
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Intro to Machine Learning
Inference Algorithms bull Graphs without loops Tree-structured Graphs
ndash Efficient exact inference algorithms are possible ndash Sum-product algorithm and its special cases
bull Belief propagation in Bayes nets bull Forward-Backward algorithm in Hidden Markov Models (HMMs)
bull Graphs with loops ndash Junction tree algorithms
bull Convert into a graph without loops bull May lead to exponentially large graph
ndash Sum-productmessage passing algorithm lsquodisregarding loopsrsquo bull Active research topic correct convergence lsquonot guaranteedrsquo bull Works well in practice
ndash Approximate inference
27
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Intro to Machine Learning
Approximate Inference bull Variational Inference
ndash Deterministic approximation ndash Approximate complex true distribution over latent variables ndash Replace with family of simpletractable distributions
bull Use the best approximation in the family ndash Examples Mean-field Bethe Kikuchi Expectation Propagation
bull Stochastic Inference ndash Simple sampling approaches ndash Markov Chain Monte Carlo methods (MCMC)
bull Powerful family of methods ndash Gibbs sampling
bull Useful special case of MCMC methods
28
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
The Inference Problem
Intro to Machine Learning 29
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Bayes Nets to Factor Graphs
Intro to Machine Learning 30
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Factor Graphs Product of Local Functions
Intro to Machine Learning 31
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Marginalize Product of Functions (MPF)
bull Marginalize (or maximize) product of functions
bull For g1(x1) we have bull For g3(x3) we have
Intro to Machine Learning
Distributive Law ab + ac = a(b+c)
32
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Example Computing g1(x1)
Intro to Machine Learning 33
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Compute product of descendants with f Then do not-sum over parent
Message Passing
Intro to Machine Learning
Compute product of descendants
34
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
The Sum-Product Algorithm
bull To compute gi(xi) form a tree rooted at xi
bull Starting from the leaves apply the following two rules ndash Product Rule
At a variable node take the product of descendants ndash Sum-product Rule
At a factor node take the product of f with descendants then perform not-sum over the parent node
bull To compute all marginals ndash Can be done one at a time repeated computations not efficient ndash Simultaneous message passing following the sum-product algorithm ndash Examples Belief Propagation Forward-Backward algorithm etc
Intro to Machine Learning 35
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Example Computing g3(x3)
Intro to Machine Learning 36
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Hidden Markov Models (HMMs)
Intro to Machine Learning
Latent variables z0z1hellipzt-1ztzt+1hellipzT Observed variables x1hellipxt-1xtxt+1hellipxT
Inference Problems 1 Compute p(x1T) 2 Compute p(zt|x1T) 3 Find maxz1T
p(z1T|x1T)
Similar problems for chain-structured Conditional Random Fields (CRFs)
37
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Distributive Law on Semi-Rings
bull Idea can be applied to any commutative semi-ring bull Semi-ring 101
ndash Two operations (+times) Associative Commutative Identity ndash Distributive law atimesb + atimesc = atimes(b+c)
Intro to Machine Learning
bullBelief Propagation in Bayes nets bullMAP inference in HMMs bullMax-product algorithm bullAlternative to Viterbi Decoding bullKalman Filtering bullError Correcting Codes bullTurbo Codes bullhellip
38
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Message Passing in General Graphs
bull Tree structured graphs ndash Message passing is guaranteed to give correct solutions ndash Examples HMMs Kalman Filters
bull General Graphs ndash Active research topic
bull Progress has been made over the past 10 years ndash Loopy Message passing
bull May not converge bull May converge to a lsquolocal minimarsquo of lsquoBethe variational free energyrsquo
ndash New approaches to convergent and correct message passing
bull Applications ndash True Skill Ranking System for Xbox Live ndash Turbo Codes 3G 4G phones satellite comm Wimax Mars orbiter
Intro to Machine Learning 39
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Temporal Models
bull State Space Models
bull Time Series Analysis
Intro to Machine Learning
State Space State Transitions Hidden Markov Models Kalman Filters Linear Dynamical Systems
Time Domain Models
ARMAARMA models ACF and Partial ACF ARIMA and Seasonal ARIMA Estimation Forecasting
Spectral Analysis
Spectral Density Fourier Transforms Wavelet Transforms Filtering
40
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Spatial Models
bull Models for Continuous Data ndash Gaussian Processes Prior over functions GP(0K) ndash Kriging Co-Kriging Linear Models of Coregionalization (LMCs)
bull Models for Discrete Data ndash Markov Random Fields (MRFs)
bull Structure Learning ndash Finding statistical dependencies
Intro to Machine Learning
From H Wallach Introduction to Gaussian process regression 2005
Gau
ssia
n P
roce
ss
41
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Drought Detection
bull Significant droughts ndash Persistent over space and time ndash Catastrophic consequences
bull Examples ndash Late 1960s Sahel drought ndash 1930s North American Dust Bowl
bull Dataset Climate Research Unit
ndash httpdatagissnasagovprecip_cru ndash Monthly precipitation over land ndash Duration 1901 to 2006 ndash Resolution 05 latitude times 05 longitude
Intro to Machine Learning 42
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Detection with Markov Random Field (MRF)
bull Model dependencies using a 4-nearest neighbor grid ndash Replicate grid over time ndash Each node can be 0 (normal) or 1 (dry)
bull Toy example ndash m = 3 n = 4 N = 12 T = 5 ndash Total States 260 = 11529 x 1018
bull CRU data ndash m = 720 n = 360 N = 67420 T = 106 ndash Total States 27146520 gt 102382200
bull MAP Inference ndash Find the most likely ldquostaterdquo of the system ndash Integer programming with LARGE number of states ndash Postprocess MAP state to identify significant droughts Intro to Machine Learning 43
Drought Regions from MRFs
Intro to Machine Learning 44
Drought Regions from MRFs
Intro to Machine Learning 44
Results Drought Detection
Drought in the Sahel in the 1970s
Drought in Mongolia in the 1970s
Drought in northern Russia in the 1970s
Intro to Machine Learning 45
Results Drought Detection
Drought in northwest America in the 1920s
Drought in southern Africa in the 1920s
Drought in eastern China in the 1920s
The Dust Bowl in Midwest US in the 1930s
Drought in central Canada in the 1920s
Intro to Machine Learning 46
Results Drought Detection
Drought in Iran In the 1950s
Drought in central China in the 1950s
Intro to Machine Learning 47
Results Drought Detection
Drought in north America in the 1910s
Drought in Turkmenistan and Afghanistan in the 1910s
Drought in former Soviet Union in the 1910s
Drought in northeast China in the 1920s
Intro to Machine Learning 48
Results Drought Detection
Drought in central Africa In the 1960s
Drought in India in the 1960s
Intro to Machine Learning 49
Results Drought Detection
Drought in eastern Africa in the 1930s
Drought in Kazakhstan In the 1930s
Intro to Machine Learning 50
Results Drought Detection
Drought in northern Mexico in the 1950s
Drought in western Europe in the 1940s
Drought in eastern Canada in the 1940s
Intro to Machine Learning 51
Results Drought Detection
Drought in Iran in the 1950s
Drought in central China in the 1950s
Intro to Machine Learning 52
Graphical Models (Contd)
bull Basics Inference
bull Markov Random Fields ndash Example Draught Detection
bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling
bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems
Intro to Machine Learning 53
Intro to Machine Learning
Background Plate Diagrams
a
b
3
a
b1 b2 b3
Compact representation of large Bayesian networks
54
θ
x
d n
d
Intro to Machine Learning
03
1
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Model 1 Independent Features
d=3 n=1
55
θ
x
d n
d
π
z
θ
x
d n
k d
Model 2 Naiumlve Bayes (Mixture Models)
56
-1
3
-05
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
57
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results Drought Detection
Drought in northwest America in the 1920s
Drought in southern Africa in the 1920s
Drought in eastern China in the 1920s
The Dust Bowl in Midwest US in the 1930s
Drought in central Canada in the 1920s
Intro to Machine Learning 46
Results Drought Detection
Drought in Iran In the 1950s
Drought in central China in the 1950s
Intro to Machine Learning 47
Results Drought Detection
Drought in north America in the 1910s
Drought in Turkmenistan and Afghanistan in the 1910s
Drought in former Soviet Union in the 1910s
Drought in northeast China in the 1920s
Intro to Machine Learning 48
Results Drought Detection
Drought in central Africa In the 1960s
Drought in India in the 1960s
Intro to Machine Learning 49
Results Drought Detection
Drought in eastern Africa in the 1930s
Drought in Kazakhstan In the 1930s
Intro to Machine Learning 50
Results Drought Detection
Drought in northern Mexico in the 1950s
Drought in western Europe in the 1940s
Drought in eastern Canada in the 1940s
Intro to Machine Learning 51
Results Drought Detection
Drought in Iran in the 1950s
Drought in central China in the 1950s
Intro to Machine Learning 52
Graphical Models (Contd)
bull Basics Inference
bull Markov Random Fields ndash Example Draught Detection
bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling
bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems
Intro to Machine Learning 53
Intro to Machine Learning
Background Plate Diagrams
a
b
3
a
b1 b2 b3
Compact representation of large Bayesian networks
54
θ
x
d n
d
Intro to Machine Learning
03
1
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Model 1 Independent Features
d=3 n=1
55
θ
x
d n
d
π
z
θ
x
d n
k d
Model 2 Naiumlve Bayes (Mixture Models)
56
-1
3
-05
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
57
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results Drought Detection
Drought in Iran In the 1950s
Drought in central China in the 1950s
Intro to Machine Learning 47
Results Drought Detection
Drought in north America in the 1910s
Drought in Turkmenistan and Afghanistan in the 1910s
Drought in former Soviet Union in the 1910s
Drought in northeast China in the 1920s
Intro to Machine Learning 48
Results Drought Detection
Drought in central Africa In the 1960s
Drought in India in the 1960s
Intro to Machine Learning 49
Results Drought Detection
Drought in eastern Africa in the 1930s
Drought in Kazakhstan In the 1930s
Intro to Machine Learning 50
Results Drought Detection
Drought in northern Mexico in the 1950s
Drought in western Europe in the 1940s
Drought in eastern Canada in the 1940s
Intro to Machine Learning 51
Results Drought Detection
Drought in Iran in the 1950s
Drought in central China in the 1950s
Intro to Machine Learning 52
Graphical Models (Contd)
bull Basics Inference
bull Markov Random Fields ndash Example Draught Detection
bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling
bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems
Intro to Machine Learning 53
Intro to Machine Learning
Background Plate Diagrams
a
b
3
a
b1 b2 b3
Compact representation of large Bayesian networks
54
θ
x
d n
d
Intro to Machine Learning
03
1
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Model 1 Independent Features
d=3 n=1
55
θ
x
d n
d
π
z
θ
x
d n
k d
Model 2 Naiumlve Bayes (Mixture Models)
56
-1
3
-05
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
57
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results Drought Detection
Drought in north America in the 1910s
Drought in Turkmenistan and Afghanistan in the 1910s
Drought in former Soviet Union in the 1910s
Drought in northeast China in the 1920s
Intro to Machine Learning 48
Results Drought Detection
Drought in central Africa In the 1960s
Drought in India in the 1960s
Intro to Machine Learning 49
Results Drought Detection
Drought in eastern Africa in the 1930s
Drought in Kazakhstan In the 1930s
Intro to Machine Learning 50
Results Drought Detection
Drought in northern Mexico in the 1950s
Drought in western Europe in the 1940s
Drought in eastern Canada in the 1940s
Intro to Machine Learning 51
Results Drought Detection
Drought in Iran in the 1950s
Drought in central China in the 1950s
Intro to Machine Learning 52
Graphical Models (Contd)
bull Basics Inference
bull Markov Random Fields ndash Example Draught Detection
bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling
bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems
Intro to Machine Learning 53
Intro to Machine Learning
Background Plate Diagrams
a
b
3
a
b1 b2 b3
Compact representation of large Bayesian networks
54
θ
x
d n
d
Intro to Machine Learning
03
1
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Model 1 Independent Features
d=3 n=1
55
θ
x
d n
d
π
z
θ
x
d n
k d
Model 2 Naiumlve Bayes (Mixture Models)
56
-1
3
-05
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
57
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results Drought Detection
Drought in central Africa In the 1960s
Drought in India in the 1960s
Intro to Machine Learning 49
Results Drought Detection
Drought in eastern Africa in the 1930s
Drought in Kazakhstan In the 1930s
Intro to Machine Learning 50
Results Drought Detection
Drought in northern Mexico in the 1950s
Drought in western Europe in the 1940s
Drought in eastern Canada in the 1940s
Intro to Machine Learning 51
Results Drought Detection
Drought in Iran in the 1950s
Drought in central China in the 1950s
Intro to Machine Learning 52
Graphical Models (Contd)
bull Basics Inference
bull Markov Random Fields ndash Example Draught Detection
bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling
bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems
Intro to Machine Learning 53
Intro to Machine Learning
Background Plate Diagrams
a
b
3
a
b1 b2 b3
Compact representation of large Bayesian networks
54
θ
x
d n
d
Intro to Machine Learning
03
1
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Model 1 Independent Features
d=3 n=1
55
θ
x
d n
d
π
z
θ
x
d n
k d
Model 2 Naiumlve Bayes (Mixture Models)
56
-1
3
-05
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
57
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results Drought Detection
Drought in eastern Africa in the 1930s
Drought in Kazakhstan In the 1930s
Intro to Machine Learning 50
Results Drought Detection
Drought in northern Mexico in the 1950s
Drought in western Europe in the 1940s
Drought in eastern Canada in the 1940s
Intro to Machine Learning 51
Results Drought Detection
Drought in Iran in the 1950s
Drought in central China in the 1950s
Intro to Machine Learning 52
Graphical Models (Contd)
bull Basics Inference
bull Markov Random Fields ndash Example Draught Detection
bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling
bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems
Intro to Machine Learning 53
Intro to Machine Learning
Background Plate Diagrams
a
b
3
a
b1 b2 b3
Compact representation of large Bayesian networks
54
θ
x
d n
d
Intro to Machine Learning
03
1
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Model 1 Independent Features
d=3 n=1
55
θ
x
d n
d
π
z
θ
x
d n
k d
Model 2 Naiumlve Bayes (Mixture Models)
56
-1
3
-05
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
57
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results Drought Detection
Drought in northern Mexico in the 1950s
Drought in western Europe in the 1940s
Drought in eastern Canada in the 1940s
Intro to Machine Learning 51
Results Drought Detection
Drought in Iran in the 1950s
Drought in central China in the 1950s
Intro to Machine Learning 52
Graphical Models (Contd)
bull Basics Inference
bull Markov Random Fields ndash Example Draught Detection
bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling
bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems
Intro to Machine Learning 53
Intro to Machine Learning
Background Plate Diagrams
a
b
3
a
b1 b2 b3
Compact representation of large Bayesian networks
54
θ
x
d n
d
Intro to Machine Learning
03
1
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Model 1 Independent Features
d=3 n=1
55
θ
x
d n
d
π
z
θ
x
d n
k d
Model 2 Naiumlve Bayes (Mixture Models)
56
-1
3
-05
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
57
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results Drought Detection
Drought in Iran in the 1950s
Drought in central China in the 1950s
Intro to Machine Learning 52
Graphical Models (Contd)
bull Basics Inference
bull Markov Random Fields ndash Example Draught Detection
bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling
bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems
Intro to Machine Learning 53
Intro to Machine Learning
Background Plate Diagrams
a
b
3
a
b1 b2 b3
Compact representation of large Bayesian networks
54
θ
x
d n
d
Intro to Machine Learning
03
1
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Model 1 Independent Features
d=3 n=1
55
θ
x
d n
d
π
z
θ
x
d n
k d
Model 2 Naiumlve Bayes (Mixture Models)
56
-1
3
-05
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
57
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Graphical Models (Contd)
bull Basics Inference
bull Markov Random Fields ndash Example Draught Detection
bull Mixed Membership Models ndash Example Text Analysis and Topic Modeling
bull Probabilistic Matrix Factorization ndash Example Matrix Completion and Recommendation Systems
Intro to Machine Learning 53
Intro to Machine Learning
Background Plate Diagrams
a
b
3
a
b1 b2 b3
Compact representation of large Bayesian networks
54
θ
x
d n
d
Intro to Machine Learning
03
1
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Model 1 Independent Features
d=3 n=1
55
θ
x
d n
d
π
z
θ
x
d n
k d
Model 2 Naiumlve Bayes (Mixture Models)
56
-1
3
-05
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
57
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
Background Plate Diagrams
a
b
3
a
b1 b2 b3
Compact representation of large Bayesian networks
54
θ
x
d n
d
Intro to Machine Learning
03
1
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Model 1 Independent Features
d=3 n=1
55
θ
x
d n
d
π
z
θ
x
d n
k d
Model 2 Naiumlve Bayes (Mixture Models)
56
-1
3
-05
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
57
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
θ
x
d n
d
Intro to Machine Learning
03
1
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Model 1 Independent Features
d=3 n=1
55
θ
x
d n
d
π
z
θ
x
d n
k d
Model 2 Naiumlve Bayes (Mixture Models)
56
-1
3
-05
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
57
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
θ
x
d n
d
π
z
θ
x
d n
k d
Model 2 Naiumlve Bayes (Mixture Models)
56
-1
3
-05
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
57
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
-1
3
-05
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
57
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
01
21
-15
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Naiumlve Bayes Model
n k
π z θ x
d d
58
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
θ
x d
n
d
π
z
θ
x d
n
k d
π
z
x d
n
α
θ k d
Model 3 Mixed Membership Model
Intro to Machine Learning 59
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
07
31
-1
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
60
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
09
21
-2
x
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
-10 -8 -6 -4 -2 0 2 4 6 8 100
01
02
03
04
05
06
07
08
Intro to Machine Learning
Mixed Membership Models
π z x
d n
α θ
k d
61
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
01
21
-15
x
Intro to Machine Learning
Mixture Model vs Mixed Membership Model
-1
3
-05
x 07
31
-1
x 09
21
-2
x
Single component membership Multi-component mixed membership
62
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
Nd D
zi
xi
π (d)
α
p(d) sim Dirichlet(α)
zi sim Discrete(π (d) )
xi sim Discrete(β (zi) )
K
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Latent Dirichlet Allocation (LDA)
βj
63
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
LDA Generative Model
document1 document2
64
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Learning Inference and Estimation
Intro to Machine Learning 65
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Variational Inference
Intro to Machine Learning 66
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
Nd D
zi
xi
p (d)
φ (j)
α
β p(d) sim Dirichlet(α)
zi sim Discrete(p (d) ) φ (j) sim Dirichlet(β)
xi sim Discrete(φ (zi) )
T
distribution over topics for each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Smoothed Latent Dirichlet Allocation
67
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
Aviation Safety Reports (NASA)
68
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
Results NASA Reports I
Arrival Departure
Passenger Maintenance
runway approach departure altitude
turn tower
air traffic control heading taxi way
flight
passenger attendant
flight seat
medical captain
attendants lavatory
told police
maintenance engine
mel zzz
air craft installed
check inspection
fuel Work
69
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
Results NASA Reports II
Medical Emergency
Wheel Maintenance
Weather Condition
Departure
medical passenger
doctor attendant oxygen
emergency paramedics
flight nurse aed
tire wheel
assembly nut
spacer main axle bolt
missing tires
knots turbulence
aircraft degrees
ice winds wind speed
air speed conditions
departure sid
dme altitude
climbing mean sea level
heading procedure
turn degree
70
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
The pilot flies an owners airplane with the owner as a passenger Loses contact with the center during the flight
While performing a sky diving a jet approaches at the same altitude but an
accident is avoided
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
71
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
Altimeter has a problem but the pilot overcomes the
difficulty during the flight
During acceleration a flap retraction issue happens The pilot then returns to base and lands The mechanic finds out
the problem
Red Flight Crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
72
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
The pilot has a landing gear problem Maintenance crew
joins radio conversation to help
The captain has a medical emergency
Red Flight crew Blue Passenger Green Maintenance
Two-Dimensional Visualization for Reports
73
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
Red Flight Crew Blue Passenger Green Maintenance
Mixed Membership of Reports
Flight Crew 07039 Passenger 00009 Maintenance 02953
Flight Crew 01405 Passenger 00663 Maintenance 07932
Flight Crew 02563 Passenger 06599 Maintenance 00837
Flight Crew 00013 Passenger 00013 Maintenance 09973
74
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Generalizations
bull Generalized Topic Models ndash Correlated Topic Models ndash Dynamic Topic Models Topics over Time ndash Dynamic Topics with birthdeath
bull Mixed membership models over non-text data applications ndash Mixed membership naiumlve-Bayes ndash Discriminative models for classification ndash Cluster Ensembles
bull Nonparametric Priors ndash Dirichlet Process priors Infer number of topics ndash Hierarchical Dirichlet processes Infer hierarchical structures ndash Other priors Gaussian Process Beta Process Pitman-Yor Process etc
Intro to Machine Learning 75
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Recommendation Systems
Intro to Machine Learning
Age 28 Gender Male Job Sales man Interest Travel hellip
Users
Movies Title Gone with the wind Release year 1940 Cast Vivien Leigh Clark Gable Genre War Romance Awards 8 Oscars Keywords Love Civil war hellip
Movie ratings matrix 76
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Advertisements on the Web
Intro to Machine Learning
1 2 001 hellip
01 2 3 hellip
2 2 05 hellip
02 03 15 2 hellip
25 1 hellip
15 1 004 hellip
Click-Through-Rate matrix
Category Sports shoes Brand Nike Ratings 425 hellip
Category Baby URL babyearthcom Content Webpage text Hyperlinks
Webpages
Products
hellip
77
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Forest Ecology
Leaf(N) Leaf(P) SLA Leaf-Size hellip Wood density
2 3 5 hellip
4 1 2 hellip
3 3 hellip
1 1 3 2 hellip
4 2 1 hellip
1 1 3 hellip
Plant Trait Matrix (TRY db) (Jens Kattage Peter Reich et al)
Plants
Traits
78 Intro to Machine Learning
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Matrix Factorization
bull Singular value decomposition
bull Problems
ndash Large matrices with millions of rowcolumns bull SVD can be rather slow
ndash Sparse matrices most entries are missing bull Traditional approaches cannot handle missing entries
Intro to Machine Learning
asymp
79
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Probabilistic Matrix Factorization (PMF)
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(0 σu2I)
N(0 σv2I)
uiT ~ N(0 σu
2I) vj ~ N(0 σv
2I) Rij ~ N(ui
Tvj σ2)
Inference using gradient descent
80 Intro to Machine Learning
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Bayesian Probabilistic Matrix Factorization
Xij ~ N(ui
Tvj σ2)
uiT
vj
N(microu Λu)
N(microv Λv) microu ~ N(micro0 Λ u) Λ u ~ W(ν0 W0) microv ~ N(micro0 Λ v) Λ v ~ W(ν0 W0) ui
~ N(microu Λ u) vj ~ N(microv Λ v) Rij ~ N(ui
Tvj σ2)
Wishart
Gaussian
Inference using MCMC 81 Intro to Machine Learning
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Parametric PMF (PPMF)
bull Are the priors suitable Can we get faster algorithms
Intro to Machine Learning
uiT
vj
N(0 σu2I)
N(0 σv2I) PMF
Diagonal covariance
uiT
vj
N(microu Λu)
N(microv Λv) BPMF Full covariance with ldquohyperpriorrdquo
uiT
vj
N(microu Λu)
N(microv Λv)
Parametric PMF (PPMF) Full covariance but no ldquohyperpriorrdquo
82
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results PMF on Netflix
Intro to Machine Learning 83
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results Bayesian PMF on Netflix
Intro to Machine Learning 84
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results Bayesian PMF on Netflix
Intro to Machine Learning 85
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results PMF on Netflix
Intro to Machine Learning 86
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results Bayesian PMF on Netflix
Intro to Machine Learning 87
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
PPMF vs PMF BPMF
Intro to Machine Learning
PPMF mostly achieves higher accuracy
88
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Learning with Expert Advice
Online Convex Optimization
Example Portfolio Selection
89
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Learning with Expert Advice
bull Prediction by adaptive combination of expert outputs ndash At round t Expert prediction Weight on Experts ndash Combined prediction ndash Nature reveals truth ndash Loss of expert i ndash Update ldquoconfidencerdquo on experts
Intro to Machine Learning
e1
e2
e3
e4
xt1
xt2
xt3
xt4
wt1
wt2
wt3
wt4
y
90
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
x1 x2 x3 x4 x5 xT
Goal Maximize prediction accuracy
Today 2020
TM Choose best w in hindsight
OL Choose wt before seeing xt
w1 w2 w3 w4
x1 x2 x3 x4 wT
xT
Time Machine (TM) vs Online Learning (OL)
Guarantee OL will be competitive with TM
Base Models
91
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
Online Concave Optimization (OCO)
92
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
OCO Good News and Bad News
93
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
Portfolio Selection Notation and Definitions
94
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
The Portfolio Selection Problem
95
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
Constant Rebalanced Portfolio (CRP)
96
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
ge
o(T)
minus
1T
log(wt sdot xt )t=1
T
sum ge1T
log(w sdot xt ) minusεt=1
T
sum
Regret Difference of log wealth between Best CRP and Online Strategy
log(wt sdot xt )t=1
T
sum
On-Line
sum=
sdotT
ttxw
1
)log(
Best CRP
Intro to Machine Learning
Regret Competing with the Best CRP
97
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Datasets
Intro to Machine Learning
bull SampP 500 ndash SampP 500 Index 500 large cap actively traded stocks ndash All stocks that were in SampP 500 from Jan 1995 to Nov 2008 ndash Total 385 stocks over 14 years at daily resolution ndash Two financial meltdowns Dot com (Mar 2000) Housing (Oct 2007)
bull NYSE
ndash Widely used dataset in most earlier papers ndash 36 large cap stocks over 22 years at daily resolution ndash From July 1962 to Dec 1984 ndash One major bear market 1973-1974
98
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results UP EG
Intro to Machine Learning 99
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results UP EG ONS
Intro to Machine Learning 100
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results UP EG ONS AC30
Intro to Machine Learning 101
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results UP EG ONS AC30
Intro to Machine Learning 102
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Best of both worlds
Competitive with best CRP
Strong empirical performance
Heuristic based methods
Strong Empirical Performance No performance guarantee
Universal Algorithms
Competitive with best CRP Can be outdone by simple heuristics
Intro to Machine Learning
Motivation for Meta Algorithms
103
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Meta Algorithm
Base Algo 2
Base Algo 3
Base Algo m
Base Algo 1
hellip
pt1 ptm pt3 pt2
pt1x ptkxt pt3xt pt2xt
Day t
xt Wealth Gain
tjtjt xpr sdot=
m base algorithms
wt distribution over algorithms
w(1) w(2) w(3) w(m)
Weight update
Intro to Machine Learning
bull wt updated based on rt bull Good algorithms get more money
104
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results MAEG MAONS
Intro to Machine Learning 105
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results MAEG MAONS
Intro to Machine Learning 106
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results UP EG ONS AC30
Intro to Machine Learning 107
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results MAEG MAONS
Intro to Machine Learning 108
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results New Base Algorithm BAH(AC30)
Intro to Machine Learning 109
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results MAEG MAONS
Intro to Machine Learning 110
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results NYSE Base Algorithms
Intro to Machine Learning 111
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Results MAEG MAONS
Intro to Machine Learning 112
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Overview
Intro to Machine Learning
Predictive Models
Graphical Models
Online Learning
Exploratory Analysis
Clustering
Dimensionality Reduction
113
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
bull Mixture Modeling
ndash l ndash Expectation Maximization (EM) or SpectralRandom projections
bull Kmeans and Related Approaches ndash l ndash Special (limit) case of mixture modeling ndash Kmeans algorithm Iteration Relocation
Model-based Clustering
Intro to Machine Learning
n
x
θ h
π
k
114
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Spectral Clustering
bull Weighted Graph with weights bull Spectral Clustering
ndash Graph Laplacian ndash k-segments k-eigenvalues are (close to) 0 ndash Methods Spectral Approaches Kernel K-means
Intro to Machine Learning
From A Ng et al On spectral clustering analysis and an algorithm NIPS 2001
wwwcsberkeleyedu~fowlkesBSE
115
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Linear Dimensionality Reduction
bull Linear Projections ndash Projection such that optimizes some criterion ndash Solve suitable (generalized) eigenvalue problem
bull Principal Component Analysis (PCA) ndash Variance of is maximized best hyperplane approximation
bull Fisherrsquos Linear Discriminant Analysis (LDA) ndash Separation of classes is maximized
Intro to Machine Learning
PCA LDA
116
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Nonlinear Dimensionality Reduction
bull Manifold Embedding
ndash Preserve local geometric structure
bull Geometric Approaches ndash Isomap Locally Linear Embedding Laplacian Eigenmaps
bull Probabilistic Approaches ndash Probabilistic PCA Gaussian Process Latent Variable Models
Intro to Machine Learning 117
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Summary and Conclusions
bull Success of Machine Learning ndash Internet applications Bioinformatics Social Network Analysishellip
bull Four Themes ndash Predictive + Graphical Models Online Learning Exploratory Analysis ndash Numerous other active and relevant themes eg Sparsity
bull Key challenges Methods ndash High-dimensional problems nonlinear nonstationary dependencies ndash Spatial and Temporal effects Gradual vs Abrupt changes ndash Modeling typical behavior vs extremesanomalies
bull Key challenges ScienceDomain Questions ndash Feynmanrsquos 3-step lsquoalgorithmrsquo for problem solving ndash Questions+Expertise+Data from Domain ScientistsExperts
Intro to Machine Learning 118
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning
Acknowledgements
Amrudin Agovic Qiang Fu Soumyadeep Chatterjee
Hanhuai Shan Puja Das
119
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-
Intro to Machine Learning 120
- Tutorial Introduction to Machine Learning
- Success Stories
- Success Stories
- Types of Data
- Types of Learning
- Overview
- Classification Linear Models
- Regression Linear Models
- Hierarchical Linear Models
- Regularization Generalization
- Non-linear Methods
- Ensembles
- High Dimensional Modeling
- Land Variable Regression
- Land Regions for Prediction Temp Precip
- Ordinary Least Squares
- Sparse Group Lasso (SGL)
- Example Land Variable Regression
- Error (RMSE) SGL NC OLS
- Overview
- Graphical Models What and Why
- Example Burglary Network
- Bayesian Networks
- Example II Rain Network
- Latent Variable Models
- Inference
- Inference Algorithms
- Approximate Inference
- The Inference Problem
- Bayes Nets to Factor Graphs
- Factor Graphs Product of Local Functions
- Marginalize Product of Functions (MPF)
- Example Computing g1(x1)
- Message Passing
- The Sum-Product Algorithm
- Example Computing g3(x3)
- Hidden Markov Models (HMMs)
- Distributive Law on Semi-Rings
- Message Passing in General Graphs
- Temporal Models
- Spatial Models
- Drought Detection
- Detection with Markov Random Field (MRF)
- Drought Regions from MRFs
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Results Drought Detection
- Graphical Models (Contd)
- Background Plate Diagrams
- Slide Number 55
- Slide Number 56
- Slide Number 57
- Slide Number 58
- Slide Number 59
- Slide Number 60
- Slide Number 61
- Slide Number 62
- Latent Dirichlet Allocation (LDA)
- LDA Generative Model
- Learning Inference and Estimation
- Variational Inference
- Smoothed Latent Dirichlet Allocation
- Aviation Safety Reports (NASA)
- Results NASA Reports I
- Results NASA Reports II
- Slide Number 71
- Slide Number 72
- Slide Number 73
- Slide Number 74
- Generalizations
- Recommendation Systems
- Advertisements on the Web
- Forest Ecology
- Matrix Factorization
- Probabilistic Matrix Factorization (PMF)
- Bayesian Probabilistic Matrix Factorization
- Parametric PMF (PPMF)
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- Results Bayesian PMF on Netflix
- Results PMF on Netflix
- Results Bayesian PMF on Netflix
- PPMF vs PMF BPMF
- Overview
- Learning with Expert Advice
- Time Machine (TM) vs Online Learning (OL)
- Online Concave Optimization (OCO)
- OCO Good News and Bad News
- Portfolio Selection Notation and Definitions
- The Portfolio Selection Problem
- Constant Rebalanced Portfolio (CRP)
- Regret Competing with the Best CRP
- Datasets
- Results UP EG
- Results UP EG ONS
- Results UP EG ONS AC30
- Results UP EG ONS AC30
- Motivation for Meta Algorithms
- Meta Algorithm
- Results MAEG MAONS
- Results MAEG MAONS
- Results UP EG ONS AC30
- Results MAEG MAONS
- Results New Base Algorithm BAH(AC30)
- Results MAEG MAONS
- Results NYSE Base Algorithms
- Results MAEG MAONS
- Overview
- Model-based Clustering
- Spectral Clustering
- Linear Dimensionality Reduction
- Nonlinear Dimensionality Reduction
- Summary and Conclusions
- Acknowledgements
- Slide Number 120
-