Machine learning for_finance
-
Upload
stefan-duprey -
Category
Data & Analytics
-
view
333 -
download
0
description
Transcript of Machine learning for_finance
![Page 1: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/1.jpg)
Machine Learningin
Finance
Stefan Duprey / September 2013
![Page 2: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/2.jpg)
Statistical learning scope
Data Mining
Exploration
UnivariatePie chart,
Histogram, etc…
MultivariateFeature
selection and transformation
Modelling
Clustering
Partitive
K-means
Gaussianmixture model
SOMHierarchical
Classification
Discriminant
Decision Tree
Neural Network
Support VectorMachine
Regression
![Page 3: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/3.jpg)
Classifier for
Credit Scoring
![Page 4: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/4.jpg)
Decision rule for Support Vector Machines
![Page 5: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/5.jpg)
A quadratic optimization problem !
![Page 6: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/6.jpg)
SVM non-linear case
![Page 7: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/7.jpg)
SVM summary
avoid the plague of local minima the engineer’s expertise is in the appropriate
kernel (beware of overfitting, cross-validate and experiment your own kernels) only classify between 2 class (one vs all or one
vs one methodology) a reference in use cases in computer vision,
bio informatics
![Page 8: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/8.jpg)
Neural Network : what are they ?
![Page 9: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/9.jpg)
Neural Network summary
Gradient descent algorithm : stochastic, mini-batch, conjugate
plague of local minima : difficult to calibrate
the engineer’s expertise is in the appropriate architecture (beware of overfitting, cross-validate and experiment your own architecture ‘deeper learning’)
![Page 10: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/10.jpg)
>> t = classregtree(X,Y);
>> Y_pred = t(X_new);
Regression Trees
![Page 11: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/11.jpg)
Forests of Trees
predictors
updowndown
upupup
downup
downupup...
responseY
>> t = TreeBagger(nb_trees,X,Y);
>> [Y_pred,allpred] = predict(t,X_new);
![Page 12: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/12.jpg)
Splitting criteria : information gain
![Page 13: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/13.jpg)
Why a regression and what is a regression ?
A regression is a model to explain and predict a process :
supervised machine learning
![Page 14: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/14.jpg)
Why regularizing ?• Terms are correlated
• The regression matrix becomes close to singular
• Badly conditioned matrix yield poor numerical results
• Bayesian interpretation
Likelihood
Regularisation term
Posterior
Prior
We rather minimize
![Page 15: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/15.jpg)
Why Lasso and Elastic Net?• No method owns the truth
• Reduce the number of predictors in a regression model
• Identify important predictors
• Select among redundant predictors
• Produce shrinkage estimates with potentially lower predictive errors than ordinary least squares (cross validation)
Lasso :
Elastic Net :
![Page 16: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/16.jpg)
Ensemble learning
Why ensemble learning ?
‘melding results from many weak learners into one high-quality ensemble predictor’
![Page 17: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/17.jpg)
Main differences between Bagging and Boosting
BAGGING BOOSTING
Bagging is randomness Boosting is adaptative and deterministic
Bootstrapped sample Complete initial sample
Each model must perform well over the whole
sample
Each model has to perform better than the
previous one on outliers
Every model have the same weight Models are weighted according to their
performance
Defining features
Advantages and disadvantages
BAGGING BOOSTING
Reducing model variance Variance might rise
Not a simple model anymore Not a simple model anymore
Can be parallelized Can not be parallelized
Less noise over fitting : better than boosting
when noise
Models are weighted according to their
performance
Bagging is usually efficienter than boosting On specific cases, boosting might achieve a far
better accuracy
![Page 18: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/18.jpg)
Big DataLearning
overDistributed Data
![Page 19: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/19.jpg)
Distributed memory : MDCS & the MAP/REDUCE
paradigm
![Page 20: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/20.jpg)
Big data & Machine learning
“It’s not who has the best algorithm that wins . It’s who
has the most data”
![Page 21: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/21.jpg)
Quick overviewExploratory analysis
ClusteringClassification
![Page 22: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/22.jpg)
Aims of this presentation
awareness of the range of methods for multivariate data
reasonable understanding of algorithms
![Page 23: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/23.jpg)
Data Mining
• Exploratory Data Analysis
• Clustering
• Classification
• Regression
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group1
Group2
Group3
Group4
Group5
Group6
Group7
Group8
• Categorical
• Ordinal
• Discontinuous
![Page 24: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/24.jpg)
Exploratory Data Analysis
Why exploratory analysis ? Can be used to:o Graphical viewo “Pre filtering”: preliminary data trends and behaviour
• Means:• Multivariate Plots• Features transformation : principal component analysis, factor model• Features selection : stepwise optimization
![Page 25: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/25.jpg)
Data Exploration: Getting an overview of individual variables
Basic Histogram>> hist(x(:,1))
Custom Number of Bins>> hist(x(:,1),50)
By Group>> hist(byGroup,20)
Gaussian fit>> histfit(x(:,2))
3D Histogram>> hist3(x(:,1:2))
Scatter Plot>>gscatter(x(:,1),x(:,2),groups)
Pie Chart>> pie3(proportions,groups)
>> X = [MPG,Acceleration,Displacement,Weight,Horsepower];
Box Plot>> boxplot(x(:,1),groups)
5 10 15 20 25 30 35 40 45 500
10
20
30
40
50
60
70
80
5 10 15 20 25 30 35 40 45 500
5
10
15
20
25
6 8 10 12 14 16 18 20 22 24 260
10
20
30
40
50
60
5 10 15 20 25 30 35 40 45 508
10
12
14
16
18
20
22
24
26
3
4
5
6
8
10
15
20
25
30
35
40
45
3 4 5 6 8
5 10 15 20 25 30 35 40 45 500
5
10
15
20
25
byGroup(:,1)
byGroup(:,2)
Group6
Group5
Group8
Group3
Group4
![Page 26: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/26.jpg)
Data Exploration: Getting an overview of multiple variables
Plot Matrix by Group
>> gplotmatrix(x,x,groups)
Parallel Coordinates Plot
>> parallelcoords(x,'Group',groups)
Andrews’ Plot
>> andrewsplot(x,'Group',groups)
Glyph Plot
>> glyphplot(x)Chernoff Faces
>> glyphplot(x,'Glyph','face')
MPG Acceleration Displacement Weight Horsepow er
MP
GA
ccele
ratio
nD
ispla
cem
ent
Weig
ht
Hors
epow
er
50 1001502002000 4000200 40010 2020 40
50
100
150
200
2000
4000
200
400
10
20
20
40
MPG Acceleration Displacement Weight Horsepower-3
-2
-1
0
1
2
3
4
Coord
inate
Valu
e
4
6
8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-8
-6
-4
-2
0
2
4
6
8
t
f(t)
4
6
8
chevrolet chevelle malibu buick skylark 320 plymouth satellite
amc rebel sst ford torino ford galaxie 500
chevrolet impala plymouth fury iii pontiac catalina
chevrolet chevelle malibubuick skylark 320 plymouth satellite
amc rebel sst ford torino ford galaxie 500
chevrolet impala plymouth fury iii pontiac catalina
![Page 27: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/27.jpg)
Principal component analysis
1 2 3 4 5 6 7 8 9 100
0.005
0.01
0.015
0.02
0.0249
Principal Component
Variance E
xpla
ined (
%)
0%
20%
40%
60%
80%
100%
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
1
Component 1
CommerzbankDeutscheBankInfineon
ThyssenKruppMANDaimlerHeidelbergerAllianzDeutscheBahnBMWSalzgitterSiemensDeutschePostLufthansaBASFAdidasMetroVWLindeEONMunichReBayerRWESAPMRKDeutscheTelekomBeiersdorf
Fresenius
HenkelFreseniusMedical
Component 2
Com
ponent
3
>>[pcs,scrs,variances]=princomp(stocks);
-3 -2 -1 0 1 2 3-2
02
-3
-2
-1
0
1
2
3
![Page 28: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/28.jpg)
Factor model Alternative to PCA to improve your components
>>[Lambda,Psi,T,stats,F]=factoran(stocks,3,'rotate','promax);
-1-0.5
00.5
1 -1
-0.5
0
0.5
1-1
-0.5
0
0.5
1
Component 2
DeutscheBankDaimlerAllianzMAN
ThyssenKruppBMWLufthansa
SiemensDeutschePostCommerzbank
BASF
Adidas
LindeMunichReMetroHeidelberger
SAP
Bayer
Salzgitter
InfineonDeutscheBahn
EONRWE
VW
DeutscheTelekom
BeiersdorfMRKFresenius
Henkel
FreseniusMedical
Component 1
Com
ponent
3
![Page 29: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/29.jpg)
Paring predictors : stepwise optimization Some predictors might be correlated, other irrelevant
Requires Statistics Toolbox™>>[coeff,inOut]=stepwisefit(stocks, index);
2007 2008 2009 2010 2011-0.1
0
0.1
0.2
0.3Returns
original data
stepwise fit
2007 2008 2009 2010 20110.5
1
1.5Prices
![Page 30: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/30.jpg)
Cloud of randomly generated points• Each cluster center is randomly chosen inside specified bounds
• Each cluster contains the specified number of points per cluster
• Each cluster point is sampled from a gaussian distribution
• Multidimensionnal dataset
>>clusters = 8; % number of clusters.>>points = 30; % number of points in each cluster.>>std_dev = 0.05; % common cluster standard deviation>>bounds = [0 1]; % bounds for the cluster center
>>[x,vcentroid,proportions,groups] =cluster_generation(bounds,clusters,points,std_dev);
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group1
Group2
Group3
Group4
Group5
Group6
Group7
Group8
![Page 31: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/31.jpg)
Clustering Why clustering ?
o Segment populations into natural subgroupso Identify outlierso As a preprocessing method – build separate models on each
• Means• Hierarchical clustering• Clustering with neural network (self-organizer map, competitive layer)• Clustering with K-means nearest neighbours• Clustering with K-means fuzzy logic• Clustering using Gaussian mixture models
• Predictors: categorical, ordinal, discontinuous-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Input Vectors
x(1)
x(2
)
![Page 32: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/32.jpg)
Hierarchical Cluster Analysis – what is it doing?
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Cutt-off = 0.1
![Page 33: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/33.jpg)
Hierarchical Cluster Analysis – how do I do it ?
• Calculate pairwise distances between points
>> distances = pdist(x)
• Carry out hierarchical cluster analysis
>> tree = linkage(distances)
• Visualise as a dendrogram
>> dendrogram(tree)
• Assign points to clusters
>> assignments = cluster(tree,‘cutoff',0.1)
![Page 34: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/34.jpg)
Assessing the quality of a hierarchical cluster analysis
• The cophenetic correlation coefficient measures how closely the length of the tree links match the original distances between points
• How ‘faithful’ the tree is to the original data
• 0 is poor, 1 is good
>> cophenet(tree,distances)
![Page 35: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/35.jpg)
K-Means Cluster Analysis – what is it doing?
Randomly pick K cluster
centroids
Assign points to the
closest centroid
Recalculate positions of
cluster centroids
Reassign points to the
closest centroid
Recalculate positions of
cluster centroids
Repeat until centroid positions converge
………
![Page 36: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/36.jpg)
K-Means Cluster Analysis – how do I do it ?
Running the K-mean algorithm for K fixed>> [memberships,centroids] = kmeans(x,K);
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 37: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/37.jpg)
Evaluating a K-Means analysis and choosing K
• Try a range of different K’s, and compare the point-centroid distances for each
>> for K=3:15
[clusters,centroids,distances] = kmeans(data,K);
totaldist(K-2)=sum(distances);
end
plot(3:15,totaldist);
• Create silhouette plots
>> silhouette(data,clusters)
![Page 38: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/38.jpg)
Sidebar: Distance Metrics
• Measures of how similar datapoints are – different definitions make sense for different data
• Many built-in distance metrics, or define your own
>> doc pdist
>> distances = pdist(data,metric); %pdist = pairwise distances
>> squareform(distances)
>> kmeans(data,k,’distance’,’cityblock’) %not all metrics supported
Euclidean Distance
Default
Cityblock Distance
Useful for discrete variables
Cosine Distance
Useful for clustering variables
![Page 39: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/39.jpg)
Fuzzy c-means Cluster Analysis – what is it doing?• Very similar to K-means
• Samples are not assigned definitively to a cluster, but have a ‘membership’ value relative to each cluster
Requires Fuzzy Logic Toolbox™
Running the fuzzy K-mean algorithm
for K fixed>> [centroids, memberships]=fcm(x,K);
![Page 40: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/40.jpg)
Gaussian Mixture Models• Assume that data is drawn from a fixed number K of normal
distributions
• Fit these parameters using the EM algorithm
>> gmobj = gmdistribution.fit(x,8);
>> assignments = cluster(gmobj,x);
Plot the probability density>> ezsurf(@(x,y)pdf(gmobj,[x y]));
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
10
20
![Page 41: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/41.jpg)
Evaluating a Gaussian Mixture Model clustering
• Plot the probability density function of the model
>> ezsurf(@(x,y)pdf(gmobj,[x y]));
• Plot the posterior probabilities of observations
>> p = posterior(gmobj,data);
>> scatter(data(:,1),data(:,2),5,p(:,g)); % Do this for each group g
• Plot the Mahalanobis distances of observations to components
>> m = mahal(gmobj,data);
>> scatter(data(:,1),data(:,2),5,m(:,g)); % Do this for each group g
![Page 42: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/42.jpg)
Choosing the right number of components in a Gaussian Mixture Model
• Evaluate for a range of K and plot AIC and/or BIC
• AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are measures of the quality of the model fit, with a penalty for higher K
>> for K=3:15
gmobj = gmdistribution.fit(data,K);
AIC(K-2) = gmobj.AIC;
end
plot(3:15,AIC);
![Page 43: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/43.jpg)
Neural Networks – what are they?Input
variables
WeightsBias
Transfer
function
Output
variable
A two layer
feedforward
network
Build your
architecture
![Page 44: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/44.jpg)
Self Organising Maps Neural Net – what are they?
• Start with a regular grid of ‘neurons’ laid over the dataset
• The size of the grid gives the number of clusters
• Neurons compete to recognise datapoints (by being close to them)
• Winning neurons are moved closer to the datapoints
• Repeat until convergence
-0.5 0 0.5 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2SOM Weight Positions
Weight 1
Weig
ht
2
-0.2 0 0.2 0.4 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1SOM Weight Positions
Weight 1
Weig
ht
2
![Page 45: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/45.jpg)
Summary: Cluster analysisNo method owns the truth
Use the diagnostic tools to assess your clusters
Beware of local minima : global optimization
![Page 46: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/46.jpg)
Classification
Why classification ? Can be used to:
o Learning the way to classify from already classified observations
oClassify new observations
• Means:• Discriminant analysis classification
• Bootstrapped aggregated decision tree classifier
• Neural network classifier
• Support vector machine classifier
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group1
Group2
Group3
Group4
Group5
Group6
Group7
Group8
![Page 47: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/47.jpg)
Discriminant Analysis – how does it work?
• Fit a multivariate normal density to each class• linear — Fits a multivariate normal density to each group,
with a pooled estimate of covariance. This is the default.• diaglinear — Similar to linear, but with a diagonal
covariance matrix estimate (naive Bayes classifiers).• quadratic — Fits multivariate normal densities with
covariance estimates stratified by group.• diagquadratic — Similar to quadratic, but with a diagonal
covariance matrix estimate (naive Bayes classifiers).
• Classify a new point by evaluating its probability for each density function, and classifying to the highest probability
![Page 48: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/48.jpg)
Discriminant Analysis – how do I do it?
• Linear Discriminant Analysis>> classes = classify(sample,training,group)
• Quadratic Discriminant Analysis>> classes = classify(x,x,y,’quadratic’)
• Naïve Bayes>> nbGau= NaiveBayes.fit(x, y);
>> y_pred= nbGau.predict(x);
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
![Page 49: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/49.jpg)
Interpreting Discriminant Analyses
• Visualise the posterior probability surfaces
>> [XI,YI] = meshgrid(linspace(4,8), linspace(2,4.5));
>> X = XI(:); Y = YI(:);
>> [class,err,P] = classify([X Y], meas(:,1:2), species,'quadratic');
>> for i=1:3
ZI = reshape(P(:,i),100,100);
surf(XI,YI,ZI,'EdgeColor','none');
hold on;
end
![Page 50: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/50.jpg)
Interpreting Discriminant Analyses
• Visualise the probability density of sample observations
• An indicator of the region in which the model has support from training data
>> [XI,YI] = meshgrid(linspace(4,8), linspace(2,4.5));
>> X = XI(:); Y = YI(:);
>> [class,err,P,logp] = classify([X Y], meas(:,1:2), species, 'quadratic');
>> ZI = reshape(logp,100,100);
>> surf(XI,YI,ZI,'EdgeColor','none');
![Page 51: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/51.jpg)
Classifying K-Nearest Neigbours – what does it do?
• One of the simplest classifiers – a sample is classified by taking the K nearest points from the training set, and choosing the majority class of those K points
• There is no real training phase – all the work is done during the application of the model
>> classes =
knnclassify(sample,training,group,K)
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
![Page 52: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/52.jpg)
Decision Trees – how do they work?• Threshold value for a variable
that partitions the dataset
• Threshold for all predictors
• Resulting model is a tree where each node is a logical test on a predictor (var1<thresh1, var2>thresh2)
![Page 53: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/53.jpg)
Decision Trees – how do I build them ?
• Build tree model>> tree = classregtree(x,y);
>> view(tree)
• Evaluate the model on new data>> tree(x_new)
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
![Page 54: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/54.jpg)
Enhancing the model : bagged trees• Prune the decision tree>> [cost,secost,ntnodes,bestlevel] =test(t, 'test', x, y);
>> topt = prune(t, 'level', bestlevel);
• Bootstrapped aggregated trees forest>> [cost,secost,ntnodes,bestlevel] =test(t, 'test', x, y);
>> forest = TreeBagger(100, x, y);
>> y_pred = predict(forest,x);
• Visualise class boundaries as before
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
![Page 55: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/55.jpg)
Pattern Recognition Neural Network– what are they?
• Two-layer (i.e. one-hidden-layer) feed forward neural networks can learn any input-output relationship given enough neurons in the hidden layer.
• No restrictions on the predictors
![Page 56: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/56.jpg)
Pattern Recognition Neural Network– how do I build them ?
• Build a neural network model>> net = patternnet(10);
• Train the net to classify observations
>> [net,tr] = train(net,x,y);
• Apply the model to new data>> y_pred = net(x);
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
x1
x2
1
2
3
4
5
6
7
8
![Page 57: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/57.jpg)
Support Vector Machines – what are they?
• The SVM algorithm finds a boundary between the classes that maximises the minimum distance of the boundary to any of the points
• No restrictions on the predictors
• 1 vs all to classify multiple classes
![Page 58: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/58.jpg)
Support Vector Machines – how do I build them ?
• Build an SVM model>> svmmodel = svmtrain(x,y)
• Try different kernel functions>> svmmodel =
svmtrain(x,y,’kernel_function’,’rbf’)
• Apply the model to new data>> classes =
svmclassify(svmmodel,x_new);
-3 -2 -1 0 1 2 3-2
-1
0
1
2
3
4
1
2
Support Vectors
![Page 59: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/59.jpg)
Evaluating a Classifying Model• Three main strategies
• Resubstitution – test the model on the same data that you trained it with
• Cross-Validation
• Holdout Test on a completely new dataset
• Use cross-validation to evaluate model parameters such as the number of leaf for a tree or the number of hidden neurons.
Apply cross validation to your classifying model>> cp = cvpartition(y,'k',10);
>> ldaFun= @(xtrain,ytrain,xtest)(classify(xtest,xtrain,ytrain));
>> ldaCVErr = crossval('mcr',x,y,'predfun',ldaFun,'partition',cp)
![Page 60: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/60.jpg)
Summary: Classification algorithms
No absolute best methods
Simple does not mean inefficient
Decision trees produce models and neural network overfit the noise : use bootstrapping and cross-validation
Parallelize
![Page 61: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/61.jpg)
RegressionWhy Regression ? Can be used to:
o Learn to model a continuous response from observationsoPredict the response for new observations
• Means:
• Linear regressions• Non-linear regressions• Bootstrapped regression tree• Neural network as a fitting tool
![Page 62: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/62.jpg)
New data set with a continuous response from one predictor
• Non-linear function to fit
• A continuous response to fit from one continuous predictor
>>[x,t] = simplefit_dataset;
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
![Page 63: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/63.jpg)
Linear Regression – what is it?
• A collection of methods that find the best coefficients b such that y ≈ X*b
• Best b means minimising the least squares difference between the predicted and actual values of y
• “Linear” means linear in b –you can include extra variables to give a nonlinear relationship in X
![Page 64: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/64.jpg)
Linear Regression – how do I do it ?
>> b = x\y
• Linear Regression>> b = regress(y, [ones(size(X,1),1) x])
>> stats = regstats(y, [ones(size(x,1),1) x])
• Robust Regression – better in the presence of outliers>> robust_b = robustfit(X,y) %NB (X,y) not (y,X)
• Ridge Regression – better if data is close to collinear>> ridge_b = ridge(y,X,k) %k is the ridge parameter
• Apply the model to new data>> y = newdata*b;
![Page 65: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/65.jpg)
Interpreting a linear regression model• Examine coefficients to see
which predictors have a large effect on the response
>> [b,bint,r,rint,stats]=regress(y,X)
>> errorbar(1:size(b,1),b, b-bint(:,1),bint(:,2)-b)
• Examine residuals to check for possible outliers
>> rcoplot(r,rint)
• Examine R2 statistic and p-value to check overall model significance
>> stats(1)*100 %R2 as a percentage
>> stats(3) %p-value
• Additional diagnostics with regstats
![Page 66: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/66.jpg)
Non linear curve fitting
Least square algorithm
>> model = @(b,x)(b(1)+b(2).*cos(b(3)*x+b(4))+b(5).*cos(b(6)*x+b(7))+b(8).*cos(b(9)*x+b(10)));
>> [ahat,r,J,cov,mse] = nlinfit(x,t,model,a0);
0 1 2 3 4 5 6 7 8 9 10-5
0
5
10
15
0 10 20 30 40 50 60 70 80 90 1000
0.05
0.1
0.15
0.2
![Page 67: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/67.jpg)
Fit Neural Network– what are they?• Fitting networks are feedforward neural networks used to fit
an input-output relationship.
• This architecture can learn any input-output relationship given enough neurons.
• No restrictions on the predictors (categorical,ordinal,discontinuous)
![Page 68: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/68.jpg)
Fit Neural Network– how do I build them ?
• Build a fit neural net model>> net = fitnet(10);
• Train the net to fit the target>> [net,tr] = train(net,x,t);
• Apply the model to new data>> y_pred = net(x);
0 1 2 3 4 5 6 7 8 9-2
0
2
4
6
8
10
12
Function Fit for Output Element 1
Ou
tpu
t a
nd
Ta
rge
t
-0.02
0
0.02
0.04
Err
or
Input
Targets
Outputs
Errors
Fit
Targets - Outputs
![Page 69: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/69.jpg)
Regression trees– what are they?
• A decision tree with binary splits for regression. An object of class RegressionTree can predict responses for new data with the predict method.
• No restrictions on the predictors (categorical,ordinal,discontinuous)
![Page 70: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/70.jpg)
Regression trees – how do I use them?
• Build a fit neural net model>> rtree = RegressionTree.fit(x,t);
• Train the net to fit the target>> y_tree = predict(rtree,x);
• Apply the model to new data>> y_pred = net(x);
0 1 2 3 4 5 6 7 8 9 100
5
10
0 10 20 30 40 50 60 70 80 90 1000
0.5
1
1.5x 10
-15
![Page 71: Machine learning for_finance](https://reader033.fdocuments.in/reader033/viewer/2022060117/5586c8dfd8b42a241f8b464e/html5/thumbnails/71.jpg)
Summary
Data Mining
Exploration
UnivariatePie chart,
Histogram, etc…
MultivariateFeature
selection and transformation
Modelling
Clustering
Partitive
K-means
Gaussianmixture model
SOMHierarchical
Classification
Discriminant
Decision Tree
Neural Network
Support VectorMachine
Regression