Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... ·...
Transcript of Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... ·...
![Page 1: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/1.jpg)
Data Driven Modelling
Data Driven Modelling using MATLAB
Shan He
School for Computational ScienceUniversity of Birmingham
Module 06-23836: Computational Modelling with MATLAB
![Page 2: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/2.jpg)
Data Driven Modelling
Outline
Outline of Topics
What is data driven modelling?
Regression Analysis in MATLAB
Artificial Neural Networks
Conclusion
![Page 3: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/3.jpg)
Data Driven Modelling
What is data driven modelling?
What is data driven modelling?
I For equation and agent-based models, we assume the model isknown.
I However, sometimes we have large amount of data but verylittle prior knowledge.
I Finding the model in the first place is the most difficult andimportant question.
I A new research field: data driven modelling (DDM).
I Based on the data, a model is built on the basis ofconnections between the system state variables, e.g., input,internal and output variables, with only a limited assumptionabout the system.
![Page 4: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/4.jpg)
Data Driven Modelling
What is data driven modelling?
Goals/purposes of data driven modelling
I Extract and recognize patterns in data
I Interpret or explain observations
I Test validity of hypotheses
I Search the space of hypotheses
![Page 5: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/5.jpg)
Data Driven Modelling
What is data driven modelling?
Tasks of data driven modelling
I Classification: where the task constitutes of assigning a classfor an input data point.
I Association: where association between variablescharacterising the system is to be identified, which is used insubsequent prediction.
I Regression: where the task constitutes of predicting a realvalue associated with an input data point.
I Clustering: where groups of data points with within groupsimilarity are to be determined.
![Page 6: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/6.jpg)
Data Driven Modelling
What is data driven modelling?
It is new and old!
I Before it was called observational modelling.
I Based on methods in statistics, e.g., regression.
I These methods usually cannot handle nonlinear systems.
I Recent years, machine learning techniques have been applied.
I We will learn how to use regression and Artificial NeuralNetworks to build data-driven models in MATLAB.
![Page 7: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/7.jpg)
Data Driven Modelling
What is data driven modelling?
Data driven modelling process
I Data preparation: obtain data / data checking/ datacleaning
I Feature selection: if you have high-dimensional data.
I Specify assumptions based on domain knowledge.
I Develop Model based on the assumptions.
I Specify loss function, e.g., the mean least square errorbetween the model output and the real data.
I Use algorithms to minimize loss based on the train data.
I Test the model using testing data
![Page 8: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/8.jpg)
Data Driven Modelling
What is data driven modelling?
What tools can we use?
I Statistics:I Linear regressionI Nonlinear regressionI Logistic regressionI Probit regression
I Machine Learning techniques:I Decision treeI Artificial Neural NetworkI Nearest NeighboursI Support Vector MachineI Association rule learning
![Page 9: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/9.jpg)
Data Driven Modelling
Regression Analysis in MATLAB
Linear regression analysis in MATLAB
I For linear regression, we can use polynomial curve fitting.
I MATLAB function: p = polyfit(x,y,n)
I It finds the coefficients of a polynomial p(x) of degree n thatfits the data, p(x(i)) to y(i), in a least squares sense.
I The output p is a row vector of length n+1 containing thepolynomial coefficients in descending powers:
p(x) = p1xn + p2x
n−1 + · · ·+ pnx + pn+1
I To evaluate the polynomial at the data points: y =
polyval(p,x)
![Page 10: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/10.jpg)
Data Driven Modelling
Regression Analysis in MATLAB
A very simple example: fitting error function
I Regression: We aim to fit the data points from the errorfunction erf(X) is twice the integral of the Gaussiandistribution with 0 mean and variance of 1/2:
erf(x) =2√π
∫ ∞x
e−t2dt
![Page 11: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/11.jpg)
Data Driven Modelling
Regression Analysis in MATLAB
A more complex example: fitting traffic data
I Hourly traffic counts at three intersections for a single day.
I Regression: We aim to fit the data with polyval
![Page 12: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/12.jpg)
Data Driven Modelling
Regression Analysis in MATLAB
Logistic regression
I Sometimes called the logistic model or logit model.
I Can be used for predicting the outcome of a binary dependentvariable: Classification.
I MATLAB function: b = glmfit(X,y,distr)
I Output: a p-by-1 vector b of coefficient estimates for ageneralized linear regression of the responses in y on thepredictors in X, using the distribution distr
![Page 13: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/13.jpg)
Data Driven Modelling
Regression Analysis in MATLAB
Australian Credit Card Assessment
I Task: to assess applications to an Australian bank for a creditcard based on a number of attributes.
I 2 classes: granted (44.5% of the instances) or denied (55.5%of the instances)
I 14 attributes: names and values have been changed tomeaningless symbols to protect confidentiality of the data.
I Mixing-value inputs: there are 5 continuous, 4 binary and 5nominal
I A lot of missing value.
![Page 14: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/14.jpg)
Data Driven Modelling
Regression Analysis in MATLAB
Military Trauma survival prediction
![Page 15: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/15.jpg)
Data Driven Modelling
Artificial Neural Networks
What is Artificial Neural Networks (ANNs)?
Hidden LayerInput Output
I ANN: Mathematical model or computational model inspiredby biological neural networks.
I Consists of an interconnected group of artificial neurons
![Page 16: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/16.jpg)
Data Driven Modelling
Artificial Neural Networks
What are Artificial Neural Networks (ANNs)?
I Non-linear statistical data modeling tools:I Model complex relationships between inputs and outputs;I Discover patterns in data.
I Can be used for classification, association, regression andclustering.
I MATLAB Neural Network Toolbox (Click for more detailedtutorial)
![Page 17: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/17.jpg)
Data Driven Modelling
Artificial Neural Networks
Example: Prediction of number of sun spots
I Sunspot series is a record of the activity of the surface of thesun.
I Important: Telecommunication will by disrupted by asufficiently large solar flare.
I Time series data for sunspot activity over the last 300 years.
I Sunspot activity is cyclical, reaching a maximum about every11 years.
I Challenging: sunspot series is nonlinear, non-stationary andnon-Gaussian
![Page 18: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/18.jpg)
Data Driven Modelling
Artificial Neural Networks
Prediction of sunspot number by ANNs
I Task: We use recorded sunspot data to train our ANN topredict sunspot number based on the sunspot numbers ofprevious 3 years.
I Training data: sunspot numbers from 1705 – 1884
I Test data: sunspot numbers from 1884 – 1987
![Page 19: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/19.jpg)
Data Driven Modelling
Artificial Neural Networks
New direction for ANNs: Deep Learning
I ANNs fell out of favour in 90s because they are slow andinefficient
I In 2006, Prof. Geoff Hinton made a breakthrough: deeplearning
I Excels at unsupervised learning, e.g., recognise handwrittenwords
I Key idea: learn categories incrementally, e.g., lower-levelcategories (letters) → higher-level categories (words)
I Google, Microsoft and along with other big names havejumped on the bandwagon
I Microsoft Project: Speech Recognition
![Page 20: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/20.jpg)
Data Driven Modelling
Conclusion
Conclusion
I If you know the underlying mechanisms of the system (evenpartially), DO NOT use data-driven modelling methods.
I How to choose your tools: start from simple tools
I Regression → Decision Tree → ANNs (SVM, Random Forest)→ Hybrid methods, e.g., Evolutionary ANNs
I Also need to consider interpretability: simpler tools do better
![Page 21: Data Driven Modelling using MATLAB - cs.bham.ac.ukszh/teaching/matlabmodeling/Lecture17_body... · Data Driven Modelling Outline Outline of Topics What is data driven modelling? Regression](https://reader030.fdocuments.in/reader030/viewer/2022041206/5d5d436788c99377578b464d/html5/thumbnails/21.jpg)
Data Driven Modelling
Conclusion
Assignment
I Based on the sunspot number prediction example, use linearregression (polyfit) and ANNs to model Hudson BayCompany fur record data.
I Investigate how to use decision tree for Australian Credit CardAssessment problem. Compare the results with ANNs andLogistic regression.