Artificial Neural Network

47
Artificial Neural Network Dessy Amirudin May 2016 Data Science Indonesia Bootcamp

Transcript of Artificial Neural Network

Page 1: Artificial Neural Network

Artificial Neural Network

Dessy Amirudin

May 2016Data Science Indonesia

Bootcamp

Page 2: Artificial Neural Network

Intro

Page 3: Artificial Neural Network

Brain Plasticity

http://www.nytimes.com/2000/04/25/science/rewired-ferrets-overturn-theories-of-brain-growth.html

auditory cortex

“One learning algorithm” to rule them all

Page 4: Artificial Neural Network

Any sensor input!

Hearing with vibrationhttp://www.eaglemanlab.net/sensory-substitution

Human Echolocationhttp://www.sciencemag.org/news/2014/11/how-blind-people-use-batlike-sonar

Page 5: Artificial Neural Network

Any sensor input!

Seeing with sound:https://www.newscientist.com/article/mg20727731-500-sensory-hijack-rewiring-brains-to-see-with-sound/

3rd Eye Frog:https://www.newscientist.com/article/mg20727731-500-sensory-hijack-rewiring-brains-to-see-with-sound/

Page 6: Artificial Neural Network

Neuron

http://learn.genetics.utah.edu/

Page 7: Artificial Neural Network

Neuron

http://learn.genetics.utah.edu/

Page 8: Artificial Neural Network

Neural Network

input hidden layer output

Page 9: Artificial Neural Network

Neural Network in Brief History

• Algorithm to mimic the brain• Was widely use in 80s and early 90s• Popularity diminishes in late 90s. Why?• Recent resurgence: State-of-the-art technique

for many application• Can be used for regression and classification

Page 10: Artificial Neural Network

Recent Application of NN

• Speech recognition• Image recognition and search

• Playlist recommendation

• Skype Translate

Page 11: Artificial Neural Network

Other application of NN

• Stock market prediction• Credit Worthiness• Credit Rating

FINANCIAL

• Medical Diagnosis• Electronic Noses

MEDICAL

• Churn prediction• Targeted Marketing• Service Usage Forecast

SALES & MARKETING

& many more

Page 12: Artificial Neural Network

One Layer Neural Network

input output

input node =∑𝑖𝑤 𝑖𝑎𝑖 =

output node = ø(wTa)

wTa

Ø is the activation function

Page 13: Artificial Neural Network

Some Common Activation Function

linear

step

sigmoid

tanh

ø(wTa) = wTa

ø(wTa) =

ø(wTa) =

ø(wTa) =

Page 14: Artificial Neural Network

Revisit One Layer Neural Network

a. If the activation function is linear, what will happen?

b. If the activation function is sigmoid, what will happen?

Page 15: Artificial Neural Network

Do we really need many layer?

Page 16: Artificial Neural Network

Look at a classification problem

• Linear classification model is not enough• Add quadratic or qubic term as necessary

Suppose we have a classification problem, with n=100 featureAdding all quadratic term, the number of variable will become ~5000Adding all qubic and quadratic term, the number of variable will become ~170K

http://sebastianraschka.com

Page 17: Artificial Neural Network
Page 18: Artificial Neural Network

Dog vs Cat

vs

100 x 100 pixels (example) ~10000 variablesAdding all quadratic term, the number of variable become ~ 50 million variables

Page 19: Artificial Neural Network

Multilayer Network

sigmoid

ø(wTa)

node endw

w

w

Page 20: Artificial Neural Network

Recall a sigmoid function

Page 21: Artificial Neural Network

AND Function

𝑎1

𝑎2

𝑎0output

0 0 0

0 1 0

1 0 0

1 1 1

=1, this is the bias valueActivation function is sigmoidSuppose we assign the weight = -20 = 15 = 15The AND logic will be correct

𝑤0

𝑤1

𝑤2

Page 22: Artificial Neural Network

OR Function

𝑎1

𝑎2

𝑎0output

0 0 0

0 1 1

1 0 1

1 1 1

=1, this is the bias value

Task 1:Find value of , and to make the OR logic is TRUE

What is the value of the weight if the logic is NOT ( OR ?

𝑤0

𝑤1

𝑤2

Page 23: Artificial Neural Network

XOR Functionoutput

0 0 0

0 1 1

1 0 1

1 1 0

=1, this is the bias value

Can you find the weight for XOR function?

XOR = OR

AND NOT (OR) output

0 0 0 1 0

0 1 0 0 1

1 0 0 0 1

1 1 1 0 0

Page 24: Artificial Neural Network

Multilayered Network for XORRepresentation

𝑎01

𝑎11

𝑎21

𝑎02

𝑎12

𝑎22

𝑤0123

𝑤1123

𝑤2123

AND

NOT OR

𝑤0112

𝑤0212

𝑤1112

𝑤1212

𝑤2112

𝑤2212

Now, a multilayered network is necessary

Page 25: Artificial Neural Network

How to assign the weight?

Page 26: Artificial Neural Network

Intro to Optimization

How to find minimum value of this function? How to find minimum value of this function?

Page 27: Artificial Neural Network

Gradient Descent Method

• Suppose the function is the descent direction is the first derivative • Parameter to start the algorithm

α = learning parameter, usually set with small value such as 0.001ϵ = convergence parameter, usually set with very small value such as 1e-6

Page 28: Artificial Neural Network

Gradient Descent Method Initialize with k=0, some value of and ϵ Start from random as Calculate cost function Update value of as Calculate cost difference δ- If δ< ϵ , STOP

We can write linear regression learning as an optimization problem

min𝛽∑1

𝑛

(𝑦 𝑖− 𝛽𝑇 𝑥𝑖)

2

Page 29: Artificial Neural Network

Exercise 1• Load “auto_data.csv”• Create linear regression model with dependent variable (y) = “weight” and

independent variable(x) = “mpg”• What is the value of intercept?• What is the value of mpg’s coefficient?• What is the MSE’s value?• Plot the mpg ~ weight and it’s model

• Can you write R code to find the optimum value of intercept’s coefficient and mpg’s coefficient using the gradient descent method?

Page 30: Artificial Neural Network
Page 31: Artificial Neural Network

Forward and Backward Propagation

Page 32: Artificial Neural Network

Forward Propagation

• = input

• ) (add bias)

𝑠𝑖3𝑠𝑖2

𝑠𝑖4

𝑎𝑖3𝑎𝑖

2𝑎𝑖1

𝑎𝑖4

𝑤𝑖𝑗12 𝑤𝑖𝑗

23

𝑤𝑖𝑗34

Get the output using Forward Propagation

How to update weight using gradient descent?

Page 33: Artificial Neural Network

Backward Propagation

Define error

inputoutput

𝑎1 𝑠2 𝑎2 𝑠3 𝑎3 𝑠4 𝑎4

𝑤12 𝑤23 𝑤34

Given training example update

update

sigm sigm sigm

Page 34: Artificial Neural Network

Backward PropagationIn case of one output with many hidden layer, the formulation for hidden layer in one particular node become

Page 35: Artificial Neural Network

Neural Network Tips• Most of the time, one hidden layer is enough• Number of neuron between input layer size and output layer size• Number of neuron in hidden usually 2/3 input size

HOWEVER, this is not always TRUE. The best way is to keep experimenting

Page 36: Artificial Neural Network

Exercise

Page 37: Artificial Neural Network

Neural Network for RegressionLoad MASS libraryUse “Boston” dataPredict the median value of the house (medv)Do the following:Data preparation- Split the data into train and test set. Train set comprises 0.75 % of the data

Model 1:- Create the linear regression model using the train data set (using lm or glm)- Predict the “medv” from the test data set- Calculate the RSS of the test set- Calculate the TSS of the test set- Calculate R2 of the test set- Calculate MSE of the test set

Page 38: Artificial Neural Network

Regression ContinuesModel 2:- Load “neuralnet” library library(neuralnet)- Create the regression model using neural network algorithm with one hidden layer

with 8 node. Follow the code?n=names(train)f=as.formula(paste("medv~",paste(n[!n %in%

"medv"],collapse="+")))nn <- neuralnet(f,data=train,hidden=c(8),linear.output=T)

What happened? Do you see message like this?“algorithm did not converge in 1 of 1 repetition(s) within the stepmax”

• Predict the “medv” from the test data set• Calculate the RSS of the test set• Calculate the TSS of the test set• Calculate R2 of the test set• Calculate MSE of the test set• Plot the model graphCompare with the result from linear model.

Page 39: Artificial Neural Network

Neural Network Plot

Use plot(“nn model”) to plot the graph

Page 40: Artificial Neural Network

Neural Network Additional Tips

• Preprocessed data using normalization• Usually scaling in the intervals [0,1] or [-1,1] tends to give better results.

Page 41: Artificial Neural Network

Exercise 2Same as exercise model 2, but normalized the data.

• Predict the “medv” from the test data set• Calculate the RSS of the test set• Calculate the TSS of the test set• Calculate R2 of the test set• Calculate MSE of the test set

Compare with the result from linear model. Can you improve it?What is the lesson learned?

Page 42: Artificial Neural Network

Neural Network for Binomial Classification

Page 43: Artificial Neural Network

Data Exploration and ModelingUse “credit_dlqn.csv” dataExplore the data• How many variable?• What is the type of variable?• Any other variable that you think are needed to create credit model?

Do the following:Data preparation- Split the data into train and test set. Train set comprises 0.75 % of the data

Model 1:• Use logistic regression to predict the default in 2 years

Model 2:• Use neural network to predict the default in 2 years• Use one hidden layer with number of node is 2/3 of input (equal to 7)

Page 44: Artificial Neural Network

Continue ExperimentingHow long do you finish the model?

NOTE : Neural Network is very slow to converge. Depend on the objective of the business, as Data Scientist you have to be very considerate when choosing an algorithm

• Try with another number of node in hidden layer, such as 2 node• How is the result?• How is the accuracy compared to logistic regression?

Page 45: Artificial Neural Network

Recall on Confusion Table

• Source wikipedia

Page 46: Artificial Neural Network

Assignment – Due to Next Week• Increase the precision of the neural network model. Use neural network

with different parameter• In word document, tell what is the improvement that you can obtaind,

what is your method, why it is work, why it doesn’t work

• Submit your code and word document to [email protected] before 23 May 2016 23:59:59

Page 47: Artificial Neural Network

References

• Machine Learning. Courses in Coursera by Andrew Ng, 2013.

• Hastie T., Tibshirani R., Witten D. and James G. The Introduction of Statistical Learning. Springer. 2014.