Artificial Neural Network

Post on 07-Jan-2017

255 views 7 download

Transcript of Artificial Neural Network

Artificial Neural Network

Dessy Amirudin

May 2016Data Science Indonesia

Bootcamp

Intro

Brain Plasticity

http://www.nytimes.com/2000/04/25/science/rewired-ferrets-overturn-theories-of-brain-growth.html

auditory cortex

“One learning algorithm” to rule them all

Any sensor input!

Hearing with vibrationhttp://www.eaglemanlab.net/sensory-substitution

Human Echolocationhttp://www.sciencemag.org/news/2014/11/how-blind-people-use-batlike-sonar

Any sensor input!

Seeing with sound:https://www.newscientist.com/article/mg20727731-500-sensory-hijack-rewiring-brains-to-see-with-sound/

3rd Eye Frog:https://www.newscientist.com/article/mg20727731-500-sensory-hijack-rewiring-brains-to-see-with-sound/

Neuron

http://learn.genetics.utah.edu/

Neuron

http://learn.genetics.utah.edu/

Neural Network

input hidden layer output

Neural Network in Brief History

• Algorithm to mimic the brain• Was widely use in 80s and early 90s• Popularity diminishes in late 90s. Why?• Recent resurgence: State-of-the-art technique

for many application• Can be used for regression and classification

Recent Application of NN

• Speech recognition• Image recognition and search

• Playlist recommendation

• Skype Translate

Other application of NN

• Stock market prediction• Credit Worthiness• Credit Rating

FINANCIAL

• Medical Diagnosis• Electronic Noses

MEDICAL

• Churn prediction• Targeted Marketing• Service Usage Forecast

SALES & MARKETING

& many more

One Layer Neural Network

input output

input node =∑𝑖𝑤 𝑖𝑎𝑖 =

output node = ø(wTa)

wTa

Ø is the activation function

Some Common Activation Function

linear

step

sigmoid

tanh

ø(wTa) = wTa

ø(wTa) =

ø(wTa) =

ø(wTa) =

Revisit One Layer Neural Network

a. If the activation function is linear, what will happen?

b. If the activation function is sigmoid, what will happen?

Do we really need many layer?

Look at a classification problem

• Linear classification model is not enough• Add quadratic or qubic term as necessary

Suppose we have a classification problem, with n=100 featureAdding all quadratic term, the number of variable will become ~5000Adding all qubic and quadratic term, the number of variable will become ~170K

http://sebastianraschka.com

Dog vs Cat

vs

100 x 100 pixels (example) ~10000 variablesAdding all quadratic term, the number of variable become ~ 50 million variables

Multilayer Network

sigmoid

ø(wTa)

node endw

w

w

Recall a sigmoid function

AND Function

𝑎1

𝑎2

𝑎0output

0 0 0

0 1 0

1 0 0

1 1 1

=1, this is the bias valueActivation function is sigmoidSuppose we assign the weight = -20 = 15 = 15The AND logic will be correct

𝑤0

𝑤1

𝑤2

OR Function

𝑎1

𝑎2

𝑎0output

0 0 0

0 1 1

1 0 1

1 1 1

=1, this is the bias value

Task 1:Find value of , and to make the OR logic is TRUE

What is the value of the weight if the logic is NOT ( OR ?

𝑤0

𝑤1

𝑤2

XOR Functionoutput

0 0 0

0 1 1

1 0 1

1 1 0

=1, this is the bias value

Can you find the weight for XOR function?

XOR = OR

AND NOT (OR) output

0 0 0 1 0

0 1 0 0 1

1 0 0 0 1

1 1 1 0 0

Multilayered Network for XORRepresentation

𝑎01

𝑎11

𝑎21

𝑎02

𝑎12

𝑎22

𝑤0123

𝑤1123

𝑤2123

AND

NOT OR

𝑤0112

𝑤0212

𝑤1112

𝑤1212

𝑤2112

𝑤2212

Now, a multilayered network is necessary

How to assign the weight?

Intro to Optimization

How to find minimum value of this function? How to find minimum value of this function?

Gradient Descent Method

• Suppose the function is the descent direction is the first derivative • Parameter to start the algorithm

α = learning parameter, usually set with small value such as 0.001ϵ = convergence parameter, usually set with very small value such as 1e-6

Gradient Descent Method Initialize with k=0, some value of and ϵ Start from random as Calculate cost function Update value of as Calculate cost difference δ- If δ< ϵ , STOP

We can write linear regression learning as an optimization problem

min𝛽∑1

𝑛

(𝑦 𝑖− 𝛽𝑇 𝑥𝑖)

2

Exercise 1• Load “auto_data.csv”• Create linear regression model with dependent variable (y) = “weight” and

independent variable(x) = “mpg”• What is the value of intercept?• What is the value of mpg’s coefficient?• What is the MSE’s value?• Plot the mpg ~ weight and it’s model

• Can you write R code to find the optimum value of intercept’s coefficient and mpg’s coefficient using the gradient descent method?

Forward and Backward Propagation

Forward Propagation

• = input

• ) (add bias)

𝑠𝑖3𝑠𝑖2

𝑠𝑖4

𝑎𝑖3𝑎𝑖

2𝑎𝑖1

𝑎𝑖4

𝑤𝑖𝑗12 𝑤𝑖𝑗

23

𝑤𝑖𝑗34

Get the output using Forward Propagation

How to update weight using gradient descent?

Backward Propagation

Define error

inputoutput

𝑎1 𝑠2 𝑎2 𝑠3 𝑎3 𝑠4 𝑎4

𝑤12 𝑤23 𝑤34

Given training example update

update

sigm sigm sigm

Backward PropagationIn case of one output with many hidden layer, the formulation for hidden layer in one particular node become

Neural Network Tips• Most of the time, one hidden layer is enough• Number of neuron between input layer size and output layer size• Number of neuron in hidden usually 2/3 input size

HOWEVER, this is not always TRUE. The best way is to keep experimenting

Exercise

Neural Network for RegressionLoad MASS libraryUse “Boston” dataPredict the median value of the house (medv)Do the following:Data preparation- Split the data into train and test set. Train set comprises 0.75 % of the data

Model 1:- Create the linear regression model using the train data set (using lm or glm)- Predict the “medv” from the test data set- Calculate the RSS of the test set- Calculate the TSS of the test set- Calculate R2 of the test set- Calculate MSE of the test set

Regression ContinuesModel 2:- Load “neuralnet” library library(neuralnet)- Create the regression model using neural network algorithm with one hidden layer

with 8 node. Follow the code?n=names(train)f=as.formula(paste("medv~",paste(n[!n %in%

"medv"],collapse="+")))nn <- neuralnet(f,data=train,hidden=c(8),linear.output=T)

What happened? Do you see message like this?“algorithm did not converge in 1 of 1 repetition(s) within the stepmax”

• Predict the “medv” from the test data set• Calculate the RSS of the test set• Calculate the TSS of the test set• Calculate R2 of the test set• Calculate MSE of the test set• Plot the model graphCompare with the result from linear model.

Neural Network Plot

Use plot(“nn model”) to plot the graph

Neural Network Additional Tips

• Preprocessed data using normalization• Usually scaling in the intervals [0,1] or [-1,1] tends to give better results.

Exercise 2Same as exercise model 2, but normalized the data.

• Predict the “medv” from the test data set• Calculate the RSS of the test set• Calculate the TSS of the test set• Calculate R2 of the test set• Calculate MSE of the test set

Compare with the result from linear model. Can you improve it?What is the lesson learned?

Neural Network for Binomial Classification

Data Exploration and ModelingUse “credit_dlqn.csv” dataExplore the data• How many variable?• What is the type of variable?• Any other variable that you think are needed to create credit model?

Do the following:Data preparation- Split the data into train and test set. Train set comprises 0.75 % of the data

Model 1:• Use logistic regression to predict the default in 2 years

Model 2:• Use neural network to predict the default in 2 years• Use one hidden layer with number of node is 2/3 of input (equal to 7)

Continue ExperimentingHow long do you finish the model?

NOTE : Neural Network is very slow to converge. Depend on the objective of the business, as Data Scientist you have to be very considerate when choosing an algorithm

• Try with another number of node in hidden layer, such as 2 node• How is the result?• How is the accuracy compared to logistic regression?

Recall on Confusion Table

• Source wikipedia

Assignment – Due to Next Week• Increase the precision of the neural network model. Use neural network

with different parameter• In word document, tell what is the improvement that you can obtaind,

what is your method, why it is work, why it doesn’t work

• Submit your code and word document to trainer.datascience@gmail.com before 23 May 2016 23:59:59

References

• Machine Learning. Courses in Coursera by Andrew Ng, 2013.

• Hastie T., Tibshirani R., Witten D. and James G. The Introduction of Statistical Learning. Springer. 2014.