Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and...

Neural Networks NN 1 1

Neural netwoksthanks to: www.cs.vu.nl/~elena/slides

Basics of neural network theory and practice for supervised and unsupervised learning.

Most popular Neural Network models:• architectures• learning algorithms• applications

Neural Networks

• A NN is a machine learning approach inspired by the way in which the brain performs a particular learning task:– Knowledge about the learning task is given in the form of

examples.

– Inter neuron connection strengths (weights) are used to store the acquired information (the training examples).

– During the learning process the weights are modified in order to model the particular learning task correctly on the training examples.

• Supervised Learning– Recognizing hand-written digits, pattern recognition,

regression.– Labeled examples

(input , desired output)– Neural Network models: perceptron, feed-forward, radial basis

function, support vector machine.

• Unsupervised Learning– Find similar groups of documents in the web, content

addressable memory, clustering.– Unlabeled examples

(different realizations of the input alone)– Neural Network models: self organizing maps, Hopfield

networks.

Learning

Neurons

Network architectures

• Three different classes of network architectures

– single-layer feed-forward neurons are organized

– multi-layer feed-forward in acyclic layers

– recurrent

• The architecture of a neural network is linked with the learning algorithm used to train

Single Layer Feed-forward

Input layerof

source nodes

Output layerof

neurons

Multi layer feed-forward

Inputlayer

Outputlayer

Hidden Layer

3-4-2 Network

Recurrent Network with hidden neuron(s): unit delay operator z-1

implies dynamic system

Recurrent network

inputhiddenoutput

The Neuron• The neuron is the basic information processing unit of

a NN. It consists of:1 A set of synapses or connecting links, each link

characterized by a weight: W1, W2, …, Wm

2 An adder function (linear combiner) which computes the weighted sum of the inputs:

3 Activation function (squashing function) for limiting the amplitude of the output of the neuron.

1jj xwu

) (u y b+=ϕ

The Neuron

Inputsignal

Synapticweights

Summingfunction

ActivationfunctionLocal

vOutput

M M∑ )(−ϕ

Bias of a Neuron

• Bias b has the effect of applying an affine transformation to u

v = u + b• v is the induced field of the neuron

1jj xwu

Bias as extra input

Inputsignal

Synapticweights

Summingfunction

ActivationfunctionLocal

vOutput

M M∑ )(−ϕ

w0x0 = +1

• Bias is an external parameter of the neuron. Can be modeled by adding an extra input.

Dimensions of a Neural Network

• Various types of neurons

• Various network architectures

• Various learning algorithms

• Various applications

Face Recognition

90% accurate learning head pose, and recognizing 1-of-20 faces

Handwritten digit recognition

Learning in NN

• Hebb 49: learning by modifying connetions

• Widrow & Hoff 60: comparing with target

Architecture

• We consider the architecture: feed-forward NN with one layer

• It is sufficient to study single layer perceptrons with just one neuron:

Single layer perceptrons

• Generalization to single layer perceptrons with more neurons is easy because:

• The output units are independent among each other • Each weight only affects one of the outputs

Perceptron: Neuron Model• Uses a non-linear (McCulloch-Pitts) model

of neuron:x1

b (bias)

v y(v)

is the sign function:

(v) = +1 IF v >= 0

-1 IF v < 0Is the function sign(v)

Perceptron: Applications

• The perceptron is used for classification: classify correctly a set of examples into one of the two classes C1, C2:

If the output of the perceptron is +1 then the input is assigned to class C1

If the output is -1 then the input is assigned to C2

Perceptron: Classification

• The equation below describes a hyperplane in the input space. This hyperplane is used to separate the two classes C1 and C2

0 bxwm

1iii =+∑

decisionboundary

w1x1 + w2x2 + b = 0

decisionregion for C1

w1x1 + w2x2 + b >= 0

Perceptron: Limitations

• The perceptron can only model linearly separable functions.

• The perceptron can be used to model the following Boolean functions:

• AND• OR• COMPLEMENT

• But it cannot model the XOR. Why?

Perceptron: Learning Algorithm

• Variables and parameters(n) = input vector

= [+1, x1(n), x2(n), …, xm(n)]T

w(n) = weight vector

= [b(n), w1(n), w2(n), …, wm(n)]T

b(n) = biasy(n) = actual responsed(n) = desired response

= learning rate parameter

The fixed-increment learning algorithm

• Initialization: set w(1) = 0• Activation: activate perceptron by applying input

example (vector (n) and desired response d(n) )• Compute actual response of perceptron:

y(n) = sgn[wT(n)(n)]

• Adapt weight vector: if d(n) and y(n) different then

w(n + 1) = w(n) + d(n)(n)

Where d(n) =+1 if (n) C1

-1 if (n) C2

• Continuation: increment time step n by 1 and go to Activation step

Example

Consider 2D training set C1 C2, where: C1 = {(1,1), (1, -1), (0, -1)} elements of class 1 C2 = {(-1,-1), (-1,1), (0,1)} elements of class -1

Use the perceptron learning algorithm to classify these examples.

• w(1) = [1, 0, 0]T = 1

Consider the augmented training set C’1 C’2, with firstentry fixed to 1 (to deal with the bias as extra weight):(1, 1, 1), (1, 1, -1), (1, 0, -1)

(1,-1, -1), (1,-1, 1), (1,0, 1)

Replace with - for all C2’ and use the following simpler update rule:

w(n) + x(n) if w(n) x(n) 0

w(n+1) = w(n) otherwise

• Training set after application of trick:

(1, 1, 1), (1, 1, -1), (1,0, -1), (-1,1, 1), (-1,1, -1), (-1,0, -1)• Application of perceptron learning algorithm:

Example

Adjustedpattern

Weightapplied

w(n) x(n) Update? Newweight

(1, 1, 1) (1, 0, 0) 1 No (1, 0, 0)(1, 1, -1) (1, 0, 0) 1 No (1, 0, 0)(1,0, -1) (1, 0, 0) 1 No (1, 0, 0)(-1,1, 1) (1, 0, 0) -1 Yes (0, 1, 1)(-1,1, -1) (0, 1, 1) 0 Yes (-1, 2, 0)(-1,0, -1) (-1, 2, 0) 1 No (-1, 2, 0)

End epoch 1

Example

Adjustedpattern

Weightapplied

w(n) x(n) Update? Newweight

(1, 1, 1) (-1, 2, 0) 1 No (-1, 2, 0)(1, 1, -1) (-1, 2, 0) 1 No (-1, 2, 0)(1,0, -1) (-1, 2, 0) -1 Yes (0, 2, -1)(-1, 1, 1) (0, 2, -1) 1 No (0, 2, -1)(-1, 1, -1) (0, 2, -1) 3 No (0, 2, -1)(-1,0, -1) (0,2, -1) 1 No (0, 2, -1)

End epoch 2

At epoch 3 no updates are performed. (check!) stop execution of algorithm.Final weight vector: (0, 2, -1). decision hyperplane is 2x1 - x2 = 0.

Example

-1 1/2

Decision boundary:2x1 - x2 = 0

Convergence of the learning algorithm

Suppose datasets C1, C2 are linearly separable. The perceptron convergence algorithm converges after n0 iterations, with n0 nmax on training set C1 C2.

XOR is not l.s.

Adaline: Adaptive Linear Element

• Adaline: uses a linear neuron model and the Least-Mean-Square (LMS) learning algorithmThe idea: try to minimize the square error, which is a function of the weights

• We can find the minimum of the error function E by means of the Steepest descent method

)n(e)w(n)( 2

e(n) = d(n) − x j(n)w j(n)j= 0

Steepest Descent Method

)) E(nofgradient ()w(n)1n(w −=+

• start with an arbitrary point• find a direction in which E is decreasing most rapidly

• make a small step in that direction

,, ))w( of(gradient

Least-Mean-Square algorithm (Widrow-Hoff algorithm)

• Approximation of gradient(E)

• Update rule for the weights becomes:

](n)xe(n)[

e(n)e(n)

)w(n)(

∂∂

(n)e(n)x w(n)1)w(n +=+

)n(e)w(n)( 2

e(n) = d(n) − x j(n)w j(n)j= 0

Summary of LMS algorithm Training sample: input signal vector (n)

desired response d(n)

User selected parameter >0

Initializationset ŵ(1) = 0

Computation for n = 1, 2, … compute

e(n) = d(n) - ŵT(n)(n)

ŵ(n+1) = ŵ(n) + (n)e(n)

Comparison LMS and Perceptron

• Perceptron and Adaline represent different implementations of a single-layer perceptron based on error-correction learning.

LMS: Linear.

• Model of a neuron Perceptron: Non linear.

Hard-Limiter activation function. McCulloch-Pitts model.

LMS: Continuous.

• Learning Process Perceptron: A finite number of iterations.

Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and...

Documents

Transcript of Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and...

Neural & Evolutionary Computing - Lecture 3 1 Neural Networks with Unsupervised Learning Particularities of unsupervised learning Data clustering with.

Neural Vector Spaces for Unsupervised Information Retrieval

PAPER Unsupervised Learning in Neural Computation

Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.

Supplementary Materials: Augmenting Supervised Neural ...honglak/icml2016-CNNdec-supp.pdfSupplementary Materials: Augmenting Supervised Neural Networks with Unsupervised Objectives

Netwoks icml09

Computer netwoks meaning, types, benefits

Unsupervised Quality Estimation for Neural Machine Translation · Unsupervised Quality Estimation for Neural Machine Translation Marina Fomicheva,1 Shuo Sun,2 Lisa Yankovskaya,3 Frédéric

An Adaptive Neural Network for Unsupervised Mosaic … · 2020. 6. 29. · An Adaptive Neural Network for Unsupervised Mosaic Consistency Analysis in Image Forensics Quentin Bammey

Neural Predictive Coding using Convolutional Neural ...Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics Arindam

Unsupervised Learning From Video With Deep Neural ......Unsupervised Learning from Video with Deep Neural Embeddings Chengxu Zhuang1 Tianwei She1 Alex Andonian2 Max Sobol Mark1 Daniel

Supervised and Unsupervised Artificial Neural Networks for Analysis ...

Chapter 10 Rough Set Approaches to Unsupervised Neural ...€¦ · Chapter 10 Rough Set Approaches to Unsupervised Neural Network Based Pattern Classiﬁer Ashwin Kothari and Avinash

“ANN” - ARTIFICIAL NEURAL NETWOKS AND FUZZY LOGIC …library.iyte.edu.tr/tezler/master/makinamuh/T000372.pdfve bulanık mantık modelleri oluturulmutur. Veriler Amerika Birleik

Augmenting Supervised Neural Networks with Unsupervised ...proceedings.mlr.press/v48/zhangc16.pdf · Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale

Unsupervised learning Networks Associative Memory Networks ELE571 Digital Neural Networks.

HDGI: An Unsupervised Graph Neural Network for ...

Unsupervised Neural Network Approach to Frame Analysis of ...

- Unsupervised Discrete Sentence Representation Learning ...€¦ · Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation Group:Gaussian’sConfusion

Unsupervised Clustering of Neural Pathways - Inria · Unsupervised Clustering of Neural Pathways Tesis presentada para optar al título de Licenciado en Ciencias de la Computación