Self Organizing Feature Maps - 123seminarsonly.com · 2011-11-30 · Self Organizing Feature Maps...

Self Organizing Feature Maps

SOM is an unsupervised neural network technique that approximates an unlimited number of input data by a finite set of models arranged in a grid, where neighbor nodes correspond to more similar models.

The models are produced by a learning algorithm that automatically orders them on the one or two-dimensional grid along with their mutual similarity.

Self Organization or Kohonen network fires a group of neurons instead of a single one.

The group “some how” produces a “picture” of the cluster. Fundamentally SOM is competitive learning. But weight changes are incorporated on a neighborhood. Find the winner neuron, apply weight change for the winner and its

“neighbors”.

Self-Organization

system - a group of interacting parts functioning as a whole and distinguishable from its surroundings (environment) by recognizable boundaries.

system property - the resultant system no longer solely exhibits the collective properties of the parts themselves (“the whole is more than the sum of its parts”)

organization - the arrangement of selected parts so as to promote a specific function

external organization - system organization imposed by external factors

self organization - evolution of a system into an organized form in the absence of external constraints.

Can things self-organize ?

Yes, any system that takes a form that is not imposed from outside (by walls, machines or forces) can be said to self-organize

e.g. crystallization, fish schooling, brain, organism structures, economies, planets, galaxies, universe

Properties:

- absence of centralized control (competition)- multiple equillibria (possible attractors)

- global order (emergence from local interactions) - redundancy (insensitive to damage)

- self-maintenance (repair)- complexity (multiple parameters)- hierarchies (multiple self-organized levels)

Brain is a self-organizing system that can learn by itself by changing(adding, removing, strengthening) the interconnections between neurons.

Feature maps

What is the result of brain’s self-organization?

formation of feature maps in the brain that have a linear or planar topology (that is, they extend in one or two dimensions)

Examples:tonotopic map - sound frequencies are spatially mapped into regions of the cortex in an orderly progression from low to high frequencies. retinotopic map - visual field is mapped in the visual cortex (occipital lobe) with higher resolution for the centre of the visual field-somatosensory map - mapping of touch

Why are feature maps important?

Sensory experience is multidimensionalE.g. sound is characterised by pitch, intensity, timbre, noise etc.,

The brain maps the external multidimensional representation of the world (including its spatial relations) into a similar 1 or 2 - dimensional internal representation.

That is, the brain processes the external signals in a topology-preserving way

So, if we are to have a hope of mimicking the way the brain learns, our system should be able to do the same thing.

Data: vectors XT = (X1, ... Xd) from d-dimensional space.

Grid (lattice) of nodes, with local processor (called neuron) in each node.

Local processor # j has d adaptive parameters W(j).

Goal: change W(j) parameters to recover data clusters in X space.

Self Organizing Feature Maps

x y z

oo

o

o

o''o

o

o

o

o=data

2-D grid with

W assigned to

input neurons

feature space

processors

processors

' = network W(i)

parameters

' '

' ''''

'

''''

''

'''

' ''

'

'

'

2 D Network , 3D Data

o

o

oox

x

xx=dane

siatka neuronów

N-wymiarowa

xo=pozycje wag neuronów

o

o o

o

o

o

o

o

przestrzeń danych

wagi wskazująna punkty w N-D

w 2-D

Training process

Network Architecture

Input Layer accepts multidimensional input pattern from the environment

an input pattern is represented by a vector.

e.g. a sound may consist of pitch, timbre, background noise, intensity, etc.

each neuron in the input layer represents one dimension of the input pattern

an input neuron distributes its assigned element of the input vector to the competitive layer.

Competitive layer

each neuron in the competitive layer receives a sum of weighted inputs from the input layer every neuron in the competitive layer is associated with a collection of other neurones which make up its 'neighbourhood’ competitive Layer can be organized in 1 dimension, 2 dimensions, or ... n dimensions typical implementations are 1 or 2 dimensions. upon receipt of a given input, some of the neurodes will be sufficiently excited to fire. this event can have either an inhibitory, or an excitatory effect on its neighbourhood the model has been copied from biological systems, and is known as 'on-center, off-surround' architecture, also known as lateral feedback / inhibition.

Lateral feedback / inhibition

Output layer

organization of the output layer is application-dependent

strictly speaking, not necessary for proper functioning of a Kohonen network

the "output" of the network is the way we choose to view the interconnections between nodes in the competitive layer

if nodes are arranged along a single dimension, output can be seen as a continuum:

Self-Organizing Feature Maps (SOFMs)

The Kohonen model provides a topological mapping.

It places a fixed number of input patterns from the input layer into a

higher dimensional output or Kohonen layer.

Training in the Kohonen network begins with the winner’s neighborhood

of a fairly large size. Then, as training proceeds, the neighborhood size

gradually decreases.

Kohonen SOMs result from the synergy of three basic processes

•Competition,

•Cooperation,

•Adaptation

COMPETITION OF KSOFM

Each neuron in an SOM is assigned a weight vector with the same dimensionality N as the input space.

Any given input pattern is compared to the weight vector of each neuron and the closest neuron is declared the winner.

The Euclidean norm is commonly used to measure distance.

CO-OPERATION OF KSOFM

The activation of the winning neuron is spread to neurons in its immediate neighborhood.

This allows topologically close neurons to become sensitive to similar patterns.

The winner’s neighborhood is determined on the lattice topology. Distance in the lattice is a function of the number of

lateral connections to the winner.

The size of the neighborhood is initially large, but shrinks over time.

An initially large neighborhood promotes a topology-preserving mapping.

Smaller neighborhoods allow neurons to specialize in the latter stages of training.

How to define a topological neigborhood that is neurobiologically correct?

Let hij denote the topological neighborhood centered on winning neuron i, and encompassing a set of excited neurons denoted by j.

The topological neighborhood is symmetric about the maximum point

The amplitude of the topological neighborhood decreases monotonically with the increasing lateral distance

.

Neighborhood Function of neurons in the grid

−= 2

2

2exp)( σ

ijiji

ddh

CO-OPERATION OF KSOFM

Linear array of cluster unit

Neighborhoods for rectangular Grid Neighborhoods for Hexagonal Grid

ADAPTATION OF KSOFM

During training, the winner neuron and its topological neighbors are adapted to make their weight vectors more similar to the input pattern that caused the activation.

Neurons that are closer to the winner will adapt more heavily than neurons that are further away.

The magnitude of the adaptation is controlled with a learning rate, which decays over time to ensure convergence of the SOM.

Network in Operation

Competition in a SOFM (The Emergence of Order from Chaos)

each neuron in the competitive layer receives a (complex) mixture of excitatory and inhibitory signals from the neurones in its neighbourhood, and from the input layer. lateral inhibition is used to stabilize the network and prevent "meltdown" due to over - excitation in the competitive layer, or starvation due to poor selection of the threshold value. for a given input, the neuron which responds most strongly will be permitted to adjust the weights of the neurones which make up its neighbourhood, including itself. this is a "winner takes all" strategy to the learning process.

neurones in this layer are competing to 'learn' the input vectors.

Network Initialization and Training Techniques

Initialization originally was thought acceptable to randomize input weights and neighbourhood associationsthis may lead to the rise of dead vectors: neurones which will never fire…if we know something about the distribution of inputs, it may be useful to initialize the neurones to mirror this distribution

ideally, we want each neuron to be able to 'win', or be in the neighbourhood of a winning neuron, for at least one input from the training set

SOM: learning

Upon repeated presentations of the training examples, the weight vectors of the neurons are adapted and tend to follow the distribution of the examples.

This results in a topological ordering of the neurons, where neurons adjacent to each other tend to have similar weight vectors.

The input space of patterns is mapped onto a discrete output space of neurons.

Neighborhood Function of neurons in the grid Gaussian neighborhood function:

-dji: lateral distance of neurons i and j in a 1-dimensional lattice| j - i |

in a 2-dimensional lattice|| rj - ri || where rj is the position of neuron j in the lattice.

σ measures the degree to which excited neurons in the vicinity of the winning neuron cooperate in the learning process.

In the learning algorithm σ is updated at each iteration during the ordering phase using the following exponential decay update rule, with parameters

−= 2

2

2exp)( σ

ijiji

ddh

10 and Tσ

−=1

0 exp)( Tnn σσ

UPDATE RULE

( ))(-x )( )()()1()(

nwnhnnwnwjxijjj

α+=+

−=2

exp)( 0 Tnn αα

exponential decay update of the learning rate:

Two-phases learning approach Self-organizing or ordering phase. The learning rate and spread of the Gaussian neighborhood function are adapted during the

execution of SOM, using for instance the exponential decay update rule.

Convergence phase. The learning rate and Gaussian spread have small fixed values during the execution of SOM.

Self organizing or ordering phase:Self organizing or ordering phase: Topological ordering of weight vectors.Topological ordering of weight vectors. May take 1000 or more iterations of SOM algorithm.May take 1000 or more iterations of SOM algorithm.

Important choice of the parameter values. For instanceImportant choice of the parameter values. For instance α(n): α α(n): α 00 = 0.1 = 0.1 TT2 2 = 1000= 1000 ⇒⇒ decrease gradually decrease gradually αα(n) (n) ≥≥ 0.01 0.01 hhji(x)ji(x) (n):(n): σσ 00 big enough big enough TT1 1 = = 1000/ log σ 0

With this parameter setting initially the neighborhood of the winning neuron includes almost all neurons in the network, then it shrinks slowly with time.

Training: Convergence Phase

Convergence phase: Fine tune the weight vectors. Must be at least 500 times the number of neurons in the network ⇒

thousands or tens of thousands of iterations. Choice of parameter values:Choice of parameter values:

η (n) maintained on the order of 0.01. Neighborhood function such that the neighbor of the winning neuron

contains only the nearest neighbors. It eventually reduces to one or zero neighboring neurons.

Selection of Network Size

Q: "How many neurones do I need in the competitive Layer?”

A: "How much error can you live with?”

generally speaking, more neurones equals less error.

higher numbers of neurones will support finer-grained classification of the inputs.

the size of the competitive layer is governed by the number of parameters represented in the input vector

caveat: the more neurones present, the longer training will take...

The Training Set

most important element of the network planning process

we want the training set to mirror the probability distribution of inputs in the environment

within the training set, we should randomize the presentation of input classes

we may need to adjust the elasticity of the learning rate to accommodate large sets

Note: No guarantee of convergence for neural networks with more than one dimension in the competitive layer

Desired Properties of Maps

• Approximation of input space: generally a many-one mapping of input space into weight space

• Topology-preserving: points close together in input space should map to points close together in weight space.

• Density-preserving: regions of similar: density should map to regions of proportional density.

Feature Selection. Given data from an input space with an nonlinear distribution, the self-organizing map is able to select a set of best features for approximating the underlying distribution.

7) Pros & Cons of SOFMs

Pros:

- excellent for classification problems

- can greatly reduce computational complexity

- high sensitivity to frequent inputs

- new ways of associating related data

- no need of supervised learning rules

Cons:- system is a black box

- error rate may be unacceptable

- no guarantee of network convergence for higher dimension networks- many problems can't be effectively represented by a SOFM

- a large training set may be required

- for large classification problems, training can be lengthy

- can be difficult to come up with the input vector.

- associations developed by SOFM not always easily understood by people.

8) Applications•Discovering similarities in data (clustering)• Data-mining• KDD (Knowledge Discovery in Databases)

Natural language processing: linguistic analysis, parsing, learning languages, hyphenation patterns.

Optimization: configuration of telephone connections, VLSI design, time series prediction scheduling algorithms.

Signal processing: adaptive filters, real-time signal analysis, radar, sonar seismic, USG, EKG, EEG and other medical signals ...

Image recognition and processing: segmentation, object recognition, texture recognition ...

Content-based retrieval: examples of WebSOM, Cartia, Visier , PicSom – similarity based image retrieval.

Brain research: modeling of formation of various topographical maps in motor, auditory, visual and somatotopic areas.

AI and robotics: analysis of data from sensors, control of robot’s movement (motor maps), spatial orientation maps.

Information retrieval and text categorization. Bioinformatics: clusterization of genes, protein properties, chemical compounds Business: economical data, business and financial data .... Data compression (images and audio), information filtering. Medical and technical diagnostics.

Self Organizing Feature Maps - 123seminarsonly.com · 2011-11-30 · Self Organizing Feature Maps...

Documents

Transcript of Self Organizing Feature Maps - 123seminarsonly.com · 2011-11-30 · Self Organizing Feature Maps...