CS494/594: Overview of Self-Organizing Mapsweb.eecs.utk.edu/~leparker/Courses/CS594-spring06/... ·...

CS494/594:

Overview of Self-Organizing Maps (Material mostly derived from http://www.ai-junkie.com/ann/som/som1.html)

April 13, 2006

Instructor: Dr. Lynne E. Parker

http://www.ai-junkie.com/ann/som/som1.html

Introduction: Self-Organizing Maps

• Invented by Prof. Teuvo Kohonen, Academy of Finland, in the 1970s-1980s

• Provides way of representing multidimensional data in much lower dimensional spaces (e.g., 1-2 dimensions)– This is similar to data compression technique called

“vector quantization” (clustering for the purpose of data compression)

– Also: creates network that stores info to maintain topological relationships within training set

• Not intended for optimal classification or statistical pattern recognition; SOMs are an abstraction method

Example: Mapping of colors (RGB) into 2 dimensions

• Here is SOM trained to recognize 8 different colors on right

• Colors presented to network as 3D vectors (one per red, green, blue)

• Network has learned to represent them in 2D space

• Note: – Colors clustered into distinct regions– Regions of similar properties are adjacent to each other

SOMs are related to K-Means Clustering

K-Means SOM

You choose the number of clusters You choose size and shape of network of clusters;But, SOM won’t force a matching to a particular number of clusters

Input examples are processed one at a time, and the closest centroid is updated

Ditto;But also “neighbors” of centroid are also updated

High-dimensional observations projected to a two-dimensional coordinate system;

Provides similarity between clusters

Why are SOMs of interest?

• Hierarchical clustering is fairly fragile, especially with large data sets– SOMs scale well to large data sets

• K-means clustering finds local features of data, but doesn’t provide an overall organization– SOMs provide global structure

• Parametric clustering assumes you know the underlying distribution– SOMs are unsupervised

SOM: Unsupervised Clustering

• Here, input vector presented to Self-Organizing Map WITHOUT “correct” answer supplied

• That is – it is unsupervised

• Contrast with neural networks, which use use supervised learning

From hereon, even though SOMs will look like neural nets, forget (for now!) what you know about neural nets, in terms of

neurons, activation functions, feedforward connections, backpropagation, etc. SOMs are different!!

Network Architecture

• 2D lattice of “nodes”, each of which is fully connected to input layer• Here is a small SOM – 4x4 nodes connected to input layer:

• Each node:– Has a specific topological position (i.e., an x,y coordinate in the lattice)– Contains vector of weights of the same dimension as the input vectors

• Input vector V = (v1, v2, v3, …, vd) => weight vector W = (w1, w2, w3, …, wd)

NOTE: Yellow lines between nodes only represent

adjacency; these are NOT weighted connections

(Green is Input Layer)

Another SOM

• 40x40 SOM• Each node has 3 weights: one for each element of the input vector (i.e.,

corresponding to red, green, blue)• Each node is drawn as a rectangular cell• Each “cluster” is a feature classifier – so, graphical output is like a feature

map of the input space

SOM Topology

• A couple of ways of representing the SOM topology:

Learning Algorithm Overview

1. Each node’s weights are initialized.2. A vector is chosen randomly from training data and presented to

lattice.3. Every node is examined to calculate which one’s weights are most like

the input vector. The winning node is called the “Best Matching Unit (BMU)”.

4. The radius of the neighborhood of the BMU is calculated.• Starts large, but diminishes with each time step.• Any node within the radius is “inside the BMU’s neighborhood”

5. Each neighbor node’s (i.e., node from step 4) weights are adjusted to make them more like the input vector. The closer the node is to the BMU, the more its weights get altered.

6. Repeat back to step 2 for N iterations.

Learning Algorithm – More Details

1. Each node’s weights are initialized.• Usually to small random values between 0 and 1.

2. A vector is chosen randomly from training data and presented to lattice.

Learning Algorithm – More Details (con’t.)

3. Every node is examined to calculate which one’s weights are most like the input vector. The winning node is called the “Best Matching Unit (BMU)”.• Iterate through all nodes, calculating Euclidean distance between each

node’s weight vector and the current input vector.

• Distance calculation:

• Node with closest weight vector is called the BMU.

2

1( )

d

i ii

dist v w=

= −∑


4. The radius of the neighborhood of the BMU is calculated.

• Radius starts large, but diminishes with each time step:

(Yellow nodeYellow node is BMU)

(Green arrowGreen arrow is radius)

0

0 0

radius at time = ( ) 1, 2,3...

where width of lattice at time constant

= current time step

t

t t e t

t

t

λσ σ

σλ

−= =

==


Decreasing radius/neighborhood over time: (Yellow nodeYellow node is BMU)(Green arrowGreen arrow is radius)

increasing time

• In practice, the BMUBMU will also move, according to the input vector presented to the network

• Over time, neighborhood shrinks to the size of just 1 node – the BMU


5. Each neighbor node’s (i.e., node from step 4) weights are adjusted to make them more like the input vector. The closer the node is to the BMU, the more its weights get altered.

Update equation for all nodes in neighborhood (including the BMU itself):

Decay of learning rate:

Distance influence:

( 1) ( ) ( ) ( )( ( ) ( ))where time step

( ) learning rate (which decreases over time)(t)= influence of distance (from BMU) on learning

W t W t t L t V t W tt

L t

+ = +Θ −==

Θ

0( ) 1, 2,3...t

L t L e tλ−

= =(typically, L0 begins around 0.1, and ends up near 0)

2

22 ( )( ) 1, 2,3...dist

tt e tσ−

Θ = = (where ( ) is current radius value)tσ

Applications of SOMs

• Commonly used as visualization aids• Helpful for seeing relationship between vast amounts of data• Example: World Poverty Map

– Use SOM to classify statistical data describing various quality-of-life factors:• State of health• Nutrition• Educational services• Etc.

– Countries with similar quality-of-life factors end up clustered together

Example: World Poverty Map

• Countries with better quality-of-life are in upper left• Countries that are most poverty-stricken are in lower right• Here, use “hexagonal grid” (commonly called “unified distance matrix, or

“u-matrix”). Each hexagon is a node in the SOM.

(Poverty map based on 39 indicators from World Bank Statistics, 1992)

Example: World Poverty Map (con’t.)

• Can then transfer to world map plot:

• This visualization approach makes it much easier to understand the data

Another Example: Animal Classification

• Animals ordered by SOM• Animals described by attributes (e.g., size, living space)

–Size: Living space: small=0 medium=1 big=2 Land=0 Water=1 Air=2

Mouse Lion Horse Shark DoveSize small bigmedium smallbigLiving space LandLand AirWaterLand

(2/0)(0/0) (0/2)(2/1)(1/0)

(this is just a sampling of the data)

Example: Self-Organizing Maps

A grouping according to similarity has emerged:

Animal names and their attributes

birds

peaceful

hunters

is

has

likesto

Dove Hen Duck Goose Owl Hawk Eagle Fox Dog Wolf Cat Tiger Lion Horse Zebra Cow Small 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0

Medium 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 Big 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1

2 legs 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 4 legs 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 Hair 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1

Hooves 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 Mane 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0

Feathers 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 Hunt 0 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0 Run 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 Fly 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0

Swim 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0

Teuvo Kohonen, Self-Organizing Maps, Springer, 2001

Example: Visualization of Song Collections on a PDA

• SOM visualization and interaction frameworkNeumayer, Lidy, Rauber, Content-based organization of digital audio collections , Fifth Workshop Interactive Musiknetwork, 2005.

Case Study: Applying SOMs to Recognize Topographic Patterns in EEG Data

Remember this?

EEG electrodes reading brain waves: • Rotation task, left brain

• Resting task, with eye blink • Counting task

• Rotation task, right brain

SOM, EEG Case Study (con’t.)

[From IEEE Transactions on Biomedical Engineering, 42(11): 1062-1068, 1995]• Objective: Develop method to understand background EEG activity, then

use this later to find correlates of learning in disabled children• Input: extractions from short-time power spectra of EEG channels• Node in SOM: represents model for clusters of similar input patterns• “Instantaneous topographic pattern in EEG”: corresponds to location of

sample– Changes in time correspond to trajectory

• SOM learned to distinguish between these classes:– alpha– alpha attenuation– theta of drowsiness– eye movements– EMG artifacts– Electrode artifact


• Collect data on children with minor learning disabilities (while lying down)• Data feature extraction

– Apply FFT on 1.28s windows of data every 0.64 seconds– Power spectrum reduced to 7 features by integrating values with weighting functions:

– Dimensions reduced to 154

• SOM lattice design: 300 nodes in hexagonal formation


• After learning, 6 clusters result (3 shown here individually):“Continuous alpha” “Muscle activity” “Eye movements”


• Main findings:– SOM is able to recognize topographic patterns in EEG data– It can recognize eye movements and muscle activity– It can recognize “background” alpha activity

• Uses:– Aid in analysis of brain activity in neuropsychological experiments– Used in diagnostics for online monitoring and analysis

Strengths and Limitations of SOMs

• Strengths:– Neighborhood relationships amongst clusters gives you information on

“similarity” of different clusters– Very handy for visualization

• Limitations:– User must choose parameters (although this is true for any learning

algorithm)– Not guaranteed to converge (although it usually does in practice)– Resulting cluster may not correspond to a single natural cluster (mostly due to

dimensionality reduction)

CS494/594: Overview of Self-Organizing Mapsweb.eecs.utk.edu/~leparker/Courses/CS594-spring06/... ·...

Documents

Transcript of CS494/594: Overview of Self-Organizing Mapsweb.eecs.utk.edu/~leparker/Courses/CS594-spring06/... ·...