K-means method for Signal Compression: Vector Quantization.

38
K-means method for Signal Compression: Vector Quantization

Transcript of K-means method for Signal Compression: Vector Quantization.

Page 1: K-means method for Signal Compression: Vector Quantization.

K-means method for Signal Compression: Vector Quantization

Page 2: K-means method for Signal Compression: Vector Quantization.

Blocks of signals: A sequence of audio. A block of image pixels.Formally: vector example: (0.2, 0.3, 0.5, 0.1)

A vector quantizer maps k-dimensional vectors in the vector space Rk into a finite set of vectors

Y = {yi: i = 1, 2, ..., N}. 

Each vector yi is called a code vector or a codeword. and the set of all the codewords is called a codebook.  Associated with each codeword, yi, is a nearest neighbor region called Voronoi region, and it is defined by:

The set of Voronoi regions partition the entire space Rk .

Voronoi Region

Page 3: K-means method for Signal Compression: Vector Quantization.

Codewords in 2-dimensional space.  Input vectors are marked with an x, codewords are marked with red circles, and the Voronoi regions are separated with boundary lines.

Two Dimensional Voronoi Diagram

Page 4: K-means method for Signal Compression: Vector Quantization.

The Schematic of a Vector Quantizer (signal compression)

Page 5: K-means method for Signal Compression: Vector Quantization.

Compression Formula Amount of compression:

Codebook size is K, input vector of dimension L In order to inform the decoder of which code

vector is selected, we need to use bits. • E.g. need 8 bits to represent 256 code vectors.

Rate: each code vector contains the reconstruction value of L source output samples, the number of bits per vector component would be: .

K is called “level of vector quantizer”.

K2log

LK /log2

Page 6: K-means method for Signal Compression: Vector Quantization.

Vector Quantizer Algorithm

1. Determine the number of codewords, N,  or the size of the codebook.

2. Select N codewords at random, and let that be the initial codebook.  The initial codewords can be randomly chosen from the set of input vectors.

3. Using the Euclidean distance measure clusterize the vectors around each codeword.  This is done by taking each input vector and finding the Euclidean distance between it and each codeword.  The input vector belongs to the cluster of the codeword that yields the minimum distance.

Page 7: K-means method for Signal Compression: Vector Quantization.

Vector Quantizer Algorithm (contd.)

4. Compute the new set of codewords.  This is done by obtaining the average of each cluster.  Add the component of each vector and divide by the number of vectors in the cluster.

where i is the component of each vector (x, y, z, ... directions), m is the number of vectors in the cluster.

5. Repeat steps 2 and 3 until the either the codewords don't change or the change in the codewords is small.

Page 8: K-means method for Signal Compression: Vector Quantization.

Other Algorithms Problem: k-means is a greedy algorithm, may fall

into Local minimum. Four methods selecting initial vectors:

Random Splitting (with perturbation vector) Animation Train with different subset PNN (pairwise nearest neighbor)

Empty cell problem: No input corresponds to am output vector Solution: give to other clusters, e.g. most populate

cluster.

Page 9: K-means method for Signal Compression: Vector Quantization.

VQ for image compression

Taking blocks of images as vector L=NM. If K vectors in code book:

need to use bits. Rate:

The higher the value K, the better quality, but lower compression ratio.

Overhead to transmit code book:

Train with a set of images.

K2log

LK /log2

codebook size K Overhead bits/pixel16 0.0312564 0.125

256 0.51024 2

Page 10: K-means method for Signal Compression: Vector Quantization.

K-Nearest Neighbor Learning

22c:145University of Iowa

Page 11: K-means method for Signal Compression: Vector Quantization.

Different Learning Methods

Parametric Learning The target function is described by a set of

parameters (examples are forgotten) E.g., structure and weights of a neural network

Instance-based Learning Learning=storing all training instances Classification=assigning target function to a new

instance Referred to as “Lazy” learning

Page 12: K-means method for Signal Compression: Vector Quantization.

Instance-based Learning

Its very similar to aDesktop!!

Page 13: K-means method for Signal Compression: Vector Quantization.

General Idea of Instance-based Learning

Learning: store all the data instances Performance:

when a new query instance is encountered• retrieve a similar set of related instances

from memory• use to classify the new query

Page 14: K-means method for Signal Compression: Vector Quantization.

Pros and Cons of Instance Based Learning

Pros Can construct a different approximation to the

target function for each distinct query instance to be classified

Can use more complex, symbolic representations

Cons Cost of classification can be high Uses all attributes (do not learn which are

most important)

Page 15: K-means method for Signal Compression: Vector Quantization.

Instance-based Learning K-Nearest Neighbor Algorithm Weighted Regression Case-based reasoning

Page 16: K-means method for Signal Compression: Vector Quantization.

Most basic type of instance learning Assumes all instances are points in n-

dimensional space A distance measure is needed to

determine the “closeness” of instances Classify an instance by finding its

nearest neighbors and picking the most popular class among the neighbors

k-nearest neighbor (knn) learning

Page 17: K-means method for Signal Compression: Vector Quantization.

1-Nearest Neighbor

Page 18: K-means method for Signal Compression: Vector Quantization.

3-Nearest Neighbor

Page 19: K-means method for Signal Compression: Vector Quantization.

Important Decisions Distance measure Value of k (usually odd) Voting mechanism Memory indexing

Page 20: K-means method for Signal Compression: Vector Quantization.

Euclidean Distance

Typically used for real valued attributes Instance x (often called a feature

vector)

Distance between two instances xi and xj

)(),(),( 21 xaxaxa n

2

1

))()((),( j

n

rrirji xaxaxxd

Page 21: K-means method for Signal Compression: Vector Quantization.

Discrete Valued Target Function

Training algorithm: For each training example <x, f(x)>, add the example to the list training_examples

Classification algorithm: Given a query instance xq to be classified. Let x1…xk be the k training examples nearest to xq

Return

otherwise 0),(

if 1),( where

))(,(maxarg)(ˆ1

ba

baba

xfvxfk

ii

Vvq

Page 22: K-means method for Signal Compression: Vector Quantization.

Continuous valued target function

Algorithm computes the mean value of the k nearest training examples rather than the most common value

Replace fine line in previous algorithm with

k

xfxf

k

ii

q

1

)()(ˆ

Page 23: K-means method for Signal Compression: Vector Quantization.

Training dataset

Customer ID Debt Income Marital Status Risk

Abel High High Married Good

Ben Low High Married Doubtful

Candy Medium Very low Unmarried Poor

Dale Very high Low Married Poor

Ellen High Low Married Poor

Fred High Very low Married Poor

George Low High Unmarried Doubtful

Harry Low Medium Married Doubtful

Igor Very Low Very High Married Good

Jack Very High Medium Married Poor

Page 24: K-means method for Signal Compression: Vector Quantization.

k-nn K = 3 Distance

Score for an attribute is 1 for a match and 0 otherwise

Distance is sum of scores for each attribute

Voting scheme: proportionate voting in case of ties

Page 25: K-means method for Signal Compression: Vector Quantization.

Customer ID Debt Income Marital Status Risk

Abel High High Married Good

Ben Low High Married Doubtful

Candy Medium Very low Unmarried Poor

Dale Very high Low Married Poor

Ellen High Low Married Poor

Fred High Very low Married Poor

George Low High Unmarried Doubtful

Harry Low Medium Married Doubtful

Igor Very Low Very High Married Good

Jack Very High Medium Married Poor

Zeb High Medium Married ?

Query:

Page 26: K-means method for Signal Compression: Vector Quantization.

Customer ID Debt Income Marital Status Risk

Abel High High Married Good

Ben Low High Married Doubtful

Candy Medium Very low Unmarried Poor

Dale Very high Low Married Poor

Ellen High Low Married Poor

Fred High Very low Married Poor

George Low High Unmarried Doubtful

Harry Low Medium Married Doubtful

Igor Very Low Very High Married Good

Jack Very High Medium Married Poor

Yong Low High Married ?

Query:

Page 27: K-means method for Signal Compression: Vector Quantization.

Customer ID Debt Income Marital Status Risk

Abel High High Married Good

Ben Low High Married Doubtful

Candy Medium Very low Unmarried Poor

Dale Very high Low Married Poor

Ellen High Low Married Poor

Fred High Very low Married Poor

George Low High Unmarried Doubtful

Harry Low Medium Married Doubtful

Igor Very Low Very High Married Good

Jack Very High Medium Married Poor

Vasco High Low Married ?

Query:

Page 28: K-means method for Signal Compression: Vector Quantization.

Voronoi Diagram Decision surface formed by the training

examples of two attributes

Page 29: K-means method for Signal Compression: Vector Quantization.

Examples of one attribute

Page 30: K-means method for Signal Compression: Vector Quantization.

Distance-Weighted Nearest Neighbor Algorithm

Assign weights to the neighbors based on their ‘distance’ from the query point Weight ‘may’ be inverse square of the

distancesAll training points may influence a

particular instance Shepard’s method

Page 31: K-means method for Signal Compression: Vector Quantization.

Kernel function for Distance-Weighted Nearest Neighbor

Page 32: K-means method for Signal Compression: Vector Quantization.

Examples of one attribute

Page 33: K-means method for Signal Compression: Vector Quantization.

Remarks

+Highly effective inductive inference method for noisy training data and complex target functions

+Target function for a whole space may be described as a combination of less complex local approximations

+Learning is very simple- Classification is time consuming

Page 34: K-means method for Signal Compression: Vector Quantization.

Curse of Dimensionality

- When the dimensionality increases, the volume of the space increases so fast that the available data becomes sparse. This sparsity is problematic for any method that requires statistical significance. 

Page 35: K-means method for Signal Compression: Vector Quantization.

Curse of Dimensionality

Suppose there are N data points of dimension n in the space [-1/2, 1/2]n.

The k-neighborhood of a point is defined to be the smallest hypercube containing the k-nearest neighbor.

Let l be the average side length of a k-neighborhood. Then the volume of an average hypercube is dn.

So dn/1n = k/N, or d = (k/N)1/n

Page 36: K-means method for Signal Compression: Vector Quantization.

d = (k/N)1/n

N k n d

1,000,000 10 2 0.003

1,000,000 10 3 0.02

1,000,000 10 17 0.5

1,000,000 10 200 0.94

When n is big, all the points are outliers.

Page 37: K-means method for Signal Compression: Vector Quantization.

- Curse of Dimensionality

Page 38: K-means method for Signal Compression: Vector Quantization.

- Curse of Dimensionality