•PCA NETWORK
Unsupervised Learning NEtWORKS
PCA is a Representation Network useful for signal, image, video processing
In order to analyze multi-dimensional input vectors, a representation with maximum information is the principal component analysis (PCA).
PCA
• per component: extract most significant features,
• inter-component: avoid duplication or redundancy between the neurons.
PCA NEtWORKS
Rx Řx = (1/M ) Σt x(t)xt(t)
An estimate of the autocorrelation matrix by taking the time average over the sample vectors:
Rx = UΛUt
the optimal matrix W is formed by the first m singular vectors of Rx .
x(t) = W a(t)
the errors of the optimal estimate are [Jain89]:
• matrix-2-norm error = λm+1
• least-mean-square error = Σin
=m+1 λi
to enhance the correlation between the input x(t) and the extracted component a(t), it is natural to use a Hebbian-type rule:
w(t+1) = w(t) + β x(t)a(t)
a(t) = w(t)tx(t)
First PC
the Oja learning Rule is equivalent to a normalized Hebbian rule. (Show procedure!!)
Δw(t) = β [x(t)a(t) - w(t) a(t)2]
Oja Learning Rule
By the Oja learning rule, w(t) converges asymptotically (with probability 1) to
Convergence theorem:
Single Component
w = w(∞) = e1
where e1 is the principal eigenvector of Rx
Δw(t) = β [x(t)a(t) - w(t) a(t)2]Proof:Δw(t) = β [x(t)x’(t)w(t) - a(t)2 w(t)]
Δw(ť) = β [Rx - σ(ť)I] w(ť)
Δw(ť) = β [UΛUT - σ(ť)I] w(ť)
Δw(ť) = β U[Λ - σ(ť)I] UT w(ť)
ΔUTw(ť) = β [Λ - σ(ť)I] UTw(ť)
ΔΘ(ť) = β [Λ - σ(ť)I] Θ (ť)
take average over a block of data, and redenote ť as the block time index:
the relative dominance of the principle component grows, with a growth rate:
Convergence Rates
Each of the eigen-components is enhanced/dampened by
θi(ť+1) = [1+β' λi - β' σ(ť)] θi(ť)
(1+β' [λi-σ(ť)])/(1+β' [λ1 - σ(ť)])
Θ(ť) = [θ1(ť) θ2(ť) … θn(ť)]T
Simulation: Decay Rates of PCs
Multiple Principal Components
How to extract
Let W denote a nm weight matrix
ΔW(t) = β [x(t) - W(t) a(t)] a(t)t
Concern on duplication/redundancy
Assume that the first component is already obtained; then
the output value can be ``deflated'' by the following transformation:
Deflation Method
x = (I- w1 w’1) x˜
the basic idea is to allow the old hidden units to influence the new units so that the new ones do not duplicate information (in full or in part) already provided by the old units. By this approach, the deflation process is effectively implemented in an adaptive manner.
Lateral Orthogonalization Network
APEX Network(multiple PCs)
Δαij(t) = β [ ai(t) aj(t) - αij(t) ai(t)2 ]
APEX: Adaptive Principal-component Extractor
Δwi(t) = β [ x(t)ai(t) - wi(t) ai(t)2]
the Oja Rule: for i-th component (e.g. i=2)
Dynamic Orthogonalization Rule (e.g. i=2,j=1)
the Hebbian weight matrix W(t) in APEX converges asymptotically to a matrix formed by the m largest principal components.
Convergence theorem: Multiple Components
the weight matrix W(t) converges to (with probability 1),
W(∞) = W
where W is the matrix formed by m row vectors wit,
wi = wi(∞) = ei
Δα(t) = β [ a1(t) a2(t) - α(t) a2(t)2 ]
Δw2(t) = β [ x(t)a2(t) – w2(t) a2(t)2]
w’1Δw2(t) = β [w’1 x(t)a2(t) – w’1w2(t) a2(t)2]
Δw’1w2(t) = β [a1(t)a2(t) – w’1w2(t) a2(t)2]
Δ[w’1w2(t)- Δα(t)] = β[ w’1w2(t) -α(t)]a2(t)2
α(t)→w’1w2(t) a2(t) = x’ (t)w2(t) - α(t)a1(t) = x’ (t) [I- w’1w1] w2(t)
[w’1w2(t+1)- α(t+1)] = [1-βσ(t)][ w’1w2(t) -α(t)]
w’1w2(t) - α(t) → 0
Learning Rates of APEX
[w’1w2(ť+1)- α(ť+1)] = [1-β’σ(ť)][w’1w2(ť) -α(ť)]
β’ = 1/σ(ť)
• β = 1/[Σta2 (t)2]
• β = 1/[Σtγta2 (t)2]
Learning Rates
• PAPEX: Hierarchical Extraction
Other Extensions
• DCA: Discriminant Component Analysis
• ICA: Independent Component Analysis
Top Related