Topic Models for Signal Processing - Carnegie Mellon University

120
Topic Models for Signal Processing Bhiksha Raj, Carnegie Mellon University Paris Smaragdis, University of Illinois, Urbana Champaign UrbanaChampaign http://topicmodelsforaudio.com [org] ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Transcript of Topic Models for Signal Processing - Carnegie Mellon University

Page 1: Topic Models for Signal Processing - Carnegie Mellon University

Topic Models for Signal Processing

Bhiksha Raj, Carnegie Mellon UniversityParis Smaragdis, University of Illinois, 

Urbana ChampaignUrbana‐Champaignhttp://topicmodelsforaudio.com [org]

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, 

Raj

Page 2: Topic Models for Signal Processing - Carnegie Mellon University

The Engineer and the MusicianOnce upon a time a rich potentate discovered a previously unknown p yrecording of a beautiful piece of music. Unfortunately it was badly damaged.  

He greatly wanted to find out what it would sound like ifHe greatly wanted to find out what it would sound like if it were not.

So he hired an engineer and a musician to solve the problem..

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 3: Topic Models for Signal Processing - Carnegie Mellon University

The Engineer and the Musician

The engineer worked for many years. He spent much money and publishedHe spent much money and published many papers.

Finally he had a somewhat scratchy restoration of the music..

The musician listened to the music carefully for a day, transcribed it,  broke out his trusty keyboard andbroke out his trusty keyboard and replicated the music.

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 4: Topic Models for Signal Processing - Carnegie Mellon University

The Prize

Who do you think won the princess?

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 5: Topic Models for Signal Processing - Carnegie Mellon University

Sounds – an example• A sequence of notes

• Chords from the same notesChords from the same notes

• A piece of music from the same (and a few additional) notes

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 6: Topic Models for Signal Processing - Carnegie Mellon University

Sounds – another example• A sequence of sounds

• A proper speech utterance from the same soundsp p p

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 7: Topic Models for Signal Processing - Carnegie Mellon University

Template Sounds Combine to Form a Signal

• The component sounds “combine” to form complex sounds– Notes form music– Phoneme‐like structures combine in utterances

• Sound in general is composed of such “building blocks” or g p gthemes– Which can be simple – e.g. notes, or complex, e.g. phonemes

Our definition of a building block: the entire structure occurs– Our definition of a building block: the entire structure occurs repeatedly in the process of forming the signal 

• Goal: Learn these building blocks automatically from data• Goal:  Learn these building blocks automatically, from data– Learn to be the musician– But through automated analysis of data

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 8: Topic Models for Signal Processing - Carnegie Mellon University

The representation

TIMEAMPL FREQ

TIME

• Requirement: Building blocks must combine additively– Linear: The presence of two notes does not distort either note

TIME TIME

– Constructive: Notes do not cancel – adding one note will not diminish another

• Spectral representations:  The magnitude spectra of uncorrelated signals addh dd h h ld b f d– In theory, power spectra add. In practice, this holds better for magnitude spectra

• We represent signals spectrographically, as a sequence of magnitude spectral vectors estimated from (overlapping) segments of signalspectral vectors estimated from (overlapping) segments of signal

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 9: Topic Models for Signal Processing - Carnegie Mellon University

Discovering Building Blocks• The magnitude spectrogram is a matrix of numbers

– The f‐th frequency component in frame t is the (t,f )‐th entry in the matrix

• Standard matrix decomposition methods may be applied to discover additive p y ppbuilding blocks:

– PCA, ICA...

• Constraint: PCA ICA etc will result in bases with negative components• Constraint: PCA, ICA etc. will result in bases with negative components– Not physically meaningful (what is a negative power spectrum)

Example: Spectrogram of two intermittent tones

Example:

Q

Example: Spectrogram of two intermittent tones

PCA basesfrom the tonessound

FREQ + +

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 10: Topic Models for Signal Processing - Carnegie Mellon University

Discovering Building Blocks• The power spectrogram is a matrix of numbers

– The f‐th frequency component in frame t is the (t,f )‐th entry in the matrix

• Standard matrix decomposition methods may be applied to discover additive p y ppbuilding blocks:

– PCA, ICA...

• Constraint: PCA ICA etc will result in bases with negative components• Constraint: PCA, ICA etc. will result in bases with negative components– Not physically meaningful (what is a negative power spectrum)

Example: Spectrogram of two intermittent tones

Example:

Example: Spectrogram of two intermittent tones

Q

PCA basesfrom the tonessound

+ +FREQ

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 11: Topic Models for Signal Processing - Carnegie Mellon University

Magnitude Spectra are like Histograms

• Magnitude spectrograms comprise sequences of magnitude spectra• Magnitude spectra are sequences of non‐negative numbers

– Which can be viewed as scaled histograms• Histograms may be viewed as the outcome of draws from a g y

multinomial distribution

• Analysis techniques related to a multinomial processes applyy q p pp y

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 12: Topic Models for Signal Processing - Carnegie Mellon University

What follows

• A Tutorial in two parts• Part A:

– Model definitionEquations update rules and other squiggles– Equations, update rules and other squiggles

• Part B:– Applications and extensions and the other fun stuffApplications and extensions and the other fun stuff

• What is coveredli i f l i i l d i d l di– Application of multinomial and topic models to audio

• What is not covered– Text and text‐related modelsText and text related models

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 13: Topic Models for Signal Processing - Carnegie Mellon University

Part A: Overview

• Multinomial distributions and Topic models• Signals representations• Application of PLCA to signalsApplication of PLCA to signals

– Bag of frequencies modelB f d l– Bag of spectrograms model

• Latent variable models with priors– Sparsity and entropic priors– Cross‐entropic priorsCross entropic priors

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 14: Topic Models for Signal Processing - Carnegie Mellon University

Part A: Overview

• Convolutional models• High‐dimensional data

– Tensors and higher dimension data

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 15: Topic Models for Signal Processing - Carnegie Mellon University

Part B OverviewPart B Overview

• Low‐Rank Models of SignalsLow Rank Models of Signals• Separation of Monophonic Mixtures

i i i i• Recognition in Mixtures• Temporal demixing• Pitch Tracking• User InterfacesUser Interfaces• Multimodality

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 16: Topic Models for Signal Processing - Carnegie Mellon University

The Multinomial Distribution

• The outcome of the draw can take one of a finite number of discrete values– A man drawing coloured balls from an urn

• The probability of retrieving a ball of a given color depends on the fraction of balls in the urn that are of that color

– P(red) = no. red balls / Total ballsICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 17: Topic Models for Signal Processing - Carnegie Mellon University

A probability distribution over words

lion

lion lioncat

cat catdog

dog

dog

P(word)

lionlion lion

catdog

dog

lion dog cat

• The balls have words written on them– I e draws now obtain words not colorsI.e. draws now obtain words, not colors 

• P(word) = No. of balls with given word on it / total ballstotal balls

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 18: Topic Models for Signal Processing - Carnegie Mellon University

The probability of a collectiondogdoglion

dog x num(dog)cat x num(cat)li (li )=

lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog catdoglion

lion x num(lion)

..

wordnwordnwordn wordPCwordPwordCPwordofwordnwordofwordnP

)()()(2211

)()()(,...) )(, )((

21 word

wordPCwordPwordCP 21 )(...)()(

')(log)(,...))(, )((log 2211 CwordPwordnwordofwordnwordofwordnP

• Probability of drawing a specific collection of wordsC f i

)(g)(, ))(,)((g 2211 ffword

• Constant accounts for permutationsICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 19: Topic Models for Signal Processing - Carnegie Mellon University

Estimating probabilitiesg p

dog x num(dog)cat x num(cat) rd

)

lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog

cat x num(cat)lion x num(lion)

Given Find

lion dog cat

P(wor

• Given: A collection of words, find the underlying probability distributionunderlying probability distribution– A problem we encounter repeatedly

• The maximum likelihood estimate:• The maximum likelihood estimate: wordword

wordP wordPwordPwordnwordP )()(log)(maxarg)}({ )}({ wordword

')'(

)()(

wordwordn

wordnwordP

Page 20: Topic Models for Signal Processing - Carnegie Mellon University

The expected countsp

lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog

lion dog cat

P(word)

• Given:  

dog

– The probabilities of all words  P(word)– The total number of draws NWh t i• What is:– The expected number of draws of any word?

• Answer: E[n(word)] = N P(word)• Answer:  E[n(word)] = N.P(word)ICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 21: Topic Models for Signal Processing - Carnegie Mellon University

The inverse problemp

lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog

lion dog cat

P(word)

• Given:  The number of times “lion” was drawn 

dog

N(lion)• How many times was “dog” drawn?y g

)()()()( dogP

lionPlionNdogN

)(lionPICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 22: Topic Models for Signal Processing - Carnegie Mellon University

The inverse multinomial• Given P(word) for all words• Observed n(word ) n(word ) n(word )• Observed n(word1), n(word2)..n(wordk)• What is n(wordk+1), n(wordk+2)…

ki

wordnio

i

kiio

kkiwordPP

wordnN

wordnNwordnwordnP )(

21 )()()(

)(),...)(),((

• No is the total number of observed counts

ki

io wordnN )()(

– n(word1) + n(word2) + …

• Po is the total probability of observed events– P(word1) + P(word2) + …

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 23: Topic Models for Signal Processing - Carnegie Mellon University

An Modified Experimentdog dog lion cat dog lion ..

• Multiple pots with different distributions

lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog

• Multiple pots with different distributions– The picker first selects a pot with probability P(pot)

h h d b ll f h– Then he draws a ball from the pot

• We only see the outcomes!• Can we estimate the pots

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 24: Topic Models for Signal Processing - Carnegie Mellon University

A mixture multinomial

• The outcomes are a mixtureof draws from many pots

dog dog lion cat dog lion ..

of draws from many pots– i.e. a mixture of draws from several multinomials catdog

lion

lionlioncat

cat

cat

dog

dogdog

dog lion

lionlioncat

cat

cat

dog

dogdog

dog lion

lionlioncat

cat

cat

dog

dogdog

dog lion

lionlioncat

cat

cat

dog

dogdog

dog lion

lionlioncat

cat

cat

dog

dogdog

dog

several multinomials

• The process is a mixture multinomialP b bili f d• Probability of word

ZXPZPXP )|()()(

• Z is the potX th d

Z

• X are the wordsICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 25: Topic Models for Signal Processing - Carnegie Mellon University

Mixture multinomials are multinomials too

lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog

• From the outside viewer’s perspective, it is j t lti i ljust a multinomial– Outcome is one of a countable, finite set of values

• Constraint: Probabilities are composed from component multinomials

Z

ZXPZPXP )|()()(

Page 26: Topic Models for Signal Processing - Carnegie Mellon University

Multinomials and Simplexes

• Multinomials are probability vectors– If vocab size = V, live in V‐1 subspace of V‐space– Strictly in positive orthant

• Mixture multinomials reside within simplexes specified by p p ycomponent multinomials

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 27: Topic Models for Signal Processing - Carnegie Mellon University

Estimating Mixture Multinomials

X

XPXnXofXnXofXnP )(log)(,...) )(, )((log 2211

ZXPZPX )|()(l)( X Z

ZXPZPXn )|()(log)(

• Direct optimization not possible• Expectation Maximization

ZXPZPZXQXnZXPZPXn )|()(),(log)()|()(log)(

X ZX Z ZXQZXQXnZXPZPXn

),(),(log)()|()(log)(

ZXPZPZXQXn )|()(log),()(

X Z ZXQZXQXn

),(log),()(

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 28: Topic Models for Signal Processing - Carnegie Mellon University

Estimating Mixture Multinomials

I t d i Q(X Z)

X Z

ZXPZPXnXofXnXofXnP )|()(log)(,...) )(, )((log 2211

• Introducing Q(X,Z)

ZXQ

ZXPZPZXQXnZXPZPXn)(

)|()(log),()()|()(log)(X ZX Z ZXQ ),(

• Optimizing with respect to Q(X,Z))|()( ZXPZP

')'|()'(

)|()(),(

ZZXPZP

ZXPZPZXQ

• Optimizing P(Z) and P(X|Z) ),()( ZXQXn ),()()|( ZXQXnZXP

')',()(

)()()(

Z X

X

ZXQXn

QZP

'),'()'(

),()()|(

XZXQXn

QZXP

Page 29: Topic Models for Signal Processing - Carnegie Mellon University

EM as counting

dog dog lion cat dog lion ..

lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog

f h f h d k• If the exact urn for each drawn instance was known– Compute distribution of each urn using urn‐specific counts

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 30: Topic Models for Signal Processing - Carnegie Mellon University

EM as counting

dog dog lion cat dog lion ..

lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog

Wh th t i k• When the exact urn is unknown– Fragment each instance according to Q(X,Z)

• Based on a current estimate of probabilitiesE i t f X ill b f t d th• Every instance of X will be fragmented the same way

– Compute distribution of each urn by counting fragmentsICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 31: Topic Models for Signal Processing - Carnegie Mellon University

EM as fragmentation and counting

)'|()'()|()(),(ZXPZP

ZXPZPZXQ Fragmentation'

)|()(Z

)()( ZXQXn )()( ZXQXn

')',()(

),()()(

Z X

X

ZXQXn

ZXQXnZP

'),'()'(

),()()|(

XZXQXn

ZXQXnZXP

ll h “f d ”

Z X

Counting• We will use the “fragmentation and counting” explanation for the most part in the remaining slides• Instead of deriving detailed solutionsInstead of deriving detailed solutions

• Which are probably easy to derive anywayICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 32: Topic Models for Signal Processing - Carnegie Mellon University

Modification: Given P(X|Z)

)'|()'()|()(),(ZXPZP

ZXPZPZXQ Fragmentation'

)|()(Z

)()( ZXQXn

')',()(

),()()(

Z X

X

ZXQXn

ZXQXnZP Counting

• Given P(X|Z) (urn) and a collection of counts n(X)

Z X

• Given P(X|Z) (urn) and a collection of counts n(X)• What is the manner in which urns were selected?

Wh t i P(Z)?• What is P(Z)?

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 33: Topic Models for Signal Processing - Carnegie Mellon University

Topics and Bags of Words

• The bag of words model:– Put all words in a “document”into a “bag”

thei kj d

theover supper– Only considers occurrenceof words

S i f ti t d

quickbrown

fox

jumped

lazydog

thedoghad

thecatfor

pp

• Sequence information not used

• An alternate  vector representation[4 1 1 1 1 1 1 2 1 1 1 1 ]

– Each number represents the number of instances The  Quick Brown Fox Jumped Over Lazy Dog Had Cat For  Supper[4        1         1        1         1          1      1       2      1     1    1        1   ]

of a wordICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 34: Topic Models for Signal Processing - Carnegie Mellon University

Topics and Documents

secretarythequick

runningthrow

tacklethehome

lazyrun

the ballhad

thecatfor

goalrunningthecourt

motion

congressthepresident

camprun

the ballhad

thecatvote for

green

runningthelights

saucer

floatingthemartian

ringsantennae

the crophad

the

alien

the the the p

lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog sports lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dog politics lion

lion

lion lioncat

cat

cat cat

dog

dog

dog

dogalienabduction

• Bags of words for different topics have different distributions– Drawn from different pots– Different words with different frequenciesDifferent words with different frequencies

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 35: Topic Models for Signal Processing - Carnegie Mellon University

Documents as mixtures of topics

lion

lion lioncat

cat catdog

dog

dog sports lion

lion lioncat

cat catdog

dog

dog politicslion cat catdog dog

alienlion

catdoglion

catdog

lion

lion lioncat

catdog

dogabduction

• Bag of words model for newspaper: mixes g p ptopics

• Given word distributions for topics and bag ofGiven word distributions for topics and bag of words for newspaper, find composition?– Estimate mixture weightsEstimate mixture weights

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 36: Topic Models for Signal Processing - Carnegie Mellon University

Documents as mixtures of topicssports

politics

newspaper

alien abduction

• A document lies in topic simplex– Mixture weights are inversely related to distance g yto topic

– May be multiple ways of mixing topics to obtain same document

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 37: Topic Models for Signal Processing - Carnegie Mellon University

Documents as mixtures of topics(0,0,1)

topic1

topic3

(0,0,1)  = topic1

(1,0,0)

topic2

(1,0,0)  = topic3

( )

• Mixture multinomial modelmaps document from a

(0,1,0)

topic2 (0,1,0)  = topic2

• Mixture multinomial model maps document from a point within topic simplex to a point in mixture weight simplex– Different locations in mixture weight simplex may result in same document vector

D di t f t i t• Depending on arrangement of topic vectors

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 38: Topic Models for Signal Processing - Carnegie Mellon University

Documents differ in composition(1,0,0) = sportsSI

NYT

( ) l bd(0,0,1) = politics

Weekly world news

• Relative proportion of different topics varies

(0,1,0) = alien abduction(0,0,1)   politics

Relative proportion of different topics varies– I.e P(Z) has different structure for different documents

• I.e. location within mixture weight and/or topic simplex g / p pdiffers for different documents

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 39: Topic Models for Signal Processing - Carnegie Mellon University

A priori probabilities on mixturesP()

= {P(Z)}

• Location of document in mixture weight simplex may vary

• Distribution of locations of documents from a given category can be captured by a priori probability distributions

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 40: Topic Models for Signal Processing - Carnegie Mellon University

Dirichlet Distributions

i

iiP

BP 1

)(1);(

i)(

ii

B

)(

)(

ii

• The most common model for distributions of a priori probabilities are Dirichlet distributions– Mathematical tractability– Conjugate prior to multinomial j g p

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 41: Topic Models for Signal Processing - Carnegie Mellon University

Estimation with prior)()|}({logmaxargˆ PXP

ˆ )(log)()(log)(maxarg PwordPwordPwordnwordword

• Utilize the a priori distribution to better estimate mixture weights to explain new documents– Resolves issue of multiple solutionsResolves issue of multiple solutions– Improves estimate in all cases

• Maximum a posteriori estimator• Compare to maximum likelihood estimator below

wordPwordPwordn )()(log)(maxargˆ wordword

wordPwordPwordn )()(log)(maxarg

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 42: Topic Models for Signal Processing - Carnegie Mellon University

Estimating the Dirichlets (and others)Estimating the Dirichlets (and others)

• Estimate mixture weights on training data and g gestimate Dirichlet parameters

• Jointly estimate mixture weights and Dirichlets on• Jointly estimate mixture weights and Dirichlets on training data

• Jointly estimate topics, mixture weights and Dirichletparameters from training data

• Requires complex estimation procedure that are not relevant to us

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 43: Topic Models for Signal Processing - Carnegie Mellon University

More complex modelsMore complex models

• Other forms of priorsOther forms of priors– E.g. correlated topics

• Priors on priors• Priors on priors• Capturing temporal sequences

– E.g. Markov chains on topics

• Etc..• Not directly relevant

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 44: Topic Models for Signal Processing - Carnegie Mellon University

Mixture multinomials and audioMixture multinomials and audio

• Audio data representations are similar toAudio data representations are similar to histograms

• Topic model frameworks apply.

• We will generically refer to topic models and extensions applied to audio as PLCAextensions applied to audio as PLCA– Probabilistic Latent Component Analysis

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 45: Topic Models for Signal Processing - Carnegie Mellon University

Representing Audio

• Basic representation: Spectrogram– Computed using a short‐time Fourier transform– Segment audio into “frames” of 20‐64ms

Adj t f l b 75%• Adjacent frames overlap by 75%

– “Window” the segmentsCompute a Fourier spectrum from each frame– Compute a Fourier spectrum from each frame

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 46: Topic Models for Signal Processing - Carnegie Mellon University

Spectrogram

S1(1) S2(1) S3(1) S4(1) S5(1) S6(1) …

S1(2) S2(2) S3(2) S4(2) S5(2) S6(2) …

S1(3) S2(3) S3(3) S4(3) S5(3) S6(3) …

… … … … … … …

• The spectrogram is a series of complex vectors.– Can be decomposed into magnitude and phase

• We will however work primarily on the magnitude• Why:  magnitude spectra of uncorrelated signals combine 

ddi i ladditively– In theory power spectra are additive, but in practice, this holds 

better for magnitudesg

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 47: Topic Models for Signal Processing - Carnegie Mellon University

The Spectrogram as a Histogram• A generative model for one frame of a spectrogram• A magnitude spectrum represents energy against discrete 

frequencies• This may be viewed as a histogram of draws from a multinomial

FRAME  tHISTOGRAM

f Pt (f )

FRAME  t

f

Power spectrum of frame t

Probability distribution underlying the t-th spectral vectorICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 48: Topic Models for Signal Processing - Carnegie Mellon University

A Mixture Multinomial Model

• The  “picker” has multiple urns– For each draw he selects an urn and then draws a ball from it

• Overall probability of drawing f is a mixture multinomial

HISTOGRAM

multiple draws

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 49: Topic Models for Signal Processing - Carnegie Mellon University

The Picker Generates a Spectrogram

• The picker has a fixed set of Urns– Each urn has a different probability distribution over  f

H d th t f th fi t f• He draws the spectrum for the first frame– In which he selects urns according to some probability P0(z)

• Then draws the spectrum for the second framep– In which he selects urns according to some probability P1(z)

• And so on, until he has constructed the entire spectrogram

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 50: Topic Models for Signal Processing - Carnegie Mellon University

The Picker Generates a Spectrogram

• The picker has a fixed set of Urns– Each urn has a different probability distribution over  f

H d th t f th fi t f• He draws the spectrum for the first frame– In which he selects urns according to some probability P0(z)

• Then draws the spectrum for the second framep– In which he selects urns according to some probability P1(z)

• And so on, until he has constructed the entire spectrogram

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 51: Topic Models for Signal Processing - Carnegie Mellon University

The Picker Generates a Spectrogram

• The picker has a fixed set of Urns– Each urn has a different probability distribution over  f

H d th t f th fi t f• He draws the spectrum for the first frame– In which he selects urns according to some probability P0(z)

• Then draws the spectrum for the second framep– In which he selects urns according to some probability P1(z)

• And so on, until he has constructed the entire spectrogram

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 52: Topic Models for Signal Processing - Carnegie Mellon University

The Picker Generates a Spectrogram

• The picker has a fixed set of Urns– Each urn has a different probability distribution over  f

H d th t f th fi t f• He draws the spectrum for the first frame– In which he selects urns according to some probability P0(z)

• Then draws the spectrum for the second framep– In which he selects urns according to some probability P1(z)

• And so on, until he has constructed the entire spectrogram

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 53: Topic Models for Signal Processing - Carnegie Mellon University

The Picker Generates a Spectrogram

• The picker has a fixed set of Urns– Each urn has a different probability distribution over f

H d th t f th fi t f• He draws the spectrum for the first frame– In which he selects urns according to some probability P0(z)

• Then draws the spectrum for the second framep– In which he selects urns according to some probability P1(z)

• And so on, until he has constructed the entire spectrogram

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 54: Topic Models for Signal Processing - Carnegie Mellon University

The Picker Generates a Spectrogram

• The picker has a fixed set of Urns– Each urn has a different probability distribution over f

H d th t f th fi t f• He draws the spectrum for the first frame– In which he selects urns according to some probability P0(z)

• Then draws the spectrum for the second framep– In which he selects urns according to some probability P1(z)

• And so on, until he has constructed the entire spectrogram– The number of draws in each frame represents the energy in that 

frame• Actually the total magnitude spectral value

Page 55: Topic Models for Signal Processing - Carnegie Mellon University

The PLCA Model

• The URNS are the same for every frame– These are the component multinomials or bases for the source that 

generated the signal

• The only difference between frames is the probability with which he selects the urns

( ) ( ) ( | )t tzP f P z P f z SOURCE specific

basesFrame‐specificspectral distribution

Frame(time) specific mixture weight

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 56: Topic Models for Signal Processing - Carnegie Mellon University

Spectral View of ComponentMultinomials

• For audio each component multinomial (urn) is actually a normalized histogram over frequencies P(f |z)g q (f | )– I.e. a spectrum

• Component multinomials represent spectral building blocks for the given sound source

• The spectrum for every analysis frame is explained as an additive combination of these basic spectral patternscombination of these basic spectral patterns

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 57: Topic Models for Signal Processing - Carnegie Mellon University

Learning Bases

• By “learning” the mixture multinomial model for any sound source we “discover” these “basis” spectral structures for the source

• The model can be learnt from spectrograms of training audio p g gfrom the source using the EM algorithm– May not even require large amounts of audio

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 58: Topic Models for Signal Processing - Carnegie Mellon University

Likelihood of a Spectrogram)(

)|()(),)((fS

tt

t

zfPzPftfSP

)|()())((

t f ztt fff

zt

t ftt zfPzPfSftfSP )|()(log)(),)((log

• St(f) is the f‐th frequency of the  magnitude spectrogram in the t‐th frame

• Once again a direct maximum likelihood estimation is not possible

• EM estimation requiredICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 59: Topic Models for Signal Processing - Carnegie Mellon University

Learning the Bases• Simple EM solution

Except bases are learned from all frames– Except bases are learned from all frames

)'|()'()|()()|( t

t fPPzfPzPfzP

'

)'|()'()|(

zt

t zfPzPf

)()|()( f

tt fSfzPP

'

)()|'()(

z ftt

ft fSfzP

zP

)()|( fSfzP

')'()'|(

)()|()|(

f ttt

ttt

fSfzP

fSfzPzfP

The “Basis” distribution

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 60: Topic Models for Signal Processing - Carnegie Mellon University

Learning the Bases• Simple EM solution

Except bases are learned from all frames– Except bases are learned from all frames

)'|()'()|()()|( t

t fPPzfPzPfzP

'

)'|()'()|(

zt

t zfPzPf

f )()|()( f

tt fSfzPP

'

)()|'()(

z ftt

ft fSfzP

zP

)()|( fSfzP

t

The “Basis” distribution

')'()'|(

)()|()|(

f ttt

ttt

fSfzP

fSfzPzfP

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 61: Topic Models for Signal Processing - Carnegie Mellon University

Learning the Bases• Simple EM solution

Except bases are learned from all frames– Except bases are learned from all frames

)'|()'()|()()|( t

t fPPzfPzPfzP fragmentation

'

)'|()'()|(

zt

t zfPzPf

f )()|()( f

tt fSfzPP

counting

'

)()|'()(

z ftt

ft fSfzP

zP

)()|( fSfzP

t

The “Basis” distribution

')'()'|(

)()|()|(

f ttt

ttt

fSfzP

fSfzPzfP

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 62: Topic Models for Signal Processing - Carnegie Mellon University

Learning Building BlocksbSpeech Signal bases Basis‐specific spectrograms

)|()( zfPZPt

P(f|z) From Bach’s Fugue in Gm

Frequency

Time

Pt(z)

Page 63: Topic Models for Signal Processing - Carnegie Mellon University

What about other data

• FacesFaces– Trained 49 multinomial components on 2500 faces

• Each face unwrapped into a 361‐dimensional vector– Discovers parts of facesDiscovers parts of faces

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 64: Topic Models for Signal Processing - Carnegie Mellon University

Given Bases Find Composition

( ) ( ) ( | )t tzP f P z P f z

)(fSt

f

5585 1

744 1 1

741

f

• Iterative process:– Compute a posteriori probability of the zth topicCompute a posteriori probability of the z topic for each frequency f in the t‐th spectrum

)'|()'()|()()|( t

t fPPzfPzPfzP

– Compute mixture weight of zth basis

'

)'|()'()|(

zt

t zfPzPf

)()|( fSfP

'

)()|'(

)()|()(

z ft

ft

t fSfzP

fSfzPzP

Page 65: Topic Models for Signal Processing - Carnegie Mellon University

Bag of Frequencies vs. Bag of SpectrogramsSpectrograms

• The PLCA model described is a “bag of frequencies” model– Similar to “bag of words”

• Composes spectrogram one frame at a timeContribution of bases to a frame does not affect other frames– Contribution of bases to a frame does not affect other frames

• Random Variables: – Frequencyq y– Possibly also the total number of draws in a frame

Z f

Nt

Pt(Z)

Nt

Page 66: Topic Models for Signal Processing - Carnegie Mellon University

Bag of Frequencies PLCA modeltime

T=0: P (Z) T=1: P (Z) T=k: P (Z)T=0: P0(Z) T=1: P1(Z) T=k: Pk(Z)

5

58

5 1

74

4 1 1

74

1

8 4 4

P(f|z)Z=0 Z=1 Z=2 Z=M

• Bases are simple distributions over frequencies• Manner of selection of urns/components varies from analysis frame to analysis frame

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 67: Topic Models for Signal Processing - Carnegie Mellon University

Bag of Spectrograms PLCA Model

Z 1 Z 2 Z M

P(T|Z) P(F|Z) P(T|Z) P(F|Z) P(T|Z) P(F|Z)

• Compose the entire spectrogram all at once• Complex “super pots” include two sub pots

Z=1 Z=2 Z=M

• Complex  super pots  include two sub pots– One pot has a distribution over frequencies: these are our bases– The second has a distribution over time

• Each draw:– Select a superpot– Draw “F” from frequency pot

Z

zfPztPzPftP )|()|()(),(

q y p– Draw “T” from time pot– Increment histogram at (T,F)

Page 68: Topic Models for Signal Processing - Carnegie Mellon University

The bag of spectrograms

( | ) ( | ) ( | ) ( | ) ( | ) ( | )

DRAW

Z=1 Z=2 Z=M

P(T|Z) P(F|Z) P(T|Z) P(F|Z) P(T|Z) P(F|Z)

Z

P(T|Z) P(F|Z)

T FT F

f

fZ

(T,F)

t

f

Repeat N times

t

F T

• Drawing procedure– Fundamentally equivalent to bag of frequencies model

Repeat N times

Z

zfPztPzPftP )|()|()(),(

Fundamentally equivalent to bag of frequencies model • With some minor differences in estimation

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 69: Topic Models for Signal Processing - Carnegie Mellon University

Estimating the bag of spectrograms

( | ) ( | ) ( | ) ( | ) ( | ) ( | )

)'|()'|()'()|()|()(),|(ztPzfPzP

ztPzfPzPftzP

Z=1 Z=2 Z=M

P(T|Z) P(F|Z) P(T|Z) P(F|Z) P(T|Z) P(F|Z)

?

'

)|()|()(z

ztPzfPzP

)()|'(

)(),|()( t f

t

fSfP

fSftzPzP

f

'

)(),|'()(

z t ft fSftzP

)(),|()|( t

t fSftzPfP

t

')'()',|(

)|(

f tt

t

fSftzPzfP

)(),|( t fSftzP zfPztPzPftP )|()|()(),(

• EM update rules– Can learn all parameters

'' )(),'|(

)|(

t ft

ft

fSftzPztP

Z

– Can learn P(T|Z) and P(Z) only given P(f|Z)– Can learn only P(Z)

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 70: Topic Models for Signal Processing - Carnegie Mellon University

Bag of frequencies vs. bag of spectrograms

• Fundamentally equivalenty q• Difference in estimation

– Bag of spectrograms: For a given total  N and P(Z), the total “energy” assigned to a basis is determined

• increasing its energy at one time will necessarily decrease its energy elsewhere

– No such constraint for bag of frequencies• More unconstrained

• Bag of frequencies more amenable to imposition of a priori• Bag of frequencies more amenable to imposition of a priori distributions

• Bag of spectrograms a more natural fit for other modelsg p g

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 71: Topic Models for Signal Processing - Carnegie Mellon University

The PLCA Tensor Model

Z

P(A|Z) P(B|Z) P(C|Z)

Z

P(A|Z) P(B|Z) P(C|Z)

• The bag of spectrograms can be extended to multivariate data

)|(...)|()|()(),...,( zcPzbPzaPzPcbaPZ

• EM update rules are essentially identical to bivariate case

Z

bivariate case

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 72: Topic Models for Signal Processing - Carnegie Mellon University

How many bases

• Previous examples assumed knowledge of b f bnumber of bases

• How do we know the correct number of bases– In general we do not– No good mechanism to learn this automatically

• Must determine as many bases as data/math will allow– With appropriate constraints for best discovery of bases

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 73: Topic Models for Signal Processing - Carnegie Mellon University

An Geometric View(0,0,1)

xx  =  normalized training spectrum

xxxxx x

xxxxx x

o  =  learned component multinomialo

xx

xx xx

xxxo

o

• Normalized spectrograms/spectral vectors live on probability simplex

(0,1,0)(1,0,0)

Normalized spectrograms/spectral vectors live on probability simplex• ML estimation approximates these as linear combinations of components

– Spectral vectors lying outside the region enclosed by components are modeled with error

A d b KL di b i i d• As measured by KL distance between approximation and true vector• ML estimation learns parameters to minimize this error

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 74: Topic Models for Signal Processing - Carnegie Mellon University

PLCA as Matrix DecompositionsBag of frequencies model

Spectrogram = DxT matrixComponents

DxKP(F|Z)

= Weights KxTPt(Z)

Energy:diag(n(t))

Bag of frequencies model

P(F|Z) t( )TxT

B f d l

Spectrogram = DxT matrixComponents

DxK= Weights KxT( | )

WeightsP(Z)

Bag of spectrograms model

Spectrogram   DxT matrix DxKP(F|Z)

= P(T|Z)P(Z)KxK diagonal

• PLCA  decomposes the spectrogram matrix as a product of matrices

The first matrix represents multinomial components– The first matrix represents multinomial components– The second represents activations  

Page 75: Topic Models for Signal Processing - Carnegie Mellon University

A Mathematical Constraint

Spectrogram = DxT matrixComponents

DxK= Weights KxTSpectrogram = DxT matrix DxKP(F|Z)

= Pt(Z) or P(Z)P(T|Z)

• Estimate RHS to minimize KL between LHS and RHS• Unique estimates are possible if K<DUnique estimates are possible if K<D

– Dimensionality reduction• But with non‐zero error

• K >= D will result in non‐unique solutions– The learned bases will be mixed upThe learned bases will be mixed up

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 76: Topic Models for Signal Processing - Carnegie Mellon University

No limitation in nature

• Music: Hundreds of instrument types, dozens of notesThousands of bases– Thousands of bases

• Speech: Hundreds of sound patterns• The arithmetic limitation of “K” is artificialThe arithmetic limitation of  K  is artificial

• Reason for limitation: No requirement for the bases to be “ l ”“complex”– For K>=D bases may be individual frequencies

• The model imposes no constraintp

– “Natural” bases, on the other hand are complex sounds

• Requirement: Learning procedure that permits learning b h h b f di imore bases than the number of dimensions

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 77: Topic Models for Signal Processing - Carnegie Mellon University

An Alternate View

• Allow any arbitrary number of bases (urns)

• Specify that for any specific frame only a small number of p y y p f ybases may be used– Although there are many spectral structures, any given frame only 

has a few of thesehas a few of these

• Only a small number or urns are used in any frame– The mixture weights must be sparseg p

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 78: Topic Models for Signal Processing - Carnegie Mellon University

Sparse overcomplete estimation

DxT DxK=KxT

• Learning more  bases than dimensions results in overcompleterepresentationsp

• The activation (second matrix) must be sparse to accommodate– I.e. sparse, overcomplete representationsS ti ti t i f t i i th t i i• Sparse activation matrix => no. of non‐zero entries in the matrix is small– I.e. the columns of the matrix have small L0 norm 

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 79: Topic Models for Signal Processing - Carnegie Mellon University

A Brief Segue: Compressive Sensing and Sparse Estimationand Sparse Estimation

hY A X

• Given: Y whereY = AX

Dx1

DxKKx1

=

• dim(X) >> dim(Y)– K >> D

1

• Given A, estimate X• Underspecified no unique solution for XUnderspecified, no unique solution for X

– Unique solutions if X is sparse

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 80: Topic Models for Signal Processing - Carnegie Mellon University

Sparse Estimation and CS

• Y = AX,   Y = Dx1,  X = Kx1,  D<<K– Given: X = sparse with no more than “S” components

• True solution:S l f “S ” b f X– Solve for every “S‐sparse” subvector of X

– Find the one that results in zero error in Yf l• Requires D >= 2S for unique solution

– NP complete

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 81: Topic Models for Signal Processing - Carnegie Mellon University

Sparse Estimation and CS

• Approximate solution 1:Minmize ||Y AX|| 2 + |X|– Minmize ||Y – AX||22 + |X|1

– X value that minimizes both error and L1 norm of X– Guaranteed to result in optimal sparse estimate of X under strict 

conditions

• Appoximate solution 2:Appoximate solution 2:– Minmize ||Y – AX||22 constrained to |X|0 = S– IHT, COSAMP, …

• Solutions also applicable to larger class of problems than quadratic error minimizationq

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 82: Topic Models for Signal Processing - Carnegie Mellon University

Sparse Estimation of Topic Models• Similar to CS and other sparse estimation problemsproblems– KL distance, rather than squared error

• L minimization does not apply• L1 minimization does not apply– Our data are probability vectors with L1 norm = 1

i i i i h i d l• L0 minimization techniques do apply– With appropriate generalization– But are however often ineffective

• Poor solutions

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 83: Topic Models for Signal Processing - Carnegie Mellon University

Sparsity through priors(1,0,0) , very sparse

(a,0,b), somewhat sparse) (a,b,0), somewhat sparse)

(a,b,c)not sparse (0,a,b), somewhat sparse)

(0,1,0) , very sparse(0,0,1) , very sparse

• Mixture weights reside in probability simplex• Corners and edges of probability simplex represent “sparse” regions

• An a priori probability distribution that favors edges and corners will result in sparse estimates

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 84: Topic Models for Signal Processing - Carnegie Mellon University

Sparse Priors

• Dirichlet distributions with appropriate parameters can force sparse l tisolutions

– As shown• The objective (probability) surface is shallow in the middle and 

steep at sparse points– Unstable

• Instead we will use an entropic prior• Instead, we will use an entropic prior

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 85: Topic Models for Signal Processing - Carnegie Mellon University

Entropy as a measure of sparsity

Entropy = 0

Shannon Entropy = log(6) = max

E i l f i

0 1 2 3 4 5 0 1 2 3 4 5

• Entropy is also a measure of sparsity– Fewer possibilities == greater predictability == lower entropy

• Different entropy measures:• Different entropy measures:– Shannon entropy of a distribution  = {Pi}: 

ii PPH log)(

– Renyi’s entropy (tends to Shannon entropy as  1)

i

ii g)(

PH log1)(

iiPH

log

1)(

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 86: Topic Models for Signal Processing - Carnegie Mellon University

The Entropic Prior

))(exp(1)( HJ

P

• J is a normalization factor

J

• For positive , distributions  with low entropy (favoring sparsity) have higher probability– Larger  promote sparsity more aggressivelyg p p y gg y

• Any definition of entropy is acceptabley py p

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 87: Topic Models for Signal Processing - Carnegie Mellon University

Entropic Prior vs. Dirichlet Prior

• Left, Dirichlet prior. Right (Shannon) Entropic priorp

• Entropic prior: smoother surface, promotes sparsity from all points– More stable estimates

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 88: Topic Models for Signal Processing - Carnegie Mellon University

Estimation with sparsity• Maximum A Posteriori Estimation

)()|(logmaxarg)()|(maxargˆ PXPPXP )()|(logmaxarg)()|(maxarg PXPPXP

• X is the spectrogram {St(f)}k l h d f

ttt zfPzPfSftfSP )|()(log)(),)((log

• Likelihood of spectrogram:

zt

t ftt ffff )|()(g)(),)((g

• =  set of multinomial parameters: {Pt(Z)}, and {P(f|Z)}• Can enforce sparsity on bases P(f|Z) too

ftZfZ ZfPHCZPHCP ))|((exp))((exp)(

zt

• =  is a binary 1/0 indicator to specify if sparsity is needed

Page 89: Topic Models for Signal Processing - Carnegie Mellon University

MAP estimation with SparsityMAP estimation with Sparsity

ZfPZP zfPzPfSZfPZP )|()(log)(maxarg)}|(ˆ{)},(ˆ{ )}|({)}({

zt

t ftZfPZP zfPzPfSZfPZP )|()(log)(maxarg)}|({)},({ )}|({)},({

CZfPHZPHz

fft

tzz )|()(

• Exactly the same as ML estimation, with additional penalty terms derived from entropypenalty terms derived from entropy

• Either mixture weights Pt(Z) or bases P(f|Z) (or both) can be estimated with sparsityp y– By setting z and f to 1 if sparsity is needed and 0 otherwise

• Solutions are very similar to ML estimates with EMICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 90: Topic Models for Signal Processing - Carnegie Mellon University

Sparse Estimate with Entropic Priors

')'|()'(

)|()()|(

zt

tt zfPzP

zfPzPfzPFragment and count

z

f

ttt fzPfSzcount )|()()( t

tt fzPfSzfcount )|()()|(

• ML (non‐sparse) estimates:)(Zt )|( Zfcount

')(

)()(

Zt

tt Zcount

ZcountZP

')|(

)|()|(

fZfcount

ZfcountZfP

• Sparse estimates:

)}({,)( ZcountgZP tzsparset )}|({,)|( ZfcountgZfP fsparse

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 91: Topic Models for Signal Processing - Carnegie Mellon University

What is gsparse()p

• H() = Shannon entropy

: of iterations}{, isparseg

/

• W() is Lambert’s W function

)/(/)( /1

eWiP

i

i i

i

iPiP

)()(log

• W() is Lambert s W function

• H() = Renyi entropy

: of iterations}{, isparseg

iP )( iQ )(

i

i iPiPiQ

)(

)(1

)(

iiQ

iQiP)(

)()(

Page 92: Topic Models for Signal Processing - Carnegie Mellon University

Synthetic Example: Estimation with Sparsity

• Top and middle panel: Non‐sparse estimator– As the number of bases increases, bases migrate towards corners of the unit 

simplex• Bottom panel: Sparse estimator

– Simplex formed by bases shrinks to fit the data

Page 93: Topic Models for Signal Processing - Carnegie Mellon University

The Vowels and Music Examples

• Sparsity only applied to mixture weights (though solution is not overcomplete)Sparsity only applied to mixture weights (though solution is not overcomplete)• Left panel, Non‐sparse learning: most bases have significant energy in all frames• Right panel, Sparse learning: Fewer bases active within any frame

Page 94: Topic Models for Signal Processing - Carnegie Mellon University

Sparsity on components vs. weights• Increasing sparsity of mixture weights makes bases denser 

– And vice versa– Preserving net information

Sparse bases High‐entropy mixture weights

High‐entropy bases Sparse mixture weights

Page 95: Topic Models for Signal Processing - Carnegie Mellon University

Cross‐Entropy as a prior

• So far we have considered sparsity of i di id lindividual parameters– E.g. P(Z),  P(f|Z) for each Z, etc.

• We can also impose constraints on relations b diff bbetween different bases– E.g. make P(f|Z) for different Zs maximally diffdifferent

• Make individual bases maximally dissimilar

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 96: Topic Models for Signal Processing - Carnegie Mellon University

Cross‐Entropy PriorTh “ t i ”• The “symmetric” cross entropy:

i

iii

ii PQQPQPH loglog),(

• The Shannon cross‐entropic prior:

)'|()|(1)} )|({)}|(({ ZfPZfPHZfPZfPP

– Can manipulate the solution by varying 

',21 )'|(),|(exp)},....)|({)},|(({

ZZZfPZfPH

ZZfPZfPP

p y y g

• The objective function to optimize:

zt

t ftZfPZP zfPzPfSZfPZP )|()(log)(maxarg)}|(ˆ{)},(ˆ{ )}|({)},({

CZfPZfPH )'|()|( CZfPZfPHZZ

',

)|(),|(

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 97: Topic Models for Signal Processing - Carnegie Mellon University

Estimation with Cross‐Entropic Priors

ZZ

ZfPZfcountZfP '

)'|()|()|(

f ZZZfPZfcount

ZfP

')'|()|(

)|(

• Can be extended to minimize cross‐entropy between groups of basesg p

ZfPZfcount )'|()|(

f Z

Za

b

b

ZfPZfcount

ZfPZfcountZfP

Z

ZZ)'|()|(

)'|()|()|(

f Z bZ

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 98: Topic Models for Signal Processing - Carnegie Mellon University

Temporal Priors

P(f|Z2)

P(f|Z3)

P(f|Z1)

Spectral trajectory

• Other mechanisms may impose “temporal” priorsI i t l t i t th t j t– Imposing temporal constraints on the trajectory through the simplex

– More from Paris– More from Paris 

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 99: Topic Models for Signal Processing - Carnegie Mellon University

The limits of Sparsity

DxT DxK=KxT

• Sparse estimation permits K>>D• The largest value with unique solutions is K=T

– The training data themselves become basesg– Weights matrix becomes an identity matrix

• For K>T the solution becomes indeterminate• For K>T the solution becomes indeterminateICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 100: Topic Models for Signal Processing - Carnegie Mellon University

Example based representation

• Use training vectors themselves as bases• A justification:

– “Learning” – learned bases are linear combinations of normalized data– Learning   learned bases are linear combinations of normalized data• They can lie in regions not visited by data

– A data based representation has no such restrictions

• Need not store all training vectors– Random sampling is sufficiently effective

• The “building blocks” metaphor is no longer valid, however

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 101: Topic Models for Signal Processing - Carnegie Mellon University

Patterns Beyond a Record

• Techniques talked about so far are effective at extracting structure

• However structure is discovered at the level ofHowever structure is discovered at the level of entire records– Each basis spans the entire frequency axisEach basis spans the entire frequency axis

• What about structures that extend beyond aWhat about structures that extend beyond a record?– Or substructures within a record?Or substructures within a record?

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 102: Topic Models for Signal Processing - Carnegie Mellon University

Patterns extend beyond a single frame

• Four bars from a music example• The spectral patterns are actually patchesp p y p

– Not all frequencies fall off in time at the same rate• The basic unit is a spectral patch, not a spectrum• Extend model to consider this phenomenon

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 103: Topic Models for Signal Processing - Carnegie Mellon University

Shift‐Invariant Model

P(T|Z) P(t,f|Z) P(T|Z) P(t,f|Z) P(T|Z) P(t,f|Z)

• Employs bag of spectrograms model

Z=1 Z=2 Z=M

Employs bag of spectrograms model• Each “super‐pot” has two sub pots

One subpot now stores bi variate distributions– One subpot now stores bi‐variate distributions• Each ball has a (t,f) pair marked on it – the bases

Balls in the other subpot merely have a time “T”– Balls in the other subpot merely have a time  T  marked on them – the “location”

Page 104: Topic Models for Signal Processing - Carnegie Mellon University

The shift‐invariant model

( | ) ( | ) ( | ) ( | ) ( | ) ( | )

DRAW

Z=1 Z=2 Z=M

P(T|Z) P(t,f|Z) P(T|Z) P(t,f|Z) P(T|Z) P(t,f|Z)

Z

P(T|Z) P(t,f|Z)

T t fT t,f

f

f

(T+t,f)

t

f

Repeat N times

t

Repeat N times

zftTPzTPzPftP )|()|()()( Z T

zftTPzTPzPftP )|,()|()(),(ICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 105: Topic Models for Signal Processing - Carnegie Mellon University

Estimating ParametersEstimating Parameters

• Maximum likelihood estimate followsMaximum likelihood estimate follows fragmentation and counting strategy

• Two step fragmentation• Two‐step fragmentation– Each instance is fragmented into the super potsTh f i h i f h– The fragment in each super‐pot is further fragmented into each time‐shift

• Since one can arrive at a given (t f) by selecting any T• Since one can arrive at a given (t,f) by selecting any T from P(T|Z) and the appropriate shift  t‐T from P(t,f|Z)

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 106: Topic Models for Signal Processing - Carnegie Mellon University

Shift invariant model: Update Rules• Given data (spectrogram) S(t,f)• Initialize P(Z), P(T|Z), P(t,f | Z)• Iterate

)|,()|()|,,( )|,()|()(),,(T

ZfTtPZTPZftTPZfTtPZTPZPZftP

''

)|,','()|,,(),,|(

)',,(),,(),|(

TZ

T

ZfTtTPZfTtTPftZTP

ZftPZftPftZP Fragment

'' TZ

)()|'()|(

),(),,|(),|(

)|( )()|'(

),(),|(

)( t ft f

ftSftZTPftZP

ftSftZTPftZP

ZTPftSftZP

ftSftZP

ZP

''

),(),,|(),|(

),(),,|'(),|(),(),|'(

T

T t fZ t f

fTSfTZtTPfTZP

ftSftZTPftZPftSftZP

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

'

),(),,|'(),|()|,(

t T

T

fTSfTZtTPfTZPZftP Count

Page 107: Topic Models for Signal Processing - Carnegie Mellon University

An Example

• Two distinct sounds occuring with different repetition rates within a signal

INPUT SPECTROGRAM

Discovered “patch” bases

Contribution of individual bases to the recording

Page 108: Topic Models for Signal Processing - Carnegie Mellon University

Shift‐Invariance in Two dimensionsShift Invariance in Two dimensions

• Patterns may be substructures– Repeating patterns that may occur anywherep g p y y

• Not just in the same frequency or time location• More apparent in image data

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 109: Topic Models for Signal Processing - Carnegie Mellon University

The two‐D Shift‐Invariant Model

P(T,F|Z) P(t,f|Z) P(T,F|Z) P(t,f|Z) P(T,F|Z) P(t,f|Z)

Z=1 Z=2 Z=M

• Both sub‐pots are distributions over (T,F) pairs– One subpot represents the basic patternp p p

• Basis

– The other subpot represents the location

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 110: Topic Models for Signal Processing - Carnegie Mellon University

The shift‐invariant model

( | ) ( | ) ( | ) ( | ) ( | ) ( | )

DRAW

Z=1 Z=2 Z=M

P(T,F|Z) P(t,f|Z) P(T,F|Z) P(t,f|Z) P(T,F|Z) P(t,f|Z)

Z

P(T,F|Z) P(t,f|Z)

T F t fT,F t,f

f

f

(T+t,f+F)

t

f

Repeat N times

t

Repeat N times

zFftTPzFTPzPftP )|()|()()( Z T F

zFftTPzFTPzPftP )|,()|,()(),(ICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 111: Topic Models for Signal Processing - Carnegie Mellon University

Two‐D Shift Invariance: Estimation• Fragment and count strategy• Fragment into superpots, but also into each T and FFragment into superpots, but also into each T and F 

– Since a given (t,f) can be obtained from any (T,F) 

)|,()|,()|,,,( )|,()|,()(),,(FT

ZFfTtPZFTPZftFTPZFfTtPZFTPZPZftP

',''

,

)|',',','()|,,,(),,|,(

)',,(),,(),|(

FTZ

FT

ZFfTtFTPZFfTtFTPftZFTP

ZftPZftPftZP

Fragment

),(),,|','(),|(

),(),,|,(),|(

)|,( ),(),|'(

),(),|(

)( t ft f

ftSftZFTPftZP

ftSftZFTPftZP

ZFTPftSftZP

ftSftZP

ZP

,

,

' ''

),(),,|,(),|(

)|(

),(),,|,(),|(),(),|(

FT

T F t fZ t f

FTSFTZfFtTPFTZP

ZftP

fffff

',' ,

),(),,|','(),|()|,(

ft FT

FTSFTZfFtTPFTZPZftP

CountICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 112: Topic Models for Signal Processing - Carnegie Mellon University

Shift‐Invariance: CommentsShift Invariance: Comments• P(T,F|Z) and P(t,f|Z) are symmetric

– Cannot control which of them learns patterns and which the locations

• Answer: Constraints• Answer: Constraints– Constrain the size of P(t,f|Z)

• I.e. the size of the basic patch

– Impose sparsity on location P(T,F|Z)• Patches occur only occasionally and their locations are inherently sparsesparse

– Sparsity is obtained simply by employing gsparse() on the counts, as before

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 113: Topic Models for Signal Processing - Carnegie Mellon University

Shift‐Invariance in Many Dimensions

• The generic notion of “shift invariance” can be• The generic notion of  shift‐invariance  can be extended to multivariate data– Not just two‐D data like images and spectrograms

• Shift invariance can be applied to any subset of variablesof variables

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 114: Topic Models for Signal Processing - Carnegie Mellon University

Example: 2‐D shift invariance

• Sparse decomposition employed in this example– Otherwise locations of faces (bottom right panel) are not precisely determined

Page 115: Topic Models for Signal Processing - Carnegie Mellon University

Example: 3‐D shift invariance

• The original figure has multiple handwritten renderings of three characters– In different colours

• The algorithm learns the three characters and identifies their locations in the figureg

Input datascovered

Patches

Dis P

Patch

ocations

Lo

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 116: Topic Models for Signal Processing - Carnegie Mellon University

Beyond shift‐invariance: transform invariance

• The draws from the urns may not only be shifted, but also transformedTh ith ti i i il t th hift i i t• The arithmetic remains very similar to the shift‐invariant model– Wemust now impose one of an enumerated set of transformsWe must now impose one of an enumerated set of transforms 

to (t,f), after shifting them by (T,F)– In the estimation, the precise transform applied is an unseen 

variablevariable

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 117: Topic Models for Signal Processing - Carnegie Mellon University

Example: Transform Invariance

• Top left: Original figure• Bottom left – the two bases discovered• Bottom right –

– Left panel, positions of “a”– Right panel positions of “l”Right panel, positions of  l

• Top right: estimated distribution underlying original figureICASSP 2011 Tutorial: Applications of Topic 

Models for Signal Processing – Smaragdis, Raj

Page 118: Topic Models for Signal Processing - Carnegie Mellon University

Relationship to Other TechniquesRelationship to Other Techniques

• PCA/ICA: NonePCA/ICA:  None– Topic model does not impose orthogonality or independence constraintsindependence constraints

• Although they can be imposed via other priors

• PARAFAC: – Tensor models provide multivariate pdecompositions similarly to Parafac

• Minimizing KL divergence, rather than L2

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 119: Topic Models for Signal Processing - Carnegie Mellon University

Relationship to other Techniques

• NMF:– Basic PLCA provably identical to NMF within a scaling constant

– However, provides handle for additional statistical framework 

E t i d t i i• Entropic and cross‐entropic priors• Anti priors.

Nevertheless fundamentally similar– Nevertheless fundamentally similar• With greater mathematical elegance and ease of algorithm development

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Page 120: Topic Models for Signal Processing - Carnegie Mellon University

Over to ParisOver to Paris

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj