May 3 rd, 2010 Update Outline Monday, May 3 rd 2 Audio spatialization Performance evaluation...

33
May 3 rd , 2010 Update

Transcript of May 3 rd, 2010 Update Outline Monday, May 3 rd 2 Audio spatialization Performance evaluation...

Update

May 3rd, 2010UpdateOutlineMonday, May 3rd2Audio spatializationPerformance evaluation (source separation)Source separationSystem overviewDemonstration (system)Concentration measure and W-disjoint orthogonalityAdaptive time-frequency representation (TFR)Demonstration (adaptive TFR)Audio spatializationMonday, May 3rd3Audio spatialization a spatial rendering technique for conversion of the available audio into desired listening configurationAnalysis separating individual sourcesRe-synthesis re-creating the desired listener-end configurationPerformance evaluation [1]Monday, May 3rd4ISR = Image to Spatial-distortion RatioSIR = Source to Interference RatioSAR = Source to Artifacts RatioSDR = Source to Distortion RatioPerformance evaluationMonday, May 3rd5Estimated source image can be decomposed astrue source image, error componentsspatial distortion,interference,artifacts,

Performance evaluationMonday, May 3rd6

Source separation [2,3]Monday, May 3rd7Source separation obtaining the estimates of the underlying sources, from a set of observations from the sensorsTime-frequency transformSource analysis estimation of mixing parametersSource synthesis estimation of sourcesInverse time-frequency representation7Mixing modelMonday, May 3rd8Anechoic mixing modelMixtures, xiSources, sj

Under-determined (M < N)M = Number of mixturesN = Number of sources

Figure: Anechoic mixing model Audio is observed at the microphones with differing intensity and arrival times (because of propagation delays) but with no reverberationsSource:P. O. Grady, B. Pearlmutter and S. Rickard, Survey of sparse and non-sparse methods in source separation, International Journal of Imaging Systems and Technology, 2005MixturesMonday, May 3rd9

Source 1

Source 2

Source 3Mixtures (stereo)

function TFRStereoMixture (stereo)Sampling frequencyDFT sizeWindow sizeHop sizeMixture TFRsInputsOutputsMonday, May 3rd10Time-frequency transformMonday, May 3rd11

function SourceAnalysisMixture TFRs2-D histogramMixing parametersInputsOutputsMonday, May 3rd12Source analysis (estimation of mixing parameters)Monday, May 3rd13

function SourceSynthesisMixing parametersMixture TFRsEstimation techniqueDUET/LQBPEstimated source masksEstimated source TFRsInputsOutputsMonday, May 3rd14Source synthesis (estimation of sources)Monday, May 3rd15

Monday, May 3rd16

Source synthesis (estimation of sources)Monday, May 3rd17

Source synthesis (estimation of sources)function InverseTFREstimated source TFRsSampling frequency

Estimated sourcesInputsOutputsMonday, May 3rd18Inverse time-frequency transformMonday, May 3rd19

Orig. source 1Orig. source 2Orig. source 3

Source 1Source 2Source 3Demonstration (system)Monday, May 3rd20No. of sources (2)No. of sources (3)MixtureOriginalEstimatedDUETSAR15.558.1713.719.0913.23SDR15.107.6413.118.6211.61SIR25.7119.8322.0621.2023.65ISR27.6519.7624.3720.4718.48LQBPSAR51.4144.288.493.887.34SDR51.3344.217.773.375.58SIR69.4569.1513.666.8410.66ISR76.2862.3217.0310.7324.62

DFT size = 2048Window size = 50 msHop size = 25 msSampling frequency = 22050 Hzall the values are in dB20Concentration measureMonday, May 3rd21Requirement for source separationW-disjoint orthogonality

Sparsity is an indicator of WDO [4]

Thus a sparser TFR is expected to satisfy WDO criterion to a greater extent

Commonly used sparsity measures [5]KurtosisGini IndexMonday, May 3rd22Source separation demands (WDO)Sparse time-frequency representation (TFR)Some observationsMusic/speech signals different frequency components present at different time instantsDifferent analysis window lengths provide different sparsity [4]Therefore, to obtain a sparser TFRUse that analysis window length for a particular time-instant, which gives highest sparsity [6]Adaptive TFRAdaptive TFRMonday, May 3rd23

Adaptive TFRMonday, May 3rd24

function TFRStereo (modified)Mixture (stereo)Sampling frequencyDFT sizeWindow sizeWindow size defaultConcentration measureMixture TFRsAdapted window sequenceInputsOutputsMonday, May 3rd25Monday, May 3rd26ConstraintTFR should be invertible

SolutionSelect analysis windows such that they satisfy constant over-lap add (COLA) criterion [7]Inverse adaptive TFRAnalysis windows (COLA)Monday, May 3rd27

function InverseTFR (modified)Estimated source TFRsSampling frequencyAdapted window sequenceWindow size defaultEstimated sourcesInputsOutputsMonday, May 3rd28Demonstration (adaptive TFR)Monday, May 3rd29Source 1Source 2Source 3OriginalATFR(20:10:90 ms)SAR16.703.759.43SDR14.192.837.04SIR21.3510.9411.20ISR20.058.6617.65TFR(60 ms)SAR15.813.208.66SDR13.602.466.25SIR22.6211.7810.61ISR19.629.5419.24

all the values are in dBDemonstration (adaptive TFR)Monday, May 3rd30Source 1Source 2Source 3OriginalATFR(20:10:90 ms)SAR12.308.903.68SDR11.808.784.32SIR22.7819.3413.47ISR18.5318.2211.51TFR(60 ms)SAR12.138.793.18SDR11.768.693.76SIR22.7618.9216.24ISR19.5516.3212.16

all the values are in dBReferencesMonday, May 3rd31E. Vincent, R. Gribonval and C. Fevotte, Performance measurement in blind audio source separation, IEEE Transactions on Audio, Speech and Language Processing, 2006

A. Jourjine, S. Rickard and O. Yilmaz, Blind separation of disjoint orthogonal signals: demixing n sources from 2 mixtures, IEEE Conference on Acoustics, Speech and Signal Processing, 2000

R. Saab, O. Yilmaz, M. J. Mckeown and R. Abugharbieh, Underdetermined anechoic blind source separation via lq basis pursuit with q