[email protected] EE Dept., IIT Bombay P. C. Pandey, "Signal processing for persons with...

25
p c p a n d e y @ e e . i i t b . a c . i n E E D e p t . , I I T B o m b a y P. C. Pandey, "Signal processing for persons with sensorineural hearing loss: Challenges and some solutions,” AICTE Sponsored Faculty Development Programme on Signal Processing and Applications, Dept. of Electrical Engineering, VJTI, Mumbai, Feb. 23-27, 2015. =========================================================================== Part B Sliding-band Dynamic Range Compression (Ref: N. Tiwari & P. C. Pandey, NCC 2014, Paper No.1569847357)

Transcript of [email protected] EE Dept., IIT Bombay P. C. Pandey, "Signal processing for persons with...

Slide 1

P. C. Pandey, "Signal processing for persons with sensorineural hearing loss: Challenges and some solutions, AICTE Sponsored Faculty Development Programme on Signal Processing and Applications, Dept. of Electrical Engineering, VJTI, Mumbai, Feb. 23-27, 2015.=========================================================================== Part B

Sliding-band Dynamic Range Compression

(Ref: N. Tiwari & P. C. Pandey, NCC 2014, Paper No.1569847357)

[email protected] Dept., IIT BombayOverview1.Introduction2.Sliding-band Dynamic Range Compression 3.Offline & Real-time Implementations4.Test Results5.Summary & Conclusion#/[email protected] Dept., IIT Bombay1. IntroductionDynamic range compressionTo present sounds comfortably within the limited dynamic range of the listener by amplifying the low level sounds without making the high level sounds uncomfortably loud. Processing stepsInput level estimationGain calculation based on input levelMultiplication of input with gain function Output resynthesis Classification of compression schemesOn the basis of signal level calculation: single-band or multibandOn the basis of gain control method: feedback or feed-forward12345#/[email protected] Dept., IIT BombayProcessingGain dependent on the dynamically varying signal level.Parameters:Compression threshold (Th)Compression ratio (CR)Attack & release time

Problems

Single-band dynamic range compressionCompensation for frequency-dependent loudness growth not feasible. Power mostly contributed by low-frequency components level of of high-frequency components controlled by low-frequency components Inaudibility of high frequency components, distortions in temporal envelope

12345#/[email protected] Dept., IIT Bombay

Multiband dynamic range compressionGeneral scheme of processingSpectral components of the input signal divided in multiple bands and the gain for each band calculated on the basis of signal power in that band.Parameters (band specific): compression threshold Th, compression ratio CR, attack & release time for detection.12345#/[email protected] Dept., IIT BombayLippmann et al. (1980): 16-channel compression9% improvement in recognition score over linear amplification.Asano et al.(1991): Multiband dynamic range compression realized as a single time-varying FIR filter & implemented on a 32-bit DSP fixed-point processorLess spectral distortion due to smoothened frequency response of FIR filter.Stone et al. (1999): Comparison of single and four-channel compression schemes & effect of varying CR, Th, and attack & release times Intelligibility & quality tests showed no specific preference for schemes.Li et al. (2000): Wavelet-based compression (7 octave sub-band analysis using wavelet filter bank & resynthesis after applying a logarithmic compression on the wavelet coefficients)Increase in intelligibility without introducing noticeable distortions.Magotra et al. (2000): Multiband dynamic range compression using a 16-bit fixed-point processor Taylor's series approximation used for the compression function to reduce computations in gain calculation.

12345#/[email protected] Dept., IIT BombayDisadvantages of multiband compressionSpurious spectral distortionsReduction in spectral contrasts and modulation depthDistortion in spectral shape of formants lying across the band boundariesDistortion of formant transitions across the adjacent bands Time-varying magnitude response without corresponding variation in the phase response leading to quality degradation Audible distortions, perceptible discontinuities, adverse effect on the perception of certain speech cues12345#/[email protected] Dept., IIT BombayExample of distortion due to multiband dynamic range compression during spectral transition

Processed output: multiband compression with 18 auditory critical bands, CR = 30, Ta = 6.4 ms, Tr = 192 ms

Swept sinusoidal input: constant amplitude, 125 250 Hz linearly swept frequency, 200 ms sweep durationTime (s)Time (s)12345#/[email protected] Dept., IIT BombayInvestigation objective

Real-time dynamic range compression to compensate for frequency-dependent loudness recruitment associated with sensorineural hearing loss for use in hearing aids with a low-power processor.

Low distortions Low computational complexity & memory requirementLow signal delay (algorithmic + computational)12345#/[email protected] Dept., IIT BombayProposed scheme: Sliding-band dynamic range compressionProposed for significantly reducing the temporal and spectral distortions associated with the currently used single-band and multiband compressions in hearing aids.Realized with computational complexity acceptable for implementation on a 16-bit fixed-point DSP processor and signal delay acceptable for real-time application.

Investigations using offline & real-time implementationsSelection of processing parameters

Evaluation of the implementationsInformal listening, PESQ measure 12345#/[email protected] Dept., IIT Bombay2. Sliding-band Dynamic Range Compression

Short-time spectral analysis: windowing, zero-padding, DFT calculationSpectral modification: gain calculation, output spectrum calculationResynthesis: IDFT calculation, windowing, overlap-add ProcessingApplying a frequency-dependent gain function, with the gain for each spectral sample determined by the short-time power in auditory critical bandwidth centered at it & in accordance with the specified hearing thresholds, compression ratios, and attack and release times.12345#/[email protected] Dept., IIT BombaySpectral modification:

Pmc(k): Power at upper comfortable listening levelCR(k): Compression ratio Short-time spectral analysis: windowing (length L, shift S), zero-padding, N-point DFT Resynthesis: N-point IDFT, overlap-add 12345#/[email protected] Dept., IIT BombayGain calculationAuditory critical bandwidth BW(k) = 25 + 75(1 + 1.4f 2)0.69, freq. sample = k, freq. = f

Target gain calculationPower at upper comfortable listening level: Pmc(k)Compression ratio: CR(k)Input power: Pic(k), Output power: Poc(k)Target gain: Gt(k) = Poc(k) / Pic(k)Compression relationdB scale: [Poc(k) / Pmc(k)]dB = [Pic(k) / Pmc(k)]dB / CR(k)linear scale: Poc(k) / Pmc(k) = [Pic(k) / Pmc(k)]1/ CR(k)Target gain for kth spectral sample[Gt(k)]dB = [1 1 / CR(k)] [Pmc(k) / Pic(k)]dB 12345#/[email protected] Dept., IIT BombayGain changed in steps from the previous value towards the target value with settable attack and release timesFast attack: to avoid the output level from exceeding UCL during transients Slow release: to avoid the pumping effect or amplification of breathingNumber of steps during attack phase = sa Number of steps during release phase = srTarget gain corresponding to min. input level = GmaxTarget gain corresponding to max. input level = GminGain ratio for attack phase a = (Gmax / Gmin)1/saGain ratio for release phase r = (Gmax / Gmin)1/sr Gain for ith window & kth spectral sampleG(i,k) = max[G(i 1 ,k) / a, Gt(i,k)] for Gt(i,k) < G(i 1 ,k) min[G(i 1 ,k) r, Gt(i,k)] for Gt(i,k) > G(i 1 ,k)Attack time Ta = saS / fs , Release time Tr = srS / fs [fs = sampling freq., S = window shift]12345#/[email protected] Dept., IIT BombayImplementation related challengesModifications in the short-time magnitude spectrum without corresponding changes in the phase spectrum can cause audible distortions.Computational complexity: log or series approximation based gain calculations not suitable for use in sliding-band compression.SolutionsAnalysis-synthesis using least-square error based signal estimation from modified STFT (Griffin & Lim, 1984): Processing artifacts reduced by masking the effect of phase discontinuities in the modified short-time complex spectrum. Look-up table based gain calculation: Two-dimensional look-up table relating the input power with gain as a function of frequency. Permits compression function most suited to compensate for the abnormal loudness growth.12345#/[email protected] Dept., IIT Bombay3. Offline & Real-time ImplementationsImplementation for offline processing Implementation using Matlab 7.10 for evaluating the proposed technique and the effect of processing parameters. Processing parameters fs = 10 kHz Frame length = 25.6 ms (L = 256) Overlap = 75% (S = 64) FFT size N = 5122D look-up table for frequency-dependent compression based on a linear relation between input-dB and output-dB, with settable CR(k) and Pmc(k). Input range: 20 log intervals (trade-off: small gain increments, look-up table size). Look-up table with 25620 entriesAttack and release times sa=1, Ta = 6.4 ms: Fast attack to avoid uncomfortable level during transients sr=30, Tr = 192 ms: Slow release to avoid pumping & amplification of breathing12345#/[email protected] Dept., IIT BombayImplementation for real-time processingImplementation on a 16-bit fixed-point DSP board to examine suitability of the technique for use in hearing aids.DSP chip: TI/TMS320C551516 MB memory space (320 KB on-chip RAM with 64 KB dual access data memory)Three 32-bit programmable timers4 DMA controllers each with 4 channelsFFT hardware accelerator (up to 1024-point FFT)Max. clock speed: 120 MHz DSP Board: eZdsp4 MB on-board NOR flash for user programStereo codec TLV320AIC3204: 16/20/24/32-bit ADC & DAC, 8 192 kHz samplingSoftware development: C using TI's 'CCStudio ver. 4.0 12345#/[email protected] Dept., IIT BombayInput-output operations: DMA based I/O with cyclic buffersADC and DAC: one codec (left channel) with 16-bit quantizationProcessing parameters (same as for offline processing): fs = 10 kHz, L = 256, S = 64, N = 512Data representation (input samples, spectral values, processed samples): 16-bit real & 16-bit imaginary

Implementation details12345#/[email protected] Dept., IIT BombayData transfers & buffering operations (S = L/4)DMA cyclic buffers 5-block S-sample input buffer 2-block S-sample output buffer Pointers Current input block Just-filled input block Current output block Write-to output block(incremented cyclically on DMA interrupt)

Signal delay: Algorithmic: 1 frame (25.6 ms), Computational frame shift (6.4 ms)12345#/[email protected] Dept., IIT Bombay4. Test ResultsTests for verification and evaluationOffline processingVerification of the compression technique for speech input with a large level variation and examination of the effect of different set of processing parameters.Assessment of output speech quality (using informal listening) for different input speech materials and time varying levels.Comparison of distortions introduced by different compression techniques during spectral transitions.Real-time processingComparison of the processed outputs from offline & real-time implementation: informal listening, PESQ measure (0 4.5).Signal delay & computational requirement.12345#/[email protected] Dept., IIT BombayExample: "you will mark ut please" concatenated with scaling factors for variation in the input level. CR = 2, Ta = 6.4 ms, Tr = 6.4 & 192 ms.

Input waveform Scaling factor Unprocessed waveform Processed Tr = 6.4 ms, low Pmc Processed Tr = 192 ms, low Pmc Processed Tr = 6.4 ms, high Pmc Processed Tr = 192 ms, high Pmc

Time (s)

Results from offline processingProcessing of different speech materials with varying levels: No audible roughness or distortion during informal listening.12345#/[email protected] Dept., IIT Bombay

Time (s)

Distortions during spectral transitions: Example of swept sinusoidal input. Sliding band compression outputMultiband compression (18 auditory critical bands) outputSingle-band compression outputInput: constant amplitude, 125 250 Hz linearly swept frequency, 200 ms sweep durationCR = 30, Ta = 6.4 ms, Tr = 192 ms. 12345#/[email protected] Dept., IIT BombayResults from real-time processingInformal listening: real-time output perceptually similar to the offline outputPESQ for real-time w.r.t. offline : 3.5Signal delay = 36 msUse of processing capacity: 41% (lowest acceptable clock: 50 MHz, max = 120 MHz)

Unprocessed Offline processed Real-time processed

Example: "you will mark ut please" concatenated with scaling factors for variation in the input level. CR = 2, Ta = 6.4 ms, Tr = 192 ms, low Pmc.Time (s)12345#/[email protected] Dept., IIT Bombay5. Summary & ConclusionsSummary: Development & investigation of sliding band compression schemeRealized using modified fixed-frame analysis-synthesis for low computational complexity & without distortions associated with phase discontinuities.Suitable for speech & non-speech audio & provision for settable attack time, release time, & compression ratios.Implemented using 16-bit fixed-point DSP chip & tested for satisfactory operation: 36 ms signal delay, 41% use of processing capacity, indicating scope for combination with other processing techniques. Conclusion: Sliding-band compression can be used to compensate for frequency-dependent loudness recruitment without introducing the distortions associated with single-band & multiband compression.1234 5#/[email protected] Dept., IIT BombayTo be continued in Part [email protected] Dept., IIT Bombay10

20

30

40

50

60

70

80

90

100

60

70

80

90

100

110

120

130

Output dB SPL

Input dB SPL

Linear

Compression

Limiting

CR = 1

CR = 2

Th

110

Input Signal

BPF-1

BPF-2

BPF-n

Detector

Detector

Detector

Gain Calc.

Gain Calc.

Gain Calc.

Delay

Delay

Delay

Output Signal

Th

CR

CR

CR

Th

Th

Short-time Spectral Analysis

InputSignal

Resynthesis Using Overlap-add

Spectral Modification

OutputSignal

Level Estimation

Target Gain Calc.

CR(k)

Pmc(k)

Gain Calc.

kth Sample-Centered Band

kth Sample

Attack Time

Release Time

Input Short-time Complex Spectrum

Modified Short-time Complex Spectrum

Band Samples

Text

Codec

Processor

Output Cyclic Buffer

Input Cyclic Buffer

IFFT & OutputWindowing

InputWindowing & FFT

Output Signal

Input Signal

ADC

DAC

Overlap- Add

Spectral Modification

Text

Input Samples

Ouput Samples

Just Filled Block

Input Block

L

N - L

Words

Output Block

Mult. By Modified Hamming Window

Mult. By Mod. Hamm. Window & Overlap-Add

L

N - L

Words

DMA Input Cyclic Buffer

DMA Output Cyclic Buffer

Write to Block

Input Data Buffer

Output Data Buffer

LSamples

SSamples

FFT

IFFT