VAD in CLUE Andy Pepperell. Need for VAD Want middle boxes to be able to switch video / audio...

6
VAD in CLUE Andy Pepperell

Transcript of VAD in CLUE Andy Pepperell. Need for VAD Want middle boxes to be able to switch video / audio...

Page 1: VAD in CLUE Andy Pepperell. Need for VAD Want middle boxes to be able to switch video / audio without having to decode all audio – Not all MCUs are fully.

VAD in CLUE

Andy Pepperell

Page 2: VAD in CLUE Andy Pepperell. Need for VAD Want middle boxes to be able to switch video / audio without having to decode all audio – Not all MCUs are fully.

Need for VAD

• Want middle boxes to be able to switch video / audio without having to decode all audio– Not all MCUs are fully transcoded!

• Want to be able to determine “active” video for intra-room segment switching

• VAD algorithms must be consistent– Categorization of audio media– Calculation of energy values (dB)

Page 3: VAD in CLUE Andy Pepperell. Need for VAD Want middle boxes to be able to switch video / audio without having to decode all audio – Not all MCUs are fully.

Single vs multiple audio streams

• Some consumers may receive single, pre-mixed, audio stream from provider whereas some may receive multiple separate streams in a linear array– Want parity between these 2 cases so that all

consumers are equally capable• Single speaker rooms should not be disadvantaged for

segment switching

Page 4: VAD in CLUE Andy Pepperell. Need for VAD Want middle boxes to be able to switch video / audio without having to decode all audio – Not all MCUs are fully.

Capture set exampleMedia capture(s) Description

VC0, VC1, VC2 3 camera view of room

VC3 1 camera view of room

AC0, AC1, AC2 3 microphone version of room audio

AC3 Single pre-mixed version of room audio

A consumer choosing to receive and render AC3 should be able to switch between VC0, VC1, and VC2 equally as well as one that chooses to receive AC0, AC1 and AC2 separately.

Page 5: VAD in CLUE Andy Pepperell. Need for VAD Want middle boxes to be able to switch video / audio without having to decode all audio – Not all MCUs are fully.

Details

• Idea is to include (potentially multiple) active position information with VAD as well as “overall” VAD for the audio stream– For example, if leftmost segment of 3 is “loudest”

then the audio stream would indicate a specific audio VAD value at the active position• Centre of left segment might be 16 ( / 100)• Active positions could be determined by other means,

e.g. button press

Page 6: VAD in CLUE Andy Pepperell. Need for VAD Want middle boxes to be able to switch video / audio without having to decode all audio – Not all MCUs are fully.

Messaging implications

• In stream configure message from consumer to provider, consumer should be able to specify VAD characteristics– Algorithm for provider to use if a choice is

available– Maximum number of active positions to include in

provider’s audio

• Need to consider security of VAD information