Audio data and metadata model for radio & television ...

39
Audio data and metadata model for radio & television automation systems Mostafa Tavallaei December 13, 2017 1

Transcript of Audio data and metadata model for radio & television ...

Page 1: Audio data and metadata model for radio & television ...

Audio data and metadata model

for radio & television

automation systems

Mostafa Tavallaei

December 13, 2017 1

Page 2: Audio data and metadata model for radio & television ...

Audio in Radio & TV:

2

1. Container

2. Interface

3. Codec

Page 3: Audio data and metadata model for radio & television ...

Container

3

• BWF as an old but useful format

• The specification of the BW64 (Broadcast Wave

64Bit) audio file format

• ADM, Audio Definition Model

Page 4: Audio data and metadata model for radio & television ...

BWF:

• The BWF file format (Broadcast Wave Format) ,

Recommendation ITU-R BS.1352

• The (BWF) is based on the “Microsoft WAVE audio file format”

which is a type of file specified in the Microsoft “Resource

Interchange File Format”, RIFF

• The basic building block of the RIFF file format, called a chunk,

contains a group of tightly related pieces of information.

4

Page 5: Audio data and metadata model for radio & television ...

BWF: • It consists of a chunk identifier, an integer value

representing the length in bytes and the information

carried.

• A RIFF file is made up of a collection of chunks.

• BWF file includes a <Broadcast Audio Extension> chunk

and a <Universal Broadcast Audio Extension> chunk Fig. 1

5

Page 6: Audio data and metadata model for radio & television ...

BWF:

6

Page 7: Audio data and metadata model for radio & television ...

7

• Maximum file size of less than 4Gbytes.

• No support for advanced multichannel audio.

• Inadequate support for technical metadata.

BW64 format aims to overcome these limitations, and

maintain as much compatibility as possible with the

Recommendation ITU-R BS.1352 format

BWF has a number of limitations, most notably:

Page 8: Audio data and metadata model for radio & television ...

BW64 RECOMMENDATION ITU-R BS.2088-0

8

• The BW64 (Broadcast Wave 64Bit) format is based on RIFF

“Long-form file format for the international exchange

of audio programme materials with metadata”

• enable to contain large multichannel files • efficient metadata including the Audio Definition Model

(ADM) specified in Recommendation ITU-R BS.2076.

Page 9: Audio data and metadata model for radio & television ...

BW64 format description:

9

A BW64 format file should start with the mandatory “WAVE” header and at least the following chunks:

<WAVE-form> ->

BW64(‘WAVE’

<ds64-ck> // ds64 chunk for 64-bit addressing

<fmt-ck> // Format of the audio signal: PCM/non-

PCM

<chna-ck> // chna chunk for ADM look-up

<axml-ck> // axml chunk for carrying XML metadata

<wave-data>) // sound data

Page 10: Audio data and metadata model for radio & television ...

BW64

10

• There is an increasing demand on Audio Definition

Model (ADM) metadata ITU-R BS.2076.

• includes a definition of the <axml> chunk for storing and

transferring metadata as XML

Page 11: Audio data and metadata model for radio & television ...

BW64 format description:

11

Existing chunks defined as part of the RIFF/WAVE standard:

The RIFF/WAVE standard uses a number of chunks that are already defined.

These are:

• <RIFF>

• <fmt>

• <data>

Page 12: Audio data and metadata model for radio & television ...

BW64 format description:

12

The RIFF/WAVE is a subset of the ITU-R BS.1352 format.

Recommendation ITU-R BS.1352 contains these additional

chunks:

• <bext>

• <ubxt>

These chunks will not be included in the BW64 format,

which provides a more flexible solution carrying broadcast

metadata.

Page 13: Audio data and metadata model for radio & television ...

BW64 format description:

13

New Chunks and structure in the BW64 format:

• <BW64>

• <ds64>

• <JUNK>

• <axml>

• <chna>

Using the <ds64> (data size 64) chunk to enable the use of files greater than 4 Gbyte in size

• The reason for the 4 Gbyte barrier is the 32-bit addressing in

RIFF/WAVE and BWF

• With 32 bits a maximum of 4294967296 bytes = 4 Gbyte can be

addressed

• The structure of a basic conventional RIFF/WAVE file is shown in Fig.

2, where the chunkSize fields are 32-bit numbers representing the

sizes of their chunks

Page 14: Audio data and metadata model for radio & television ...

BW64 format description:

14

FIGURE 2

Basic RIFF/WAVE file structure

Just changing the size of every

field in a BWF to 64-bit would

produce a file that is not

compatible with the standard

RIFF/WAVE format – an

obvious but important

observation

Page 15: Audio data and metadata model for radio & television ...

BW64 format description:

15

The approach is to define a new 64-bit based RIFF called BW64 that is identical to the original RIFF/WAVE format, except for the following changes:

• The ID ‘BW64’ is used instead of ‘RIFF’ in the first four bytes of the file

• A mandatory <ds64> chunk is added, which has to be the first chunk after the “BW64 chunk”

Page 16: Audio data and metadata model for radio & television ...

BW64 format description:

16

The ‘ds64’ chunk has two mandatory 64-bit integer values, which replace two 32-bit fields of the RIFF/WAVE format:

• bw64Size (replaces the size field of the <RIFF>

chunk)

• dataSize (replaces the size field of the <data> chunk)

Page 17: Audio data and metadata model for radio & television ...

BW64 format description:

17

For all two 32-bit fields of the RIFF/WAVE format

the following rule applies:

If the 32-bit value in the field is not “-1” (= FFFFFFFF

hex) then this 32-bit value is used.

If the 32-bit value in the field is “-1” the 64-bit value in

the ‘ds64’ chunk is used instead.

Page 18: Audio data and metadata model for radio & television ...

BW64 format description:

18

The complete structure of the BW64 file format is

illustrated in Fig. 3, where the chunkSize values for the

<BW64> and <data> chunks are set to −1, to allow them to

use 64-bit size values from the <ds64> chunk.

FIGURE 3

BW64 file structure

Page 19: Audio data and metadata model for radio & television ...

Achieving compatibility between RIFF/WAVE and BW64:

19

• Some production audio files will be smaller than 4 Gbyte and they should therefore stay in the short-form RIFF/WAVE format

• What about recording ??

• enable the recording application to switch from RIFF/WAVE to BW64 on the fly at the 4 Gbyte size limit, while the recording is still going on

Page 20: Audio data and metadata model for radio & television ...

Achieving compatibility between RIFF/WAVE and BW64:

20

• a <JUNK> chunk that is of the same size as a <ds64> chunk

• This reserved space has no meaning for short-form WAVE

• it will become the <ds64> chunk, if a transition to BW64 is necessary.

The diagram in Fig. 4 shows the <JUNK> placeholder chunk placed before the <fmt > chunk

FIGURE 4

File structure with JUNK chunk

Page 21: Audio data and metadata model for radio & television ...

BW64 format description:

21

• At the beginning of a recording, a BW64-aware

application will create a standard RIFF/WAVE with a

’JUNK’ chunk as the first chunk.

• While recording, it will check the RIFF and data sizes. If

they exceed 4 Gbyte, the application will:

• Replace the chunkID <JUNK> with <ds64> chunk. (This transforms the

previous <JUNK> chunk into a <ds64> chunk).

• Insert the RIFF size, 'data' chunk size and sample count in the <ds64>

chunk

• Set RIFF size, 'data' chunk size and sample count in the 32 bit fields to −1 =

FFFFFFFF hex

• Replaces the ID ‘RIFF’ with ‘BW64’ in the first four bytes of the file

• Continue with the recording.

Page 22: Audio data and metadata model for radio & television ...

Compatibility with Recommendation ITU-R BS.1352

22

• BWF carries broadcast metadata in the <bext> and <ubxt> chunks. These chunks have fixed length fields limited to the specified fields preventing any other broadcast

related metadata being carried. • The <axml> chunk in BW64 can carry any

XML metadata

Page 23: Audio data and metadata model for radio & television ...

XML (Extensible Markup Language)

• XML is similar to HTML

• Both XML and HTML contain markup symbols to

describe the contents of a page or file

• HTML, describes the content only in terms of how it is

to be displayed and interacted with.

23

Page 25: Audio data and metadata model for radio & television ...

ADM

Audio Definition Model, RECOMMENDATION ITU-R BS.2076

• This Recommendation describes the structure of a

metadata model that allows the format and content of

audio files to be reliably described.

• This model specifies how XML metadata can be

generated to provide the definitions of tracks in an audio

file

25

Page 26: Audio data and metadata model for radio & television ...

ADM

Audio Definition Model, RECOMMENDATION ITU-R BS.2076

• Each individual track within a file or stream should be

able to be correctly rendered, processed or distributed

according to the accompanying metadata

• Audio Definition Model is an open standard to ensure

compatibility across all systems

• The purpose of this model is to formalize the description

of audio.

26

It is not a format for carrying

audio!!

Page 27: Audio data and metadata model for radio & television ...

ADM Cooking Analogy

• To help explain what the ADM actually does, it may be useful to consider a cooking analogy.

• The recipe for a cake will contain a list of ingredients, instructions on how to combine those ingredients and how to bake the cake.

• The ADM is like a set of rules for writing the list of ingredients; it gives a clear description of each item, for example: 2 eggs, 200g butter, 200g sugar …

27

Page 28: Audio data and metadata model for radio & television ...

ADM Cooking Analogy

• The ADM provides the instructions for combining

ingredients but does not tell you how to do the

mixing or baking; in the audio world that is what the

renderer does.

• The ADM is in general compatible with wave-file

based formats such as BW64

28

Page 29: Audio data and metadata model for radio & television ...

ADM • This model will initially use XML as its specification

language

• When it is used with BW64, the XML can be embedded

in the <axml> chunk of the file

• The model is divided into two sections:

format part

content part

29

Page 30: Audio data and metadata model for radio & television ...

ADM • The format part describes the technical nature of the

audio so it can be decoded or rendered correctly.

• The content part describes what is contained in the

audio, so will describe things like the language of any

dialogue, the loudness and so on.

30

Page 31: Audio data and metadata model for radio & television ...

ADM • Some of the format elements may be defined before we

have any audio signals, whereas the content parts can

usually only be completed after the signals have been

generated.

• An audioContent element describes the content of one

component of a programme (e.g. background music)

31

Page 32: Audio data and metadata model for radio & television ...

AudioContent

32

• Sub-elements:

Page 33: Audio data and metadata model for radio & television ...

Audio Content

33

• Attributes:

• Sub-elements:

Page 34: Audio data and metadata model for radio & television ...

Audio Content

34

• Elements:

• Dialogue :

Page 35: Audio data and metadata model for radio & television ...

Audio Content

35

Page 36: Audio data and metadata model for radio & television ...

AudioProgramme • An audioProgramme element refers to a set of one or

more audioContents that are combined to create a full

audio programme.

• It contains start and end timecodes for the programme

which can be used for alignment with video timecodes.

• Loudness metadata is also included to allow the

programme’s loudness to be used.

36

Page 37: Audio data and metadata model for radio & television ...

Audio Content

37

Page 38: Audio data and metadata model for radio & television ...

Audio Content

38

Page 39: Audio data and metadata model for radio & television ...

39

THANK YOU!!

QUESTIONS??

[email protected]