Multimedia Services:Audio
Sep-2015
Dani Gutiérrez PorsetAssociate Professor
Communications Engineering
Eman ta zabal zazu
2 2Multimedia Services:
Audio
Thanks, Licences and Tools
● Thanks to people and organizations who took or take part in free software and free knowledge projects, specially Wikimedia Foundation and KDE
● This presentation is licensed as CC BY-SA 3.0 EShttp://creativecommons.org/licenses/by-sa/3.0/es/
● This presentation has been made with KDE, LibreOffice, Inkscape, Gimp, Chromium, Firefox
3 3Multimedia Services:
Audio
Sources and References
● Images from Wikimedia Foundation, if not referenced other source. Logos and trademarks belong to respective organizations
● Texts:
– Wikipedia pages and referenced articles and material– “Guide to Voice and Video over IP” - Sun, Mkwawa, Jammeh,
Ifeachor– “Video over IP” - Wes Simpson– “Computer Networking, a top-down approach” - Kurose, Ross
4 4Multimedia Services:
Audio
Index
● Introduction● Codecs● Speech● Files, Containers and Formats● Audio wires and connectors
5 5Multimedia Services:
Audio
Human ear
● Time domain:– Not able to hear short
time signals (< 1 msec)– Loud signals mask
quieter signals near in time
Introduction
● Frequency domain:– Range of audible
frequencies– Loud signals at one
pitch mask quieter signals at a near pitch
6 6Multimedia Services:
Audio
Audio Applications
● Speech, VoIP● CD, DVD sound● Digital Audio/Video Broadcasting (DAB, DVB)● Internet streaming● Studio/transmitter link● Theatrical movie presentation● MIDI (similar to vectorial images): Technical standard for musical
instruments that describes pitch, durations, velocity,... of notes. An output hardware device or a software synthesizes real audio
Introduction
7 7Multimedia Services:
Audio
Audio analog signals
Waveform:for time-domain
Spectrum:for freq-domain
Audacity screenshots. Dani Gutiérrez
Introduction
8 8Multimedia Services:
Audio
Modulation families
Analog baseband signal Digital signal
Analog bandpass channel Analog modulatione.g. AM, FM
Digital modulatione.g. PSK, FSK, ASK, QAM
Analog baseband channel Pulse modulation,analog over analoge.g. PAM, PWM
Digital baseband modulatione.g. Unipolar, NRZ, Manchester
Digital channel Pulse modulation,analog over digitale.g. PCM
Introduction
9 9Multimedia Services:
Audio
Analog-over-digital modulations= Digitization
● Pulse-code modulation (PCM)– Differential PCM (DPCM)– Adaptive DPCM (ADPCM)
● Delta modulation (DM or Δ-modulation)● Adaptive-delta modulation (ADM) or Continuously variable
slope delta modulation (CVSDM)● Delta-sigma modulation ( Δ)∑
● Pulse-density modulation (PDM), e.g. used in Super Audio CD (“DSD” trademark from Sony and Philips)
Introduction
10 10Multimedia Services:
Audio
Pulse Code Modulation
1.Sampling (>= 2 x bandwidth of the analog signal)Errors depend on Frecuency cut and clock accuracy
2.Quantization: uniform (LPCM=Linear PCM) or non-uniform (PCMA=A-law, PCMU= -law)μErrors: Granularity
3.Coding: number of bits per sample
Mode Bandwidth (Hz) Sampling (kHz)
Narrowband (NB) 300–3400 8
Wideband (WB) 50–7000 16
Super-wideband (SWB) 50–14000 32
Fullband (FB) 20–20000 48
Introduction
11 11Multimedia Services:
Audio
Audio Codecs
● Aim of a Codec: to convert and to compress, for storage and transmission over distinct media, e.g.– AMR-NB: lossy, for speech– Dolby Digital: lossy, for cinema and HDTV broadcast– Dolby TrueHD: lossless, for home entertainment
● Conversion types:– Analog to Digital (+ Digital to Analog)– Digital to Digital
● Bitrate (kbits/s) at the codec outputhttp://en.wikipedia.org/wiki/Analog-to-digital_converter
http://en.wikipedia.org/wiki/Audio_coding_format
Codecs
12 12Multimedia Services:
Audio
Classifications of Audio CodecsCodecs
● Nature of source: speech, music, cellular (2G GSM, 3G ARM)...● Source signal bandwidth (NB, WB, SWB, FB) and Sampling rate
● Resulting bitrate (Most in 4,8 to 16 kbps)● One or more bitrates (adaptive)● Lossless or lossy● Latency or delay (inherent to each algorithm)● Quality
● Creator (ITU-T, IETF, ETSI, Skype,...)● Licenses● Costs for encoder and player
● Compression techniques and algorithms(depend on nature and bandwidth of source signal):Frame based or sample based, Delay, CBR or VBR, No. of channels,...
● Complexity (computation time)
Source
Processing
Result
Legals & Costs
http://en.wikipedia.org/wiki/Comparison_of_audio_coding_formats
13 13Multimedia Services:
Audio
Audio compression
● Based on psychoacoustics:– Threshold of hearing (frequencies)– Simultaneous masking
● Lossy used algorithm families:– Time domain: Linear predictive coding (LPC), mainly for speech: CELP,
ACELP, VSELP, LPC, RPE-LTP,...– Freq domain:
● Modified discrete cosine transform (MDCT), e.g. CELT● Applied to full band or to sub-bands (SBC): break signal into freq bands, and
encode each one independently, e.g. MP3
– Some codecs combine both, e.g. G.718 uses CELP and MDTC
Codecs
14 14Multimedia Services:
Audio
Compression ratio
CodecDigital Input
StreamOutputStream
● f=Sampling freq (kHz)● bs=Bits/sample
● b=Bitrate (kbps)
f x bsb
Compression ratio =(related to input)
64b
Compression ratio =(related to 64 kbps)
Codecs
15 15Multimedia Services:
Audio
Framed based vs Sample based
● Sample-based: one sample each timee.g. PCM and ADPCM
● Frame-based: more than one sample is taken, to study correlation between near samples. Frame length can be fixed or variablee.g. G.723.1 and G.729
Codecs
16 16Multimedia Services:
Audio
Audio Codecs and Delays● Delays more or less appropriate for some types of transmission:
– Low latency: less compression, higher bitratee.g. for real time in VoIP or satellite communications
– High latency: higher compression, lower bitratee.g. for stored media, broadcasting or recording
● Origin of latencies:– Processing, depends on hardware– Inherent to each algorithm or codec (buffering is needed):
● Frame size (msec): related to number of samples inside the frame● Look-ahead time: when needed to study correlation between actual and next frame
● Delay calculations:– In sender: Algorithm delay = Frame length + Look-ahead time– In both: Codec delay = 2 x Frame length + Look-ahead time
Codecs
17 17Multimedia Services:
Audio
Audio Codecs: CBR vs VBR
● CBR: Constant bitrate. Older● VBR: Variable bitrate:
– Frames of a file with distinct bitrates depending on variability of information, higher during more complex periods
– Better quality vs size, but more complex to encode– Typical in lossless compression (e.g. FLAC, Apple Lossless) and in some
lossy compressions (e.g. MP3, Opus, Vorbis, AAC)– Encoding in single-pass (“on the fly”) or multipass (not for real time or
live streaming)– Input parameter: fixed quality, max/min bitrate, average bitrate, file
size
Codecs
18 18Multimedia Services:
Audio
Audio Codecs and Channels
● Mix two (stereo) or more channels of similar information reducing size but at high quality, instead of store and send independent channels
● Techiques (used in e.g. MP3, AAC, Vorbis) that may be combined for a signal:– Simple Stereo (SS): independent channels. No compression– Mid-side Stereo (MS):
● Middle = (L+R)/2, Side = (L-R)/2.● Can benefit if signal is more “mono-like”, compressing new “Middle channel”
– Intensity Stereo:● Based on phychoacoustics, replaces both channels with a single signal plus directional
information● Better at low bitrates, worse at high bitrates
Codecs
19 19Multimedia Services:
Audio
Audio Codecs comparisons
Source: http://www.opus-codec.org/comparison/
Codecs
20 20Multimedia Services:
Audio
Example of Audio Codec: MP3
● Versions: MPEG-1, MPEG-2, MPEG-2.5 Audio Layer III● Specification defines decoder better than encoder.
Distinct implementations for encoder, e.g. LAME● Distinct bitrates and sampling rates depending on version● Channels: 2 in MPEG-1 mode and up to 5.1 in MPEG-2● Algorithms: MDCT Hybrid Subband● Supports CBR and VBR● Licensing and patent war
Codecs
21 21Multimedia Services:
Audio
Other examplesof typical Audio Codecs
● AAC (Advanced Audio Coding), from ISO and IEC. Part of MPEG-2 and MPEG-4. Designed to replace MP3. Patent for coding, not for streaming or distributing contents
● Vorbis, from Xiph.Org foundation: typically inside Ogg or WebM containers. Based on MDCT. Open, Royalty-free
● Opus, from IETF. Suitable for interactive real-time. Based on CELT and SILK. Open, Royalty-free
http://en.wikipedia.org/wiki/Category:Audio_codecs
Codecs
22 22Multimedia Services:
Audio
Speech case
● Distinct to music● Interactive● Voiced speech: harmonics (at freq depending if
male/female)● Unvoiced signal: like white noise
Speech
23 23Multimedia Services:
Audio
Speech Codecs
● Aim: intelligibility and speaker identification● Specialized codecs, e.g.:
– Better for music: Vorbis– Better for speech: GSM, Speex,...
● Distinct times:– Speech frame: time to encode a frame of speech– RTP Packet voice duration: time to packetize and send to the network
e.g. for PCM: 20 msec
● Sometimes a VoIP tool provides several codecs to be selected manually or automatically, and can be changed during conversation
Speech
24 24Multimedia Services:
Audio
Speech Codecs:Techniques and Codec Comparison
● Compression: remove short-term correlation (~ 1 msec) and long-term correlation (~ 5 to 10 msec).
● Techniques: Waveform, Parametric (Vocoders for speech), Hybrid
Source: http://www-mobile.ecs.soton.ac.uk/speech_codecs/common_classes.html
Speech
25 25Multimedia Services:
Audio
G.711 Codec
● Reference codec for comparison● G.711 = “PCM of voice frequencies”
8k samples/sec x 8 bits/sample = 64 kbps● Voice quantisation: non-uniform logarithmic quantisation
because of its nature of voice: lower level speech signal has higher PDF (Probability Density Function) than higher speech
● Variations:– µ-law (North America, Japan): 14 bits to 8 bits– A-law (Europe): 13 bits to 8 bits
Speech
26 26Multimedia Services:
Audio
Speech compression:Waveform based technique
● Method: Remove rendundancy in waveform and reconstruct.
● Complexity: low● Results: 16 kpbs to 64 kbps● Examples of Codecs:
– PCM– ADPCM (Adaptative Differential PCM), for NB and WB
Speech
27 27Multimedia Services:
Audio
Speech compression:Parametric based technique
● Method:– Take segments of short periods (~20 msec) and classifies them in
voiced or unvoiced– The voice parameters of each segment are obtained via speech
analysis, encoded and sent
● Complexity: high● Results: better compression ratios, bad quality● Examples of Codecs:
– LPC (Linear Prediction Coding); 1,2 to 4,8 kbps. Used for secure wireless communications
Speech
28 28Multimedia Services:
Audio
Speech compression:Hybrid based technique
● “Analysis-by-Synthesis coding”● Method: Combines waveform and parametric● Examples of Codecs:
– CELP (Codebook Excitation Linear Prediction): 4,8 to 16 kbps. Mobile/wireless/satellite communications achieving toll quality (MOS over 4.0)
– Other modern codecs: G.729, G.723.1, AMR, iLBC, SILK
Speech
29 29Multimedia Services:
Audio
Speech codec examples
Source: Cisco. Voice Over IP - Per Call Bandwidth Consumption
Other important speech codecs:● SILK, from Skype. Based on LPC. Not royalty-free● iSAC (internet Speech Audio Codec): wideband and super wideband, open, royalty-free● ILBC (internet Low Bitrate Codec): narrowband, open, royalty-free for WebRTC● AMR (Adaptive Multi-Rate) or AMR-NB (Narrow Band)● AMR-WB (Wideband)● Speex http://en.wikipedia.org/wiki/Category:Speech_codecs
Speech
30 30Multimedia Services:
Audio
Audio files
● No. of Channels: one (“mono”), two (“stereo”) or Multichannel
● Compression and codecs:– No compression: raw PCM,...– Lossless: FLAC, Apple Lossless .m4a, WMA lossless,...– Lossy: MP3, Vorbis, AAC,…
Files, Containers and Formats
http://en.wikipedia.org/wiki/Audio_file_format
31 31Multimedia Services:
Audio
Examples of Audio containers
● WAV:– Instance of [the more general] RIFF– Chunks:
● One or more chunks, e.g. 2 channels for stereo● Can contain compressed audio data and non-audio data
– Metadata for each chunk:● Encoding (typically LPCM uncompressed), No. of channels,
bits/channel, sample rate● Labels: artist, comments,...
● From video: Ogg, MPEG-4 Part 14 or MP4
Files, Containers and Formats
32 32Multimedia Services:
Audio
Audio physical formats
● CD:– Reference: “Red book”– Digital audio encoding:
● 2-channel● Signed 16-bit● Linear PCM● 44,100 Hz
– Similar but distinct to WAV: no headers, tracks that match the CD's sector sizes
● Other supports and associated modulations and lossless codecs:
Super Audio CD (SACD) Pulse density modulation + Direct Stream Transfer
DVD-Audio, Blu-ray, (HD DVD) Meridian Lossless Packing
Files, Containers and Formats
Top Related