Analysis of Audio Compression Algorithms Sanjeev Sharma.

55
Analysis of Audio Compression Algorithms Sanjeev Sharma

Transcript of Analysis of Audio Compression Algorithms Sanjeev Sharma.

Analysis of Audio Compression Algorithms

Sanjeev Sharma

What will be covered?

What are the audio file formats? Why so many? History of the most popular format (MP3) NeoAudio Transcoder MP3 file format explained MP3 Algorithm/ Features/ Issues VQF vs. MP3 Ogg Vorbis vs. MP3

Some Audio Formats

Uncompressed– RIFF: Resource Interchange File Format (Windows)– AIFF: Audio Interchange File Format (Mac)– AU: Audio (Unix)

Compressed– MP3 : MPEG-I/II Layer 3 – VQF: [Transform-domain Weighted Interleave]

Vector Quantization Format– Ogg Vorbis

Why so many formats?

Different hardware/ operating systems need different file structure/ device drivers

– Apple plays AIFF (uncompressed) AIFC (compressed)

– Sun or DEC (Unix) play ‘au’, ‘snd’

– PCs (Windows) play ‘RIFF’/‘wav’ (uncompressed), ‘wma’, ‘wmv’ (compressed)

Why so many formats? (Cont’d)

Several companies came out with their own Proprietary Technologies – InterWave by VocalTec (www.vocaltec.com)– TrueSpeech by DSP Group, Inc (www.dspg.com)– RealAudio by Real Networks (www.real.com)– ToolVox by VoxWare (www.voxware.com)– Perceptual Audio Coder (PAC) by Lucent (

www.lucent.com)

Why so many formats? (Cont’d)

Proprietary Technologies (Cont’d)– Adaptive Transform Audio Coding (ATRAC) by Sony

(http://www.sony.net/Products/ATRAC3)– TwinVQ or VQF from NTT/ Yamaha (

http://www.yamaha-xg.com)– Windows Media Audio by Microsoft (

http://www.microsoft.com/windows/windowsmedia)

Why so many formats? (Cont’d)

Several companies collaborated to define non proprietary open standards – Specification available to all– But different economics involved

In general, MP3 Encoder is not free, has IPR restrictions Ogg Vorbis Encoder is free, and open source

MPEG

Stands for Moving Pictures Experts Group MPEG-1

– First phase, started in 1988, finalized in 1992– Three operating mode with increasing complexity and

performance Layer 1, Layer 2, Layer 3

MPEG-2– Originally (1994) only added two extensions to MPEG-1

Backwards compatible multi-channel coding Coding at lower sampling frequencies

– Later gave up backwards compatibility in favor of Advanced Audio Coding (AAC)

MPEG (Cont’d)

MPEG-3– Created to define High Definition Television (HDTV) video

coding – Later rolled into MPEG-2 itself

MPEG-4– Finished in late 1998– Emphasis on new functionalities rather than compression

efficiency Mobile/ Stationary User Terminal Database Access Communications Interactive Services

MPEG (Cont’d)

MPEG-7– Does NOT define compression algorithm– Content representation standard for multimedia

information search, filtering, management and processing

MPEG Layers

Layer 1– possesses the lowest complexity – specifically targeted to applications where the complexity of

the encoder plays an important role.

Layer 2– requires a more complex encoder as well as a slightly more

complex decoder. – is able to suppress more redundancy in the signal and

applies the psychoacoustic model in more efficient way.

MPEG Layers (Cont’d)

Layer 3– increased complexity – targeted to applications needing the lowest data

rates, by its suppression of the redundant signal and its improved extraction of feebly audible frequencies using its filter

– MP3 stands for MPEG-1/2 Layer 3 and not MPEG-3!!

Personal Car Stereo

Installed Sony CDX-MP450X in my car hoping I would be able to enjoy my MP3’s while driving

Burnt an mp3 CD to play on car stereo (~150 songs) Most of the mp3’s were skipped, only some actually played Investigated to find the difference Turned out that player was able to decode only high bit rate files Installed free software (NeoAudio) on computer to do the

‘transcoding’– Conversion from one sampling rate and/or bit rate to another

‘On the fly’ converted files play, but with ‘clicks’ Intermediate conversion to wav and then transcoding to mp3 gave

perfect results!

NeoAudio Transcoder

Transcoder Settings

Transcoding Options

Choose Encoder Choose MPEG Version Choose Bitrate Choose Mode Choose Quality Choose Samplerate

Choose Encoder

Choose Encoder Version

Choose Bitrate

Choose Mode and Quality

Choose Samplerate

MP3 File Format

File itself split into frames– One frame is and audio clip of 24 ms at 48 KHz sampling

Each frame has a 4 byte frame header Constant Bit Rate files have similar frame headers Variable Bit Rate (VBR) files have different info in each

frame header– Lower bitrates may be used in frames where it will not affect

quality

MP3 Frame Header

AAAAAAAA AAABBCCD EEEEFFGH IIJJKLMM – A - Frame sync (all 11 bits set)– B - MPEG Audio version ID (2 bit)– C - Layer description (2 bit)– D - Protection bit (1 bit)– E - Bitrate index (4 bit)– F - Sampling rate frequency index (2 bit)– G - Padding bit (1 bit)– H - Private bit (1 bit)– I - Channel Mode (2 bit)– J - Mode Extension (2 bit)– K - Copyright (1 bit)– L - Original (1 bit)– M - Emphasis (2 bit)

MPEG Audio version ID (B)

00 - MPEG Version 2.5 (unofficial) 01 - reserved 10 - MPEG Version 2 (ISO/IEC 13818-3) 11 - MPEG Version 1 (ISO/IEC 11172-3)

Layer Description (C)

00 – reserved 01 - Layer III 10 - Layer II 11 - Layer I

Protection Bit (D)

0 - Protected by CRC (16bit crc follows header) 1 - Not protected

Bitrate Index (E)

bits V1,L1 V1,L2 V1,L3 V2,L1 V2, L2 & L3

0000 free free free free free

0001 32 32 32 32 8

0010 64 48 40 48 16

0011 96 56 48 56 24

0100 128 64 56 64 32

0101 160 80 64 80 40

0110 192 96 80 96 48

0111 224 112 96 112 56

1000 256 128 112 128 64

1001 288 160 128 144 80

1010 320 192 160 160 96

1011 352 224 192 176 112

1100 384 256 224 192 128

1101 416 320 256 224 144

1110 448 384 320 256 160

1111 bad bad bad bad bad

Sampling rate frequency index (F)

bits MPEG1 MPEG2 MPEG2.5

00 44100 22050 11025

01 48000 24000 12000

10 32000 16000 8000

11 reserv. reserv. reserv.

Padding Bit (G), Private Bit (H)

Padding Bit (G)– 0 - frame is not padded– 1 - frame is padded with one extra slot– Padding is used to fit the bit rates exactly

Private Bit (H)– May be freely used for specific needs of an

application

Channel Mode (I)

00 - Stereo01 - Joint stereo (Stereo)10 - Dual channel (2 mono channels)11 - Single channel (Mono)

Mode Extension (J)

Applicable to Joint Stereo only

Complete frequency range of MPEG file is divided into 32 subbands

For Layer I & II these two bits determine frequency range (bands) where intensity stereo is applied.

For Layer III these two bits determine which type of joint stereo is used (intensity stereo or Middle/Side stereo).

Value Layer 1/ 2

Intensity

Stereo

Layer 3

Intensity

Stereo

Layer 3

MS

Stereo

00 Bands 4 to

31

Off Off

01 Bands 8 to

31

On Off

10 Bands 12 to

31

Off On

11 Bands 16 to

31

On On

Copyright (K), Original (L), Emphasis (M)

Copyright (K)– 0 - Audio is not copyrighted– 1 - Audio is copyrighted

Original (L)– 0 - Copy of original media– 1 - Original media

Emphasis (M)– It is used to sort of 're-equalize' the sound after a Dolby-like noise

supression– 00 – none– 01 - 50/15 ms– 10 - reserved– 11 - CCIT J.17

Perceptual Audio Coder (PAC)

Original work attributed to Lucent (http://www.bell-labs.com/org/1133/Research/SpeechAudioCoding/audio.html)

Became the framework of MPEG-2 encoders

MP3 Encoder/ Decoder

AnalysisFilterbank

PerceptualModel

AudioIn

Quantization& Encoding

Encoding ofbitstream

Bitstreamout

Decoding ofbitstream

InverseQuantization

SynthesisFilterbank

AudioOut

BitstreamIn

Decoder

Encoder

MP3 Encoder/ Decoder (Cont’d)

Filter Bank– Encoder decomposes input signal into subsampled spectral

components (time/ frequency domain)– Forms an Analysis/ Synthesis system in combination with the

decoder filterbank Perceptual Model

– For either time domain signal or the analysis filterbank output Computes an estimate of the actual (time and frequency

dependent) masking Uses rules known from psychoacoustics

– Psychoacoustics: Relationship between what arrives at the ear and what we hear

MP3 Encoder/ Decoder (Cont’d)

Quantization and coding– Spectral components are quantized and coded

keeping the quantization noise below the masking threshold

Encoding of bitstream– Bitstream formatter assembles the bitstream– Bitstream consists of

Quantized and coded spectral coefficients Side information like bit allocation information

MPEG Flexibility

Flexibility needed to fit into several applications Flexibility achieved with

– Different Operating Modes Single channel Dual channel (two independent channels) Stereo (no joint stereo coding) Joint stereo

– Different Sampling frequencies 32 KHz, 44.1 KHz, 48 KHz (MPEG-1) Half of above (MPEG-2) ¼ th of MPEG-1 (MPEG-2.5, proprietary Fraunhofer extension)

MPEG Flexibility (Cont’d)

Flexibility achieved with– Different Bit rates

Bitrate defines the compression ratio Min 32 kpbs to Max 320 kbps for MPEG-1 Min 8 kpbs to Max 160 kbps for MPEG-2 Low Sampling

Frequencies extension (LSF) Variable bit rate also possible (each segment has its own bit rate) Sweet spot – 128 Kbps for stereo signal at 48 KHz sampling rate

– Bit rates higher than this, improve quality very slowly– Bit rate lower than this, degrade quality very fast

MP3 Quality

Not all encoders are created equal Quantization and encoding block forms

– Inner control loop to adjust the quantization step with the available Huffman codes (rate loop)

– outer control loop with the perceptual block to keep quantization noise under masking threshold (noise control loop)

Hence encoder needs to be ‘tuned’ for different bitrates

MP3 IPR Issues

MPEG is an open standard But it is informative only The ISO approved standard is based on work by Fraunhofer

Institute, which is protected by several patents. In September 98, Fraunhofer Institute, sent a letter to several

developers of "free" ISO-source based encoders saying that all developers and publishers of MPEG-audio layer 3 (MP3) encoders based on ISO-source must pay a license fee to Fraunhofer.

Fraunhofer joined with Thomson Multimedia (AKA RCA) in order to create a joint patents portfolio: mp3licensing.com

Sample MP3/MP3 Patents

Digital coding process Digital adaptive transformation coding method Process for the detecting of errors in the transmission of

frequency-coded digital signals Process for reducing frequency interlacing during acoustic

or optical signal transmission and/or recording Method for reducing data in the transmission and/or storage

of digital signals of several dependent channels Process for reducing data in the transmission and/or storage

of digital signals of several interdependent channels Etc…etc..

LAME

LAME Ain’t an Mp3 Encoder LAME is an educational tool to be used for

learning about MP3 encoding The goal of the LAME project is to use the

open source model to improve the psycho acoustics, noise shaping and speed of MP3

Free Software?

Several free software like NeoAudio use LAME plug-in, despite the cryptic note on the official homepage (http://www.mp3dev.org)– “Using the LAME encoding engine (or other mp3

encoding technology) in your software may require a patent license in some countries.”

NeoAudio and LAME are open source software under the GNU General Public License

VQF or TwinVQ

Started by NTT/ Yamaha Corp Some claim that VQF produces audio files with better

compression and better sound quality than MP3. Others say, the sound quality of a VQF file is not better

nor worse than a MP3 file, it is just different. Needs more processing power for encoding/ decoding Supported in MPEG-4 Support for VQF has waned as of late

MP3 vs. VQF

MP3 128Kbps Original 1411Kpbs VQF 96Kbps

MP3 vs. VQF (Cont’d)

Colors vary from red (peaks in power spectra) to blue and violet (the lowest signal power). - VIBGYOR

MP3 vs. VQF (Cont’d)

1. MP3 psychoacoustic model excludes completely some high frequencies (colored blue) when it decides that they are irrelevant. Clearly, VQF designers have decided not to exclude any part of the spectrum.

2. MP3 preserves power spectra peaks (colored red) very good, but it has its problems with the "green" and "yellow" parts; this can be heard by a careful listener. VQF does not preserve the peaks at the highest frequencies that good, but it beats MP3 at everything else (especially at mid-frequencies).

MP3 vs. VQF (Cont’d)

VQF vs. MP3 (Cont’d)

Conclusion? – It seems that MP3 has a better psychoacoustic

model. – VQF sounds (and looks) more natural.

Ogg Vorbis

Started in 1993 Development picked up in fall, 1998 after Fraunhofer

started asking royalties for MP3 projects Ogg is a container format for audio, video, and

metadata Vorbis is the name of a specific audio compression

scheme that's designed to be contained in Ogg– other formats are capable of being embedded in Ogg

such as FLAC and Speex

Why Ogg?

MP3 vs. Ogg

MP3 vs. Ogg

MP3 vs. Ogg

Frequencies over 16 KHz are lost in both Cutoff more severe for MP3 around 15 KHz Ogg does maintain, although diminishing,

some of higher frequencies

That’s All Folks!