Embracing Mp3: mastering audio in the age of mobile...

Embracing Mp3: mastering audio in the age of mobile music

Chris Camilleri Advisor: Dr. Kenneth Peacock

Spring 2009

Submitted in partial fulfillment of the requirements for the

Master of Music in Music Technology

in the Department of Music and Performing Arts Professions

in The Steinhardt School

New York University

Table Of Contents Abstract 4 Preface 5 1. Introduction and rationale 6 2. Mastering audio 8 2.1 Role of a mastering engineer 2.2 Tools of the trade 2.3 Tip of the iceberg 3. MPEG data compression for digital audio 13 3.1 A brief history 3.2 Inside the MPEG encoder 3.3 MPEG-1, Layer III (Mp3) 3.4 MPEG-2 (AAC) and the future of encoding 4. An objective test of MPEG encoders 25 4.1 Related work 4.2 Methodology 4.3 Results and analysis 4.4 Sources of error 5. Discussion 44 6. Conclusion 51 Selected References 53 Appendices 54

Abstract The aim of this study was to develop a guide to MPEG audio compression relevant to today’s mastering engineer. Two of the most popular encoders and bit rates were tested over a diverse set of 15 popular music samples using professional mastering tools. The Mp3 encoder was shown to have a statistically insignificant effect on peak/RMS level and phase correlation. The AAC encoder had similar success in phase, but caused significant change in peak/RMS level. When tested for high frequency shape, AAC outperformed Mp3, especially at higher bit rates. It is projected that the AAC encoder and its future descendants will be the ideal model of data compression technology for the purpose of mastering professional audio.

Preface As a young audio professional, there are many practical dilemmas one faces for the first time. Accordingly, there are three paths he can take: ask the advice of someone more experienced, seek the wisdom of literature, or simply take a plunge into the abyss and hope he comes out with a lesson learned, if not the answer he sought. I must admit that my typical approach to these dilemmas is choice three. I like to find my own way; it’s quite selfish, indeed. There was one particular issue, however, that inspired me to contribute some small part to the increasingly generous bed of knowledge we call literature. Of course, this meant that I had to first be aware what other professionals had already contributed. Discontented with answers I had found alone while chasing my own solution, I took the advice of my mentors (who, to my surprise, could also not answer the question in a concrete way) and bought a book—a few books.

The issue with which I was concerned was, in the broadest sense, mastering music in the 21st century. The gap I saw existed somewhere between a mastered piece of music and its digitally compressed replica. My question was two-fold: how best can we prepare a master to be compressed into a portable file, and subsequently how can we make the best use of data compression technologies we currently use? While exploring, I found very little in the way of a direct answer to my question. One book, however, was responsible for inspiring me to pursue the questions necessary to fill in the gap. It was Bob Katz’s Mastering Audio: the art and the science. And it was his professional philosophy and practical, yet objective, approach that I found especially appealing.

I set out to tackle an issue that had not yet, by my calculation, been adequately addressed in the literature. This issue was paramount in importance, I reasoned, because the adaptation of a master to its distributed format was essentially a final step in the mastering process. It did not make sense to me that we, as professionals, could treat the last leg of the race as something of a mystery. After all, it is this last leg by which the energy and calculation of the entire race is given its meaning.

The document that follows is a piece of the puzzle that I found to be missing. Its main concern is not with the application of novel mastering techniques, but with my belief that data compression is integral to the mastering process. The document is inspired by the work of Katz and informed by several cornerstone technical essays on data compression. My work is less a treatise on mastering than it is an objective evaluation of the effects of data compression. Through this exercise, I hope we may make better decisions throughout the mastering process, and make the race worthwhile.

1. Introduction and rationale

Throughout 100 years of recorded audio, each consumer audio format has

demanded specific treatment during the mastering process. Very often, these were

necessary ways of dealing with physical phenomena. Vinyl, for instance, had limited

low frequency range; waveforms too large caused the stylus to jump. On the other

hand, tape had a characteristic hiss and saturated quality that, while pleasing to the ear,

nonetheless created equalization and dynamics concerns that had to be addressed in

mastering.

In the age of digital audio, we have started to move away from the physical

restrictions of one consumer format, yet mastering remains an integral part of the

production process. The pristine quality of digital audio has provided mastering

engineers and recording artists with the most freedom of any recorded format. The

practice of mastering has, in the words of mastering guru Bob Katz, become not only a

science but an art as well.1 And while digital formats have been the standard in

recording for nearly 20 years, a rift has emerged between professional and consumer

audio—one that has been inadequately assessed and unfortunately disregarded.

Today’s consumer market is one based heavily on the Internet. Downloads,

streaming, and portability are catchwords with which we have all become familiar.

Two formats particularly important in the digital consumer market are Mp3 and AAC,

progeny of the Motion Picture Experts Group (MPEG). Their technologies employ

psychoacoustic models and other types of data compression to diminish file size in as

inaudible a manner as possible, thereby making files easier to distribute throughout the

1 Katz, 2.

Internet. While Mp3—the older of the two—is still the most popular of these formats,

primarily due to its longevity and lack of DRM copyright protection, the differences

between AAC and Mp3 are gigantic in the scope of audio mastering. Regrettably, these

differences have been virtually disregarded by professionals in the field. Without any

literature to justify their ignorance, the professional mastering community seems to

have taken a laissez-faire stance when it comes to the final process in mastering:

conversion to today’s most popular consumer formats—not Compact Disc.

The aim of this study is to welcome these ‘mobile’ audio formats by observing

their technical specifications, extending research into the specific field of mastering, and

drawing conclusions about the appropriate use and future development of these

formats. The research will stem from a literature review in which the responsibilities of

a mastering engineer will be introduced and specifications of Mp3 and AAC encoders

will be addressed. A technical discussion of encoder mechanisms and quality

considerations will cover filter bank analysis, quantization coding, bandwidth limiting,

pre-echo artifacts, and other relevant topics. The review will inspire original research

directed at several audio principles vital to the mastering process, aimed at

demonstrating the relevance of this topic to the mastering engineer through objective

tests using digitally encoded music. Such topics include: dynamic contour, phase

correlation, and spectral content. Following the data analysis will be a discussion that

ties together theoretical background with objective data. In this discussion, practical

methods of treating audio for mastering to digitally compressed formats will be

presented and the differences between Mp3 and AAC will be revealed.

2. Mastering audio

2.1 Role of a mastering engineer

Insofar as mastering is both an art and a science, it should be treated as such.2

There exist quantifiable aspects of the trade, ways in which we can predict how a piece

of audio will behave in a given signal chain. At the same time, the aesthetic properties

of mastered music are typically quite different—sonically speaking—than the

unmastered predecessor, meaning that mastering is the last creative process in creating

a piece of commercial music. Much of what is now possible in mastering can be

attributed to digital technology’s virtually nonexistent physical limitations and nearly

limitless degree of fidelity. In the words of famed engineer Geoff Emerick, “Mastering

engineers were originally ‘the men in white coats’ who cut the records and were not

allowed to have any creativity.”3 The way that mastering is done for the Compact Disc

(CD) is drastically different than the way it was done for tape and vinyl. Thus, I argue

mastering for digitally compressed files, the first non-physical digital format, should too

be given a different treatment than mastering for CD.

Mastering requires a highly trained ear; pitch perception must be at least ½

octave, which equates to a few hundred hertz in the lower range and a few thousand in

the higher range.4 The art and science of mastering is an exacting process, not unlike

splitting a pea. A mastering engineer is trained to fix problems of two types: aesthetic

and inaesthetic, or technical. Aesthetic issues are those concerned mainly with artistic

communication: equalization, compression, loudness, reverberation and effects, and

2 Katz, 23. 3 Katz, 24. 4 Katz, 47.

stereo width. Technical issues pertain to such problems as dropouts, phasing, hiss,

noise, hum, and the list goes on.5 Note that none of the lists below each of these two

responsibilities appears to be entirely artistic or scientific. In other words, there is

always a scientific rationale for why an equalization choice might work. On the other

hand, there may sometimes be an artistic value in allowing—or, in some cases,

adding—digital overloads or noise. The line between aesthetic and technical properties

is thin. The job of the mastering engineer is to recognize these issues and deal with

them in a way that is appropriate to the music, in both a scientifically cogent and

artistically meaningful way.

2.2 Tools of the trade

When a mastering engineer is working, there are several important tools he uses

to aid his ears in analysis. While good ears are the keys to a good master, they are

perhaps the easiest to fool of our five senses, which is why visual mastering tools are

vital to the process. The Fletcher-Munson curves tell us that at low volumes bass

frequencies seem to be faint or non-existent, whereas high volumes distort our

perception in the opposite way.6 Moreover, frequency octaves are tens or hundreds of

cycles apart in the lower range of human hearing, and thousands of cycles apart in the

higher range, delivering to the mind more information about low frequencies than

high.7 Thus, ears must be trusted delicately and not without a measure of skepticism; as

we will see, the gullible nature of our auditory system is both a blessing and a curse.

5 Katz, 50. 6 Pohlmann, 323. 7 Pohlmann, 320.

In his book, Katz points to four key principles vital to mastering audio. Each of

these requires the utmost in accuracy: monitoring, metering, phase, and spectral

content. The first is peripheral to the investigation contained in this document, but

should be mentioned anyhow. When it comes to listening during the mastering

process, one must pay respect to the aforementioned inconsistencies of human hearing.

A stepped (incremental by decibel) monitor controller is the ideal way to adjust volume,

such that the engineer is always intellectually aware of how loud he is hearing the

music.8 This will ensure that dynamics and equalization decisions are made

conscientiously and not through the veil of fooled perception. In my personal studio,

where this luxury cannot be afforded, I keep a Sound Pressure Level (SPL) meter on

hand to aid my judgment.

The next topic, metering, is one that is closely related to monitoring. The most

direct visual cue a mastering engineer can follow is that which his meters deliver: level,

or volume. For monitor calibration and precision metering, Katz developed the K-

System in the early 1990s. Many of today’s digital meters, such as those implemented

in Spectrafoo (a program used extensively in this investigation), have adopted this

system. There are three scales defined by the K-System specification: K-20 is intended

for music with wide dynamic range (i.e. classical music), K-14 is for the majority of

recordings (pop, folk, rock, etc.), and the K-12 meter is for those recordings destined for

radio broadcast. Each of the numbers following the ‘K’ designation refers to the

decibels of headroom past the 0 dB (83 dB calibrated monitor level) threshold.9 In

addition to peak and RMS level cues, metering gives the mastering engineer a visual

8 Katz, 170. 9 Katz, 186.

representation of the music’s dynamic range—the difference in decibels between

highest and lowest levels.10

The third concept is relative phase, or polarity. Simply put, a perfectly monaural

signal will have a polarity of +1, meaning at any instant the left and right channels are

perfectly in phase with one another. On the other hand, a purely stereo—or dual-

mono—signal will have a polarity of -1, with no correlation between left and right

channels. A good stereo recording usually contains a significant amount of mono

information; its polarity tends toward +1, varying between 0 and +1 throughout the

duration of the piece. Errors in phase can not only skew the stereo image but also create

comb filtering, which results in inadvertently changed, adversely affected frequency

content.11

The final principle, spectral content, is—apart from dynamics—the most

important way in which a mastering engineer makes his mark. In Katz’s opinion, they

provide “protection from the manifold and varied bugs of digital audio, but are no

substitute for the ear,”12 one more tool on the mastering engineer’s belt that allows him

to be confident in the decisions he makes. The Fast Fourier Transform implemented in

Spectrafoo’s Spectrogram allows the mastering engineer to see a color-coded map of the

frequency content in a piece of music, changing with each moment. The spectrogram

can be helpful in confirming technical problems, especially in the upper range of human

hearing where sensitivity is limited.

10 Katz, 323. 11 Katz, 206. 12 Katz, 216.

Throughout the following investigation we will pay special attention to the

topics described above, striving to develop valid new research that can accurately speak

the language of professional mastering engineers, and hopefully improve the sonic

quality of digitally compressed music.

2.3 Tip of the iceberg

In Mastering Audio: the art and the science, Bob Katz briefly addresses

mastering for compressed audio:

“No special preparation is needed… though level and sound translation should be considered… There is an advantage to coding an mp3 from a 24-bit file as it will have subtly better sound quality than from a 16-bit source.”13

So while he gives a solid piece of advice to his readers, one that we will investigate with

scrutiny below, Katz does little to address the larger issues of mastering in the 21st

century. Interestingly, he offers an in-depth and rather insightful appendix dedicated

to the mastering of music for FM radio transmission, but the provocative statement

above reveals Katz’s laissez-faire, uninformed, or simply outdated philosophy towards

mastering for music made mobile by the Internet. The rest of this document will seek to

extend the comprehensive lexicon of mastering developed by Katz to include the

modern consumer formats with which we have become so familiar throughout the

decade past.

13 Katz, 35.

3. MPEG data compression for digital audio

3.1 A brief history

As the potential of personal computing and the Internet rose in the late 1980s, it

became clear to audio professionals that consumers would benefit from smaller file

sizes, making it easier to store and transmit data. As a result, the Moving Picture

Experts Group (MPEG) took on the responsibility of developing a family of encoders to

significantly compress audio and video data without much or any perceivable loss.14

And while bandwidth, server speed, and hard disk space have increased exponentially

in recent years these formats have nevertheless become the lifeblood of music,

television, radio, and film transmission. They are essential to the digital economy and

will remain staples to our Internet-based culture into the foreseeable future.

Today, Pulse Code Modulated (PCM) file types are a predominantly

‘professional’ method of storing digital audio. The PCM specification is employed by

such formats as WAV and AIFF, allowing a straight representation of binary sample

values.15 As an example, it is the AIFF format that has been used to encode audio for

the twenty years of CD’s existence. The primary drawback to PCM audio is its

innocuously large data footprint due to a high—typically 16 or 24—bit depth. One

must appreciate this level of accuracy, achieved by manufacturers designing digital

recording technology. High bit resolution ensures that analog signals are converted to

digital numbers with the utmost integrity.

To use a simple example of the PCM footprint, ten megabytes (MB) of hard disk

space are required for every one minute of CD audio at 44.1 kHz sampling rate, 16-bit

14 Fries, 141. 15 Fries, 137.

resolution. Each of a file’s parameters contributes to its overall size: duration,

mono/stereo, sampling rate, and bit depth. A typical CD track may occupy 40 to 60 MB

of hard disk space.16

Merely ten years ago, a 28.8 or 56 kilobits per second (kbps) Internet connection

was the mode of most consumer-level Internet usage. The idea of transmitting full

music albums over the web was simply an absurd dream. Today, with help of MPEG

data compression models and more robust pathways of communication, that dream has

become a reality. Many Americans now have affordable access to broadband speeds of

5 Mbps or more, but the advantages of MPEG encoding have kept compressed audio a

part of our everyday Internet lexicon. Consumers are accustomed to the portability of

Mp3s and the reasonable hard disk consumption of an encoded music library. At 50-

100 MB per album, as opposed to 700 MB for one CD, the difference is stark and

commensurate with the downsizing of our personal computing devices. Moreover,

new evidence is beginning to suggest that the younger generation of music listeners

prefers the sound of compressed audio to that of PCM audio.17 So as the Internet age

continues full-steam-ahead, it is the author’s humble opinion that the world of

professional audio should climb aboard. The best way a mastering engineer can

prepare to treat audio appropriately is to first understand the technology with which it

will be encoded. The following paragraphs will discuss the inner designs of Mp3 and

AAC encoders, members of the current MPEG family.

16 Fries, 130-131. 17 Dougherty.

3.2 Inside the MPEG encoder

MPEG audio encoders employ two types of compression: lossless and lossy.

Lossless compression uses simple mathematics to restructure the language with which

audio is stored. This process, called Huffman encoding, compresses data by using an

efficient vocabulary to code redundant information in the bit stream.18 It is referred to

as lossless compression because it does not cause any degradation of the original audio.

Indeed, an equal but opposing algorithm can return the compressed information to its

original arrangement and size. When used alone, Huffman encoding can achieve up to

a 2:1 compression ratio.

Lossy compression employs psychoacoustic models of the human auditory

system to significantly reduce file size by removing inaudible information from the bit

stream. This is designed around the principle of masking. The term masking refers to

time and/or frequency based events that cause one sound object to pass unnoticed due

to the existence of another, stronger object. In short, the algorithm attempts to mimic

the human ear, parsing the hearing range into computable sub categories and searching

for information that is either too soft to be perceived or is momentarily masked by

another sound in the same range.19 Lossy encoding technology is the heart of MPEG

encoding, achieving much higher compression ratios than that of lossless types. It is

with the effects of lossy compression that this document is primarily concerned. The

rest of this chapter will concentrate on the way in which lossy compression technologies

analyze the content of an audio file, reduce its footprint, and create a replica with near-

perfect sound characteristics at a fraction of the disk space.

18 Fries, 149. 19 Fries, 147.

Input signal

Given its structure, there are two key audio-related components of any PCM file

that may be altered to reduce its size: sampling frequency and bit depth. Of course, the

Nyquist theory advises against decreasing the sampling rate below 40 kHz, which is

twice the highest frequency in the human range. If decreased any further, audible

aliasing would occur, significantly degrading the audio.20 Therefore, an MPEG encoder

is designed to maintain sampling frequency while employing algorithms to remove

unnecessary content in the file’s bit stream, decreasing the overall bit depth

significantly.

We can imagine that a 16-bit PCM file recorded in stereo at 44.1 kHz has a bit-

rate of 1,411 kilobits per second (kbps). Mathematically speaking, 44.1 kHz is

multiplied by 2 channels for a stereo file, and multiplied again by 16 bits to equal

1,411.2 or approximately 1,411 kbps.21 Decreasing this bit-rate by just a 4:1 ratio

compresses the file to 25% of its original footprint. Most encoded music files achieve

ratios much higher than this—up to 11:1! Typical bit-rates include (in kbps): 320, 256,

128, and 64. Generally speaking, higher bit-rates deliver results that most accurately

resemble the original PCM audio signal. The chain of analysis and size-reduction tools

described below will demonstrate how this process is executed so effectively.

The chain

The encoder consists of four main elements: a filter bank, a perceptual model, a

quantizer, and finally a bit stream encoder. The filter bank is essentially an analyzer,

20 Pohlmann, 23. 21 Fries, 150.

based on Fourier’s principle.22 The encoder processes in parallel fashion a direct feed of

the time-based signal through the perceptual model and the frequency domain output

of the filter bank to determine the masking threshold, as a function of both time and

frequency. The spectral components are then fed through the quantizer, which

redefines the audio content into a digital bit stream. It is important that the

quantization noise—a byproduct of the significantly reduced bit rate—is kept below the

masking threshold at any given moment of the audio sample. Finally, the bit stream

encoder assembles the file, employing a special coding architecture utilizing such

principles as Huffman encoding to minimize file size.23

Figure 1. A diagram showing the four main components of an MPEG encoder. Note the parallel processing of audio

by the analysis filter bank and perceptual model.24

Output modes

There are three stereo modes that can be employed by an MPEG encoder, and

two bit-rate modes. Joint stereo is the most commonly implemented stereo mode,

because it has the most effective compression method, utilizing middle/side (m/s)

22 A complex, continuous, and periodic waveform can be separated into its component parts through relatively simple trigonometry. Pohlmann, 5. 23 Brandenburg, 2. 24 Inspired by Brandenburg, 2.

encoding. It characterizes any stereo file by two categories: information that is identical

in both channels, and the difference. Simple stereo is a bulkier model, which stays truer

to a normal stereo signal by encoding each channel independently. The final stereo

mode is intensity stereo, providing the highest level of compression by encoding only

stereo information determined as vital to the stereo image. This mode has the highest

potential for sound quality degradation. Two bit-rate modes include constant bit-rate

(CBR) and variable bit-rate (VBR). In virtually all cases, VBR is preferred because it

increases efficiency by varying the number of bits depending on the complexity of the

music. For example, solo guitar passage is given fewer bits than a passage featuring a

full rock band. By providing bit allocation where needed, VBR mode helps to maintain

a more constant signal-to-noise ratio.25 Due to the limited scope of this paper as well as

the general market acceptance of these standards, joint stereo and VBR will be the

default modes used in the investigation.

Theoretical limitations

Because sample rate is preserved in conversion, every object of distortion in a

compressed file can be traced to quantization. Keeping in mind that the theoretical

reduction in dynamic range is 6 dB for every one-bit reduction of word length,26 if

quantization noise can be kept below the masking threshold in each subband partition,

the resulting audio should be indistinguishable from the original signal.27 Of course,

there are many cases where this is impossible, especially at low bitrates where the bank

25 Fries, 149. 26 Pohlmann, 329-330. 27 Brandenburg, 5.

of available bits is especially limited. Two of the most egregious artifacts of conversion

include bandwidth loss and quantization noise, specifically pre-echoes. Bandwidth loss

results when the encoder does not have enough bits to encode a block of music with the

quality demanded by the masking threshold (i.e. allowed noise per critical band). In

most cases this results in dropped windows (frequency bands) in the high frequency

range.28 One can imagine low bit rate encoders being especially culpable of bandwidth

loss. Pre-echoes are a descriptive name given to quantization noise, as it is spread out

over time—often occurring even before the corresponding sonic event responsible for the

noise. This type of error is particularly common when the musical signal contains a

sudden increase in energy, such as a percussive attack. “If the attack occurs well within

the analysis window, this [audible] error will precede the actual attack.”29 These issues

point to the importance of continued development in the encoding technology, and also

inspire the proper use of those currently available.

3.3 MPEG-1, Layer III (Mp3)

The first step in Mp3 encoding is analysis of the audio bit stream via the

polyphase filter bank. Incidentally, this step is responsible for the greatest inaccuracy of

all the mechanisms associated with the encoding process. The filter bank is given a

task of dividing the audio signal into frequency sub-bands, analyzing the content, and

passing it onto the psychoacoustic model. Unlike a typical discrete Fourier transform,

the filter bank is not time invariant.30 As a result, any conversion between time and

28 Brandenburg, 7. 29 Brandenburg, 7. 30 Pohlmann, 687.

frequency domains is not a lossless transformation. The MPEG filter bank provides

“good time resolution with reasonable frequency resolution.”31 Furthermore, it does

not accurately reflect the human auditory system’s frequency-dependent ‘critical

bands,’ after which the MPEG’s set of 32 fixed sub-bands are modeled. Additionally,

the overlap of these sub-bands results in certain frequencies appearing in adjacent filter

outputs. This design flaw can adversely affect further stages of conversion by

misrepresenting the signal’s frequency content.32

Mp3 encoding also features a Modified Discrete Cosine Transform (MDCT),

which occurs after the filter bank transformation. The MDCT can partially cancel some

aliasing caused by overlapping windows in the filter bank; however, because the MDCT

processing provides better frequency resolution than the filter bank, it sacrifices time

resolution. In fact, its time resolution is so poor that it does not carry the phase of the

waveform in any identifiable form.33 During encoding, the quantization of MDCT

values often results in errors spread over the duration of a window, producing audible

distortions. These ‘pre-echoes’ appear because the temporal masking of noise occurring

before the given signal is weaker than the masking noise afterward.34 The MDCT

features two window sizes, which can be alternated at each sub-band outputted from

the filter bank. This switching is dependent on content. The encoder generally relies

upon a longer window, while a shorter one handles transients. The shorter window has

a length of 8 ms, which is one-third the length of the longer window (24 ms) but still not

31 Pan, 3. 32 Pan, 4. 33 Watkinson, 314. 34 Pan, 8.

short enough to completely eliminate pre-echo artifacts. Additionally, buffering is used

to reduce these artifacts.35

In the next stage, each subband is quantized with as few bits as necessary to

make quantization noise inaudible. The psychoacoustic model accomplishes this using

a Fast Fourier transform.36 There are two psychoacoustic models available, though the

second is preferred for higher resolution conversion. It features a larger window of

1,024 samples, and a double-windowing calculation for more accurate assessment of the

masking threshold in Mp3 encoding. Each model features a slightly different process

determining the masking threshold and computing the signal-to-mask ratio for each

subband, passing this value onto the next mechanism of the encoder.37

Two particularly important advantages of the Mp3 encoding stage include

entropy coding and the ‘bit reservoir.’ Entropy coding, described above as Huffman

coding, achieves a higher data compression ratio through reorganization of the

outputted bits. This state improves coding efficiency and helps to control error

propagation. The second advantage unique to newer MPEG encoders is the use of a ‘bit

reservoir,’ which is essentially a buffer. During the process of bit allocation, the encoder

can donate bits to a reservoir when it needs less than the average allotted amount of bits

to code a particular frame. Later, when the encoder needs more than the average

number, it can borrow bits from the reservoir. During instances such as those described

above, where the short window of the MDCT is still not quick enough to catch the

transient, the quantizing can be made more accurate by relying on the bit reservoir—

35 Watkinson, 314. 36 Pan, 5. 37 Pan, 5-7.

buffer—to avoid raising the output bit rate whilst also avoiding pre-echo artifacts which

would result if the coefficients were heavily quantized.38 This helps to protect the

integrity of the musical material and maintain a consistent noise floor. Of course, the

encoder cannot simply create bits; it must ‘borrow’ them from past frames.39 During

less active musical moments, the buffer contents are deliberately emptied by a

coarsening (lowering) of the bit rate; with a reduced input rate and relatively fixed

output, the buffer is able to make room for transients and more active musical passages

that require extra input quantization.40 This feature can only be employed when the

encoder’s user control value is set to Variable Bit Rate (VBR).

3.4 MPEG-2 Advanced Audio Coding (AAC) and the future of audio encoders

MPEG-2 AAC, a newer encoder than Mp3 by about 5 years, offers several

improvements over Mp3, making it a more attractive model for the data compression

needs of today’s professionals and consumers. On a practical level, AAC supports a

broader range of sample rates, from 8 kHz up to 96 kHz, and enables the coding of

multi-channel (surround) audio.41 On a technical level, AAC offers higher frequency

resolution during analysis, and features a standard switched MDCT filter bank, rather

than the cascaded filter bank and MDCT blocks used in Mp3. The MDCT in AAC has

an impulse response of 2.6 ms for short blocks at 48 kHz, compared with Mp3’s 8 ms,

dramatically reducing the degree to which pre-echo artifacts enter the bit stream. AAC

38 Watkinson, 315. 39 Pan, 9. 40 Watkinson, 315. 41 Pohlmann, 381.

also gives greater attention to lossless coding tools, providing more efficient Huffman

coding and joint stereo (m/s) encoding, ultimately achieving the same quality as Mp3 at

70% of the size.42 This results in files with higher compression ratios, which not only

require less space on storage devices but also preserve the power cycle of personal

listening device batteries by nearly 50%.

There are a few peculiarities associated with the AAC encoder, given that its

design is more modern than that of Mp3. Because the filter bank is a switched MDCT

circuit, short windows cannot be accessed indiscriminately but have to occur in

multiples of eight blocks to preserve block phase between channels. Eight short blocks

plus two transition windows are equal to the length of three long blocks. Following this

filter bank is a predictive coding module that finds redundancy between coefficients

within blocks. Working in the frequency domain, after the analysis, this produces a

prediction error correlating to the input tie-domain signal; essentially, a distortion that

is time-aligned with the input. Pre-echo is virtually eliminated by this prediction

circuit, referred to as temporal noise shaping (TNS)—perhaps the most significant

improvement of the AAC encoder over Mp3.43

The development of encoding technologies has become increasingly important

since the early 1990s when Mp3 and AAC were born. The past decade has

demonstrated their market success, and the demand for more robust technologies is

palpable. A new family of MPEG encoders, including Mp3HD and HD-AAC are just on

the horizon. While they are still in their infancy, a few details have been confirmed.

The first and most revolutionary aspect of these technologies is their lossless encoding

42 Brandenburg, 6. 43 Watkinson, 318-319.

architecture. Mp3HD will be better suited to capturing the full quality of CD audio,

and HD-AAC will go beyond the quality of CD audio up to 24-bit, 192 kHz. 44 The

philosophy of HD-AAC’s superior sample rate scalability stems from a desire to bring

24-bit, 96 kHz (or higher) recording format to consumers. Both technologies will strive

for compatibility with the current family of Mp3 and AAC players, containing

embedded Mp3 or AAC side information within the larger, HD-level file. In the case of

HD-AAC, metadata will be far more robust, and new error correction and editing

capabilities will be integrated into the file structure.45 While it is impossible to predict

when or which of these technologies will oust the current models, it is my guess that

HD-AAC will have a better chance of taking the market. It is yet to be seen whether the

new AAC encoder will—like its predecessor—become a proprietary technology under

Apple Inc.; however, its impressive scalability and robust metadata handling both point

to a future where mobile media could be taken to new levels of quality and integration.

While these new technologies will eventually swallow the technologies we have

today—Mp3 and AAC—we must nevertheless learn to use these tools to the utmost.

The next chapter takes a look at today’s popular encoding formats, comparing different

configurations of each encoder to CD-quality audio in an effort to demonstrate, in an

objective way, the performance of each encoder in several critical topics within the

scope of mastered audio.

44 Thomson.net. 45 Fraunhofer.de.

4. An Objective Test of MPEG encoders

4.1 Related work

As described above, the key to audio data compression is quantization. During

one subjective test, where expert listeners were asked to distinguish between encoded

and full resolution signals under optimal listening conditions, a ratio of 6:1—a 16-bit

signal at 48 kHz compressed to 256 kbits/s—was audibly indistinguishable from the

source (PCM) file. The clips used were specially chosen for this specific test because

they were difficult to compress.46 In his article, “A Tutorial on MPEG/Audio

Compression,” author Davis Pan cites this study but gives no explanation for why a

signal may or may not be ‘difficult to compress.’ This gap is, in an acute way, the

inspiration for my research. By targeting the areas of an encoder that make it pliable,

we may be able to more successfully test an encoder in an objective way.

In several of the theoretical papers examined in this study, the authors conclude

that objective classification of an encoder’s effect is not useful. One argues that

subjective tests with trained listeners are more suited to measure the performance of a

perceptual encoder.47 In his guide, Brandenburg declares, “The dynamic range of both

MP3 and AAC is well beyond the equivalent of 24 bit D/A resolution,” further

purporting “the [average] frequency response is perfectly even (0 dB deviation).”48

However, he goes on to concludes that it makes no sense to objectively test compressed

signals for these objective qualities of sound, qualities that are especially important to the

mastering engineer. To look at objective measurements of test signal inputs, he argues,

46 Pan, 3. 47 Fries, 152-153. 48 Brandenburg, 7-8.

may describe to an expert the effects of conversion but is a dangerous tool on which to

rely solely.49

My outlook empathizes with the perspective of a mastering engineer. He has no

audience for feedback; his work is solitary and technology-based. His decisions are

fueled only by information relayed by his senses, through listening and observing

meters. Insofar as this document seeks to communicate something to a community of

mastering engineers, the language of objectivity is the only acceptable means of

dialogue. The mastering engineer requires something more tangible than trends based

on subjective tests. For this reason, the language and tools of professional mastering

engineers will be used exclusively throughout the experiment.

4.2 Methodology

To briefly reassert the four domains of importance to the mastering engineer,

both on technical and aesthetic levels: monitoring, metering, spectral content, and

phase. Once again, the topic of monitoring is somewhat irrelevant to the experiment at

hand; listening does not have a direct role in the procedure. Therefore, it is with

metering, phase, and spectral content that we are primarily concerned. After a brief

overview of the experimental method, the particular details of each topic will be

explicated separately.

49 Brandenburg, 9.

Overview

The experiment consisted in five tests: peak level, RMS level, phase correlation,

spectral response, and spectral shape. Each test was done using three of Spectrafoo’s

featured processors: Power History, Spectrograph, and Correlation History. In every

case, a Hanning window of 4096 samples was used. The goal of the five tests was to

compare objective data obtained from the PCM sample with that of four samples

encoded with Mp3 and AAC technologies, at both 128 kbps and 256 kbps.50 All five of

these tests were carried out in a quantitative fashion. In the paragraphs below, the

methodology for each test is described in detail.

Sample set

A total of 15 musical samples were tested, spanning the broad popular music

market. More specifically, three genres: Rock, Hip-hop, and Pop. These genres were

chosen based on the high relevance of this music to the mastering world. They spanned

small, medium, and large budget productions. Five songs from each genre were tested

using the methodology described above. Each group was assembled with a diverse set

of sonic traits. The Rock and Hip-Hop categories featured only male vocalists, while

the Pop category was exclusively female. The oldest example used was recorded in

2003; most examples were produced in the past couple of years. From this, it can be

concluded that implications of the test results and analysis are limited to the scope of

musical content provided by this population.

50 Mp3 encoder LAME v3.98, licensed under LGPL. AAC encoder © Apple Inc 2009.

Phase correlation, peak and RMS

Figure 2. A snapshot of Spectrafoo’s Correlation History and Power History processors.

Phase correlation, and peak and RMS levels were captured on a continuous basis

using Spectrafoo’s Correlation History and Power History processors. Ten points, one-

per-second, were collected from a pre-determined location in the music—usually 30-

seconds into the piece.51 This was carried out for all five variations of each piece, one

PCM (i.e. control) sample and four encoded samples. After all the data had been

collected, averages were calculated for each of the three tests: Peak, RMS, and Phase.

During analysis, the data of each of the four encoded samples were set against those of

the control across each of the 15 corresponding examples in a t-test, further described

below.

51 In a few cases where the musical material had a lengthy exposition, the sample was collected at a slightly later point on the timeline.

Spectral content

Figure 3. A snapshot of Spectrafoo’s Spectrogram processor.

In the first test of a two-part examination, three values were collected for each

sample—at 100 Hz, 1 kHz, and 10 kHz. These points provided readings of variation in

low, mid, and high frequency areas. The second test looked at rolloff and cutoff

frequencies in the high register (near or above 10 kHz). Data for this test was collected

in a pseudo-objective manner: the numbers were taken straight from Spectrafoo’s

Spectrograph, but decisions for where rolloff and cutoff frequencies occurred had to be

made subjectively. While it is generally known that MPEG encoding creates

innumerable spectral variations, pinpointing each of these methodically was not the

aim of this experiment. Rather, the goal of this endeavor was to understand the musical

consequences of MPEG encoding from the perspective of a mastering engineer. The

spectral content tests were meant to examine broad consequences of four MPEG

encoding types. Thus, rather than simply examining each example individually, a grand

mean for each of the three genre groups was calculated. The results have been provided

below in two parts: Spectral Response and Spectral Shape.

Statistical analysis

Analysis was done with appropriate statistics for each of the 15 examples. These

calculations began with determining joint stereo peak and RMS averages for the control

set and each of the four encoded variants. Mean and standard error of the mean were

then calculated for all five samples, and four t-tests were done comparing the control set

(PCM) to each of the four variants to determine statistical significance. The t-test was a

level-one test with two tails; a p-value of 0.05 was set as the threshold for statistical

difference. This p-value implied that 5 times out of 100, a significant difference would

be found by ‘chance’. Values lower than 0.05 were considered significant.

4.3 Results and analysis

The following graphs depict data for all 15 examples. The samples have all been

given code names, corresponding to the appropriate genre and order in which they

were analyzed. Items of interest in each table are Mean, Standard Error of the Mean

(SEM), and T-Test. As a note to the reader: a great deal of information is presented over

the following few pages; it is partially dissected throughout, and further digested in the

section following, called Discussion. For the benefit of future studies in this field, the

raw data and relevant calculations for each of the 15 examples can be found in the

Appendices attached to the end of this document.

Peak (joint stereo)

Figure 4. A visualization of the ‘Rock’ data, comparing control set Peak level to four MPEG-encoded variants.52 The

pop-out portion of the graph displays results for example R02, between the dB scale values -2 and -4. It illustrates a

more tangible difference between results that are significantly different from the control and those that are not.53

52 For all graphs, error-bars show SEM values and starred bars correspond to a statistically significant difference (p-value ≥ 0.05). 53 The pop-out is provided merely as a visual aid; while it is only given for this particular sample, it could just as well have been included for any and has no numerical impact on the data.

Figure 5. A visualization of the ‘Hip-Hop’ data, comparing control set Peak level to four MPEG-encoded variants.

Figure 6. A visualization of the ‘Pop’ data, comparing control set Peak level to four MPEG-encoded variants.

The figures above demonstrate variations that occur in peak levels during

conversion of 16-bit, 44.1 kHz PCM audio to Mp3 and AAC type files. In none of the 15

cases examined were the Mp3-encoded samples significantly different from the original,

control set. In 13 of the 15 examples, the AAC conversion at both 128 kbps and 256

kbps resulted in peak levels that differed significantly from the control group. In all but

one case [P02], the AAC encoded files had average peak levels lower than that of the

control. Mp3 conversion had a less predictable, yet consistently insignificant effect on

peak level.

RMS (joint stereo)

Figure 7. A visualization of the ‘Rock’ data, comparing control set RMS level to four MPEG-encoded variants.

Figure 8. A visualization of the ‘Hip-Hop’ data, comparing control set RMS level to four MPEG-encoded variants.

Figure 9. A visualization of the ‘Pop’ data, comparing control set RMS level to four MPEG-encoded variants.

Similar to the peak level test above, the RMS level test delivers evidence of a

diminishing effect that AAC encoding has on CD-standard PCM audio. The AAC

encoder demonstrated its statistically significant level decrease in more than 13 of the 15

examples. In one case [P02], only the 128 kbps AAC file was found to be significantly

different after t-test analysis. Each of the 15 cases, however, shows a decrease in

average RMS levels for both AAC bit rates. Once again, in none of the 15 examples did

the Mp3 encoder show a significant effect on RMS level.

Phase Correlation

Figure 10. A visualization of the ‘Rock’ data, comparing control set Phase to four MPEG-encoded variants.

Figure 11. A visualization of the ‘Hip-Hop’ data, comparing control set Phase to four MPEG-encoded variants.

Figure 12. A visualization of the ‘Pop’ data, comparing control set Phase to four MPEG-encoded variants.

As evident in the charts above, phase correlation is preserved quite well during

conversion for both MPEG formats, and across both 128 kbps and 256 kbps bit rates. In

only two cases [H03 and P03], both encoded with the AAC encoder at 256 kbps, was the

phase value of the compressed file significantly different from that of the control. The

nature of minute variations among the four encoded types was unpredictable. In some

cases, phase would increase insignificantly, in others it would decrease. The overall

picture this paints is one that shows the success of MPEG encoders in preservation of

stereo phase correlation and mono-compatibility.

Spectral Response

Figure 13. A visual comparison of ‘Rock’ control set Spectral Response to those of four MPEG-encoded variants.

Figure 14. A visual comparison of ‘Hip-Hop’ control set Spectral Response to those of four MPEG-encoded variants.

Figure 15. A visual comparison of ‘Pop’ control set Spectral Response to those of four MPEG-encoded variants.

The chart above provides insight into peak levels at three frequencies: 100 Hz, 1

kHz, and 10 kHz. Low, middle, and high register peak levels demonstrate the justice

that each encoder and setting does to the original source. Keep in mind that this test

was slightly different than the previous three tests, in which each of the 15 examples

was separately analyzed. The Spectral Response charts are comprised of averaged data

for five examples across each of the three genres examined. In most cases, variations

were not significant. Five of six instances that were statistically different came from the

AAC encoder, two at 128 kbps and three at 256 kbps. The one instance of statistical

significance in the Mp3 encoded samples was the 256 kbps average [‘Rock’ at 100 Hz],

which was below the control group average at that frequency.

Those data that were not significant still reveal a great deal about the respective

character of each encoder. In all cases, AAC encoded averages were below the control.

The Mp3 averages at 128 kbps were less predictable, while the 256 kbps Mp3 levels

tended to fall near or slightly below the control values. The next test, Spectral Shape,

depicts spectral consequences of encoding that occur near and above the 10 kHz limit of

this test.

Spectral Shape

The charts below are oriented differently than those of the four previous tests.

Once again, these data were analyzed per genre rather than individually across the 15

examples. No t-test calculations were made; this test seemed unsuitable for data that

were not collected in a wholly objective manner, as described in the methodology

above.

Figure 16. A visual comparison of ‘Rock’ control set Spectral Shape to those of four MPEG-encoded variants.

Figure 17. A visual comparison of ‘Hip-Hop’ control set Spectral Shape to those of four MPEG-encoded variants.

Figure 18. A visual comparison of ‘Pop’ control set Spectral Shape to those of four MPEG-encoded variants.

The first set of observations made was for Natural Rolloff. All three genres

displayed a natural (i.e. musical) rolloff in the high frequencies, somewhere between 5

and 10 kHz. In the case of Rock, it was closer to 5 kHz, while Hip-Hop and Pop

approached 10 kHz. The Steep Rolloff typically fell near the Cutoff frequency. In cases

where the encoder had a cutoff frequency below the natural (control) cutoff, there was

often a steep roll in high frequencies one or two kilohertz before the cutoff. For

example, one example may have had an original cutoff frequency at 21 kHz, but its

Mp3-encoded variant at 128 kbps rolled off steeply at 16 kHz and cut off completely at

17 kHz. This was the case for virtually every example encoded by the Mp3 encoder at

both bitrates and the AAC encoder at 128 kbps. To give a general idea of these

characteristic rolloff and cutoff ranges, I have created the table below:

Steep Rolloff Cutoff

Mp3 128 kbps 15-16 kHz 17 kHz AAC 128 kbps 16-18 kHz 18 kHz Mp3 256 kbps 17-19 kHz 19-20 kHz AAC 256 kbps - -

Table 1. Approximate steep rolloff and cutoff frequencies for Mp3 and AAC encoders at two bit rates.

Curiously, there are no data given in the table above for the AAC encoder at 256

kbps. Perhaps the most staggering discovery made in this analysis was the fact that

AAC at 256 kbps reproduced control values identically in virtually every example.

There were no instances of a steep (i.e. forced) rolloff. Even in cases where frequencies

above 20 kHz (the limit of human hearing) were present the AAC encoder did not cut

them off.

With these five tests completed over a total of 15 musical examples, the results

point to several characteristics of the MPEG family that are important to the mastering

engineer preparing content for digital distribution. The next chapter, Discussion, will

tie together observations made in the quantitative data analysis with the theoretical

background to generate new ways of thinking about mastering for compressed audio.

4.4 Sources of error

This experiment was designed with the intention of avoiding human error; it was

purposely objective—no listening involved—with analysis based on numerical output

from an entirely digital system. The final test, Spectral Shape, was the exception; the

sole investigator was responsible for interpreting spectrogram charts and—aided by

digital measurement devices—deciding where rolloff and cutoff frequencies occurred.

Throughout the execution of the five tests described above, a consistent set of hardware

and software tools were used.

It should also be noted that Spectrafoo has the ability to read AIFF format audio

to the exclusion of all other audio file types. All files for each of the 15 examples had to

be converted to AIFF format. This was done in a uniform manner using Pro Tools. It is

assumed that any sonic ramifications this step contributed to the audio were applied in

a universal way to all testing material (i.e. all 75 files analyzed).

During the conversion of formats to AIFF, it was discovered that the Mp3 files

contained a slight overall delay, putting both 128 and 256 kbps files equally out of phase

with the PCM and AAC formats (which had an in-phase relationship with one another).

Simple time aligning of the Mp3 files was completed in Pro Tools by hand in order to

sync the start point of all five formats prior to converting them to AIFF for examination

in Spectrafoo. This was done to ensure that level, phase, and spectral discrepancies

collected in the experimental stage were truly due to the particular differences of each

format rather than a characteristic error associated with the encoding process.

Lastly, while all of the numbers were double-checked for accuracy, the process of

data collection was prone to human error. Since there was no way to natively export

numbers from Spectrafoo devices into a spreadsheet, this had to be done by hand. It

was tedious, careful work. The calculations done during the analysis ensure that minor

errors associated with this phase did not have drastic effects on the outcome of the

experiment.

5. Discussion

Through research and experimentation, a wealth of information pertaining to the

construction and implementation of MPEG audio encoders has been assembled. The

principles integral to the art and science of mastering were applied to test MPEG

technologies. The task now is to hold a light to the newly understood behaviors of

audio encoders to determine how best to use data compression in mastering. Thus, we

regard data compression as the final step in the mastering process, either occurring after

the digital CD master is printed, or ideally as an entity entirely separate from the CD

master. The following sections will approach these two realities, and construct useful

strategies for appropriately treating audio during the encoding process.

If we consider the compression of audio data as a step that takes place after the

CD master has been made, we assume two important characteristics about the audio: it

has a sample rate of 44.1 kHz and a bit depth of 16. Tests conducted in this document

could only gather data under this reality; professionally produced PCM audio at higher

sample rates and bit depths is currently unavailable at the consumer level.54 Thus, the

MPEG encoder may be designed to work under higher sample rates and bit depths; the

current practice of compressing 16-bit files limits the MPEG encoder to some degree,

inheriting the physical peculiarities associated with the audio CD. It makes sense, then,

that as encoding technologies develop and 16-bit CD audio retires we should see a

trend toward 24-bit (and higher) resolution at the consumer level.

54 One current format, SACD, contains a 1-bit (high quantization error) file system called Direct Stream Digital, offering practical dynamic range of 120 dB and sample rate of 100 kHz. This is roughly equivalent to 96 kHz, 24-bit PCM. The benefits of such a format have been limited to a small consumer-base and its recordings typically fall into ‘softer’ genres: classical, jazz, and acoustic. Pohlman, 309-313.

5.1 Preparing MPEG files from a CD Master

Let me first address the results of the peak/RMS level test. To summarize

briefly, Mp3 files at 128 and 256 kbps showed no statistical difference when compared

to the CD master levels while AAC files at both bit rates were culpable of a statistically

significant decrease in peak and RMS levels in nearly every case. While this seems to

show a propensity for the AAC encoder towards error, the trend may not be so simply

characterized. This type of error might in fact help the AAC encoder to deliver better

results at lower bitrates. Keep in mind that CDs today are often peak limited to within

0.1 dB (or less!) of digital distortion. While the Mp3 encoder performed well in

maintaining appropriate peak and RMS levels at both bit rates, lower bit rate files

tended to have a less predictable error rate on a global level. In the case of Mp3, this

often led to a situation where peaks that came close to 0.0 dB resulted in overloads, and

thus led to consistent distortion. Of course, the mastering engineer can often use digital

clipping as an artistic tool but the MPEG encoder’s clipping rate cannot be calculated or

controlled in any measurable way. The AAC encoder’s tendency to decrease the overall

level by about 0.5 dB acts to protect against the unpredictability of level discrepancies

natural to the encoding process. A cushion is created that can drastically reduce the

number of digital clips that occur. So while the Mp3 encoder does not suffer from the

statistically significant level differences, it is still just as culpable of slight variation

during the encoding process as the AAC encoder. Without the overall level attenuation,

however, the Mp3 encoder is more prone to distortion.

With regards to phase correlation, there were just two averages out of 15

calculated that were statistically different than the control group. Both of these were

AAC encoded files at 256 kbps. While this evidence is not entirely compelling, it seems

to hint at side effects of the AAC joint stereo (m/s) encoder, which is designed to be

even more robust than the Mp3 encoder. Given the relatively limited data pool,

concerns that the AAC encoder significantly skews phase are not unreasonable but also

not completely reliable.

Perhaps the most intriguing topic of discussion is that of spectral content among

the different encoding types. Once again, statistics tell us that the AAC encoder veers

from the control (input) values to a greater degree than the Mp3 encoder. Five out of

six statistically significant results came from the AAC encoder, three at 256 kbps and

two at 128 kbps. If we relate these peak spectral values to the peak level values,

however, we may conclude that these statistically significant results may be due to the

overall level difference issue described above. In other words, while the AAC encoder

might create spectral results that are different, it may not mean they are less accurate

than those of the Mp3 encoder. Further analysis, linking the proportionality of AAC’s

spectral response with its peak levels, might put these concerns to rest.

On the previous page, the topic of digital distortion was briefly addressed in its

relation to peak levels. Recall that Mp3 files at 128 kbps had consistently more

overloads than any of the other three file types that were tested. With respect to the

unpredictability of its peak value handling, it would seem prudent to look at spectral

content to gain a better understanding of the frequency area where these digital

overloads occur. While the data collected for this study was limited to three key

frequency regions, far too limited to develop a general set of rules for this phenomenon,

we can tell from the test that the effects of encoding are subtle. Mp3 does not drastically

change frequency content in any of the tested ranges, apart from its rolloff and cutoff

properties in the very high register (above 16 kHz).

5.2 Practical guidelines for encoding with Mp3 and AAC

Combining the theoretical background we have on MPEG encoding with the

knowledge gained through the new research contained in this document, we can put

into practice a set of guidelines to follow for the purpose of achieving the best results

when preparing audio for Mp3 and AAC formats.

Preparing a master for Mp3

When preparing a master with the expectation of conversion to Mp3, there are

three key principles we have followed: metering, phase, and frequency content. If the

master peak levels are consistently within the highest dB of dynamic range, overloads

are likely to occur—especially at low bit rates. Peak limiting slightly lower than -0.1 dB,

even just half a decibel, can dramatically reduce the number of overloads. Given that

phase correlation was virtually a non-issue when encoding at both 128 kbps and 256

kbps, it can be projected with confidence that phase will translate well despite Mp3’s

joint stereo (m/s) treatment. Finally, frequency content was preserved more

successfully by the higher—256 kbps—bit rate. High frequency rolloff and cutoff did

occur at both bit rates, though this phenomenon was far less severe at the higher bit rate

setting. To limit the guesswork involved in mastering for Mp3, it could be useful to

apply a low pass filter to the master prior to conversion. The steepness and placement

of this filter along the frequency axis is ultimately dependent upon the desired

conversion settings, but one can gather from the analysis above that lower bit rates call

for more liberal use of bandwidth limiting; a 16.5 kHz cutoff for Mp3 at 128 kbps is not

unreasonable. Finally, use of the VBR setting during conversion is recommended to

achieve the most efficient use of space and maintain a consistent noise floor.

Preparing a master for AAC

The preparation for conversion of a PCM master to AAC takes into account the

same key mastering principles outlined above: metering, phase, and frequency content.

The AAC encoder allows the mastering engineer more freedom, because its restrictions

on frequency content are less severe, and it responds to peak limited material by

decreasing overall level to within 1 dB of the original signal. This truly is the encoder to

use for a hands-off approach to mastering for mobile formats. Preparing a good master

for CD will, in most cases, ensure a good translation to AAC with minimal distortion or

bandwidth truncation. To be on the safe side, it might be advisable to bandwidth limit

mastered material if the audio is to be converted to lower bitrates (128 kbps and below).

Like in the case of Mp3 conversion, this will take some of the uncertainty in the high

frequency range out of the equation. Additionally, using the VBR setting will ensure

the most robust translation.

5.2 Forecast and future work

Given the results of this experiment and the theoretical superiority of the AAC

encoder—and its likely dominance over the compressed audio market into the future—

we can make the argument that AAC is the preferred format for mastering popular

music. The rationale is simple: AAC is a more highly compressed format, proven in an

objective way to deliver quality at or near CD quality. While Mp3 at 256 kbps

outperformed AAC at the same bit rate in most of the statistical tests, the advantages

AAC delivers justify these observations: smaller data footprint, better frequency

response, and a safer way to manage peak limited material during conversion. There

are other concepts not covered in this document that could help to strengthen this

argument, specifically including signal to noise ratio and pre-echo artifacts.

Other, larger topics of discussion still loom. What can we say about converting

24-bit (or higher) PCM files to MPEG formats? The experiment above was unable to

cover this topic due to the unavailability of such formats at the consumer level. In

theory, however, we hold that both Mp3 and AAC are endowed with resolution at or

above that of 24-bit PCM audio. We have not proven, but can assert with some

confidence Bob Katz’s assertion that dithering to 16 bits is not necessary when

mastering for MPEG formats. It only introduces more noise into the system.

A second question: What happens when these encoders are used on other

genres? The experiment above dealt with popular genres: Rock, Hip-Hop, and Pop—

three of the ‘loudest’ types of music produced and mastered. The greatest dynamic

range featured in all 15 of the examples tested was less than 8 dB. If we were to apply

the same tests to Classical, Jazz, and other material with dynamic range 10 or 20 times

greater than the material examined above, would the results be similar? We can guess,

but not prove, that AAC would outperform Mp3 due to its more efficient analysis and

quantization systems. These would help limit pre-echo artifacts—noise that would be

especially difficult to mask with quieter, sparser program material. We could also

conjecture that the gap in quality difference between 128 kbps and 256 kbps for both

encoders would be larger. Noise, distortion, and inconsistencies in frequency response

introduced by conversion would be even more audible, possibly creating a situation

where only high bit rate files would be acceptable.

The final point of interest not covered in this document is related to the specific

brands of technologies we employ for conversion. In this particular study, two

encoders were used. Only one configuration of each of these two encoders was

featured, specifically the LAME specification for Mp3 and Apple’s AAC encoder. While

the Apple encoder is currently the only available configuration for AAC, many Mp3

encoders have existed over its decade of popularity. The LAME encoder was used

because of its preference among Mp3 communities. We might guess that other

configurations such as those used by iTunes or Digidesign would not have performed

as well when compared to the LAME specification. While the AAC encoder has been

widely accepted, and might not undergo the same type of development that Mp3 has

faced over the years, it is interesting to ponder potential benefits of a situation in which

it did.

Moreover, as we look to the future of MPEG audio compression, AAC is poised

to take over the market with its HD-AAC format. This format will not only eliminate

the need for PCM archiving, because it will feature lossless encoding up to 192 kHz and

24-bit, but will outperform the Mp3HD specification that is also in development. When

the new generation of encoders takes over, many of the topics discussed in this

document will less important to the mastering engineer. Alas, a new set of concerns

will surely rise.

6. Conclusion

Through this investigation, we have assembled several contributions ancillary to

the field of mastering. The most functional of these is the wealth of objective data

comparing the performance of Mp3 and AAC encoders on CD-quality masters. A total

of 15 samples were tested over three observable principles key to mastering—level,

phase, and spectral content. The resulting data were analyzed, and the numbers were

tabulated in graphs and charts within the document above and, for the benefit of future

studies, revealed in raw form within the appendices below. We found through this

process that when compared to control data, Mp3 outperformed AAC in nearly all

categories tested. However, we have reasoned through a combination of experimental

evidence and theoretical research that the error associated with quantization during

encoding is significant. This comprehensive understanding of the encoding process

may suggest that the AAC encoder is not only better equipped to minimize

quantization errors, but also to deal with those which it cannot avoid.

Supplementary to the experimental report is a vocabulary we have developed to

objectively describe the characteristics of each encoder within the field of audio

mastering. This has been touched upon in several instances above, but to recap: Mp3 is

generally a simpler encoder. It preserves level and phase better than the AAC encoder,

but struggles with bit depth and quantization error especially at high frequencies.

Without speaking too subjectively about practical ramifications, quantization distortion

and limited frequency response give Mp3 a ‘crunchier, warmer’ tone. AAC, on the

other hand, is a leaner encoder that limits overloads through an overall level limiting

effect. Together with improved frequency response, AAC is a ‘cleaner’ sounding

algorithm for converting mastered music to mobile format.

Lastly, through research and experimentation we have proposed a set of

guidelines that can be followed when using data compression as a final step in the

mastering process. At the end of the day, the choices professionals make boil down to a

preference for one particular sound over another. Mastering engineers spend the

majority of their time worrying about how the PCM version of a song or piece of music

will sound; they often do not consider the ramifications of data compression. Given this

reality, it may make the most sense for the consumer or music distributor to encode files

using the most accurate technology available—in this case, the AAC encoder. This will

ensure that the engineer’s vision for the music is preserved. If, on the other hand, a

master sounds too ‘harsh’ or ‘cold,’ it may be appropriate for the file to be encoded with

an Mp3 encoder. This will add some distortion and warmth through quantization error

and frequency bandwidth limiting. Either way, we are faced with the fact that

consumers and distributors have the power to essentially make the final creative

decision in the mastering process.

Ultimately this choice comes down to personal preference, subjectivity, if you

will. In most cases, the digital music distributor decides what encoding option is best

suited to the needs of the company and/or the consumer. Thus, the decision is not

determined entirely by the pursuit of sound quality, but may be influenced by other

factors: copyright protection, server space, proprietary concerns, etc. As audio

professionals, we must insist that sound quality play an important role in the

production process, from start to finish. And while the decision between formats is

subjective, it can only help to have objective material on which to base our choices.

Selected References

Brandenburg, Karlheinz. “Mp3 and AAC Explained.” AES 17th International Conference

on High Quality Audio Coding. AES, 1999. Dougherty, Dale. O’Reilly Radar. O’Reilly Media. Posted: March 1 2009. Accessed:

March 15 2009. http://radar.oreilly.com/2009/03/the-sizzling-sound-of-music.html

“HD-AAC.” Fraunhofer IIS. Posted: April 2009. Accessed: April 23 2009.

http://www.iis.fraunhofer.de/EN/bf/amm/projects/lossless/index.jsp Fries, Bruce and Marty. The Mp3 and Internet Audio Handbook: Your Guide to the Digital

Music Revolution!. Burtonsville: TeamCom Books, 2000. Katz, Bob. Mastering Audio: the art and the science. Second ed. Burlington: Focal Press,

2007. Pan, Davis. “A Tutorial on MPEG/Audio Compression.” Institute of Electrical and

Electronics Engineers Multimedia Journal. IEEE, 1995. Pohlmann, Ken. Principles of Digital Audio. Fifth ed. New York: McGraw-Hill, 2005. “Thomson Introduces mp3HD File Format.” Thomson. Posted: March 19, 2009.

Accessed: April 23 2009. http://www.thomson.net/GlobalEnglish/Corporate/News/PressReleases/Pages/Thomson-Introduces-mp3HD-FileFormat.aspx

Watkinson, John. The Art of Digital Audio. Third ed. Oxford: Focal Press, 2001.

Appendix 1: Data Summary (as seen in Figures 4-18)

Mean SEM Mean SEM T Mean SEM T Mean SEM T Mean SEM T

R01 -3.9645 1.3395 -4.3795 1.3678 0.3212 -4.3310 1.4117 0.3909 -4.7175 1.3432 0.0820 -4.7485 1.4102 0.0794

R02 -2.4990 0.6660 -2.4110 0.6948 0.3730 -2.5040 0.6511 0.8835 -2.6955 0.7012 0.0048 -2.6375 0.6732 0.0005

R03 -1.2820 0.3020 -1.1795 0.3064 0.1124 -1.2415 0.3127 0.0890 -1.5435 0.3260 0.0028 -1.5915 0.3147 0.0000

R04 -7.0255 1.0397 -7.0515 1.0134 0.7517 -7.0510 1.0259 0.4178 -7.2945 0.9815 0.0429 -7.2030 1.0414 0.0000

R05 -3.7025 0.6187 -3.7675 0.6237 0.4533 -3.7170 0.6294 0.5305 -4.0420 0.6161 0.0000 -4.0725 0.6152 0.0000

R01 -13.8225 1.3926 -13.8450 1.3914 0.5437 -13.8035 1.3817 0.1625 -14.1855 1.3783 0.0000 -14.1840 1.3928 0.0000

R02 -11.5080 0.6814 -11.5450 0.7194 0.6400 -11.5215 0.6777 0.1958 -11.6775 0.6807 0.0000 -11.6620 0.6771 0.0000

R03 -9.1600 0.4437 -9.1555 0.4362 0.9508 -9.1850 0.4527 0.3142 -9.5595 0.4385 0.0000 -9.5000 0.4387 0.0000

R04 -15.2280 0.9016 -15.6525 1.0122 0.4213 -15.7395 0.9828 0.3225 -15.9170 0.9864 0.2075 -15.8805 0.9757 0.2231

R05 -12.5775 0.6389 -12.7135 0.6535 0.2529 -12.5795 0.6392 0.7779 -12.8810 0.6298 0.0000 -12.9240 0.6361 0.0000

Correlation

R01 0.4200 0.0774 0.3920 0.0780 0.0621 0.4220 0.0747 0.7895 0.4150 0.0786 0.3629 0.4190 0.0767 0.8227

R02 0.6650 0.0539 0.6600 0.0609 0.8057 0.6420 0.0662 0.3306 0.6370 0.0676 0.2930 0.6470 0.0648 0.4528

R03 0.3930 0.0493 0.3900 0.0545 0.7220 0.4040 0.0496 0.0751 0.3940 0.0551 0.9238 0.3930 0.0525 1.0000

R04 0.6430 0.0529 0.6360 0.0534 0.2977 0.6440 0.0520 0.8971 0.6580 0.0553 0.1382 0.6420 0.0539 0.7976

R05 0.6550 0.0366 0.6650 0.0341 0.3974 0.6560 0.0384 0.8638 0.6570 0.0370 0.6926 0.6540 0.0372 0.8321

Spectral Response

GM SEM GM SEM T GM SEM T GM SEM T GM SEM T

100 Hz -23.6100 1.6724 -23.3060 1.6667 0.1542 -23.8320 1.6339 0.0418 -24.2440 1.6729 0.0030 -23.9080 1.5795 0.0958

1 kHz -25.2660 2.5587 -25.4660 2.5775 0.1352 -25.5340 2.4336 0.4400 -25.9700 2.4732 0.0214 -25.8420 2.5832 0.1611

10 kHz -34.1480 1.8388 -34.1460 1.5636 0.9978 -34.9900 1.7035 0.3062 -35.1200 1.5429 0.3475 -34.9680 1.8734 0.1738

Spectral Shape

GM SEM GM SEM GM SEM GM SEM GM SEM

Natural Roll 6.8000 1.6248 6.8000 1.6248 6.8000 1.6248 7.2000 1.8547 7.0000 1.7321

Steep Roll 15.5000 2.133073 14.9000 1.1 15.0000 1.830301 15.1000 1.16619 15.5000 2.133073

Cutoff 19.66667 0.718795 16.9 0.367423 19.2 0.2 17.8 0.122474 19.66667 0.718795

WAV M128 M256 A128 A256

Hip-Hop

H01 -2.4405 0.5355 -2.3055 0.5563 0.1459 -2.3655 0.5611 0.1692 -2.8415 0.4916 0.0002 -2.8010 0.5434 0.0000

H02 -7.4675 1.2599 -7.4745 1.2669 0.9408 -7.4885 1.2777 0.6910 -7.8950 1.2405 0.0000 -7.9410 1.2790 0.0000

H03 -3.4520 0.8696 -3.0785 0.7854 0.4696 -3.4320 0.8663 0.1439 -3.6340 0.8526 0.0006 -3.5885 0.8616 0.0008

H04 -2.4500 0.5774 -2.4170 0.5863 0.6764 -2.5190 0.5859 0.0561 -2.8440 0.5699 0.0000 -2.8660 0.5741 0.0000

H05 -5.7115 1.4107 -5.6615 1.3898 0.4223 -5.6920 1.4085 0.0561 -6.0950 1.4286 0.0000 -6.0805 1.4133 0.0000

H01 -10.7215 0.5693 -10.7005 0.5840 0.6853 -10.7200 0.5694 0.7263 -11.1340 0.5718 0.0000 -11.0905 0.5759 0.0000

H02 -16.7745 1.6949 -16.8850 1.7209 0.1337 -16.7940 1.7022 0.2645 -17.3210 1.7174 0.0000 -17.2415 1.7043 0.0000

H03 -11.7680 1.2891 -11.4045 1.1295 0.5802 -11.7650 1.2872 0.5203 -11.9345 1.2728 0.0004 -11.8830 1.2828 0.0000

H04 -11.2345 0.6857 -11.2160 0.6853 0.7673 -11.2315 0.6770 0.8356 -11.5985 0.6884 0.0000 -11.6260 0.6946 0.0000

H05 -14.6690 1.8235 -14.7010 1.8272 0.4843 -14.6450 1.8175 0.1006 -15.0660 1.8260 0.0000 -15.0445 1.8262 0.0000

Correlation

H01 0.8440 0.0179 0.8400 0.0201 0.5830 0.8440 0.0187 1.0000 0.8390 0.0200 0.5129 0.8260 0.0311 0.5216

H02 0.5870 0.1161 0.5850 0.1137 0.8703 0.5790 0.1134 0.0697 0.5900 0.1186 0.6164 0.5860 0.1167 0.7804

H03 0.7380 0.0436 0.7700 0.0563 0.4352 0.7360 0.0449 0.6926 0.7410 0.0449 0.6970 0.7460 0.0426 0.0031

H04 0.8030 0.0504 0.8070 0.0492 0.5830 0.8070 0.0491 0.4620 0.8110 0.0476 0.3513 0.8110 0.0483 0.0868

H05 0.7190 0.0516 0.7210 0.0521 0.7804 0.7140 0.0527 0.2126 0.7150 0.0543 0.5450 0.7200 0.0519 0.7804

Spectral Response

100 Hz -22.8780 0.7576 -23.0240 0.4042 0.7736 -22.8020 0.410115 0.871321 -23.1440 0.497439 0.509381 -22.9940 1.57953 0.688935

1 kHz -24.0760 2.4467 -24.0280 2.3266 0.9044 -23.8040 2.099539 0.619649 -24.6320 2.194323 0.311957 -24.1300 2.583237 0.929109

10 kHz -30.5400 1.6772 -30.8920 1.7414 0.4234 -31.1700 1.580383 0.079988 -31.4600 1.824245 0.080149 -31.4360 1.873364 0.00782

Spectral Shape

Natural Roll 8.5000 2.0187 8.5000 2.0187 8.5000 2.0187 8.5000 2.0187 8.5000 2.0187

Steep Roll 20.3 0.538516 16.6 0.6 19.2 0.2 16.9 0.4 20.4 0.509902

Cutoff 20 0 17.8 0.3 19.7 0.339116 18.5 0.158114 20 0

WAV M128 M256 A128 A256

P01 -5.1230 1.1737 -5.1765 1.2446 0.6349 -5.0790 1.1636 0.1543 -5.4950 1.2391 0.0029 -5.4480 1.1712 0.0000

P02 -3.2115 0.7620 -3.1475 0.7266 0.5737 -3.1820 0.7796 0.3291 -3.1695 0.8148 0.5999 -3.1260 0.7688 0.1607

P03 -0.8775 0.3270 -0.8970 0.3107 0.7425 -0.8815 0.3375 0.8209 -1.2020 0.3580 0.0002 -1.2855 0.3412 0.0000

P04 -0.8600 0.6108 -0.7780 0.6156 0.1878 -0.8720 0.6122 0.5265 -1.2640 0.6359 0.0001 -1.2480 0.6146 0.0000

P05 -4.8110 0.9318 -4.7375 0.9416 0.3537 -4.7700 0.9232 0.1005 -5.1405 0.9206 0.0002 -5.1200 0.9242 0.0000

P01 -13.6965 1.3630 -13.6800 1.3904 0.6346 -13.6790 1.3612 0.4925 -14.0630 1.3824 0.0000 -14.0365 1.3623 0.0000

P02 -12.4455 1.0031 -12.4995 1.0092 0.3831 -12.4490 1.0044 0.6326 -12.5280 1.0006 0.0418 -12.4515 0.9993 0.5652

P03 -9.0145 0.6945 -9.0370 0.6765 0.4970 -9.0250 0.6929 0.1640 -9.4840 0.6718 0.0000 -9.4700 0.6954 0.0000

P04 -8.5080 0.8314 -8.3685 0.8281 0.2784 -8.5160 0.8319 0.2322 -9.0115 0.8305 0.0000 -8.9585 0.8295 0.0000

P05 -13.6230 1.0099 -13.5865 0.9843 0.5513 -13.6150 1.0103 0.5095 -13.9810 1.0084 0.0000 -13.9335 1.0034 0.0000

Correlation

P01 0.6050 0.0660 0.6080 0.0653 0.6637 0.6020 0.0660 0.1934 0.6140 0.0653 0.2789 0.6030 0.0644 0.5086

P02 0.6400 0.0342 0.6450 0.0315 0.6707 0.6430 0.0328 0.5599 0.6420 0.0312 0.8504 0.6450 0.0311 0.3809

P03 0.3670 0.0777 0.3710 0.0739 0.5911 0.3610 0.0780 0.1114 0.3720 0.0763 0.3809 0.3590 0.0784 0.0224

P04 0.6900 0.0623 0.6940 0.0648 0.6255 0.6970 0.0604 0.1727 0.6910 0.0630 0.8638 0.6970 0.0631 0.2091

P05 0.6340 0.0451 0.6330 0.0452 0.9380 0.6280 0.0444 0.2172 0.6310 0.0459 0.5763 0.6330 0.0453 0.8589

Spectral Response

100 Hz -22.1860 1.3914 -22.3480 1.3474 0.3897 -22.5080 1.4371 0.1146 -22.528 1.241404 0.11554 -22.718 1.346994 0.023607

1 kHz -21.3800 2.5063 -21.0360 2.4590 0.3106 -21.2360 2.4818 0.2602 -21.69 2.628515 0.136359 -21.502 2.475878 0.41442

10 kHz -28.2600 1.3439 -28.4560 1.3426 0.6922 -28.2840 1.2857 0.9256 -29.25 1.587898 0.082693 -29.048 1.466235 0.042031

Spectral Shape

Natural Roll 9.6000 0.9138 9.6000 0.9138 9.6000 0.9138 9.6000 0.9138 9.6000 0.9138

Steep Roll 19.25 0.67082 16.2 0.2 18.125 0.460977 16.4 0.244949 19.25 0.67082

Cutoff 20.75 0.223607 17.2 0.3 19.5 0 17.8 0.122474 20.75 0.223607

WAV M128 M256 A128 A256

Appendix 2: Raw Data

WAV (16/44.1)

Peak (Left) RMS (Left) Peak (Right) RMS (Right) Correlation

0 2.83 10.36 2.58 10.83 0.23 100 Hz -23.65

1 0.53 17.37 8.08 18.95 0.69 1 kHz -22.51

2 3.45 15.71 3.94 14.03 0.02 10 kHz -28.99

3 2.77 13.78 5.57 12.66 0.23

4 0.47 10.57 0.39 10.39 0.52 Natural Roll 10 kHz

5 15.51 24.42 15.19 24.19 0.87 Steep Roll 16 kHz

6 2.27 12.21 3.07 12.75 0.47 Cut off 19 kHz

7 4.47 13.05 3.45 12.48 0.39

8 1.42 11.61 0.10 10.08 0.30

9 1.43 10.39 1.77 10.62 0.48

Mp3 128 kbps

0 2.78 10.35 2.39 10.83 0.25 100 Hz -23.10

1 8.08 17.35 8.37 18.74 0.63 1 kHz -23.08

2 3.74 15.74 4.16 14.16 0.02 10 kHz -28.88

3 3.38 14.11 5.54 12.65 0.17

4 0.89 10.58 0.29 10.50 0.47 Natural Roll 10 kHz

5 15.35 24.56 14.57 24.15 0.88 Steep Roll 16 kHz

6 2.47 12.35 2.87 12.94 0.49 Cut off 16 kHz

7 4.38 12.83 3.69 12.34 0.39

8 1.30 11.75 0.56 10.08 0.22

9 1.43 10.34 1.35 10.55 0.40

Mp3 256 kbps

0 2.77 10.39 2.47 10.83 0.20 100 Hz -23.65

1 8.75 17.37 7.91 18.94 0.67 1 kHz -22.92

2 3.67 15.58 3.59 14.09 0.07 10 kHz -28.92

3 2.76 13.77 5.44 12.63 0.23

4 0.47 10.57 0.32 10.38 0.52 Natural Roll 10 kHz

5 15.82 24.25 14.85 24.11 0.87 Steep Roll 16 kHz

6 1.96 12.16 3.24 12.80 0.46 Cut off 19 kHz

7 4.44 13.03 3.42 12.48 0.42

8 1.44 11.62 0.07 10.07 0.30

9 1.44 10.39 1.79 10.61 0.48

AAC 128 kbps

0 3.40 10.88 2.93 11.33 0.25 100 Hz -24.15

1 8.86 17.52 8.04 19.23 0.71 1 kHz -23.27

2 3.95 16.03 4.09 14.38 0.02 10 kHz -30.30

3 3.60 14.08 5.60 13.09 0.21

4 0.90 11.06 0.54 10.88 0.50 Natural Roll 10 kHz

5 15.75 24.73 14.63 24.56 0.88 Steep Roll 16 kHz

6 2.94 12.47 3.39 12.99 0.45 Cut off 17.5 kHz

7 4.74 13.43 3.86 12.78 0.38

8 1.60 11.95 1.59 10.55 0.29

9 2.08 10.77 1.86 11.00 0.46

AAC 256 kbps

0 3.36 10.72 2.93 11.18 0.22 100 Hz -23.82

1 8.88 17.75 8.43 19.33 0.68 1 kHz -23.23

2 3.79 16.10 4.39 14.36 0.04 10 kHz -29.62

3 3.12 14.13 5.97 13.01 0.23

4 0.83 10.94 0.79 10.74 0.51 Natural Roll 10 kHz

5 15.87 24.74 15.69 24.58 0.88 Steep Roll 16 kHz

6 2.47 12.61 3.51 13.12 0.45 Cut off 19 kHz

7 4.75 13.40 3.82 12.81 0.41

8 2.06 11.98 0.42 10.50 0.29

9 1.80 10.73 2.09 10.95 0.48

Spectral Content

WAV (16/44.1)Peak (Left) RMS (Left) Peak (Right) RMS (Right) Correlation

0 4.68 13.72 4.83 12.78 0.70 100 Hz -21.041 4.94 11.55 4.53 10.59 0.70 1 kHz -16.972 2.32 12.22 2.19 11.22 0.64 10 kHz -36.383 7.82 17.77 5.38 15.11 0.364 0.71 9.76 0.89 8.67 0.90 Natural Roll 4 kHz5 1.31 11.58 0.93 11.27 0.70 Steep Roll none6 0.87 11.16 1.61 12.94 0.60 Cut off 21.5 kHz7 1.96 10.59 1.70 10.42 0.938 1.96 10.94 0.74 9.95 0.649 0.31 9.92 0.30 8.00 0.48

Mp3 128 kbpsPeak (Left) RMS (Left) Peak (Right) RMS (Right) Correlation

0 4.62 13.86 5.25 13.20 0.69 100 Hz -20.621 5.38 11.92 4.32 10.86 0.70 1 kHz -16.892 2.51 12.27 2.59 11.34 0.67 10 kHz -34.783 7.46 18.20 5.23 15.34 0.374 0.24 9.83 0.66 8.50 0.90 Natural Roll 4 kHz5 1.20 11.49 0.99 11.27 0.736 0.00 11.08 1.12 12.30 0.667 1.93 10.24 2.04 10.17 0.918 1.58 11.10 0.64 10.06 0.669 0.46 9.97 0.00 7.90 0.31

0 4.58 13.73 4.85 12.79 0.70 100 Hz -21.481 4.90 11.60 4.58 10.60 0.71 1 kHz -17.702 2.37 12.19 2.28 11.29 0.61 10 kHz -36.643 7.61 17.73 5.30 15.11 0.354 1.01 9.83 1.04 8.71 0.91 Natural Roll 4 kHz5 1.22 11.60 0.85 11.24 0.71 Steep Roll 18.5 kHz6 1.03 11.21 1.65 12.90 0.62 Cut off 19.5 kHz7 1.89 10.68 1.75 10.47 0.928 2.04 10.91 0.64 9.95 0.639 0.26 9.89 0.23 8.00 0.26

AAC 128 kbpsPeak (Left) RMS (Left) Peak (Right) RMS (Right) Correlation

0 4.78 13.84 5.09 12.99 0.71 100 Hz -21.921 5.21 11.60 4.53 10.66 0.68 1 kHz -17.852 2.25 12.31 2.48 11.49 0.64 10 kHz -35.233 8.63 18.00 5.76 15.20 0.374 1.29 10.10 0.73 8.82 0.89 Natural Roll 4 kHz5 1.30 11.67 1.08 11.43 0.68 Steep Roll 17 kHz6 0.88 11.68 2.14 12.89 0.58 Cut off 18 kHz7 2.37 10.87 1.81 10.64 0.958 2.18 11.12 0.87 10.11 0.649 0.53 10.05 0.00 8.08 0.23

AAC 256 kbpsPeak (Left) RMS (Left) Peak (Right) RMS (Right) Correlation 100 Hz -21.67

0 4.71 13.80 4.92 12.91 0.70 1 kHz -17.811 5.09 11.69 4.64 10.72 0.70 10 kHz -36.072 2.34 12.45 2.39 11.42 0.633 8.16 17.93 5.61 15.22 0.37 Natural Roll 4 kHz4 0.92 9.93 0.95 8.84 0.90 Steep Roll none5 1.37 11.72 0.99 11.43 0.73 Cut off 21.5 kHz6 1.38 11.33 1.56 13.04 0.637 2.06 10.76 1.64 10.58 0.928 2.27 11.08 0.86 10.13 0.639 0.43 10.08 0.46 8.18 0.26

Spectral Content

0 0.00 5.21 0.00 6.12 0.45 100 Hz -19.691 1.71 6.30 4.91 10.91 0.49 1 kHz -28.052 1.11 8.86 1.35 9.54 0.30 10 kHz -36.053 2.02 10.62 2.58 10.87 0.234 0.71 9.49 1.65 9.52 0.57 Natural Roll 2 kHz5 1.67 10.71 0.00 9.69 0.50 Steep Roll 10.5 kHz6 0.42 6.97 1.64 11.91 0.20 Cut off 16 kHz7 1.12 10.67 0.89 8.45 0.448 0.37 7.09 0.79 10.45 0.109 0.21 8.68 0.44 11.05 0.62

Spectral Content

0 2.91 11.57 2.73 13.05 0.59 100 Hz -29.211 6.63 13.97 8.25 15.93 0.66 1 kHz -31.932 9.47 8.99 10.41 19.43 0.41 10 kHz -37.923 8.92 18.96 8.49 17.95 0.624 1.89 10.82 0.63 12.32 0.82 Natural Roll 8 kHz5 8.06 15.05 9.24 17.19 0.59 Steep Roll 20 kHz6 8.89 17.72 8.93 16.83 0.80 Cut off none7 9.10 18.80 9.01 18.87 0.648 3.06 12.56 3.43 9.68 0.399 10.29 17.45 10.17 17.42 0.91

0 2.71 11.32 2.86 12.58 0.61 100 Hz -29.211 6.97 14.16 8.52 16.04 0.66 1 kHz -32.092 9.15 18.89 10.23 19.38 0.38 10 kHz -38.103 9.27 18.98 8.35 17.95 0.644 1.43 10.46 2.04 11.73 0.79 Natural Roll 8 kHz5 7.28 15.25 9.55 16.97 0.56 Steep Roll 16 kHz6 8.64 17.68 8.65 16.90 0.81 Cut off 17.5 kHz7 9.16 18.80 9.05 19.03 0.628 2.68 12.41 3.60 9.76 0.399 10.43 17.32 10.46 17.44 0.90

0 3.15 11.71 2.89 13.21 0.60 100 Hz -29.38

1 6.82 14.10 8.32 16.02 0.66 1 kHz -33.42

2 9.75 19.14 10.45 19.56 0.40 10 kHz -40.24

3 9.26 19.20 8.62 18.11 0.644 1.94 10.99 0.84 12.46 0.83 Natural Roll 8 kHz5 8.24 15.18 9.39 17.35 0.59 Steep Roll 20 kHz6 9.07 17.85 9.04 16.97 0.79 Cut off none7 9.26 18.94 9.24 19.08 0.628 3.37 12.72 3.53 9.85 0.389 10.55 17.61 10.33 17.56 0.91

Spectral Content

WAV (16/44.1)Peak (Left) RMS (Left) Peak (Right) RMS (Right) Correlation Spectral Content

0 2.14 12.46 4.59 14.21 0.44 100 Hz -24.651 2.75 11.95 3.62 12.29 0.62 1 kHz -26.972 3.30 12.04 2.25 11.67 0.58 10 kHz -30.453 2.90 11.10 3.61 12.40 0.684 0.52 8.34 0.56 8.96 0.79 Natural Roll 10 kHz5 6.48 15.18 6.95 15.89 0.78 Steep Roll none6 0.97 10.77 1.74 10.85 0.72 Cut off none7 4.00 13.19 7.15 15.40 0.518 3.98 12.20 4.98 13.26 0.709 4.72 13.39 6.84 16.00 0.73

0 2.17 12.44 4.62 14.23 0.45 100 Hz -24.921 2.80 11.98 3.59 12.32 0.60 1 kHz -27.922 3.26 12.10 2.26 11.57 0.56 10 kHz -34.003 2.97 11.13 3.60 12.40 0.714 0.59 8.32 0.61 8.98 0.79 Natural Roll 10.5 kHz5 6.57 15.07 7.09 16.01 0.79 Steep Roll none6 0.77 10.74 1.64 10.84 0.73 Cut off 19.5 kHz7 4.12 13.15 7.15 15.37 0.508 4.02 12.23 5.00 13.29 0.689 4.62 13.46 6.89 15.96 0.75

Spectral Content

0 3.57 9.99 3.32 10.21 0.79 100 Hz -22.701 3.31 11.33 3.51 11.60 0.93 1 kHz -21.522 1.04 11.84 1.10 11.95 0.84 10 kHz -28.053 2.24 10.15 2.33 10.52 0.844 1.01 8.75 1.01 9.00 0.96 Natural Roll 11.5 kHz5 0.83 9.13 1.01 8.96 0.82 Steep Roll 19 kHz6 1.74 9.62 2.06 9.88 0.84 Cut off 20 kHz7 2.45 10.80 2.62 11.36 0.808 5.85 14.76 7.07 15.24 0.829 1.76 8.95 0.98 10.39 0.80

0 3.77 10.10 3.28 10.23 0.77 100 Hz -23.521 3.51 11.45 3.50 11.47 0.92 1 kHz -22.572 0.94 11.58 1.00 11.61 0.85 10 kHz -28.513 1.35 10.40 1.71 10.67 0.844 0.46 8.57 0.90 8.83 0.96 Natural Roll 11.5 kHz5 0.82 9.13 0.87 9.03 0.77 Steep Roll 16 kHz6 1.82 9.60 2.19 9.88 0.86 Cut off 17.5 kHz7 2.32 11.09 2.10 11.15 0.798 5.79 14.79 7.02 15.50 0.859 1.44 8.71 1.32 10.22 0.79

0 3.95 10.38 3.50 10.58 0.75 100 Hz -22.701 3.75 11.70 4.05 11.90 0.93 1 kHz -23.292 1.83 12.34 1.79 12.43 0.82 10 kHz -29.373 2.85 10.63 2.86 11.03 0.864 1.31 9.09 1.31 9.31 0.95 Natural Roll 11.5 kHz5 1.15 9.57 1.56 9.49 0.80 Steep Roll 16.5 kHz6 2.31 10.16 2.52 10.23 0.84 Cut off 18 kHz7 3.06 11.37 2.73 11.68 0.828 5.97 15.20 6.87 15.61 0.859 2.03 9.23 1.43 10.75 0.77

Spectral Content

0 5.11 15.67 5.32 16.43 0.82 100 Hz -23.051 0.23 6.76 0.77 7.97 0.98 1 kHz -30.332 11.36 20.95 11.27 21.08 0.59 10 kHz -35.973 9.45 19.30 10.38 20.11 0.854 1.86 8.36 4.02 9.37 0.79 Natural Roll 5.5 kHz5 9.22 18.77 7.57 17.66 0.41 Steep Roll 20 kHz6 6.43 14.16 7.25 15.03 0.57 Cut off none7 5.50 16.03 5.43 16.69 0.738 11.64 23.60 13.70 23.42 -0.329 11.20 22.57 11.64 21.56 0.45

0 4.88 15.75 5.06 16.46 0.77 100 Hz -23.201 0.37 6.83 0.71 7.84 0.98 1 kHz -30.452 11.02 21.32 10.65 21.69 0.67 10 kHz -36.003 9.86 19.51 11.18 20.39 0.844 1.58 8.58 4.29 9.54 0.77 Natural Roll 5.5 kHz5 9.58 18.64 7.66 17.71 0.40 Steep Roll 16 kHz6 6.25 14.04 7.09 14.88 0.52 Cut off 17.5 kHz7 5.50 15.91 5.55 16.89 0.758 11.61 23.97 13.94 23.83 -0.309 11.40 22.30 11.31 21.62 0.45

0 5.26 15.64 5.45 16.41 0.80 100 Hz -23.011 0.18 6.76 0.69 7.98 0.98 1 kHz -29.322 11.52 21.08 11.39 21.20 0.58 10 kHz -36.133 9.56 19.29 10.23 20.11 0.844 1.91 8.37 4.04 9.39 0.77 Natural Roll 5.5 kHz5 9.33 18.76 7.63 17.61 0.41 Steep Roll 19 kHz6 6.21 14.14 6.88 15.02 0.56 Cut off 19.5 kHz7 5.42 16.02 5.35 16.72 0.728 11.36 23.82 13.91 23.40 -0.309 11.53 22.56 11.92 21.60 0.43

0 5.71 16.24 5.95 16.81 0.84 100 Hz -23.541 0.83 7.28 1.10 8.33 0.98 1 kHz -31.002 11.40 21.78 12.14 21.70 0.59 10 kHz -36.963 9.64 19.81 11.55 21.01 0.864 2.42 8.98 4.30 9.74 0.79 Natural Roll 5.5 kHz5 9.71 19.24 7.94 18.22 0.41 Steep Roll 16.5 kHz6 6.90 14.62 7.52 15.44 0.54 Cut off 18.5 kHz7 5.92 16.42 6.01 17.24 0.758 11.87 24.08 13.94 23.85 -0.349 11.71 23.37 11.34 22.26 0.48

0 5.61 16.15 5.77 16.90 0.82 100 Hz -23.471 0.74 7.13 1.20 8.43 0.98 1 kHz -30.632 11.56 21.45 11.53 21.63 0.57 10 kHz -37.423 9.68 19.78 10.93 20.69 0.864 2.41 8.80 4.30 9.77 0.80 Natural Roll 5.5 kHz5 9.63 19.22 7.98 18.14 0.41 Steep Roll 20 kHz6 6.92 14.61 7.63 15.52 0.55 Cut off none7 5.95 16.45 5.77 17.10 0.748 12.21 24.06 14.92 23.92 -0.329 12.08 23.00 12.00 22.08 0.45

Spectral Content

0 5.79 16.06 7.86 16.39 0.61 100 Hz -24.641 3.80 15.79 7.03 16.32 0.58 1 kHz -18.812 6.05 14.88 6.44 15.14 0.92 10 kHz -32.713 1.01 10.18 1.53 11.01 0.634 0.37 6.00 1.52 6.79 0.93 Natural Roll 2 kHz5 0.37 7.02 1.67 8.07 0.77 Steep Roll 19.5 kHz6 1.79 7.65 2.85 8.98 0.77 Cut off none7 0.38 8.41 1.32 8.80 0.878 6.47 16.31 8.87 18.32 0.579 1.21 11.19 2.71 12.05 0.73

Spectral Content

0 7.04 13.21 5.16 13.22 0.95 100 Hz -23.841 3.13 13.56 2.11 13.44 0.79 1 kHz -20.112 0.10 8.82 0.64 8.79 0.91 10 kHz -29.173 3.82 10.73 6.62 15.25 0.484 1.11 10.23 1.41 10.42 0.88 Natural Roll 11.5 kHz5 1.26 8.32 1.51 8.52 0.73 Steep Roll 21 kHz6 2.30 15.19 1.53 13.31 0.75 Cut off none7 2.39 9.41 2.85 9.63 0.988 1.16 9.24 1.87 9.70 0.939 1.43 11.95 1.56 11.75 0.63

0 7.49 13.68 5.63 13.59 0.98 100 Hz -23.111 3.25 13.64 3.05 14.03 0.79 1 kHz -20.842 0.35 9.23 1.43 9.34 0.90 10 kHz -28.683 4.19 11.01 6.81 15.37 0.534 1.69 10.75 1.69 10.93 0.84 Natural Roll 11.5 kHz5 1.65 8.57 1.81 8.76 0.74 Steep Roll 16.5 kHz6 2.94 15.60 1.64 13.84 0.77 Cut off 18.5 kHz7 2.27 9.69 3.06 10.06 0.978 1.61 9.44 2.02 10.08 0.969 2.06 12.32 2.24 12.04 0.63

0 7.33 13.56 5.52 13.56 0.96 100 Hz -23.03

1 3.68 13.97 2.97 14.02 0.81 1 kHz -21.14

2 0.14 9.16 1.30 9.31 0.92 10 kHz -29.57

3 4.22 11.08 6.89 15.55 0.514 1.41 10.65 1.83 10.80 0.87 Natural Roll 11.5 kHz5 1.70 8.65 1.79 8.89 0.72 Steep Roll 21 kHz6 2.84 15.66 1.82 13.88 0.77 Cut off none7 2.77 9.74 3.43 9.98 0.988 1.56 9.62 2.15 10.05 0.939 2.01 12.28 1.96 12.11 0.64

Spectral Content

0 2.84 11.03 4.36 11.52 0.59 100 Hz -21.47

1 15.49 22.83 7.08 22.92 0.73 1 kHz -28.41

2 0.74 11.35 2.12 12.38 0.64 10 kHz -27.55

3 13.38 24.38 14.88 25.70 0.624 0.63 10.61 1.47 10.21 0.88 Natural Roll 12 kHz5 3.56 16.30 3.86 13.88 0.95 Steep Roll 19 kHz6 7.22 13.14 6.79 13.63 0.84 Cut off 19 kHz7 3.37 12.79 5.60 14.40 0.918 8.37 17.90 7.73 17.23 0.479 1.50 5.69 2.24 6.13 0.58

0 3.16 11.22 4.70 11.83 0.57 100 Hz -21.671 15.93 23.51 7.27 23.26 0.78 1 kHz -28.422 1.09 11.77 2.57 12.77 0.59 10 kHz -27.683 13.56 24.08 16.35 26.10 0.604 0.99 10.90 1.88 10.44 0.86 Natural Roll 12 kHz5 3.66 17.09 3.94 14.48 0.98 Steep Roll 18.5 kHz6 7.58 13.52 7.26 13.82 0.81 Cut off 19 kHz7 3.85 13.20 5.98 14.49 0.918 8.97 18.37 8.61 17.92 0.509 1.78 5.92 2.77 6.63 0.55

Spectral Content

0 6.39 14.01 7.67 14.64 0.83 100 Hz -19.011 3.15 12.41 2.70 12.52 0.52 1 kHz -17.412 8.75 19.11 9.57 20.40 0.66 10 kHz -23.163 4.00 13.74 3.95 11.95 0.724 7.35 14.42 6.66 13.43 0.67 Natural Roll 11.5 kHz5 1.76 9.74 2.19 8.79 0.21 Steep Roll none6 13.59 22.52 11.33 21.21 0.78 Cut off 21 kHz7 2.55 12.38 4.11 12.85 0.508 2.91 12.34 3.20 12.51 0.829 0.38 7.88 0.25 7.08 0.34

Spectral Content

0 0.00 3.26 0.00 4.13 -0.20 100 Hz -18.861 0.06 9.17 1.31 10.81 0.42 1 kHz -29.862 0.00 6.57 0.00 6.70 0.58 10 kHz -30.003 2.03 9.50 2.35 9.97 0.404 1.92 9.65 2.08 10.25 0.70 Natural Roll 7 kHz5 0.00 9.42 0.79 9.69 0.39 Steep Roll 17 kHz6 0.00 9.37 0.16 11.61 0.49 Cut off 20 kHz7 3.36 11.33 2.08 9.62 0.438 1.14 11.07 0.27 10.12 0.289 0.00 9.58 0.00 8.47 0.18

0 0.00 3.28 0.00 4.16 -0.21 100 Hz -18.731 0.02 9.18 1.31 10.80 0.42 1 kHz -29.812 0.00 6.57 0.00 6.71 0.58 10 kHz -30.293 2.04 9.51 2.50 10.02 0.394 2.05 9.68 2.16 10.28 0.69 Natural Roll 7 kHz5 0.00 9.49 0.62 9.72 0.38 Steep Roll 17 kHz6 0.00 9.38 0.08 11.60 0.50 Cut off 19.5 kHz7 3.20 11.28 2.24 9.64 0.408 1.16 11.09 0.25 10.09 0.289 0.00 9.54 0.00 8.48 0.18

0 0.00 3.85 0.00 4.62 -0.19 100 Hz -19.35

1 0.52 9.50 1.37 11.32 0.42 1 kHz -30.69

2 0.08 7.32 0.15 7.42 0.58 10 kHz -32.00

3 2.66 9.94 2.67 10.30 0.434 2.28 10.18 2.61 10.76 0.69 Natural Roll 7 kHz5 0.01 9.90 1.49 10.11 0.36 Steep Roll 16 kHz6 0.00 9.68 0.65 12.01 0.49 Cut off 17.5 kHz7 3.76 11.80 2.52 10.08 0.458 1.35 11.37 0.80 10.51 0.309 1.12 10.06 0.00 8.95 0.19

Spectral Content

0 5.19 13.04 7.13 13.70 0.70 100 Hz -22.981 1.16 10.54 2.18 11.74 0.95 1 kHz -15.352 0.01 6.69 0.01 7.95 0.65 10 kHz -28.903 0.01 7.40 0.01 7.77 0.624 0.61 9.41 0.30 9.62 0.74 Natural Roll 10 kHz5 0.02 7.72 0.12 8.74 0.75 Steep Roll none6 0.01 8.34 0.12 8.63 0.43 Cut off 21 kHz7 0.01 9.11 0.27 8.37 0.328 0.01 3.42 0.01 3.21 0.869 0.01 7.41 0.01 7.35 0.88

0 5.65 13.48 7.51 14.13 0.70 100 Hz -23.401 1.53 10.98 2.58 12.15 0.95 1 kHz -15.782 0.00 7.14 0.39 8.41 0.70 10 kHz -29.053 0.35 7.85 0.65 8.21 0.614 0.91 9.82 0.64 10.06 0.75 Natural Roll 10 kHz5 0.41 8.19 0.63 9.20 0.75 Steep Roll none6 0.49 8.77 0.62 9.08 0.43 Cut off 21 kHz7 0.44 9.64 0.58 8.88 0.328 0.32 3.87 0.35 3.66 0.879 0.44 7.84 0.47 7.81 0.89

Spectral Content

Embracing Mp3: mastering audio in the age of mobile...

Documents

Transcript of Embracing Mp3: mastering audio in the age of mobile...

MP3 Calculations

CODIFICACIÓN MP3

The MP3 Book

Sevilha Mp3

MP3 Player, How to Use the STA013 MP3 Decoder Chip

MP3 Tutorial

Answer Key - 211.233.5.94211.233.5.94/mp3/Mastering Skills for the TOEFL iBT 2nd/Mastering … · Answer Key 221 Chapter 3 Focus A Guided Practice 01 Acupuncture Suggested underlined

Portable MP3-CD player MODE AL AUDIO ACT ESP MP3

Mp3 player

Embracing Mp3: mastering audio in the age of mobile …€¦ · Embracing Mp3: mastering audio in the age of mobile music Chris Camilleri Advisor: Dr. Kenneth Peacock Spring 2009

Interview MP3

Mp3 designs

Mastering disruption, embracing complexity

Mastering the Management Mastering the Management of ...

Mp3 ei 4slideshare_dendi

Mastering Station - IONTAC · Mastering Station RAPID Mastering Station The new RAPID Mastering Station is the next generation DCI-JPEG2000 mastering solution with extensive encoding

Revolutionary Mastering Solution - iis.fraunhofer.de · This revolutionary plug-in will allow you to listen to audio being encoded and decoded with Fraunhofer codecs such as mp3 and

Presentation With Mp3

MASTERING SCIENCE & TECHNOLOGY FOR MASTERING THE …€¦ · MASTERING SCIENCE & TECHNOLOGY FOR MASTERING THE HUMAN BRAIN & UNRAVELING THE IMMENSE HUMAN POTENTIAL. ... Telephone by

DE EN ES ODYS MP3-Player X50 ODYS MP3 player X50 ODYS ...