Understanding Audio Sonograms

25
UNDERSTANDING AUDIO SONOGRAMS Posted on October 28, 2009 UNDERSTANDING SONOGRAMS Recent technology has given us the ability to "see" into any he!eto o!e invisible #a!ts o the $o!ld% &!o '(Rays o b!o)en bones* to +AT scans o the b!ain* to ult!asound iages o yet(to(be bo!n in ants, ode!n technology !eveals i#o!tant and use ul in o!ation about ou! $o!ld% One such TOO- that can be ve!y hel# ul o! bi!de!s is the sonog!a* o! o!e accu!ately* the audio s#ect!og!a% Audio s#ect!og!as .AS/ allo$ bi!de!s to "see" inside a bi!d vocali0ation and can #!ovide i#o!tant clues on ho$ to di1e!entiate one call o! song !o anothe!% Soeties they hel# by sho$ing subtle va!iations in sho!t calls, othe! ties by hel#ing the bi!de! !ecogni0e di1e!ences in the la!ge! #atte!ns o co#le2 songs% Once these di1e!ences a!e disce!ned in s#ect!og!as* they o ten becoe uch easie! to hea! and di1e!entiate in the 3eld% In this a!ticle I4ll #!ovides soe guidance on ho$ to !ead audio s#ect!og!as* and then I4ll use the to deonst!ate ho$ they can be hel# ul in di1e!entiating the easy( to(con use songs o the th!ashe!s ound in SE A!i0ona% 56AT ARE AUDIO SPE+TROGRAMS An audio s#ect!og!a is a t$o diensional g!a#hical !e#!esentation o an audio sou!ce% S#ect!og!as a!e c!eated using a #!inci#al* called &ou!ie! Analysis* that states that co#le2 #henoenon li)e sounds* othe! #hysical #henoena* o! even e7uations* can be o!e easily unde!stood $hen they a!e b!o)en do$n into salle! #ieces% To a)e it #ossible to "see" a sound* a &ou!ie! analysis is ade o the audio and the !esulting in o!ation is conve!ted into g!a#hical o!% 6e!e4s ho$ the #!ocess $o!)s% &i!st o all* the ta!get sound is b!o)en do$n into ve!y sho!t sections o! sa#les* usually a illisecond o! less in length% So a one second sound $ould be b!o)en do$n into 8*999 o! o!e sho!t sa#les% The analy0e! then chec)s each sa#le to see i the!e is an sound #!esent at that oent in tie% I audio is #!esent* then it chec)s the sound at each o any di1e!ent !e7uencies to dete!ine $hich !e7uencies a!e #!esent at that tie% The #!esence o any audio content in a band is then g!a#hically !e#!esented by a sho!t line o! dot at each !e7uency #!esent at the tie o the sa#le% The intensity o! loudness o the audio at each !e7uency is !e#!esented usually as a lighte! o! da!)e! line on a continuu !o ve!y so t .light a!)/ to ve!y loud .uch da!)e! a!)/% This analysis is !e#eated o! eve!y sho!t tie "sa#le" o! as long as the sound lasts% The !esulting g!a#hic is a collection o all o these instantaneous !e#!esentations o !e7uency content #laced on a tie line% The ho!i0ontal a2is is tie* sho$ing the length o the audio% The ve!tical a2is is !e7uency* $ith dots o! lines sho$ing $hat* i any* content the!e $as at each !e7uency% SOME SIMP-E AUDIO E'AMP-ES -et4s loo) at a ve!y si#le e2a#le o an audio s#ect!og!a% Su##ose $e $anted to a)e an audio s#ect!og!a o a $histle that sta!ted at a lo$ #itch and g!adually and

description

sonograms and reading

Transcript of Understanding Audio Sonograms

UNDERSTANDING AUDIO SONOGRAMSPosted onOctober 28, 2009UNDERSTANDING SONOGRAMSRecent technology has given us the ability to "see" into many heretofore invisible parts of the world. From X-Rays of broken bones, to CAT scans of the brain, to ultrasound images of yet-to-be born infants; modern technology reveals important and useful information about our world.One suchTOOLthat can be very helpful for birders is the sonogram, or more accurately, the audio spectrogram. Audio spectrograms (AS) allow birders to "see" inside a bird vocalization and can provide important clues on how to differentiate one call or song from another. Sometimes they help by showing subtle variations in short calls; other times by helping the birder recognize differences in the larger patterns of complex songs. Once these differences are discerned in spectrograms, they often become much easier to hear and differentiate in the field.In this article Ill provides some guidance on how to read audio spectrograms, and then Ill use them to demonstrate how they can be helpful in differentiating the easy-to-confuse songs of the thrashers found in SE Arizona.WHAT ARE AUDIO SPECTROGRAMSAn audio spectrogram is a two dimensional graphical representation of an audio source. Spectrograms are created using a principal, called Fourier Analysis, that states that complex phenomenon like sounds, other physical phenomena, or even equations, can be more easily understood when they are broken down into smaller pieces.To make it possible to "see" a sound, a Fourier analysis is made of the audio and the resulting information is converted into graphical form. Heres how the process works.First of all, the target sound is broken down into very short sections or samples, usually a millisecond or less in length. So a one second sound would be broken down into 1,000 or more short samples.The analyzer then checks each sample to see if there is an sound present at that moment in time. If audio is present, then it checks the sound at each of many different frequencies to determine which frequencies are present at that time. The presence of any audio content in a band is then graphically represented by a short line or dot at each frequency present at the time of the sample. The intensity or loudness of the audio at each frequency is represented usually as a lighter or darker line on a continuum from very soft (light mark) to very loud (much darker mark).This analysis is repeated for every short time "sample" for as long as the sound lasts. The resulting graphic is a collection of all of these instantaneous representations of frequency content placed on a time line. The horizontal axis is time, showing the length of the audio. The vertical axis is frequency, with dots or lines showing what, if any, content there was at each frequency.SOME SIMPLE AUDIO EXAMPLESLets look at a very simple example of an audio spectrogram. Suppose we wanted to make an audio spectrogram of a whistle that started at a low pitch and gradually and smoothly rose to a very high pitch over 30 seconds. And lets say the analyzer sampled the sound at each of 30 frequencies, once every second. The resulting graphical representation of the sound would show one dot at each sample for the frequency of the tone at that time. It would look like this:

Now lets look at the audio spectrogram of a simple sine wave. If you remember back to your physics class (you werent sleeping were you) a sine wave is the purest of all tones. It consists of only 1 pitch, with no overtones, and is similar to the sound you would hear from a flute or a very pure whistle. A sonogram of a one second sine wave at one pitch would have only one line, representing the pitch of the sound, and the length would be one seconds worth of distance on the graphic.A second pure whistle of the same length, but at a lower pitch, would have a single line also lasting a second, but the line would be lower on the graphic than the first line.Heres an Audio Spectrogram of 12 pure tones, each lasting 1 second. The tones are in groups of 3 tones at the same pitch. Each group is lower in pitch than the prior group. The whole selection lasts about 4 seconds, from the 7 second mark to the 11 second mark on the time scale.Notice that the frequency of the first set of tones is about 1kHz or 1,000 cycles per second. (Middle C on a piano is about 260 cycles.)

Heres a sonogram of "row row row your boat" performed with a flute which has no overtones.If you take two sounds of different pitches but equal volume and play them at the same time, you would hear them both at once and it would sound like a chord. The AS would look like this:

Now lets take five sounds and stack them on top of each other. This time well make all but the lowest much softer and place each an octave higher than theNEXT. Instead of sounding like a chord, it would sound like just one pitch, the pitch of the lowest note. However it would sound much richer than a simple sine wave.If you remember back to your physics class again, this is what happens when a bow excites a string and the resulting sound consists of one or more harmonics. The more harmonics, the richer sounding the sound.When you are reading a sonogram for a bird song, its important to remember that the more harmonics visible in the audio spectrogram, the richer the sound.

Heres how a simple upslurred bird call or song would look like.

Now heres part of song that is a fast upward then downward slur and contains two harmonics, making the song sound fairly "rich".

If the pitch of the sound varies very quickly during its duration, but only by a small change in pitch, then this would be visible on an AS as a ripple or wave in the graphic. Heres an example of part of "row row row your boat" played with a sine wave that has vibrato, or fast, small variations in its pitch.Notice the different notes, and also that each note has very fast changes in pitch caused by the vibrato.

If the pitch of the sound varies quickly and very widely, then it would sound like a trill. Heres the sonogram of a trill, in this case a Cedar Waxwing. Notice the harmonics that show the trill is fairly round or rich and not "dry".Finally, if you stack a lot of unrelated sounds very close in pitch to each other, instead of hearing a rich tone, youllHEAR NOISE. The "ultimate" noise is actually the simultaneous sounding of equal levels of every possible pitch you can hear as a human. Heres a sonogram of part of a Seaside Sparrows song that contains almost pure, pitchless noise.

In summary, the more simultaneous sounds that are harmonics of the lowest tone, the richer the sound. The more simultaneous sounds that are not harmonics, the noisier or raspier the sound. Keep this in mind as we now look at some bird songs.ON TO BIRD SONGSSo what does all of this nonsense, that once put you to sleep in physics class and is starting to sound pretty soporific now, have to do with birds? Well, plenty. In order to read an Audio Spectrogram effectively, you need to be able to interpret it in two main ways.First of all, you need to understand the tone of the bird by seeing how many sounds are stacked at any one moment of time and if they look like harmonics or just amount to some kind of noise.And secondly you need to interpret the rhythm and patterns of the song as it unfolds over time.Lets take a look at a couple of simple bird songs that demonstrate some of the basics we have been discussing above.Here is an audio spectrogram of the very clear tones of a Lesser Yellowlegs. Notice the harmonics, denoting a rich tone; and the downward slur of each note.

Here is the familiar spring song of a Northern Cardinal (familiar at least for those of us in the Eastern US.) Notice all of the harmonics, denoting a very rich song. The song starts with two very slow upslurs. It then continues with a very steady, fairly rapid sequence of rich tones. If you look closely you can see that each of the rapid notes has a prominent downslur.The Ovenbirds song starts quietly and increases in volume. As you can see in this sonogram, it also increases in richness of tone. You can also see the two parts of each song element ("tea chertea cher.)COMPARING CHICKADEE CALLSNow lets examine two different species vocalizations and see how an AS might help us learn to differentiate them in the field.Heres an AS of a Black-capped Chickadees "phoebe" call.

There are several useful things to notice about this call.First of all, the sound is very pure. There is a basic pitch and a couple of harmonics. Were pretty sure they are harmonics and not noise because they are evenly distribute above the fundamental pitch. Both notes of the two part call have the same level of "purity" since they contain about the same harmonic content. So they will sound similar in quality.The first tone falls a bit in pitch, but not a lot. The second tone is lower than the first.Theres a clean break between the two notes, so they will sound distinct and separate.Now lets take a look at a Carolina Chickadees call. Although these calls are often confused in the field, they are actually very different. And this difference is quite evident when you look at the audio spectrograms.

The most obvious difference is that the CACHs call has four notes vs the BCCHs two. But lets look a bit deeper to see some more revealing differences, since the BCCH can double its call or the CACH truncate its call.Another striking difference between the two calls is the pitch difference between the first and second notes in the CACHs call. Whereas the BCCHs two notes were very close to the same pitch, 4.5kHz to 3.75kHz, there is a big pitch jump from the first note to the second note in the CACH: from 6kHz to 3.5kHz. And oddly enough, the second note of the CACHs call is lower than the notes of the BCCH! This can explain some confusion caused by field guides that describe the CACH as being a higher call than the BCCH.INDEEDit starts higher, but the second notes are lower.Take a look at the first two notes. Notice the differences in the harmonics between the first note, with only one harmonic, and the second note, with three harmonics. The first note is thin and pure sounding, the second more complex. Certainly the two notes do not sound as similar to each other as the BCCHs notes, which are basically the very same tonal quality.Now look at the graphic area between the first two notes of the CACHs call. You can clearly see a line between the two notes that extends lower than the second note. Since the line indicates many different frequencies in the same very short period of time, this part of the call will not be a clear note, but rather some kind of noise. And it will sound a bit lower than the second note. Since its short and noisy, then, it will sound a bit like a hiccup or glitch in the song. This glitch is very obvious when you listen to the CACHs song and is very different from the two pure, simple notes of the BCCH.Finally, the length of the CACHs call is about 1.5 seconds, the same length as the two notes of the BCCH, which therefore sounds slower and more relaxed.As you can see, an audio spectrogram can make it easier to "see" inside vocalizations and find the important differences between two species.TRILLS AND NOISY CALLSTrills usually consist of the same note or inflected note repeated many times in a very short period of time.Here are four fairly similar bird songs that are trills. The audio spectrograms reveal some interesting points about each song.

Notice that both the Worm-eating and Pine Warblers start their songs softly and then quickly increase to full volume. The Pine Warblers trill also has a lot of variability in pitch.The Dark-eyed Juncos song shows two very distinct harmonics. This indicates that the song is much richer than the other trills. There also is a rich, very short note between each iteration of the trill, that would help in learning the song.The Chipping Sparrows song is the simplest, with less internal variation in each individual note and abrupt beginning and ending. Spending more time with these spectrograms will reveal some other differences as well.As another example of trills, heres the song of a Savannah Sparrow.

Notice the song starts with some relatively clear slurs, and that the buzzes have some remants of harmonics and a distinct change of pitch. Since the Savannah Sparrows trill shows much denser lines than the trills in the above warbler examples, it will sound much noisier. However it wont sound as noisy as the Seaside or Nelsons Sharptailed Sparrows in the example below.NOISY VOCALIZATIONSI mentioned above that noise is the simultaneous sounding of many unrelated, close pitches. In an audio spectrogram it looks like a big block of dark color. If the graphic of the noise shows black from the very bottom to the top of the AS, then there will be no pitch content at all. Our ear will not hear the sound as having any pitch. However if the "blob" is concentrated in one part of the audio spectrogram, then the noise can sound high or low, especially in relation to other noises in the song.Here are two songs that contain noise.

The noise at the end of the Seaside Sparrows song is a very dense, fairly even "mass" in the AS. This indicates that the noise has very little pitch, but just sounds like broad spectrum noise.In addition to the noise, its interesting to notice that before the last large section of noise there are three "notes" that are broad, simultaneous sounding of many pitches near each other. That indicates this section wont sound like a clear tone, or a whistle.But the noise isnt monolithic. Notice that there actually 4 different wide lines on top of each other, with space between them. This indicates the sound will have some of the characteristics of a tone with harmonics. So this section of the Seaside Sparrows song will have a much moreMUSICAL" or rich sound for these three notes. And in fact this change of characteristic within the song is useful in separating the Seaside Sparrow from other sparrows that share their reedy habitat.The Nelsons Sharp-tailed Sparrows song has a less "broad" noise characteristic. The graphic for the first section of noise is concentrate in the lower 2/5ths of the spectrogram. This indicates that there will be some sense of pitch to the noise. It wont sound rich as there are no indications of harmonic content. However you will hear a definite change in pitch of the noise as the song progresses from the first section of noise to the second section.ANALYZING CALL NOTESShort call notes are the bain of many birders. Songs of many species can sound very similar, and many otherwise audio-oriented birders balk at the threshold of learning call notes. Here again, audio spectrograms can help out, at least in some cases.Lets look at the notes of five different birds that can be found calling near each other during migration on the East Coast of the US.A couple of initial ideas: Call notes can vary from being almost pure "pitched" noise to a whistle-like tone. As we have discussed, in an audio spectrogram, noise is indicated by a block of black. As mentioned above, if the block covers the whole audio spectrum, then there will be no indication of pitch. If the block is restricted to one small part of the audio spectrum, our ear will hear the referenced vocalization as having some pitch, maybe sounding basier or darker or higher or lighter than other birds or other parts of the same call. If there are harmonics, even harmonics that approach being noise themselves, as seen above, the vocalization will be richer than noise without harmonics.With this in mind, lets look at the call notes of five warblers.

The calls of the Hooded Warbler and the Common Yellowthroat are basically pure noise. Not full spectrum noise, but there are no harmonics to add any richness to the call. Looking closer, we can see that the COYEs call is actually made up of several very fast iterations of noise with small intervening spaces. This call, then, will sound more like a very fast rattle than a pure monolithic sound. If you listen for this fast variation, the call becomes much easier to ID.In contrast, the Hooded Warblers call is very monolithic. Notice also that it is, relatively speaking, a very long call note, and trails off towards the end of the call note. In fact, the note is more than twice as long as the COYEs call note.The Chipping Sparrows call note is very short and simple. Its basically one very fast event. Notice that, even though it is the highest of the call notes were discussing, it has one harmonic, showing that it is a fairly rich tone.The Yellow and Magnolia Warblers both have much more pitched call notes. Both show harmonics and considerable variation in pitch within the note. The Yellow Warblers call note ends lower than it starts and has a fast up and down movement, with most of the energy of the call in the downward movement.The Magnolia Warblers note is more gentle sounding, with most of the energy of the call at the highest point in the call and then a short falling off of the pitch.Of course when discussing call notes, were referring to very short vocalizations. The variations weve seen are taking place often in less than 1/10th of a second. Although the Hooded Warblers call is twice as long as that of the Common Yellowthroat, the difference is only 1/20th of a second. These differences are difficult to pick up when hearing the birds in the field. However one of the contentions of this article is that if you study audio spectrograms, particularly of these difficult vocalizations, you can discover much more easily what you need to listen for in the field. And these discoveries will in fact help you become much better at IDing birds from their call notes. In other words, it IS possible to hear these differences. But it helps aLOT TOknow what you are listening for!ON TO THE THRASHERSNow lets see how audio spectrograms can help us distinguish the differences between the songs of a difficult group of birds, the thrashers found in Southeast Arizona. Well examine the songsCONSIDERINGthe following criteria: the rhythm of the song, whethere there are stops or spaces in the song, the number of different song elements and how they vary, and the tonal or "pitched" range of the song.These audio spectrograms are of the first 7 or so seconds of the thrasher songs found in the Stokes Field Guide to Western Birds if youd like to listen along.The song of the Crissal Thrasher is a good place to start.

Notice first that the song is divided into very obvious sections with very visible short or longer pauses separating the sections. The sections all look fairly different from each other, including a two-part slow slur, a trill, and a three-part faster slur. There is also clear variations in pitch from section to section.This song will sound divided into sections and will have a lot of variation.Now lets contrast the Crissals AS with the similar LeContes Thrasher.

This song also has some long pauses, in fact even longer and more relaxed pauses than the Crissals song. Now notice that the sections of the song all look much more similar to each other than the Crissals repertoire, indicating less variation in the song.Also, you can see a number of "spikes" or short long lines indicating chip-like fast notes. These chips seem to puntuate much of the song, unlike the Crissal which has a lot more variety made up mostly of slurs and trills.The rhythm of the LeContes song is also fairly similar from section to section. There isnt nearly as much variation as youve seen in the Crissals song.Finally, the pitch of this song stays in the same general range. Although individual sections have slurs, the basic pitch from section to section is very similar. Again this is in contrast with the more variable Crissals song.The Bendires Thrasher has a very constant, almost run on song indicated by a steady rhythm with virtually no pauses. This is very different than the previous two species, that have variable and fairly long pauses from section to section.

The Bendires song shows elements that are often repeated three or four times, so the song will have some feel of repetition, however the sections are not set off from each other by pauses as they are in the Crissal.Notice that the pitch of the elements fall in the same basic range, but that in many song elements thare is a very fairly wide range of harmonics. That indicates the song will be fairly "rich" sounding: not thin and not deep. However there wont be much of a sense of change in pitch. So the song will sound like a very rich, run on collection of repeated sections.The Sage Thrasher also has a run on song with few pauses. Notice, however, that there are "pick up" chips, or very light single lines throughout the song. This is not present in any of the other thrasher songs so consistently.

Also, the pitch range is very limited and much lower, with fewer harmonics than the Bendires. That indicates the song will seem to be of consistent tonal quality and pitch throughout, and will seem lower and quite a bit less rich than the Bendires.Finally letsCONSIDERthe Curve-billed Thrasher.

Unlike the Bendires and Sage Thrashers songs, the song contains distinct pauses and much more variability in pitch. This suggests the Crissals song. But notice that the pauses are very brief and infrequent, unlike the longer and more variable and relaxed pauses in the Crissals song.Also, the Curve-billed song contains many shorter, chip-like sounds as distinct elements within the vocalization. This makes the song sound a bit harsher and can be very easy to pick out in the field.So the Curve-billed Thrashers song will seem faster, with fewer and shorter pauses, and will have distinct chips that are song elements, not pick up chips as in the Sage Thrashers song.In summary, the audio spectrograms make it very clear that three species of thrasher have significant, regular pauses in their songs and two do not. This should make it very easy to distinguish, for example, a Bendires from a Crissal Thrashers song.In the run on songs, the Bendires song is much richer, contains more repeated elements, and is not punctuated by chips as is the more monotonic Sage Thrashers song.For the group with pauses, the Crissal is fairly relaxed, and contains a lot of variety and pitch change. The LeContes has even longer pauses, but the elements are much more similar,are repeated more times, and are often punctuated by chips as part of the repeated song elements. The Curve-billed song is more "nervous" with fewer and shorter pauses. It contains chips that are their own elements, not part of a larger pattern. And there is much less repetition of song elements than than the more relaxed LeContes and Crissal songs.CONCLUSIONHopefully this article has been able to demonstrate how useful audio spectrograms can be in aiding you in identifying difficult songs and calls. Of course any exercise that helps you focus more intently on a song will be beneficial. Audio spectrograms offer a unique opportunity for you to analyze and compare the rhythm, tonal quality, and overall form of bird vocalizations and, by seeing "inside" the song, become more familiar with the harder to hear elements. Once you are in the field you will then be able to focus more easily on these elements and identify the vocalizations of previously difficult to ID species.Copyright 2009 Tom Stephenson