Download - I pledge on my honor that I have not given or received any ...

I pledge on my honor that I have not given or received any unauthorized assistance on this assignment/examination. I further pledge that I have not copied any material from a book, article, the Internet, or any other source, except where I have expressly cited the source. Signature _______________________________ Date: ___________________

Acoustic Detection of Foreign Sounds in an Urban Environment

Submitted to MSC Summer Research Institute

1 Castle Point Terrace Hoboken, NJ 07030

By: Alvaro Murillo University of Alaska Fairbanks Anthony Bianco Stevens Institute of Technology Laurie Prinz Stevens Institute of Technology

Raúl Huertas University of Puerto Rico Mayaguez Yegor Sinelnikov Stevens Institute of Technology

“Written and presented with the support of the Maritime Security Center, A Department of

Homeland Security Science and Technology Center of Excellence.”

July 28th, 2016

This material is based upon work supported by the U.S. Department of Homeland Security under Grant Award Number 2014-ST-061-ML0001. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the U.S. Department of Homeland Security.

1

TABLE OF CONTENTS

Abstract…………………………………………...………………….……………………………………6

Executive Summary…...…………....……..……………….…………………………………...............7

Introduction……………………………...………………………….……………………….…………….8

Materials & Methods………...….....………….…………………...…………………….……………..11

List of Materials................................................................................................................11

Microphone and F8 Multitrack Sensitivity…………...……………………………………….12

Calibration………………………….……………………...……………………...…………….12

Audio and Spectrogram Synchronization…………………………………………...…….…13

Bill of Material………………………….………………………………………………....….....14

USMMA Procedure………………………………………………………………..….………..15

Penn Plaza Pavilion Procedures………………………...…………………..……….………16

Anechoic Chamber Procedure…………………………………………..…………....……...17

Hoboken Pier Procedure...……………………………………………………………..……..18

Test Locations…..…………………....……………...…………………….…………..……….19

Definitions and Equations…………………………………...……………..…………………..……....20

Results & Discussion…...………………….………..……………………………………..…………..21

USMMA..…….……..……………………...……………………………………………………21

Penn Plaza Pavilion………….………………………………………………………………...27

Anechoic Chamber……..…………….....……………………………………………………..34

Hoboken Pier……………………………….…………………………………………………..40

Potential Filtering Process……………………………………………………………………………..43

Conclusion………………………….…………………………………………………………………....45

Recommendations…………………………………...…………………………………………………46

2

Appendices..……………………………………………………………………………………………..47

Calibration Factor Verifications Penn Plaza Pavilion...…………………………...………..47

Calibration Factor Verifications USMMA...…………………………………………………..53

Calibration Factor Penn Plaza Pavilion……..……..……..……..……..……..……..……....53

Calibration Factor Verifications Anechoic Chamber...……….……………………………..54

USMMA Data Organization……..……..……..……..……..……..……..……..……..……...56

Penn Plaza Data Organization……..……..……..……..……..……..……..……..………....63

Anechoic Chamber Data Organization……..……..……..……..……..……..……..……….72

Additional Figures…...………………………………………………………………………….78

References..….………………………………………………………………………………………….80

Acknowledgement……………………………...………...………...………...………...……………...82

3

LIST OF FIGURES

Fig 1: Process of Sound Event Recognition..................................................................................8

Fig 2: Microphone Setup at Penn Plaza (left) and Zoom F8 (right).............................................11

Fig 3: United States Merchant Marine Academy Experimental Setup…………..........................15

Fig 4: Penn Plaza Pavilion Experimental Setup……………………………………………..……....16

Fig 5: Stevens Institute of Technology Anechoic Chamber Experimental Setup………………...17

Fig 6: Joint Research with Buoy Team at Hoboken Pier…………………...………………..……..18

Fig 7: United States Merchant Marine Academy Experimental Research………………………..21

Fig 8: Recording 5 Spectrogram; Boat Traveling Downwind……………………………...……....22

Fig 9: Recording 5 Spectrogram; Boats Acoustic Signature Fades Away; 0-2 kHz……………..22

Fig 10: Recording 6 Spectrogram; Boat Traveling Downwind ……………………….…...……….23

Fig 11: Recording 6 Sound Pressure Level vs Distance (left), Sound Pressure Level vs Time

(right)…………..…………...…………...…………...…………...…………...…………...…………....23


(right)……………………………………………………..…………...…………...…………...………..24

Fig 13: Recording 10 Spectrogram; Boat Traveling Upwind…………………………...…………..24

Fig 14: Recording 10 GPS Track (Left), Distance vs Time (Top), Azimuth vs Time (Bottom).....25


(right).………………………………………………….…………...…………...…………...…………..25

Fig 16: Recording 10 Lloyd Mirror Effect……...…………...…………...…………...…………...…..26

Fig 17: Experimental Research at Penn Plaza Pavilion…..…………...…………...……………....27

Fig 18: Temperature Records throughout Penn Plaza Experiment…..…………...…………...….28

Fig 19: Humidity Records throughout Penn Plaza Experiment…...…………...…………...……...28

Fig 20: Wind Speed Records throughout Penn Plaza Experiment….…………...………………..29

Fig 21: Sound Pressure Records throughout Penn Plaza Experiment…………...…………...….29

Fig 22: Error Bar Chart of Sound Pressure Level as a function of Time at Penn Plaza Recording

with Standard Deviation Range……………...…………...…………...…………...……………...….30

Fig 23: Penn Plaza Pavilion Occurrence of Typical Events….…………...…………...…………...30

Fig 24: Recording 3 Whistling in Penn Plaza Pavilion at 8:31 AM …...…………...……………...31

Fig 25: Recording 3 Car Horn in Penn Plaza Pavilion…..…………...…………...…………...……31

Fig 26: Recording 3 Police Sirens in Penn Plaza Pavilion…..…………...…………...…………....32

Fig 27: Experimental Research at Stevens Institute of Technology Anechoic Chamber…..……34

Fig 28: Recording 8 Spectrogram; Gunshot 0-24 kHz…………………………....………………...35

4

Fig 29: Recording 9 Spectrogram; Sport Whistle 0-24kHz………….…………...………………...36

Fig 30: Recording 10 Spectrogram; Wooden Whistle 0-24kHz………………………..…………..36

Fig 31: Recording 11 Spectrogram; Metal Whistle 0-24 kHz……………….……………………...37

Fig 32: Recording 1 Spectrogram; Megaphone Siren 0-24kHz……………...…………………….37

Fig 33: Recording 12 Spectrogram; Male Yelling, 94dB……………………………..……………..38

Fig 34: Recording 15 Spectrogram; Female Yelling, 85dB…………….…………...……………...38

Fig 35: Recording 2 Spectrogram; Anechoic Chamber, No event……….………………………..39

Fig 36: Experimental Research at Hoboken Pier……...…………...…………...…………………..40

Fig 37: Hoboken Pier Helicopter 9:27:46 AM………………...…………...…………...…………....40

Fig 38: Environmental Acoustic Data Recording 3, helicopter occurs at 9:27:46 AM…………...41

Fig 39: Buoy Hydrophone Data recording 32, helicopter occurs at 9:27:46 AM…….…………...41

Fig 40: Recording 4 Spectrogram; Loud Horn from Vessel……………………...………………...42

Fig 41. Prevalence of 4.5 kHz boat harmonic over neighboring frequency band………………..43

Fig 42. Frequency Bands for Potential Filter………..…………...…………...…………...………...44

5

LIST OF TABLES Table 1. Environmental Acoustics Bill of Material……………………………………………………14 Table 2. List of sound events, distances, and maximum sound pressure levels………………...34

6

ABSTRACT

New York City has one of the liveliest soundscapes in the world; sounds of heavy traffic,

car horns, sirens, loud neighborhoods, construction equipment, and dogs barking are just a few

of the things that create an intense and dynamic noise environment. Identification of the sound

source characteristics in a boisterous environment may have substantial benefits from a security

perspective. Having the capabilities to filter out known sound events enhances the likelihood of

acoustic detection and identification of a potentially unlawful target. This capability will

strengthen the maritime security domain by eventually giving authorities the proper instruments

to identify potential threats. The Maritime Security Center’s Summer Research Institute team,

focusing on an Environmental Acoustics Project, conducted environmental air acoustic

measurements in Penn Plaza Pavilion, United States Merchant Marine Academy, Stevens

Institute of Technology Anechoic Chamber, and Hoboken Pier by using a linear alignment of

calibrated microphones. Typical collective city sounds’ spectro-temporal signatures were

identified in the recordings. Respective a-weighted sound pressure levels were calculated for

individual sound events and throughout the records. This research report documents

experimental observations, provides examples of city sounds in a noisy environment and

anechoic chamber. The real time algorithm to isolate foreign sounds from typical city sounds

within a record of noise, such as identifying a bird chirp during rush hour at Penn Plaza, has

substantial implications in becoming a useful tool with various applications in the maritime and

urban security domains.

7

EXECUTIVE SUMMARY

This report provides the thorough analysis of identifying a sound source and its detection

over a long distance, determining the sound pressure level, and the filtering of unwanted sounds

of a given environment. The analysis entails multiple recordings at the United States Merchant

Marine Academy, Penn Plaza Pavilion, Stevens Institute of Technology Anechoic Chamber and

Hoboken Pier. Calculations and analysis were provided through the use of a robust spectrogram

function in Matlab with a fully functional graphical user interface that allowed the researchers to

manipulate the fourier transform, adjust the intensity levels and frequency range in order to

isolate a clear spectrogram of a particular sound. The results of the data analysis show that a

single sound source can be isolated and identified within a collection of noise. After determining

the calibration factor and extensive calculations, the sound pressure level of Penn Plaza

Pavilion at different time intervals was determined. In summary, the researchers were able to

identify the acoustic signature of multiple sound events over a long distance that otherwise

would have been unknown to the human ear. The sound pressure level of Penn Plaza Pavilion

was determined with great accuracy which helped to characterize the intensity level throughout

the day. In addition, patterns from the acoustic signature of particular events were used to filter

out undesirable sound characteristics in a noisy environment. The researchers also discovered

that the microphones were a limiting factor for the experiment because the microphones were

not best suited for the outdoor environment.

8

INTRODUCTION

Environmental Acoustics is the study of sound and vibration of noise sources in the

environment. Agencies are concerned with the control of these noises; unwanted noises can

have significant impacts on human safety. Being able to detect specific events in urban

environments is one task security agencies need to address in order to identify a possible

intruder. Since Maritime Security Agencies are looking for improvements in the security domain,

having technologies that could help identify sound sources will significantly strengthen port

security. This year, The Maritime Security Center is conducting research related to

Environmental Acoustics, Maritime Cybersecurity, and Underwater Buoy Noise with a group of

future engineers from around the nation and faculty members from Stevens Institute of

Technology. In order to help security agencies, the Environmental Acoustics Team conducted

series of tests in several urban locations to determine the sound pressure level and examine

detection and classification of a sound source. The information collected was organized in a

database and accompanied by thorough documentation of environmental conditions, landscape

surveillance and auxiliary measurements. Sound pressure level and identification of sound

sources were done using signal processing algorithms temporal and frequency domains.

A sound event [1] recognition is comprised of three steps: the detection, feature

extraction and classification, as shown in Figure 1 below. First, sound events are detected from

the continuous audio signal. Second, sound events are segmented for feature extraction. And,

third, extracted features are classified based on training set. The training set is continuously

updated in this process. Each step has specific aims and challenges.

Figure 1. Process of Sound Event Recognition

Detection aims to detect segments that are different from the underlying background noise.

Detection challenge is setting a suitable threshold. Examples are the zero-crossing rate, higher-

9

order statistics, pitch estimation, or spectral divergence [2]. Feature Extraction aims to extract

attributes that discriminate between different classes of sound, while minimizing the variation

within classes of sound. The challenge is selection of the most suitable feature set. Examples

are the Mel-frequency cepstral coefficients, temporal evolution of the signal, the harmonic or

perceptual information, sound information across time and frequency [3,4,5]. Classification aims

to produce a label to a sound event based on extracted features. The challenge is a

computational cost. Examples are the similarity distance measures, k-nearest neighbors,

dynamic time warping, Gaussian mixture models, hidden Markov models, artificial neural

networks, support vector machines [6,7].

The sound event recognition is of interest in acoustic surveillance and environment monitoring

application. Recently, speech and sound event extraction and classification techniques have

been developed. Although both speech and sound event extraction methods are based on

similar signal processing concepts, there are a number of differences. The sound events have

wider variety in frequency content, duration and profile, and cannot be split in words or

phonemes. Furthermore, the environmental noise, distortion, reverberation and overlapping

sources complicate sound event recognition. Together, this makes sound events more suitable

for classification based on their visual time-frequency representation or spectrogram image

processing.

The power of visual analyses of spectrogram has been attempted in speech processing [8].

While spectrograms have been useful in human speech analysis, it demonstrated limited

success as a voiceprint of human vocalization [9]. Nevertheless, the spectrograms became a

major tool in studies of how people pronounce different words and syllables.

A large amount of information contained in the spectrogram makes it attractive for sound event

recognition. The sound events are typically shorter in time and are lexically less connected

compared to human speech, leading to certain advantages of operating in the spectrogram

image domain [10], [11]. Processing spectrogram as an image opens up the wide range of

techniques developed in conventional image processing [e.g, 12,13,14,15]. The representation

of sound event as an image in the time-frequency domain inspired development of novel image

processing methods [16,17]. Moreover, processing spectrogram as an image may create a

methodological and algorithmic base for a fusion of acoustics and video processing in

surveillance applications [18].

10

A typical spectrogram of a sound event constitutes an overlap of a set of harmonic lines, curves,

diffuse patterns and time dependent background, bearing similarity with conventional images.

Some sound events are easily differentiated by their spectrograms’ look. Examples of a traffic

whistle, starter gun shut, and a police siren are shown in subsequent sections of this report.

Numerous spectrogram image processing techniques exist in the literature. The concept of

spectrogram image based processing demonstrated good results in classification of

environmental sounds [19]. For improved detection performance the noise can be removed by

means of image processing operations [20]. Comprehensive review of sound event recognition

method can be found elsewhere [21].

Image processing of sound events is an area of active ongoing research. It has pros and cons.

The cons include transient temporal and spectral variability in environmental noise and sound

events not otherwise present in conventional images’ background and the lack of solid

geometrical constraints employed in image pattern recognition. The pros include large variety of

image processing and machine learning algorithms applicable to spectrogram processing to

enable the feature extraction and classification of sound events. The pros also include

developed image processing methodologies to effectively reduce noise and substantial

interdisciplinary efforts supported by steady advances in microprocessor and system

communication technologies.

11

MATERIALS AND METHODS List of Materials:

● Zoom F8 Multitrack Recorder

● Behringer B-5

● Digital Sound Level Meter

● ND9 Sound Calibrator

● Matlab 2015

● Modified Tripod

● Wind Protection

Figure 2. Microphone Setup at Penn Plaza (left) and Zoom F8 (right)

12

Behringer B-5 Microphone and F8 Multitrack Recorder Sensitivity:

Since the Behringer B-5 microphones are meant for indoor recording, there were a few

obstacles that the group had to overcome while using the microphones outside. The

researchers discovered that the microphones were sensitive to humidity and would produce

static if exposed to humidity for too long. After this discovery, the microphones were used in

humid areas only for a limited amount of time. The frequency response for the Behringer B-5

microphone is 20 Hz – 20 kHz. The max SPL is 140 dB and the equivalent SPL is 16 dB.

The F8 Multitrack Recorder was designed for professional filmmakers and sound designers so

there were many good qualities to the recorder. There are 8 channels with a low noise floor of -

127dB and a gain up to 75dB. The F8 Recorder records at a 24-bit/192 kHz resolution and

offers 10dB of headroom. Although there is a time stamp on the F8 Recorder, the group

discovered that sometimes it was not accurate.

Calibration

Calibration Option 1

1. Use sound level calibrator with microphone of interest

2. Record signal

3. Integrate signal in 1 second time window to calculate sound pressure level in dBA

Calibration Option 2

1. Use digital sound level meter

2. Record signal

3. Integrate signal in 1 second time window to calculate sound pressure level in dBA

A ND9 Sound Calibrator was used to calibrate the microphones throughout the summer

research. The calibrator has the capacity of producing two different sounds with two different

frequencies: one at 94 dBA and the other at 114 dBA. Depending on the application, one of

them is selected to conduct the calibration process. Since the environment of interest is very

noisy, the 94 dBA sound was selected for the calibration process. The Environmental Acoustic

team conducted different recordings in order to verify that the microphones were calibrated. The

calibration factor for those recordings was calculated using a Matlab algorithm. The algorithm

has the capability to calculate the intensity of the sound in decibels for a selected recording. A

Sound Pressure Level Meter (SPL) was used to determine the real-time sound pressure level of

13

the calibration recordings, environmental surroundings, or a single sound source. The device

determined the sound pressure level by taking measurements every 125 milliseconds.

The initial calibration factor that was calculated for a particular recording served as a marker the

researchers would adjust accordingly depending on a sound source, environment, or

equipment. Signal Processing was accomplished by using a Matlab Graphical User Interface

(GUI) that calculated the time-frequency representation of a selected recording. The researcher

chose a specific time lapse and retrieved the corresponding calibration factor at that specific

time frame by analyzing the sound pressure level calculated by the Matlab algorithm. A

comparison was drawn between the sound pressure level from the SPL meter measurements

and the calculated SPL. The objective was to have minimize discrepancy between the Matlab

algorithm and the SPL, and small to nonexistent deviations between the microphones. This

process was repeated extensively until a correct calibration factor was determined.

Audio and Spectrogram Synchronization

The Matlab spectrogram script that was developed in the earlier stages of the internship

generated a separate audio and spectrogram recording of the data that was collected during the

various experiments.The program did this by dividing a 10 second clip into 100 frames with a

90% overlap. The researchers then had to manipulate both the audio and spectrogram

recordings in order to have both files synchronize. Shortly after, both fully compatible recordings

were merged together using Windows Movie Maker. This allowed the audio and spectrogram

recordings to harmonize. This process was done for the four channels for all 11 recordings at

Penn Plaza Pavilion, this generated 85 gigabytes of data and 19 gigabytes for United States

Merchant Marine Academy.

14

Bill of Materials: Table 1. Environmental Acoustics Bill of Material

Equipment: Dimension(LxWxH): Weight Quantity: Cost:

Zoom F8 Multitrack Field Recorder Height -2.1", Width - 7'', Depth

5.5'' 2.1 lbs 1 $999

Behringer B-5 0.8 x 0.8 x 4.7 inches 8.5 oz 4 $69.99

goSTAND Portable Mic and Tablet Stand 18 x 4 x 4 inches 3.1 lbs 2 $49.99

Audix DCLIP Microphone Clip 11 x 12.5 x 2 inches 2.9 oz 4 $14.95

GLS Audio 6ft Patch Cable Cords 8.2 x 8 x 2.5 inches 2.0 lbs 1 $39.99

K&M 23510 Adjustable Bar 10 x 1 x 1 inches 9.1 oz 2 $19.99

Digital Sound Level Meter 10.7 x 8.8 x 2.6 inches 1.8 lbs 1 $47.99

ND9 Sound Calibrator 4 x 7 x 2 inches 13.6 oz 1 $169.99 The equipment values reflect the pricing on Amazon.com as of July 28th, 2016.

15

Methodology: United States Merchant Marine Academy

Figure 3. United States Merchant Marine Academy Experimental Setup

1. Find location where boats frequently sail and where the microphones are best protected

from wind and environment noise.

2. Assemble both tripods and position umbrella in the direct path of the wind.

3. Connect cables to microphones and the F8 Multitrack Field Recorder. Make sure to take

note of the arrangement of the microphones to the channel of the recorder.

4. Attach the directional cap and diffused metal head on each microphone and then place

on the tripods.

5. Place the tripods and microphones facing the direction that will record the sounds of the

boat.

6. Run a test recording in order to determine if the equipment is fully functional and

establish a quiet area where no one can walk or speak.

7. Establish communication with the vessel and have a person on board the vessel record

the gps movement.

16

8. Wait for an opportunity where there is the least amount of environmental activity and

signal the vessel to accelerate and move in a zigzag like pattern. Have a researcher

establish a quiet boundary and press record on the Multitrack Field Recorder.

9. Write down the start time, end time, noise level, humidity, wind speed, boat speed, and

distance the boat traveled.

10. Repeat step 8-9 until desired amount of recordings is reached.

Penn Plaza Pavilion

Figure 4. Penn Plaza Pavilion Experimental Setup

1. Find location near Penn Plaza Pavilion that is protected from the environment (rain &

wind).

2. Assemble the tripod and attach directional cap and diffused metal head to each

microphone. Attach each microphone to the F8 Zoom Recorder and then attach the

microphones to the tripods. Make sure to take note of which microphone is attached to

which channel on the Multitrack Recorder.

3. Take note of the surrounding environment.

a. Time traffic signals to know how often a steady stream of cars will be going by

b. Time how often the train comes and how long it takes to go by

c. Distances to nearby buildings, street corners etc.

4. Record for 10 or 15 minutes segments. Make sure to take notes throughout the

recording.

17

a. Take short video segments

b. Take notes of unusual noises (fire trucks, screaming, etc)

c. Take pictures of things that make unusual noises

d. Document time start/time end

e. Document noise level (dBA)

f. Document temperature

g. Document humidity

h. Document wind speed/direction

i. Document how much it is raining

j. Document the direction microphones are facing

5. After recording, transfer data to computer and run verification analysis in order to

determine if adjustments on the microphones or Multitrack Recorder needs to be made.

6. Walk around the perimeter recording observations of the environment

7. Repeat steps 4-6 every 30 minutes.

Stevens Institute of Technology Anechoic Chamber

Figure 5. Stevens Institute of Technology Anechoic Chamber Experimental Setup

1. Assemble the modified tripod at the farthest corner of the room. (The farthest corner of

the room is chosen in order to prevent the signal from being saturated from a large

sound)

2. Establish two points in the room; one in the center of the room and one in the opposite

corner to the microphones. Record the distance of those two points to the microphone.

3. Attach the directional cap to each microphone, attach each microphone to the F8 Zoom

Recorder, and then attach the microphones to the tripods. Make sure to take note of the

arrangement of the microphones to the channel of the recorder.

18

4. Establish safety guidelines since a firearm will be present

a. Safety Glasses and ear protection must be worn by the person who fires the

firearm

b. Always be mindful of which direction the firearm is facing

c. Always assume the firearm is loaded

d. Keep finger off of the trigger until ready to fire

5. When the recording occurs, the professor will count to 5 seconds before firing the first

shot. (This will give the other researchers enough time to cover their ears)

6. While the recording is taking place, the other researchers will take a video recording of

the Digital Sound Level Meter.

7. Repeat steps 5-6 for the different types of calibers.

8. Once the firearm recording has concluded, have the professor return the firearm to the

campus police office.

9. Record other significant sounds, such as whistle, yelling, blender etc, and the Digital

Sound Level Meter.

Hoboken Pier Joint Research

Figure 6. Joint Research with Buoy Team at Hoboken Pier

1) Assemble the tripods and attach microphones. (Remember to record which microphones

were being used and the connection configuration)

2) Establish communication with the Buoy Team in order for everyone to be on the same

page.

19

3) Point the microphones in the same direction as the GoPro video recording.

4) Record long segments, roughly 30 minutes to 1 hour, while slightly adjusting the

direction the microphones are facing. (As the boat maneuvers along the Hudson, both

the audio and video recording must capture its movements.)

5) Take diligent notes throughout the audio recording.

a. Recording unusual noises

b. Document time start/time end

c. Document noise level (dBA)

d. Document temperature

e. Document humidity

f. Document wind speed/direction

6) Repeat steps 3-5 until the researchers have collected a predetermined amount of data.

Test Locations

United States Merchant Marine Academy All data collection was completed on June 20, 2016. Penn Plaza Pavilion All data collection was completed on June 28, 2016. Stevens Institute of Technology Anechoic Chamber All data collection was completed on July 6, 2016. Joint Research with Buoy Team along Hoboken Pier All data collection was completed on July 14, 2016 Different Elevation Recordings of Babbio building All data collection was completed on July 14, 2016

20

DEFINITIONS AND FORMULAS

Definitions: ● Lp1 is the Sound Pressure at microphone one ● Lp2 is the Sound Pressure at microphone two ● P0 is equal to 20×10-6 Pascal ● P is the Relative Pressure to the atmosphere ● R1 is the distance from microphone one to sound source ● R2 is the distance from microphone two to sound source

Equivalent sound pressure equations:

Equations (1), (2), and (3): Equation 1 gives dB as a function of pressure, Equation 2 is used to

calculate the dB level caused by the addition of multiple sound sources, and Equation 3 yields

the dB at point 2 based upon the two distances from the sound source and the dB level at point

1.

21

RESULTS & DISCUSSION United States Merchant Marine Academy The objective of the experimental research was to measure the moving boat acoustic signature

and estimate detection distance in environment with significant acoustic interference from

helicopters and planes.

Figure 7: United States Merchant Marine Academy Experimental Research

During the experiment, two B-5 Behringer microphones were used to detect the boat's acoustic

noise from several distances while the boat made different maneuvering patterns.

Environmental noise interference from helicopters, airplanes, birds, and people added a degree

of difficulty when attempting to distinguish the boat’s acoustic signature. However, the

Environmental Acoustic team was able to overcome such challenges and successfully identified

the boat’s engine. Figure 8 displays the spectrogram of the boat’s acoustic signature. At 19

seconds into the audio recording, the boat begins to accelerate causing the engine revolution to

increase, thus the boat’s acoustic signatures were established. The boat was traveling in the

direction of the wind. Wind velocity was recorded to be 9.66 km/h

22

As the boat moves away from the

microphones, the higher frequencies

dissipate faster than the lower ones.

Forty four seconds into the audio

recording, there is a vertical line due to a

Large Rusted Metal Object (LRMO)

creating a dinging noise (see

appendices for image). Throughout

recording 5, the microphones were able

to detect the boat’s frequency up to 159

meters away from the recording station.

The boat’s signal begins to fade away

around 80 seconds into the recording as

an airplane begins to fly over the Figure 8. Recording 5 Spectrogram; Boat Traveling Downwind

equipment (See Fig. 9).

There is a strong frequency between 0

and 0.2 kHz. It was established that

between 0 and .1 kHz was being created

by the equipment itself. This was

determined via testing in the Anechoic

Chamber (see Figure 35). This still

leaves the possibility of using

frequencies between .1 and .2 kHz for

detection of the boat. At the end of the

audio recording the boat was 198

meters away. Figure 9. Recording 5 Spectrogram; Boats Acoustic Signature Fades Away; 0-2 kHz

23

Figure 10 and 13 display the boat

travelling the same distance away

from the microphones downwind

and upwind. In Figure 10, the boat

traveled to a distance of 221 meters

just as an airplane begins a flight

overhead. The recording is stopped

just as the airplane begins to fill the

spectrogram. At about 40 seconds

into the recording, the higher

frequencies between 1 and 1.5 kHz

begin to dissipate as the boat

increases in distance from the Figure 10. Recording 6 Spectrogram; Boat Traveling Downwind

microphones.

Figure 11 displays three separate

images. The left side is the GPS

track with a bright green highlight of

the duration of the recording. The top

right shows the distance from the

microphones as a function of time,

once again highlighted in green for

the duration of the recording. Finally,

the bottom right shows the Azimuth

from the boat to the microphones.

Figure 11 indicates, on the GPS

track, that the path the boat took was

down wind. The distance graph shows Figure 11. Recording 6 Sound Pressure Level vs Distance

that the boat moves from 53 meters to (left), Sound Pressure Level vs Time (right)

221 meters over the duration of the recording. Since the boat was almost directly South of the

microphones, the Azimuth is approximately 0 with variation as the boat moves East and West.

24

Figure 12. Recording 6 Sound Pressure Level vs Distance (left), Sound Pressure Level vs Time (right)

The left graph of Figure 12 displays the sound pressure level versus distance for recording 6.

The right graph of Figure 12 displays the sound pressure level versus time for recording 6.

Since the boat was increasing in distance with time, both graphs look similar. However, the

velocity of the boat was not constant, resulting in stretching and compression of the graph from

time to distance. Recording 6 clearly indicated the decrease in sound pressure level as the boat

increased in distance.

In Figure 13 the lines also begin to

dissipate as the boat travels away.

However, the remainders of the lines are

much stronger when the boat is travelling

upwind.

Figure 13. Recording 10 Spectrogram; Boat Traveling Upwind

25

Figure 14 indicates, on the GPS

track, that the path the boat took

was up wind. The distance graph

shows that the boat moves from 99

meters to 662 meters over the

duration of the recording. The boat

started moving northwest of the

microphones, but soon adjusted

the direction it was heading,

resulting in the Azimuth changing

suddenly with the start of the

recording but then flattening out as

the boat maintained a constant

direction. Figure 14. Recording 10 GPS Track (Left), Distance vs Time (Top),

Azimuth vs Time (Bottom)

Figure 15. Recording 10 Sound Pressure Level vs Distance (left), Sound Pressure Level vs Time (right)

Clearly seen in both graphs in Figure 15 are large peaks throughout the recordings. Despite the

boat moving out to 662 meters, the peaks caused sound pressure level interference. The

amount of peaks, the frequency of the peaks occurring, and the duration of the peaks vastly

limited the ability to draw any relation between boat distance and sound pressure level.

26

Figure 16 shows how the frequencies

that were established by the boat were

distorted when an airplane flew directly

above the recording station. The time-

frequency representation of the

airplane displayed a natural

phenomenon called the Lloyd Mirror

effect. The sound wave that is being

propagated from the airplane is being

reflected from the ground before the

microphones are able to record it. This

effect causes the frequency to appear

in a wave like pattern. Figure 16. Recording 10 Lloyd Mirror Effect

27

Penn Plaza Pavilion The objective of the experimental research was to measure urban city noise and create a database for the sound event recognition image processing evaluation.

Figure 17. Experimental Research at Penn Plaza Pavilion During the experiment, 4 microphones recorded the area surrounding Penn Plaza in 10 minute

increments from 7:30am to 1:00pm. The microphones picked up all environment noise such as

cars, sirens, horns, whistles, construction, and people. The sound pressure level was

determined at different times throughout the day. The following table displays the absolute

sound pressure level of one of the recordings collected throughout the experiment at Penn

Plaza Pavilion. There were a total of 11 recordings taking throughout the experiment, each of

which has a unique calibration factor for each of the four channels.

Appendix E. displays the data collected during the first recording at Penn Plaza Pavilion. The

start time of the recording, temperature, humidity, wind speed, and noise level were all

documented at the beginning of the recording. During each recording, events that were out of

the ordinary and would be easily distinguishable in a spectrogram were documented. Once the

experiment was finished, observational data was organized into an Excel sheet in a specific

format to allow the Matlab script to synchronize with the data contained. For the calibration and

distance rows, each column adjacent to the label represents each microphone. The first cell

after the calibration label is microphone 1, the second cell is microphone 2, etc. The row labeled

“distance” is zero for all of the recordings at Penn Plaza because there was no specific object

being recorded. This template was followed for each experiment performed. The calibration

factors for each microphone were later determined and included in the data collection Excel

sheet.

28

The following graph (Figure 18) displays the temperature throughout the day while recording at

Penn Plaza. At the beginning of the day, the temperature started at 70o and slowly increased.

The temperature remained constant at 72o for about an hour and a half and then dropped back

to 70o at 10:30am. The temperature then increased again for the remainder of the day.

Figure 18. Temperature Records throughout Penn Plaza Experiment

The following graph (Figure 19) shows the humidity records throughout the day while recording

at Penn Plaza. At the beginning of the day, the humidity was around 75%. Around 10:00am the

humidity increased dramatically for the remainder of the day. The increase in humidity caused

problems with the microphones. The more humid it was, the more static the microphones

produced on the recording.

Figure 19. Humidity Records throughout Penn Plaza Experiment The following graph (Figure 20) shows the wind speed, in miles per hour, throughout the day at

Penn Plaza. The wind speed varied throughout the day. The wind speed was highest at the

beginning of the day and then dramatically decreased between 10:00am and 11:00am. The

29

wind speed did not have a great effect on the recordings because the microphones were

protected from the wind during the experiment.

Figure 20. Wind Speed Records throughout Penn Plaza Experiment The following graph (Figure 21) displays the noise level in dBA throughout the day at Penn

Plaza. The noise level was highest in the early morning at 8:00am due to rush hour traffic. The

sound level decreased at 9:30am and then increased again at 10:00am. Contrary to initial

predictions, there was not a spike in noise level during lunch time.

Figure 21. Sound Pressure Records throughout Penn Plaza Experiment Figure 22 displays the average sound pressure level for each microphone from 7:30am to

1:00pm at Penn Plaza Pavilion. There was a sudden increase in the sound pressure level

starting at 10:00am which can be attributed to the opening of the dining services that were

adjacent to the recording station. Multiple pedestrians were walking to and from the dining

services around 10:00am. Furthermore, the standard deviation range for all four channels,

which appears in a vertical bar, are provided with their corresponding recording start time.

30

Figure 22. Error Bar Chart of Sound Pressure Level vs Time at Penn Plaza Recording with Standard Deviation Range.

The following figure (Figure 23)

displays the occurrences of events

recorded throughout the Penn Plaza

Pavilion research experiment. The

most common events throughout

the day were car horns, people

talking nearby, and traffic police

blowing a whistle.

Figure 23. Penn Plaza Pavilion Occurrence of Typical Events

The spectrogram of the three most prolific events at the Penn Plaza Pavilion are provided in the

following three Figures.

31

Figure 24 displays the difference in

acoustic signatures between a police

officer using a whistle at 8:31 AM to

direct traffic in contrast to a pedestrian

using their fingers to act as a whistle.

The whistles from the police officer

were present throughout the

recordings and were located 45

meters from the microphone station

whereas the person whistling was 32

meters away.

Figure 24. Recording 3 Whistling in Penn Plaza Pavilion at 8:31 AM

Figure 25 displays a vehicle honking

its horn in traffic at 8:39 AM. The event

was estimated to be roughly 35 meters

away from the recording station. In

order to isolate the event the frequency

scale, intensity range, and fourier

transforms were adjusted in a specific

arrangement to have a clear

representation of the sound event. The

reason this particular sound event was

difficult to isolate was because of the

vast amount of frequencies and

disparity in intensities that were

present during the honking of the vehicle. Figure 25. Recording 3 Car Horn in Penn Plaza Pavilion

32

The police sirens that were present

throughout the experiment at Penn

Plaza Pavilion generated unique

harmonic patterns at 8:30 AM that

went from high to low pitch (See

Figure 26 ). Even though the police

sirens were perceived to be a

continuous sound, its patterns were

not connected throughout the

spectrogram. Between 6.5 and 8.0

seconds, there appears to be a gap

in the harmonic signature. However,

the sirens could still be heard. Figure 26. Recording 3 Police Sirens in Penn Plaza Pavilion

The following spectrograms are noticeable events that occurred throughout the Penn Plaza

Pavilion recordings.

Penn Plaza Pavilion Additional Spectrograms, a-f:

a. Coughing, 71.1 dBA

b. NYPD Truck Siren, 81.3 dBA

c. Pedestrians Talking, 73.5 dBA

d. Janitorial Rolling Bucket, 79.9 dBA

e. Construction Equipment, 75.3 dBA

f. Baby Crying, 79.9 dBA

34

Stevens Institute of Technology Anechoic Chamber The objective of the experimental research was to record characteristic sounds from a set of sound events: gunshots, screams, whistles, sirens. The secondary goal was to estimate their absolute A-weighted sound pressure level.

Figure 27. Experimental Research at Stevens Institute of Technology Anechoic Chamber

Table 2 shows the data that was collected in the Anechoic Chamber for a variety of events.

Listed are the events and the maximum recorded sound pressure levels from microphones 1

and 2 as well as the distances to those microphones. The distances and sound pressure levels

were used to calculate the sound pressure level at 1 meter if the microphone was not located

there. Equation 3 was used for this calculation. Table 2. List of sound events, distances, and maximum sound pressure levels.

Event type

Distance to mic 1

(m)

Mic 1 max SPL

(dB)

Distance to mic 2

(m)

Mic 2 max SPL

(dB)

SPL at 1

meter

Gunshot 4.27 108.8 5.19 98.5 121.4

Gunshot

(misfire) 4.27 74.5 5.19 57.6 87.1

Man scream 1 106.7 1.91 88.5 106.7

Girl scream 1 99.9 1.91 80.3 99.9

Whistle

(wood) 1 93 1.91 74.4 93

35

Whistle

(sport) 1 113.3 1.91 93.5 113.3

Whistle (steel) 1 107.2 1.91 91 107.2

Siren 1 117.5 1.91 93.6 117.5

Figure 28 displays the acoustic

signature of a .22 caliber blank pistol

being shot in the anechoic chamber at

a distance of 5.19 meters. The

experiment was conducted in order to

identify the acoustic signature of the

firearm without any interference from

an outside source. The intensity level

of the firearm was far greater than the

Multitrack Recorder threshold, which

caused the audio recording to become

saturated and sequestered. Figure 28. Recording 8 Spectrogram; Gunshot 0-24kHz However, Figure 28 clearly indicates the acoustic characteristics of the firearm with a great

intensity at a high frequency followed by low intensity below 10 kHz for 0.2 seconds. The low

intensity frequency was likely produced by the gas discharging from the firearm.

Additional experiments involving three different whistles were conducted at the anechoic

chamber and are provided in the following three spectrograms.

36

Figure 29 demonstrates the acoustic

signature of the sport whistle, which

was recorded to be the loudest of the

three whistles. The intensity of the

sport whistle at 3 kHz was the

greatest and had a deafening effect.

Six distinctive frequency lines were

established each time the whistle

was blown.

Figure 29. Recording 9 Spectrogram; Sport Whistle 0-24kHz In comparison, the wooden whistle

that sounded similar to a conductor

on a train, was not as intense and

could be heard without ear protection.

The wooden whistle produced two

distinct lines under 5 kHz and less

distinctive lines above 5 kHz(See Fig.

30).

Figure 30. Recording 10 Spectrogram; Wooden Whistle 0-24kHz

37

In contrast, Figure 31 displays the

spectrogram of a metal whistle

being blown. This is characterized

as an intense low frequency pitch.

Five distinctive lines are generated

by the metal whistle. As previously

demonstrated, all three whistles

generated their own unique

acoustics signatures that are

characterized by their low frequency

intensity, distinctive frequency

patterns, and horizontal frequency

configurations. Figure 31. Recording 11 Spectrogram; Metal Whistle 0-24 kHz

Additional recordings were taken

of a megaphone siren at the

anechoic chamber (See fig.32 ).

The siren generated a unique

acoustic signature of a wave like

pattern. This is not to be

misinterpreted as the Lloyd Mirror

effect, but rather the megaphone

operating from a high to low pitch.

Figure 32. Recording 1 Spectrogram; Megaphone Siren 0-24kHz

38

The following two spectrograms in

Figure 33 and Figure 34

demonstrate the difference in

yelling patterns between a male

and female. Figure 33 displays a

high intensity sound being

generated under 5 kHz; whereas in

Figure 34 a harmonic pattern is

generated with a higher intensity

being evenly distributed throughout

a greater frequency range.

Figure 33. Recording 12 Spectrogram; Male Yelling

The noticeable difference between

the two spectrograms is the

beginning and end of the recorded

yelling. The male’s scream has an

abrupt beginning where his vocal

registry existed at a low frequency,

whereas the beginning of the

female’s screams gradually

increased into a high frequency,

resembling a ladder. The ending of

the male’s scream occurred at

lower frequencies than the female’s

scream. Figure 34. Recording 15 Spectrogram; Female Yelling

39

As stated earlier, the origins of

the high intensity, low frequencies

that were present under 0.2 kHz

at the United States Merchant

Marine Academy experimental

research were not able to

determined. It was hypothesized

the low frequencies could have

been caused by equipment

interference, or the origins of the

low frequency could have derived

from the surroundings that could

have been missed. Figure 35. Recording 2 Spectrogram; Anechoic Chamber, No event

Figure 35 displays the spectrogram of an audio recording within the anechoic chamber in

complete silence. By conducting this experiment, the possibility of an external sound source

influencing the data were able to eliminated. This supported the idea that the equipment was in

fact producing low frequency interference that existed under 0.1 kHz.

40

Joint Research with Buoy Team at Hoboken Pier The goal was to conduct simultaneous acoustic recording in water, air and video to enable

fusion signal processing of different sound events.

Figure 36. Experimental Research at Hoboken Pier

Throughout the recording at the Hoboken Pier, 39 helicopters and multiple ships were recorded.

The Environmental Acoustics audio recording and the Buoy video recording allowed for the

synchronization of a spectrogram with a video representation of what sound events occurred

over the deployed buoy. This would allow the determination of a correlation between sounds

generated above and underwater. There is a lot of great work that can be done with the

Hoboken Pier data that can further the study of how a sound transfers between two mediums

and the effects the sound wave will experience while doing so.

The acoustic signature of the helicopter shown in

Figure 37 was recorded by both the Buoy Team and

Environmental Acoustic Team as it flew over the

Hoboken Pier at 9:27:46 AM.

Figure 37. Hoboken Pier Helicopter 9:27:46 AM

41

Figure 38* displays the hydrophone

recording of the acoustic signature of the

helicopter that flew overhead at 9:27:46

AM. The acoustic signature of the

helicopter was registered at a relatively low

frequency that made its’ detection quite

difficult. Frequency range, intensity domain,

and fourier transform were manipulated in

order to distinguish the acoustic signature

of the helicopter. The Environmental

Acoustic team recorded the same

helicopter’s acoustic signature. However,

due to the lack of a well establish Figure 38*. Buoy Hydrophone Data recording 32, helicopter

synchronization procedure between occurs at 9:27:46 AM

both the Buoy Team and Environmental

Acoustic Team, it was not possible to

conclude with absolute certainty which of

the two acoustic signature displayed on

Figure 39 belongs to the helicopter. If the

researchers were to make the assumption

that the audio recording began exactly at

9:17:00, then the acoustic signature of

helicopter located at the far right of Figure

39 (See Appendix K) would then

correspond with the acoustic signature

displayed on Figure 38. However, if the

audio recording began at 9:17:59, then it is

possible for the acoustic signature of the Figure 39. Environmental Acoustic Recording 3, helicopter

helicopter to correspond with the left side present far left and far right

of Figure 39 (See Appendix L).

Figure 38*: It was later determined by the Acoustic Engineers at the Pond House that the acoustic signature displayed on Figure 38 may not belong to the helicopter but rather an interference from the hydrophone.

42

Figure 40 displays the frequency

characteristics that were generated

when a large vessel blew its horn.

Figure 40. Recording 4 Spectrogram; Loud Horn from

Vessel

43

POTENTIAL FILTERING PROCESS

While the boat’s harmonics are visible on spectrogram and their presence is visually detectable,

an algorithmic approach is required for real detection systems. Sound pressure levels in a

narrow band around one of the boat’s harmonics and in between the harmonics were

calculated. The window containing the harmonic was between 4 and 4.8 kHz and the window

without any harmonics was between 4.8 and 5.6 kHz. Consistent prevalence of signal in the

band with boat harmonic is shown in Figure 41.

Figure 41. Prevalence of 4.5 kHz boat harmonic over neighboring frequency band

44

Figure 42: Frequency Bands for Potential Filter

In Figure 42, R1 and R2 show potential frequency ranges selected for the filter. R1 is .49 kHz to

.51 kHz and contains the frequency produced by the boat engine. R2 is .45 kHz to .49 kHz and

captures the gap between the boat engine frequencies. R2 does not capture the target

frequency. By separating these out, the sound pressure level for each individual frequency band

can be calculated. Once this is done, the sound pressure levels can be compared. If the sound

pressure level of frequency band R1 is greater than R2 then this may be indicative of the boat’s

presence in the recording. If the sound pressure levels are approximately equal, this may

indicate that no boat is present during the recording. By expanding this basic example, an entire

set of frequency ranges that encompass all the boat’s frequencies and gaps between them

could be created. This set could be used as a full filter by constantly calculating the sound

pressure levels for each band and making comparisons to those around it. An algorithm that is

capable of detecting a boat based upon the relative sound pressure levels of the known boat

frequency ranges could then be written.

45

Conclusions

The Environmental Acoustic Team conducted several successful research experiments

throughout the 2016 Maritime Security Center’s Summer Research Institute. The overall

objective was to determine the absolute sound pressure level and identify a single sound source

within a given environment. Both were accomplished and thoroughly verified. Additional analysis

of the collected data was conducted, such as identifying a boat's acoustic signature over

distance in a noisy environment. In addition, comparisons between a boat travelling downwind

in contrast to a boat traveling upwind were also analyzed. The acoustic signatures of multiple

sound events, such as gunshots and police sirens were registered. Specific characteristics,

such as the gas expelling from a gunshot, were distinguished, something which could not have

been detectable without the proper computer software. Furthermore, the researchers were able

to distinguish distinctive characteristics from the acoustic signature of a helicopter and boat

above water in contrast to the acoustic signatures below water. The distinctive characteristics

of both a helicopter’s and boat’s acoustic signature were analyzed. In addition to the above

water recognition of the sound events, underwater analogues were also observed and analyzed.

46

Recommendations

Despite the large amount of analysis completed throughout the program’s duration, a

vast amount of potential has been left unutilized. This is especially prevalent in the data

processing from Penn Plaza. With such a rich environment of sound sources and the capability

of characterizing the urban noise environment, applications for this data have unfortunately

been left unrealized. With more time, the development of an algorithm to parse through

recordings and identify sound sources, location of origin, sound pressure level, and then be able

to determine whether this event was something typical of the environment like a car horn or a

whistle, or potentially a source of interest such as a gunshot or a scream. Using the MatLab

capabilities of searching through spectrograms, the potential for such an algorithm exists. These

computational capabilities would then be paired with selected events for analysis and

recognition. This would be achieved using a technique already in place in video processing

called Binary Large Object (BLOB) processing1. The screen is analyzed and groups of

connected pixels are recorded. Using this technique, it is possible for the algorithm to identify

objects and, in the case of Penn Plaza recordings, being able to identify specific BLOBs created

by distinct frequencies. Anechoic chamber recordings offer the purest acoustic signature for

events, and can likewise be used as a reference for the algorithm to compare events to. In

addition to feeding the algorithm already known acoustic signatures, the algorithm can be

complicated further by allowing the process of machine learning where the algorithm gathers

data from its own recordings and then uses past recordings as a whole to reference when

comparing for typical events. This is akin to using the entire urban noise environment as an

“event” for the algorithm to detect.

1 Moeslund, Thomas B. Introduction to Video and Image Processing: Building Real Systems and Applications.London: Springer, 2012. Web. Undergraduate topics in computer science; Undergraduate topics in computer science.

47

APPENDICES Calibration Factor Verifications As stated in the subsection titled ‘Calibration Factor’, the calibration factor for each channel of

every recording had to be determined. This was a significant procedure that would adjust the

data to counteract any potential interference from the microphones, multitrack recorder, and

cables. After the calibration factor for each channel for every recording had been determined, it

then had to undergo a verification process that either confirmed or refuted the calibration factor.

If all four microphones displayed a relatively similar sound pressure level, which must be in

synch with the Digital Sound Level Meter, throughout a predetermined time lapse, then the

researchers kept the calibration factors for that particular recording. If a single microphone

deviated by a noticeable difference for a prolonged time period, then the researchers had to

make specific adjustments and analysis of the data, and occasionally discard the calibration

factor for that particular recording all together and start from the beginning.

Penn Plaza Pavilion Calibration Verification: Appendix A. Calibration Verification for Penn Plaza

49

United States Merchant Marine Academy Calibration Verification Appendix B. Calibration Verification for United States Merchant Marine Academy

50

Stevens Institute of Technology Anechoic Chamber Calibration Verification The calibration verification for the experiment conducted at Stevens Institute of Technology

Anechoic Chamber is provided below. There is a distinction to be made with the calibration

verification from the anechoic chamber to Penn Plaza Pavilion and United States Merchant

Marine Academy. Due to the compact confinement of the anechoic chamber, the distance

between the microphones have a substantial impact in determining the sound pressure level,

whereas in a open environment the distance between the sound events and microphones are

so great that the distance between the microphones does not have a substantial impact in

determining the sound pressure level. This is the reason there is a slight deviation in sound

pressure level of the two microphones at the anechoic chamber but not at Penn Plaza Pavilion

or United States Merchant Marine Academy.

Appendix C. Calibration Verification for SIT Anechoic Chamber

53

Calibration Factors Appendix D. Calibration Factor USMMA SEQUENCE 160620_006 boat moving away down wind.WAV

Time Start 9:42 AM Calibration 3.85E-07 4.80E-07

SEQUENCE 160620_010 boat moving away up wind.WAV

Time Start 10:19 AM Calibration 5.05E-07 5.90E-07

SEQUENCE 160620_009 boat moving closer down wind.WAV Time Start

10:07 AM Calibration 5.15E-07 4.95E-07

Appendix E. Calibration Factor Penn Plaza Record Time Meter Reading (dBA) Pressure Calibration Factor

7:33 71.1 2.05E-05 7:33 71.1 7.75E-06 7:33 71.1 2.85E-05 7:33 71.1 1.09E-05 8:00 72.6 3.17E-05 8:00 72.6 9.40E-06 8:00 72.6 3.74E-05 8:00 72.6 2.30E-05 8:30 71.1 8.20E-06 8:30 71.1 7.70E-06 8:30 71.1 1.04E-05 8:30 71.1 4.60E-06 9:00 72 5.90E-06 9:00 72 6.45E-06 9:00 72 8.80E-06 9:00 72 2.80E-06 9:30 70.2 7.20E-06

54

9:30 70.2 1.01E-05 9:30 70.2 4.20E-06 9:30 70.2 0.00E+00

10:00 72.3 1.90E-05 10:00 72.3 5.55E-06 10:00 72.3 7.50E-06 10:00 72.3 3.02E-06 10:30 71.8 2.85E-05 10:30 71.8 8.90E-06 10:30 71.8 1.20E-06 10:30 71.8 6.50E-06 11:00 71.2 5.10E-05 11:00 71.2 1.04E-05 11:00 71.2 1.05E-05 11:00 71.2 6.00E-06 11:30 69.1 4.60E-05 11:30 69.1 7.40E-06 11:30 69.1 7.70E-06 11:30 69.1 3.70E-06 12:30 70 3.90E-05 12:30 70 7.90E-06 12:30 70 9.60E-06 12:30 70 4.50E-06 13:00 69.9 4.10E-05 13:00 69.9 7.50E-06 13:00 69.9 9.20E-06 13:00 69.9 4.10E-06

Appendix F. Calibration Factor Anechoic Chamber SEQUENCE 160706_001.WAV

Time Start


SEQUENCE 160706_005.WAV Time Start


55

















56





Data Organization Appendix H. USMMA Data Organization SEQUENCE 160620_001 Noise.WAV

Time Start Temperature (F)

Humidity (%)

Wind speed (mph)

Noise Level (dBA)

9:02 AM 72 69 5 64

Calibration 1 1 0 0 0 0 0 0

Distance (m) 0 0 0 0 0 0 0 0

Events time (sec)

Duration (sec) Description

1 14 talking

16 1 knocking

17 6 birds chirping

18 48 airplane

28 4 talking

58 8 birds chirping

SEQUENCE 160620_002 Noise.WAV


Humidity (%)

Wind speed (mph)

Noise Level (dBA)

9:04 AM 72 69 5 60.3


Distance (m) 0 0 0 0 0 0 0 0

Events time Duration Description

57

(sec) (sec)

0 20 boat

19 47 airplane

47 3 birds chirping

SEQUENCE 160620_003 Green Barge.WAV


Humidity (%)

Wind speed (mph)

Noise Level (dBA)

9:07 AM 72 69 4 65.7


Distance (m) 0 0 0 0 0 0 0 0

Events time (sec)


0 52 helicopter

27 63 airplane

77 3 birds chirping

SEQUENCE 160620_004 Noise.WAV


Humidity (%)

Wind speed (mph)

Noise Level (dBA)

9:34 AM 74 66 5 61


Distance (m) 0 0 0 0 0 0 0 0

Events time (sec)


0 1 talking

14 1 chair tag

18 3 birds chirping

27 7 birds chirping

36 7 chair tag

SEQUENCE 160620_005 boat moving away down wind.WAV

Time Start Temperature Humidity Wind speed Noise Level

58

(F) (%) (mph) (dBA)

9:37 AM 75 62 6 61.6

Calibration 3.85E-07 4.80E-07 0 0 0 0 0 0

Distance (m) 36.39 45.63 65.72 92.23 121.37 147.69 169.33 197.84

Events time (sec)


0 1 talking

7 1 talking

16 41 airplane

26 1 knocking

31 22 birds chirping

44 1 knocking


62 1 knocking

72 1 knocking


77 20 airplane

78 1 knocking

1 1 plane

1 0.2 LRMO



Humidity (%)

Wind speed (mph)

Noise Level (dBA)

9:42 AM 75 62 6 74.7

Calibration 3.85E-07 4.80E-07 0 0 0 0 0 0

Distance (m) 53 93 123 141 159 173 187 221

Events time (sec)


0 1 talking

0 26 airplane

31 1 knocking

59

38 1 knocking

42 1 knocking

45 1 knocking

65 13 airplane



Humidity (%)

Wind speed (mph)

Noise Level (dBA)

9:44 AM 75 62 6 65.6

Calibration 3.85E-07 4.80E-07 0 0 0 0 0 0

Distance (m) 0 0 0 0 0 0 0 0

Events time (sec)


0 1 talking

5 1 knocking

8 19 helicopter

11 1 knocking

26 1 talking

SEQUENCE 160620_008 boat standing still out there.WAV


Humidity (%)

Wind speed (mph)

Noise Level (dBA)

9:56 AM 76 61 6 51.7


Distance (m) 925.8 925.8 925.8 925.8 925.8 925.8 925.8 925.8

Events time (sec)


0 34 airplane

2 1 birds chirping

14 1 birds chirping

18 1 knocking

22 1 knocking

24 2 birds

60

chirping

30 1 birds chirping

31 1 knocking


46 2 LRMO

54 6 airplane

SEQUENCE 160620_009 boat moving closer down wind.WAV


Humidity (%)

Wind speed (mph)

Noise Level (dBA)

10:07 AM 77 60 7 67.5


Distance (m) 1107.8 1000.43 870.96 725.38 556.63 392.76 230.16 80

Events time (sec)


9 1 birds chirping

15 1 knocking

21 2 birds chirping

28 4 birds chirping

30 62 airplane

38 2 birds chirping

57 3 birds chirping

63 3 birds chirping

88 1 birds chirping


126 49 airplane

61



149 1 knocking

182 56 airplane




258 4 knocking


279 46 airplane

337 55 airplane

411 1 knocking

418 48 airplane

SEQUENCE 160620_010 boat moving away up wind.WAV


Humidity (%)

Wind speed (mph)

Noise Level (dBA)

10:19 AM 77 59 7 68.6

Calibration 5.05E-07 5.90E-07 0 0 0 0 0 0

Distance (m) 99 142 198 256 266 363 533 662

Events time (sec)


18 1 birds chirping

25 35 airplane

89 38 airplane


140 1 LRMO

149 1 knocking

62

160 1 knocking


208 24 airplane

212 1 knocking

216 1 knocking

232 1 knocking

264 55 airplane




353 1 knocking


370 68 airplane

434 1 knocking

SEQUENCE 160620_011 boat moving closer up wind.WAV


Humidity (%)

Wind speed (mph)

Noise Level (dBA)

10:28 AM 77 59 7 63.7


Distance (m) 871.21 935.14 820.39 668.03 560.5 433.12 311.16 189.69

Events time (sec)


11 1 birds chirping

40 51 airplane

44 1 knocking

47 1 knocking

51 6 knocking

70 1 birds chirping

63

74 1 knocking

80 35 knocking

102 49 airplane


158 1 knocking

165 60 helicopter

175 1 knocking

180 1 LRMO Appendix I. Penn Plaza Data Organization

SEQUENCE Penn Plaza 01 0733.WAV

Time Start Temperature (F) Humidity (%)

Wind speed (mph)

Noise Level (dBA)

7:33 AM 70 75 10 71.1

Calibration 2.50E-05 9.00E-06 2.00E-05 1.25E-05

Distance (m) 0 0 0 0

Events time (sec) Duration (sec) Description

60 60 car horns

300 60 talking

420 60 sirens

540 60 walking

540 60 car horns



Wind speed (mph)

Noise Level (dBA)

8:00 AM 72 75 11 72.6




64

60 600 construction

60 60 walking

60 60 whistle

120 60 Bus

120 60 walking

120 60 car horns

180 18 whistle

180 60 Car brake screeching

180 60 car horns

180 60 coughing

240 60 Bus

240 12 whistle

300 60 walking

300 60 whistle

300 60 walking

360 60 sirens

420 60 whistle

420 60 Bus

480 60 car horns

480 60 bus

480 120 walking



Wind speed (mph)

Noise Level (dBA)

8:30 AM 72 76 11 71.7




60 6.00E+01 Sirens

65

60 60 cement washer

60 60 whistle

60 60 talking

120 60 cement washer

120 60 whistle

180 7.2 coughing

180 60 walking

240 60 walking

300 60 whistle

300 18 coughing

360 60 walking

360 12 coughing

420 60 walking

420 12 radio

420 60 coughing


540 60 whistle



Wind speed (mph)

Noise Level (dBA)

9:00 AM 7.20E+01 7.60E+01 11 72




60 60 construction

60 60 cement washer

60 60 talking

60 60 cement washer

120 60 whistle

66

120 60 car horns

120 60 whistle

180 60 car horns


240 6.00E+01 car horns

300 60 whistle

300 60 car horns

360 60 walking


420 60 car horns


540 60 car horns



Wind speed (mph)

Noise Level (dBA)

9:30 AM 72 76 10 70.2

Calibration 1 6.40E-06 9.00E-06 3.30E-06



60 60 car horns

60 60 walking

60 18 construction


180 60 whistle

180 60 talking

180 60 walking

240 120 construction

240 60 whistle

300 60 radio

67

300 60 truck backing up

300 60 car horns

360 60 whistle

420 120 talking

480 60 loud truck drove past

480 60 car horns

540 60 whistle



Wind speed (mph)

Noise Level (dBA)

10:00 AM 71 75 10 72.3

Calibration 2.80E-05 4.75E-06 8.25E-06 1



60 60 car horns

60 60 talking

60 120 construction

120 60 talking

120 60 car horns

180 60 talking

180 60 bus

240 60 car horns

240 60 talking

300 60 car horns

300 60 bus horn

360 60 whistle

420 60 walking

420 60 talking

68

420 60 walking

480 60 car horns

540 60 talking



Wind speed (mph)

Noise Level (dBA)

10:30 AM 70 77 7 71.8


Distance (m) 0.00E+00 0.00E+00 0 0


60 60 construction

60 60 whistle

60 60 walking

60 30 sirens

120 60 walking

120 60 talking

180 60 whistle

180 60 car horns

180 60 whistle

180 60 car horns

240 60 whistle

240 30 construction

300 60 car horns

300 60 bus

300 30 construction

300 60 sirens

360 60 whistle


360 60 car horns

69


480 12 talking

540 60 car horns

540 60 whistle

540 60 car horns



Wind speed (mph)

Noise Level (dBA)

11:00 AM 72 80 4 71.2




60 60 construction

60 60 talking

60 60 car horns

60 60 walking

60 60 whistle

120 60 construction

120 60 radio

120 60 car horns

180 60 talking

180 60 car horns

240 60 whistle

240 60 construction

300 60 car horns

360 60 bus

360 60 sirens

420 60 fire truck

480 60 construction

70

540 60 talking

540 60 child screams

540 60 car horns



Wind speed (mph)

Noise Level (dBA)

11:30 AM 72 87 6 69.1




60 60 car horns

60 60 talking

60 60 car horns

60 60 walking

60 60 talking

120 60 car horns

180 60 talking

300 60 car horns

300 60 walking

360 120 car horns

480 60 sirens

480 60 construction

540 60 bus horn

540 60 talking



Wind speed (mph)

Noise Level (dBA)

12:30 AM 72 90 5 70


71



86 60 talking

159 60 whistle

168 60 car horns

183 30 talking

206 60 3 Rolling luggage, 4m

221 90 talking

292 60 whistle

401 60 talking

505 60 walking

539 60 talking



Wind speed (mph)

Noise Level (dBA)

1:00 PM 73 90 4 69.9




60 60 talking

60 18 car horns

60 60 talking

180 30 whistle

180 120 talking

300 60 construction

360 60 coughing

420 120 talking

72

Appendix J. Anechoic Chamber Data Organization SEQUENCE 160706_001.WAV

Time Start Temperature (F) Humidity (%) Wind speed (mph)

Noise Level (dBA)

10:12 AM 86 83.2 0 36.1

Calibration 2.87E-05 1.35E-05 0.00E+00 0.00E+00 0 0 0 0

Distance (m) 1 1.91 1 1 1 1 1 1


5 56 siren

SEQUENCE 160706_002.WAV


Noise Level (dBA)

10:23 AM 86 83.2 0 35

Calibration 3.45E-05 2.90E-05 0 0 0 0 0 0

Distance (m) 1 1.91 1 1 1 1 1 1




Noise Level (dBA)

10:29 AM 86 83.2 0 41.2

Calibration 1.72E-05 1.21E-05 1 1 0 0 0 0

Distance (m) 1 1.91 1 1 1 1 1 1


4 1 gun shot

8 1 misfire

12 1 gun shot

16 1 gun shot

21 1 gun shot

25 1 misfire

29 1 misfire

73

33 1 misfire

37 1 misfire

41 1 misfire

44 1 misfire

48 1 misfire



Noise Level (dBA)

10:34 AM 86 83.2 0 39.8

Calibration 3.05E-05 1.28E-05 1 1 0 0 0 0

Distance (m) 1 1.91 1 1 1 1 1 1


4 1 gun shot

9 1 misfire

15 1 gun shot

20 1 gun shot

26 1.00E+00 gun shot

32 1 misfire

38 1 gun shot

43 1.00E+00 misfire



Noise Level (dBA)

10:52 AM 86 83.2 0 35

Calibration 3.45E-05 9.80E-06 1 1 0 0 0 0

Distance (m) 4.27 5.19 1 1 1 1 1 1


7 1 gun shot

21 1 gun shot

26 1 misfire

74

31 1 misfire

36 1 gun shot

41 1 misfire

46 1 gun shot

51 1 misfire



Noise Level (dBA)

10:54 AM 86 83.2 0 42.2

Calibration 2.43E-05 2.20E-05 1 1 0 0 0 0

Distance (m) 4.27 5.19 1 1 1 1 1 1


4 1.00E+00 gun shot

9 1 gun shot

15 1 gun shot

20 1 misfire

25 1 misfire

30 1 gun shot

35 1 misfire

40 1 gun shot



Noise Level (dBA)

10:58 AM 86 83.2 0 39

Calibration 4.51E-05 1.64E-05 1 1 0 0 0 0

Distance (m) 4.27E+00 5.19E+00 1 1 1 1 1 1


5 1 gun shot

10 1 gun shot

15 1 gun shot

75

21 1 gun shot

26 1 gun shot

32 1 gun shot

37 1 gun shot

42 1 gun shot



Noise Level (dBA)

11:02 AM 86 83.2 0 35.9

Calibration 2.98E-05 1.08E-05 1 1 0 0 0 0

Distance (m) 4.27 5.19 1 1 1 1 1 1


5 1 gun shot

13 1 gun shot

20 1 gun shot

26 1 gun shot

33 1 gun shot

39 1 gun shot

44 1 gun shot

50 1 gun shot



Noise Level (dBA)

11:15 AM 86 83.2 0 37.2

Calibration 3.85E-05 1.65E-05 1 1 0 0 0 0

Distance (m) 1 1.91 1 1 1 1 1 1


4 1 whistle

8 2 whistle

14 2 whistle

76

19 1 whistle



Noise Level (dBA)

11:16 AM 86 83.2 0 36.8

Calibration 4.00E-05 1.32E-05 1 1 0 0 0 0

Distance (m) 1 1.91 1 1 1 1 1 1


4 1 whistle

8 1 whistle

12 1 whistle



Noise Level (dBA)

11:17 AM 86 83.2 0 36.5

Calibration 3.65E-05 1.14E-05 1 1 0 0 0 0

Distance (m) 1 1.91 1 1 1 1 1 1


3 1 whistle

7 1.5 whistle

11 2 whistle

16 2 whistle

20 2 whistle



Noise Level (dBA)

11:24 AM 86 83.2 0 36


Distance (m) 1.00E+00 1.91E+00 1 1 1 1 1 1


77

5 1 screaming



Noise Level (dBA)

11:28 AM 86 83.2 0 37.2

Calibration 4.05E-05 1.55E-05 1 1 0 0 0 0

Distance (m) 1 1.91 1 1 1 1 1 1


5 1 screaming



Noise Level (dBA)

11:29 AM 86 83.2 0 36.3

Calibration 3.80E-05 1.33E-05 1 1 0 0 0 0

Distance (m) 1 1.91 1 1 1 1 1 1


4 1 screaming



Noise Level (dBA)

11:33 AM 86 83.2 0 36

Calibration 3.95E-05 1.13E-05 1 1 0 0 0 0

Distance (m) 1 1.91 1 1 1 1 1 1


2 1 screaming

9 1 screaming

15 1 screaming



Noise Level (dBA)

78

11:40 AM 86 83.2 0 36.1


Distance (m) 1 1.91 1 1 1 1 1 1


3 18 blender liquefy

26 18 blender milkshake

47 18 blender smoothie

69 22 blender pulsing Additional Images: Appendix K. Large Rusted Metal Object (LRMO) at United States Merchant Marine Academy

Appendix L. Environmental Acoustic Recording 3; Helicopter acoustic signature if the researchers were to assume recording 3 of Hoboken Pier began at 9:17:00 AM.

79

Appendix M. Environmental Acoustic Recording 3; Helicopter acoustic signature if the researchers were to assume recording 3 of Hoboken Pier began at 9:17:59 AM.

80

REFERENCE

[1] Potamitis, I., & Ganchev, T. (2008). Generalized recognition of sound events: Approaches

and applications. In Multimedia Services in Intelligent Environments (pp. 41-79). Springer Berlin

Heidelberg.

[2] Robust speech recognition and understanding. I-Tech Education and Publishing, 2007. [3] Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing,28(4), 357-366. [4] Picone, J. W. (1993). Signal modeling techniques in speech recognition.Proceedings of the IEEE, 81(9), 1215-1247. [5] Butko, T. (2011). Feature Selection for Multimodal: Acoustic Event Detection. Universitat Politècnica de Catalunya. [6] O’Shaughnessy, D. (2008). Invited paper: Automatic speech recognition: History, methods and challenges. Pattern Recognition, 41(10), 2965-2979. [7] Cowling, M., & Sitte, R. (2002). Analysis of speech recognition techniques for use in a non-speech sound recognition system. [8] Zue, V. (1985). Notes on spectrogram reading. Mass. Inst. Tech. Course, 6. [9] Bolt, R. H., Cooper, F. S., David Jr, E. E., Denes, P. B., Pickett, J. M., & Stevens, K. N. (1970). Speaker identification by speech spectrograms: A scientists' view of its reliability for legal purposes. The Journal of the Acoustical Society of America, 47(2B), 597-612. [10] Chu, S., Narayanan, S., & Kuo, C. C. J. (2009). Environmental sound recognition with time–frequency audio features. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1142-1158. [11] Ghoraani, B., & Krishnan, S. (2011). Time–frequency matrix feature extraction and classification of environmental audio signals. IEEE transactions on audio, speech, and language processing, 19(7), 2197-2209. [12] Uchida, S., & Sakoe, H. (2005). A survey of elastic matching techniques for handwritten character recognition. IEICE transactions on information and systems, 88(8), 1781-1790. [13] Ashbrook, A., & Thacker, N. A. (1998). Tutorial: Algorithms For 2-Dimensional Object Recognition. Imaging Science and Biomedical Engineering Division, Medical School, University of Manchester, Manchester. [14] Mundy, J. L. (2006). Object recognition in the geometric era: A retrospective. In Toward category-level object recognition (pp. 3-28). Springer Berlin Heidelberg. [15] http://www.visionbib.com/bibliography/contents.html [16] Sharma, N. S., Yakubovskiy, A. M., & Zimmerman, M. J. (2013, November). SCUBA diver detection and classification in active and passive sonars—A unified approach. In Technologies for Homeland Security (HST), 2013 IEEE International Conference on (pp. 189-194). IEEE. [17] Yakubovskiy, A., Salloum, H., Sutin, A., Sedunov, A., Sedunov, N., & Masters, D. (2015, October). Feature extraction for acoustic classification of small aircraft. In Applications of Signal Processing to Audio and Acoustics (WASPAA), 2015 IEEE Workshop on (pp. 1-5). IEEE. [18] Bunin, B., Sutin, A., Kamberov, G., Roh, H. S., Luczynski, B., & Burlick, M. (2008, April). Fusion of acoustic measurements with video surveillance for estuarine threat detection. In SPIE

http://www.visionbib.com/bibliography/contents.html

http://www.visionbib.com/bibliography/contents.html

81

Defense and Security Symposium (pp. 694514-694514). International Society for Optics and Photonics.

[19] Dennis, J., Tran, H. D., & Chng, E. S. (2013). Image feature representation of the subband power

distribution for robust sound event classification. IEEE Transactions on Audio, Speech, and Language

Processing, 21(2), 367-377.

[20] Gonzales, R. C., & Woods, R. E. Digital Image Processing. 2002. New Jersey: Prentice Hall, 6, 681.

[21] Chachada, S., & Kuo, C. C. J. (2014). Environmental sound recognition: A survey. APSIPA

Transactions on Signal and Information Processing, 3, e14.

82

Acknowledgement "This material is based upon work supported by the U.S. Department of Homeland Security under Grant Award Number 2014-ST-061-ML0001." "The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the U.S. Department of Homeland Security."