Talis Vertriest Urban Data Mining applied to sound sensor networks

Talis Vertriest

Urban Data Mining applied to sound sensor networks

Academic year 2015-2016Faculty of Engineering and ArchitectureChair: Prof. dr. ir. Daniël De ZutterDepartment of Information Technology

Master of Science in Industrial Engineering and Operations ResearchMaster's dissertation submitted in order to obtain the academic degree of

Counsellor: Prof. dr. ir. Bert De CoenselSupervisor: Prof. dr. ir. Dick Botteldooren

I

"The author gives permission to make this master dissertation available for

consultation and to copy parts of this master dissertation for personal use. In all

cases of other use, the copyright terms have to be respected, in particular with regard

to the obligation to state explicitly the source when quoting results from this master

dissertation."

(June 1, 2016)

II

Preface

While my name is on the front cover of this thesis, I am by no mean its sole

contributor. There are a number of people behind this project who deserve to be both

acknowledged and thanked: committed supervisors, generous friends and a warm and

supportive family.

I would like to thank my thesis committee members, Professor Dick Botteldooren and

Professor Bert De Coensel for their guidance and unrelenting support through this

process. Both have routinely shared their passion and knowledge, which are to the

great benefit of this thesis.

I thank all my friends and in particular my friend and Post Doc Scientist in

Neurosciences Dr. Ken Veys for his valuable contribution and advice. This research

looks very different because of his expertise, technical help, and the additional

computation force of his MacBook pro.

I would like to express my deepest appreciation for my parents who did everything in

their power and beyond to fire fight my worries, concerns and anxieties, and have

worked to instill great confidence in both myself and my work.

Most importantly of all, I feel blessed that I am able to accomplish this thesis and

show intensive gratitude towards Mother Nature and all the people that contributed to

who I am today. In the same vein, I would like to extend great thanks to the

University of Ghent, to all professors and to everybody involved in the education of

our society.

III

Abstract Title: URBAN DATA MINING APPLIED TO SOUND SENSOR

NETWORKS

Name: TALIS VERTRIEST

Supervisor: PROF. DR. IR. DICK BOTTELDOOREN

Counsellor: PROF. DR. IR. BERT DE COENSEL

Degree: MASTER OF SCIENCE

Major Field: INDUSTRIAL ENGINEERING AND OPERATIONS

RESEARCH

Department: INFORMATION TECHNOLOGY

Faculty: ENGINEERING AND ARCHITECTURE

University: UNIVERSITY OF GHENT

Academic year: 2015-2016

Almost every activity or event produces sound patterns, making sound a valuable

source of information in the analysis of environments. As one of our senses, sound

directly contributes to the human perception of places. Solely from sound information,

people are able to distinguish danger from safe situations, unusual events from normal

activity. This thesis designs a program in the attempt to detect those (point) anomalies

that people would define as abnormal as well as contextual anomalies, which are less

obvious for human perception.

Raw audio signals are not suitable as a direct input to a classifier. As a consequence,

the data is transformed into a representation that lends itself to successful learning,

known as feature extraction. This thesis focuses on unsupervised learning because of

its multi applicable character. More specifically it applies data exploration rather than

field knowledge for feature extraction. Spectrograms are treated as series of

meaningless numbers instead of audio representative digits. Gaussian Mixture Models

describe the data per minute and its parameters define the features.

Whether or not supplemented with time features, those Gaussian features are the key

ingredient for the feature vectors. For classification, feature vectors are again being

clustered, using different techniques and dimensionalities, depending on the type of

anomaly that is searched for (point, contextual or conceptual). Conceptual anomalies

are beyond the scope of this thesis, but for point anomalies as well as contextual

anomalies, Gaussian Mixture Models outperform intelligent Kmeans and form the

basic clustering technique for this research.

In parallel, a more known but rather classical method is applied, based on spectral

features. Both techniques are compared based on their computational intensity and

results, revealing the qualities of the newly designed technique based on data

exploration for feature extraction and unsupervised learning for classification.

Key words: Sound sensors - Anomaly detection - Data exploration - Unsupervised

Learning - Gaussian Mixture Models

IV

Extended Abstract

Urban Data Mining applied to sound sensor networks

Talis Vertriest

Supervisor(s): Dick Botteldooren, Bert De Coensel

Abstract This thesis develops a system that detects unusual

situations on time, inspired by the respective property that

humans exhibit in their everyday life quite effortless. It therefore

uses audio information from sound sensors as its sole input. The

technique used for Feature Extraction is Data Exploration. More

specifically, a Gaussian Mixture Model describes the timeframe

of one minute without overlap, thereby capturing temporal as

well as spectral relations into one single model. The parameters

of these GMM's form the key ingredients for feature vectors.

These can be added up with additional time features, depending

on the type of anomaly that is sought after. Three types of

anomalies can be distinguished; Unusual Events, Unusual

Minutes and Contextual Anomalies, with the focus on the former

type. Classification is based on Unsupervised Machine Learning:

a GMM classifier clusters the feature vectors that are, once

suspected anomalous, subject to human supervision for labelling.

The characteristics of the labelled samples are again learned by

the system, reducing human supervision over time and

converging to a zero total error rate. A Linear Program defines

the optimal mutual error ratio.

Keywords Sound sensors, Data Exploration, Gaussian

Mixture Model, Anomaly Detection, Unsupervised Machine

Learning, Error Ratio

I. INTRODUCTION

Today, more than half of the world’s population lives in

urban areas, highlighting the need to improve urban

environments. Among human senses, hearing is second only

to vision and has additional advantages according to

complexity and versatility, making it it optimal manner for

understanding urban and conceptual settings.

The main focus of audio classification systems is Speech

Recognition (SR), because of the many evident application

fields and its well-defined area of content. Speech

Recognition has traditionally relied on field knowledge for

feature extraction, a popular choice being the Mel Frequency

Cepstral Coefficients (MFCC).

In the past years, with the rapid growth of technology, the

awareness of the many useful applications of audio

classification grew and Environmental Sound Recognition

(ESR) gained attention. However, the hand-crafted features

that are successful for Speech Recognition perform poorly for

noisy environments, and the urge for new techniques rose.

The advantage of sound recognition is that only certain events

are sought after and those are usually provided as labelled

samples. The features to be extracted can thus be studied in

relation to their content. This is called semi-supervised

learning.

T. Vertriest is with the Information Technology Department, Ghent

University (UGent), Gent, Belgium. E-mail: [email protected] .

Environmental Anomaly Detection is a much broader

research and its abilities go beyond those of the previous

bespoken recognition based researches. While speech

recognition and sound event recognition only search for

certain real-time events, they reject all other unclassified data

as useless, while in in anomaly detection the whole dataset is

now of interest, whether including a certain event or not. The

reason for this is the input data. Urban soundscapes, in

contrary to speech or specific audio events, capture an almost

unlimited variety of sounds with a very high level of noise.

Furthermore, many sources interfere simultaneously. The

audio input can thus not be compared anymore with a

taxonomy of possible contents, or in other words, no labelled

data is available anymore. Every signal is part of the system

and helps defining whether new incoming data are normal or

abnormal, accumulating to Big Data.

II. APPROACHES FOR ANOMALY DETECTION IN BIG SOUND

DATA

A. Anomaly Types

Three types of anomalies are distinguished, based on what a

human supervisor would notify as anomalous. Unusual events

are single unusual events, capturing only a certain frequency

range in a certain time frame, e.g. a gunshot, a thunderstorm.

Unusual timespans (one minute) contain a strange

combination of possibly normal events, e.g. someone playing

music during a heavy storm. Contextual anomalies describe an

unusual moment rather than content. Children playing on the

street are for example frequently occurring, but not during the

middle of the night. The focus of this research is on unusual

events, however also the search engines for the other

anomalies are also set up.

B. Methodology

With the focus of audio classification systems on

recognition, where a form of labelled data is available, the

traditional path for feature extraction is by data knowledge in

the form of Band filters, spectral and temporal features. For

classification, the provision of labelled data allows Supervised

Machine Learning, or Semi-supervised Machine Learning

when labelled data is replaced by a well-informed supervisor.

For this research, no labelled data is available and because

the audio data is irreversibly transformed to spectrograms by a

Discrete Fourrier Transform (DFT), direct and accurate

supervision is inconceivable. Feature extraction is therefore

based on data exploration and the classification process

happens through Unsupervised Machine learning. In parallel,

a classical approach based on spectral features is run for

V

comparison and to obtain a better insights for decision

making.

Figure 1: Classification Methodology

C. Related work

Research depends either on the extraction of classically

known low-level features, or there is labelled data available so

features can be derived by data exploration. The major part of

research even combines both field knowledge for feature

extraction and supervised learning for classification. It lays in

men's nature to apply knowledge rather than to dive into the

unknown and furthermore, it performs quite accurately for the

fields that are most attractive and thus received most attention:

speech and music. Ntalampiras et al [1] for example rely on

MFCC's and have labelled data available. Their evident

conclusion is that it only works accurately for speech involved

samples. Radhakrishnan et al [2] also rely on MFCC's for

feature extraction and have samples of 'normal' audio samples,

in the search for anomalies. This research already belongs to

the semi-supervised category because it is now unknown what

is looked after. Data exploration for feature extraction, as well

as unsupervised classification, is still in its infancy. A

combination of both seems even inexistent and this is why this

thesis is unique, because it combines data exploration with

unsupervised classification. The reason is very simple,

standard low-level features have been proven to be ineffective

for environmental audio signals, there is little to no additional

information about what types of features could be significant,

there is no labelled data available and there is no possibility to

create them by human supervising. Interesting work for

feature extraction is the idea of scattering, done by Salomon

and Bello, applied in two of their papers [3] [4]. Although

they start from the Mel spectrum, it is an interesting start point

to discover features. Also Cai, Lu et al [5] scatter the feature

vectors and apply statistical parameters. For this thesis,

instead of scattering low-level features, the spectrograms

could be scattered and described by a more complex statistical

model instead of basic parameters. Classification inspiration

comes from the same papers [4] [5] [6] for their use of K-

means, but especially the use of GMM's is attractive for both

feature extraction as well as classification, inspired by

Ntalampiras et al [1].

III. PROPOSED MODEL: GMM

A. Feature Selection

By scattering different spectrograms in one single graph,

relations between data points in frequency as well as in time

are pictured. A Gaussian Mixture Model of five components

describes 480 subsequent spectrograms (1-minute), capturing

spectral and temporal features in one model. It makes use of

the known fact that data points in vicinity of time or

frequency, do not differ a lot from each other.

Figure 2 & 3: GMM per 480 spectrograms

Five parameters describe each of the five Gaussian

components: two are allocated to the mean and three to the

covariance. Each minute is thus described by a feature vector

of 25 digits, a significant data reduction of 600 times.

B. Classification

The classification technique depends on the type of

anomaly. At first, unusual events are in fact unusual Gaussian

components. When plotting the mean values of all Gaussian

components, their histogram suggests a GMM classifier.

Different dimensions have been tested; 2D clustering only the

mean values, 5D clustering the mean values and covariances

and 5D standardized, whereby mean and covariances are

rescaled. The not rescaled 5D method performs the best. The

feature vectors are thus clustered by a 5D GMM classifier and

Gaussian components with a low probability density

according to the built model, are alerted anomalous.

Figure 3: Histogram mean values of all Gaussian components

Unusual minutes are detected by statistical approach. Each

minute is described by a feature vector of five digits, the

cluster numbers of its containing Gaussian components. Two

different measures can be calculated per feature vector; the

joint probability and the joint correlation, depending on a non-

scientifically proven point of view. Joint probability assumes

the Gaussian components or underlying audio events

independent from each other while Joint correlation assumes

them dependent. In case one of the two values is lower than a

prescribed threshold, the minute is alerted as anomalous.

Contextual anomalous count an additional continuous time

feature, representing the hour of the day. The feature vectors

per minute consist out of 12 digits, five times the cluster

VI

centroid mean values of the containing Gaussian components

and two clock coordinates for the time feature. Clustering is

done with a 12D GMM.

C. Anomaly Threshold Definition

In order to define one or more thresholds for anomaly

assignment, one must consider the types of errors occurring,

their impact and interaction. A linear program defines the

optimal threshold:

𝑚𝑖𝑛 𝐹𝑁

𝐻∗ 𝐶𝑓𝑛 + 𝐹𝑃 ∗ 𝐶𝑓𝑝 + 𝑇𝑃 ∗ 𝑅𝑡𝑝

or

𝑚𝑖𝑛 𝑓𝐹𝑁(𝑡)

𝐻∗ 𝐶𝑓𝑛 + 𝑓𝐹𝑃(𝑡) ∗ 𝐶𝑓𝑝 + 𝑓𝑇𝑃(𝑡) ∗ 𝑅𝑡𝑝

where: 𝐹𝑁: number of False Negatives

𝐹𝑃: number of False Positives

𝑇𝑃: number of True positives

𝐻: factor for moral damage to human beings

𝐶𝑓𝑛: Cost per false negative

𝐶𝑓𝑝: Cost per false negative

𝑅𝑡𝑝: Revenue per True positive

t : threshold

In order to solve this LP, the functions of threshold, as well

as the precise cost per error must be known. As this is never

the case in reality, different thresholds must be set to learn

those parameters.

The equation above defines the optimal threshold for the

optimal ratio of errors, in a steady state. However, it does not

reduce the total error rate. Therefore, Machine Learning is

applied after the supervision of alerted anomalies by an

authorized person. Assuming that the opinion of the

supervisor is always correct, they are grouped into true

positives and false positives. The characteristics of the false

positives are learned by another GMM classifier, that will be

applied onto newly incoming alerted anomalies, converging to

a zero total error rate.

IV. CLASSICAL APPROACH: SPECTRAL FEATURES

A. Feature Extraction

Each spectrogram is described by nine characteristics, low-

level spectral features; Spectral Energy, Spectral Centroid,

Spectral Spread, Spectral Roll-off Point, Spectral Entropy,

Spectral Kurtosis or flatness, Spectral Skewness, Spectral

Slope and Noisiness. Down sampling to 1/1000 was necessary

because of the excessive size and computation forces needed.

After standardizing the data, Principle Component Analysis

(PCA) decorrelates and recombines these features into pseudo

independent features. Kaiser's stopping rule or otherwise

called the eigenvalue one criterion is applied. With the extra

margin of one, three remaining features are considered

optimal.

B. Classification

A 3D GMM classifier is applied onto the feature vectors,

defining seven clusters. For a valuable comparison with the

newly proposed model, the feature vectors are divided into

timeframes of 480 spectrograms without overlap. Per minute,

the number of anomalous feature vectors are counted and the

minutes ordered in descending number of their anomalous

content.

V. RESULTS NEW APPROACH

Unusual events cannot visually be divided into categories,

they seem to be a misfit concerning the number of Gaussian

components applied. All of these anomalies are reprocessed,

and assigned with their preferred number of components,

using the Akaike Information Criterion. The newly created

feature vectors are classified with the same threshold as

before. Roughly 1/6th of the anomalies is still considered

anomalous, but after supervision, the newly assigned Gaussian

components are not representative for the underlying

anomalous scatter points, for which the reprocessing is

eliminated. The supervision of the anomalies defined by five

Gaussian components results in 52% of true positives, 48% of

false positives, a ratio that remains similar also amongst the

least anomalous of that threshold. This suggests many false

negatives. The threshold must be broadened and the

supervision is of crucial importance, no hard coded threshold

can be set. In addition Machine Learning of false positives

will help convergence to a zero error rate.

Unusual minutes are still due to supervision. Only 35,7% of

anomalies based on joint probability are the same as those

defined by unusual events, and none of those based on joint

correlation.

Contextual anomalies are hard to examine, because a deep

knowledge of the environment is necessary, as all the alerted

anomalies visually appear to be normal.

VI. CONCLUSION AND OUTLOOK

A. Conclusion

Acoustic information is a highly valuable source of

information for environmental context awareness. One of the

main difficulties of this thesis is that the transformed data is

not reversible to its original audio waves, which makes

acoustic supervision impossible. Another difficulty is that the

data is of significant size, calling for computational efficient

techniques and creative thinking. The unsupervised approach

has the advantage that all results are directly originating from

the input data; no other knowledge can be mistakenly applied.

The approach of Gaussian Mixture Models does not only

allow significant data reduction, it also captures both spectral

relations and temporal relations in a single model.

When looking at the results of the unusual events, the

created model fits the data very accurately, and where it does

not, a supervisor helps classifying true positives and false

positives. The latter ones are input for another GMM classifier

that is gradually updated and not only replaces the human

supervisor over time but also reduces the total error rate.

Instead of applying a hard threshold on the nomination of

anomalies, a more intuitive and morally correct technique is

applied. The rate of false positives is initially taken too high

and human supervisor assigns each anomaly with a label:

'false positive' or 'true positive'. The false positives are stored

and their characteristics are learned by the system. This self-

enhancement, also called machine learning, gradually

decreases the rate of false positives and increases the accuracy

of the system.

Besides the significant data reduction, the speed of the

program and the advantages of unsupervised learning, another

advantage of this research is that the developed technique can

VII

be applied on any environment. The technique will learn the

location's specific features and increase accuracy levels with

time.

B. Outlook

The duration of a thesis project allows only a certain

deepness of research, so evidentially, there is room for

improvement.

Conceptual anomalies are not addressed in this research.

The GMM's only encounter small-scale temporal relations, in

between one minute and one day. Although, the evolution of

the environment over time is also very important and could

reveal trends, seasonality, ...

Another interesting topic for future work is to build

taxonomies for different types of environments. Instead of

using a huge training set every time this program is applied

onto a new environment, the knowledge of likewise locations

could be used to converge faster and improve the level of

anomaly accuracy.

ACKNOWLEDGEMENTS

I would like to thank my thesis committee members,

Professor Dick Botteldooren and Professor Bert De Coensel

for their guidance and unrelenting support through this

process. Both have routinely shared their passion and

knowledge, which are to the great benefit of this thesis.

REFERENCES

[1] S. Ntalampiras, I. Potamitis, N. Fakotakis. (s.a.). On acoustic surveillance of hazardous situations. University of Patras, Greece:

department of Electrical and Computer Engineering

[2] R. Radhakrishnan, A. Divakaran, P. Smaragdis. (2005). Audio Analysis for Surveillance Applications. Cambridge: Mitsubishi Electric

Research Labs.

[3] R. Radhakrishnan, A. Divakaran, P. Smaragdis. (2005). Audio analysis for surveillance applications. in IEEE WASPAA’05. pp. 158-161.

[4] J. Salamon, J.P. Bello. (s.a.). Feature learning with deep scattering for

urban sound analysis. Center for urban science and progress. New York University.

[5] R. Cai, L. Lu, A. Hanjalic. (s.a). Unsupervised Content Discovery in

Composite Audio. Delft University of Technology: Department of Mediamatics, Tshinghua University: Department of Computer Science.

[6] J. Salamon, J.P. Bello. (s.a). Unsupervised Feature Learning for Urban

Sound Classification. New York University: Center for Urban Science and Progress, Music and Audio Research Laboratory.

VIII

Table of contents

Preface II

Abstract III

Extended Abstract IV

Table of contents VIII

List of figures XI

List of Matlab Graphs XII

1. Introduction 1

1.1. Motivation 2 1.2. Challenges of Environmental Anomaly Detection 5 1.2.1. Big Data 5 1.2.2. Taxonomy 6

2. Approaches for Anomaly Detection in Big Sound Data 9

2.1. Concept Introduction 9 2.1.1. Input Data types 9 2.1.2. Anomaly types 10 2.1.3. Methodology 12 2.2. Feature extraction 13 2.2.1. Field Knowledge 13

Low-level spectral features 14 Low-level harmonic features 15 Low-level perceptual features 16 Mid-level Temporal Features 17

2.2.2. Data exploration 19 2.3. Classification 19 2.3.1. Supervised 20 2.3.2. Unsupervised 22 2.4. Related work 26 2.4.1. Supervised 26 2.4.2. Unsupervised 28 2.4.3. Conclusions 30

3. Proposed Model New Approach: GMM. 32

3.1. Concept 32 3.2. Data Preparation 32 3.2.1. Missing Data 32 3.2.2. Reorganization 32 3.3. Programming Language 33 3.3.1. Efficiency 33 3.4. Feature Selection 34

IX

3.4.1. Failed try-out 34 3.4.2. Gaussian Mixture Model per minute 36 3.5. Classification 39 3.5.1. Point anomaly: Unusual event 39

2D GMM Clustering 39 5D GMM Clustering 43 5D GMM Clustering (Standardized) 43 2D iK-means Clustering 45 5D iK-means Clustering 46 Conclusion Clustering techniques 46

3.5.2. Point anomaly: Unusual minute 47 Joint Probability 47 Joint Correlation 47

3.5.3. Contextual anomaly 48 3.5.4. Anomaly Threshold Definition 49

Possible System Outcomes 49 Human defined anomalies 49 Threshold 51 Machine Learning Threshold 53

4. Classical approach: Spectral Features 55

4.1. Feature Extraction 55 4.1.1. Spectral Features 55 4.1.2. Principal Component Analysis (PCA) 55 4.2. Classification 57 4.2.1. Point anomaly 57

Gaussian Mixture Model 58 Temporal Smoothing 58

5. Results new approach 58

5.1. Unusual events 59 5.1.1. GMM 2D 59

Anomaly types 59 Model shortcomings 63 Threshold 66

5.1.2. GMM 5D 66 Types of anomalies 67 Model shortcomings 70 Reconstruction of GMM on anomalies 72 Threshold 76

5.1.3. GMM 5D standardized 76 5.1.4. 2D vs. 5D vs. 5D standardized 77 5.2. Unusual minutes 77 5.2.1. Joint Probability 78 5.2.2. Joint Correlation 78 5.3. Contextual anomalies 79

6. Results Classical Approach 80

6.1. Point anomalies 80

7. Model Extension 81

X

7.1. Feature Extraction 81 7.2. Classification 81

8. Conclusion and Outlook 82

8.1. Conclusion 82 8.2. Outlook 83

Bibliography 85

A. Appendix 1

A.1. Features 1 A.1.1. Spectral Centroid (SC) 1 A.1.2. Spectral Spread (SS) 2 A.1.3. Spectral Roll-off Point (SRP) 2 A.1.4. Spectral Entropy (SE) 2 A.1.5. Spectral Kurtosis or flatness 3 A.1.6. Mel Frequency Cepstral Coefficients (MFCC) 3 A.1.7. Bark bands 5 A.1.8. Zero Crossing Rate (ZCR) 5 A.1.9. Spectral Flux (SF) 6 A.1.10. Short Time Energy (STE) 7 A.1.11. Temporal Centroid (TC) 7 A.1.12. Energy Entropy (EE) 7 A.1.13. Autocorrelation (AC) 8 A.1.14. Root Mean Square (RMS) 9 A.2. Matlab Code 10 A.2.1. Workspace_Generator 10 A.2.2. Mid-level_GMM_generator 11 A.2.3. Cluster Gaussian Components 5D 14 A.2.4. Define anomalies based on clustering 15 A.2.5. Plot minutes of anomalous Gaussian components 17 A.2.6. Cluster based on 5D iKmeans 20 A.2.7. Anomalies based on 5d iKmeans 21 A.2.8. Clustering defined by spectral features 23 A.2.9. Anomalies based on spectral feature clustering 25 A.2.10. Unusual minutes Joint Probability 27 A.2.1. Unusual minutes Joint Correlation 28 A.2.2. Plot Unusual minutes based on Joint Probability 30 A.2.1. Plot Unusual minutes based on Joint Correlation 35 A.2.2. Cluster Contextual feature vectors 40 A.2.3. Define contextual anomalies based on clustering 42 A.2.4. plot spectrograms contextual anomalies 43

XI

List of figures

Figure 1: Point anomaly 11

Figure 2: Contextual anomaly 12

Figure 3: Classification methodology 13

Figure 4: Audio Features 14

Figure 5: MFCC extraction process 17

Figure 6: 1-Dimensional GMM 34

Figure 7: 1-Dimensional Gaussian curve fitting 35

Figure 8: 24-hour clock 48

Figure 9. MFCC extraction process 4

XII

List of Matlab Graphs

Matlab graph 1: GMM per spectrogram 36

Matlab graph 2: GMM on a 1 minute spectrogram 37

Matlab graph 3: GMM of one scattered minute 38

Matlab graph 4: Mean values of all Gaussian components 40

Matlab graph 5: Histogram of mean values of all Gaussian components 41

Matlab graph 6: GMM of mean values of all Gaussian components (2D) 42

Matlab graph 7: GMM of mean values of all Gaussian components (3D) 43

Matlab graph 8: 2D presentation of 5D GMM clustering 43

Matlab graph 9: 2D presentation of 5D GMM clustering (standardized) 45

Matlab graph 10: K-means defined clusters 46

Matlab graph 11: mean values of 1257 anomalies defined by 2D GMM 59

Matlab graph 12: Anomaly type 1: high values at high frequencies 60

Matlab graph 13: Anomaly type 2: High amplitude spread at high frequency 61

Matlab graph 14: High amplitudes at low frequencies 62

Matlab graph 15: Anomaly type 4: Low total amplitude variability 63

Matlab graph 16: Event switch during minute 64

Matlab graph 17: Event switch during minute 65

Matlab graph 18: Inaccurate data fitting 65

Matlab graph 19: Inaccurate data fitting 66

Matlab graph 20: Mean values of 1257 anomalies defined by 5D GMM 67

Matlab graph 21: Anomaly type 1: Microphone failure or resumption 68

Matlab graph 22: Anomaly type 2 69



Matlab graph 25: Anomaly based on 9 Gaussian components 71



XIII

Matlab graph 28: anomalous Gaussian out of 5 73




Matlab graph 32: Mean values of 1257 anomalies defined by 5D standardized GMM clustering 77

Matlab graph 33: Unusual minute based on joint probability 78

Matlab graph 34: Unusual minute based on joint correlation 79

Matlab graph 35: unusual minute by their time context 80

1

1. Introduction

The world is urbanizing rapidly, with more than half of the population now living in

cities. Improving urban environments for the well being of the increasing number of

urban citizens is becoming one of the most important challenges of the 21th century.

Many sources such as microphone, camera, gyroscope, accelerometer, luminance,

Global Positioning System (GPS), etc., are available for sensing and capturing various

types of environmental information. Auditory signals are chosen for a number of

reasons. Firstly, among the human senses, hearing is second only to vision in

recognizing social and conceptual settings. This is due partly to the richness in

information of audio signals. Secondly, cheap but practical microphones can be

embedded in almost all types of places and devices. Thirdly, auditory-based context

recognition or classification consumes significantly fewer computing resources than

camera-based systems. In addition, unlike visual sources of information such as

camera and video, audio signals cannot be obscured by solid objects and are

multidirectional, i.e., they can be received from any direction.

This research focuses on urban audio signals, which provide information about the

context of the environment beyond that provided by speech. Nonetheless,

environmental sound research is still in its infancy and traditionally overshadowed by

the popular field of automatic speech recognition (ASR), but recent growth of big data

and urban data analysis has opened up a range of novel application areas, including

acoustic surveillance, environmental context detection and healthcare applications.

The goal of this research is to build a system that detects unusual situations on time,

such as a hazardous situations, manifestations, strikes, etc., using incoming audio as

its only input, inspired by the respective property that humans exhibit in their

everyday life quite effortless. Furthermore it sets the basis to understand the

surrounding environment on a larger time scale in order to detect trends, seasonality

and the development of cities. Such a system should be characterized by accuracy and

flexibility, meaning that with slight alterations the system can work properly under

different kind of environments.

2

1.1. Motivation

Anomaly Detection in Big sensor Data is an interesting and growing field, applicable

on a broad range of sensors. Whether they are electrical outlets, water pipes,

telecommunications, stock exchange rates, cameras, microphones or one of many

others, in all these areas it is important to detect when defaults or anomalies occur.

They all follow the template of large amounts of data that are input very frequently,

calling for an efficient and fast approach to process these Big ata, in order to provide

real time support. The usefulness of anomaly detection is endless, varying from

simple detection of faulty sensors, for example electrical and water sensors, to more

complicated detection, for example unusual behaviour of stock exchange rates.

Furthermore, organizations that supply sensoring services compete in an industry that

has seen huge growth in recent years. One area that these organizations can exploit to

gain market share is through providing more insightful services beyond standard

sensoring. These could lead to different benefits such as energy saving, prediction of

financial portfolios, surveillance, etc. Among the different types of input data, audio

signals in particular are a very interesting source of information with many

advantages. First of all, microphones can easily be installed at all places as they

require a minimum of space and power supply. Secondly, they are cheap and easily

obtainable. Thirdly, compared to cameras, the input data a much smaller, and for

many applications it provides a broader content of the environment, as they are multi

directional. Where cameras only collect information from one direction at a time and

can be boycot by covering them, sound sensors record the interference of everything

that happens closeby and the signals are difficult to subdue.

Sonic records are always an interference of different events coming from different

sources. At their very basis, those sources can be divided in two groups: sound and

noise. Although both are mixtures of sound waves at different frequencies, sound

waves are considered to be ordered, while noise signals are considered to be

disordered. In other words, the mixture of sound waves can be easily separated into

individual frequencies, with some being more dominant than others, while on the

other hand, noise contains all possible frequencies with no presence of a dominant

frequency. This fundamental differentiation translates into different types of sound

classification systems:

3

Speech Recognition (SR) is the most developed research because sound signals, such

as speech and harmonic instruments, are easy to decompose into a limited number of

components. Furthermore, the microphone is usually pointed to a limited amount of

sources on an acceptable distance, reducing the level of interference. As a result, each

spectrogram directly relates with its content and field knowledge can be used for

feature extraction. Mel Frequency Cepstral Coefficients (MFCC) are the dominant

features used for speech recognition, often complimented with other spectral features.

Furthermore, speech consists out of a limited amount of words, spoken in a limited

amount of languages. A dictionary of examples serves as training data and enables

supervised learning. Speech Recognition is thus suitable for supervised learning and

therefore overshadows the more complicated research on 'noise' signals such as in

environmental sonic signals.

Environmental Sound Recognition has gained a lot of attention in the past years with

the growing awareness of its many useful applications. Unlike speech or music

signals, environmental acoustic signals are difficult to model due to its highly

unpredictable nature. They have a much wider variety in frequency content, thus a

broad noise-like spectrum. Furthermore, environmental signals are usually recorded

from a remarkable distance. Many of these unstructured sources now interfere, which

makes it challenging to select the features that best represent the data. Feature

selection of environmental audio signals is therefore a major constraint for the

accuracy of the system. Spectral features have high recognition accuracy in clean

conditions, but they perform poorly in unstructured environments such as urban sound

signals. Therefore, depending on the environment and the knowledge that can be

detracted from the labeled data, field knowledge or data exploration will be chosen for

feature selection. For classification, sound event recognition can still make use of

supervised learning, where different samples of 'known' events serve as training data.

Different records of gunshots for example can form the basis for the detection of

gunshots in urban areas. Despite its usefulness, only those events determined in

advance can be recognised, leaving a lot of information undiscovered.

Environmental Anomaly Detection defines the scope of this thesis. It is a very broad

research and its capabilities are beyond those of speech recognition and

environmental event recognition. Note that the title does not encounter the word

4

'Recognition'. While speech recognition and sound event recognition only search for

certain real-time events, they reject all other unclassified data as useless. They do not

cope with the full spectrum of big data, while in in anomaly detection the whole

dataset is now of interest, whether including a certain event or not. No labelled data is

available anymore; every signal is part of the system and helps defining whether new

incoming data are normal or abnormal. These 'Big Data' are one of the main

challenges of this thesis.

Another difficulty is that the original sound signals are transformed into spectrograms

and there is no ability to reconstruct the original sound waves. This makes intuitive

human supervision based on the sense of hearing impossible. As stated before, the

environment of this research is highly noisy and has a broad spectrum of sources.

Human interpretations are the best taxonomy available so far, thus without this option,

new techniques for feature extraction and classification must be developed to

approach it:

Field knowledge for feature extraction is only possible with a detailed taxonomy for

urban sounds. Because no environment is ever the same and constantly changing over

time, there is a lack of decent taxonomies for environmental sound signals, especially

for urban environments. Furthermore, my personal background in audio science is

rather limited to make reliable decisions based on spectral features. For these reasons

there is opted for Data Exploration rather than Field Knowledge.

Features, however they were extracted, become the new data input for classification.

Supervised learning uses the features of known or labelled data to search for

relationships and focuses on them, rejecting all unrelated features. Unsupervised

learning however, does not need labelled training data and searches for patterns based

on correlations of data and their frequency distributions. This reveals a totally new

range of possibilities and applications, because by clustering all data instead of only

labelled data, the system searches beyond the borders of prescribed events. In urban

environments for example, not only gunshots, screaming people and sirens are useful

to detect. Also unexpected anomalies such as storms or manifestations might be of

interest to detect and notify at early stage. Unfortunately there is no such thing as a

5

free lunch; unsupervised learning contains many challenges and difficulties, which are

described in the next part.

1.2. Challenges of Environmental Anomaly

Detection

1.2.1. Big Data

It is still difficult for a machine listening system to demonstrate the same capabilities

as human listeners for the analysis of sounds other than speech and music, generally

referred to as environmental sounds. Realistic environments consist of multiple and

simultaneous sources in reverberant conditions. Typical tasks on audio scene analysis

include scene classification and event detection recognition, but do not include

research on a larger time scale. Acquiring large scale labeled databases is still

problematic and such databases are most likely collected on heterogenous sets of

certain acoustic conditions, encompassing only limited variations and qualities.

Consequently, the quality of the results depends on the available training data set,

restricting the strength of the research. Upon today most of the methods developed are

probably not tractable on big data so there is a need for new approaches that are

efficient on large datasets, or Big Data.

Big Data is an umbrella term for massive and complex datasets composed of a variety

of data structures. In 2001, Gartner analyst Doug Laney defined data growth

challenges and opportunities as being three-dimensional: volume, velocity, and

variety. Volume refers to the growing amount of data stored, from terabytes to

pentabytes and beyond. With the growing number of applications on acoustic

sensoring, the urge to cope with higher levels of precision on larger time scales grows.

This results in exponentially increasing data. Besides the difficulties to process these

huge amounts of data, the volume also carries one of the biggest advantages of big

data; a paradigm shift in the types of algorithms used, from computationally

expensive algorithms to computationally inexpensive algorithms. The inexpensive

algorithms may have much higher training accuracy error. However, when normalized

over the entire large dataset, the higher training accuracy error converges to a smaller

prediction error. Prediction error is defined as the error accumulated when predicting

new values from a trained predictor. The prediction error for the inexpensive

6

algorithm is within similar ranges as those found with the computationally more

expensive algorithm occurring over a much smaller time frame. Big Data thus allows

more inexpensive algorithms for similar results and this thesis makes use of this

advantage. Velocity can be divided into two different aspects; the velocity of

incoming data and the processing speed. The rate of incoming data incorporates some

important characteristics. The smaller the interarrival times and the smaller the variety

of those, the better the environment can be understood and the more precise

statements can be made about the evolvement of it. The assumption that (acoustic)

sceneries only vary slightly per small time-frame, makes it possible to recognize and

define unusual changes, by gathering data of certain time spans. The second aspect is

the requirement of processing the big data at real-time speed, or faster. When the

system cannot meet real-time processing velocity, the waiting time exponentially

increases and the system becomes useless. But once again, the size of the data is

inversely related with the accuracy of the results and a good balance between these

two desirable but incompatible features must be achieved. This thesis drastically

reduces the input data during the feature selection process, making use of Gaussian

Mixture Models. It enables the system to operate fast and successfully, also with the

accelerating expansion of input data. Variety refers to the component of big data that

includes the requirement to involve additional metadata in the form of tables, other

databases, photos, web retrieved data, social media and primitive datatypes such as

integers, floats and characters. Within the acoustic sensoring domain, other sensors

can provide a variety of additional information that can enrich the research. In this

thesis however, metadata is not included, directly pointing out an interesting topic for

further research. Spectrograms and time are the sole contributors to the input data.

In recent years, additional ”V’s” have been added to this 3V's model to cope with the

evolving requirements for addressing and understanding Big Data, including veracity,

value, and visualization. Currently, businesses are acutely aware that huge volumes of

data, which continue to grow, can be used to generate new valuable business

opportunities through Big Data processing and analysis. The challenge of this

research is to develop a fast and accurate system to detect anomalies based on

environmental acoustic signals.

1.2.2. Taxonomy

7

The ultimate but utopic goal of this thesis and any acoustic classification research is a

precise reconstruction of the original environment, by decomposing the input signal

and allocate each component to its source. Unfortunately, there are some constraints

to audio signals in general, as well as to their records. The first question is whether a

single auditory sensor, irrespective to its recording quality, can provide enough

information for scene reconstruction? Secondly, are the best sonic sensors available

today sensitive enough, or is a higher frequency range and precision required to allow

better research? Thirdly, Is it possible that one general, multi applicable system can be

developed or does every environment need customized research? To even understand

these questions, a basic understanding of sound is required.

Sound is a vibration that is transmitted as longitudinal and transverse waves,

depending on the transmission medium. Without medium, a sound wave cannot

propagate and simply does not exist. Sound waves that travel through gases, plasma

and liquids are longitudinal waves. Through solids, however, sound can be

transmitted as both longitudinal waves and transverse waves. In this thesis project,

only longitudinal waves will be taken into account because the medium that surrounds

the auditory sensors, as well as our ears, is air. Although many sound waves will have

travelled through solids before as transverse waves, as soon as they leave the solid

into the air, they are converted to longitudinal waves. To interpret sound waves, the

receiver must have something that can vibrate in order to translate the vibrating

medium. That can be our eardrums, or auditory sensors that convert the sound waves

into an electrical current that holds all the information.

Simple sound waves result from a simple harmonic motion (SHM) and are made up of

a single frequency component. They are also called notes or harmonics and can be

represented as a sinusoidal waveform, i.e. a sine or a cosine wave, and is characterised

by its amplitude and frequency. Waves can move through each other, which means

that they can be in the same place at the same time and when this happens the

amplitudes of the waves simply add together and form complex sound waves.

Imagine two waves of the same frequency. If the compressions and the rarefactions of

the two waves line up, they strengthen each other and create a wave with a higher

intensity. This type of interference is known as constructive. When the compressions

and rarefactions are out of phase, their interaction creates a wave with a dampened or

8

intensity or even silence. This is destructive interference. The amount of source waves

interfering can be infinite and of different frequencies, each combination creating

different complex wave pattern, either periodic or aperiodic. However, the vast

majority of experienced sounds in nature and daily life are aperiodic.

Aperiodicity means that successive disturbances are not equally spaced in time and

are not of constant shape either. In other words, aperiodic waves do not have a regular

repeating pattern and are perceived as noises. They do not have a harmonic basis, i.e.

the component frequencies are not integer multiples of a fundamental frequency or in

other words, the component frequencies of which they are made up, are not related to

each other. Transient signals are sudden pressure fluctuations that are not sustained or

repeated over time. Examples are the consonants in speech, a hammer hitting the table,

the slamming of a door, the popping of a balloon,...

Urban environments are very different from other environments such as audiences,

households, theatres, schools, etc., in the sense that the data input consists almost only

out of transient signals and is actually interpreted as noise. Very few periodic waves

like human voices or acoustic instruments will occur, making it difficult to build a

common vocabulary. Furthermore, urban landscapes are subject to a large set of

simultaneously happening events, making it difficult to label urban audio data.

Previous work has focused on audio from carefully extracted movie or television

tracks from specific environments such as elevators, households, audiences and office

spaces. The classification of sounds into semantic groups may thus vary from study to

study, making it hard to compare results. These major hindrances together with the

lack of personal knowledge of audio signals, were the motivation for a totally

different approach: Unsupervised Learning. By treating the input data as random

numbers, without relating to its spectral characteristics, taxonomy becomes redundant.

Frequency of occurrence now becomes the basis for anomaly detection.

9

2. Approaches for Anomaly

Detection in Big Sound Data

This chapter describes the different techniques and approaches for anomaly detection

in Big Sound Data. Although the input data for this research comes from an urban

environment, a more general overview is given because the methods are similar and

the goal of this thesis is to develop a system that can be applied on different types of

environments.

2.1. Concept Introduction

2.1.1. Input Data types

One of the major considerations in using an anomaly detection algorithm is based on

the type of records: Categorical and or numerical.

Categorical data represents characteristics such as the weather (raining, cloudy,

sunny), the microphones brand or the type of day (public holiday, weekend, week).

Categorical data can take on numerical values (such as "1" indicating that it's raining,

"2" for cloudy and "3" for sunny), but those numbers don't have mathematical

meaning. You couldn't add them together for example.

Numerical data can be divided into continuous, discrete and binary data. Continuous

data represent measures; their possible values cannot be counted and can only be

described using intervals on the real number line. The original audio signals are a

continuous presentation of amplitude in time. Fourier Transform decomposes the time

scale into the frequencies that the signal is made of. The created graph represents

amplitude and frequency and is called a spectrogram. For the ease of recordkeeping

and to optimise the size of the data versus the quality of information, the original

sound waves have been transformed using a Discrete Fourier Transform, rounding the

continuous data to discrete values.

For this research specifically, the audio spectrum ranges from 20 Hz to 20 KHz and is

divided up into 31 1/3-octave bands. To define the original frequency centres, set the

10

19th 1/3-octave band as centre band, it’s centre frequency to be around 1000 Hz. Then

all lower centre frequencies for 1/3-octave bands can be defined from each other

using the formula 𝑓𝑛−1 = 𝑓𝑛/21/3. Conversely, all higher centre frequencies for 1/3-

octave bands can be defined from each other using the formula 𝑓𝑛+1 = 21/3𝑓𝑛.

2.1.2. Anomaly types

Another important consideration for the algorithms are the relationships within the

data itself. Many applications assume that there exist no relationships between the

records; these are generally considered point anomaly scenarios. Other applications

assume that relationships may exist and based on the type of relationship, they are

referred to as contextual and conceptual anomalies. For anomaly detection, algorithms

rely on the assumption that anomalies are far less frequent than normal records in the

dataset. Further, an underlying assumption in most of these algorithms is that

anomalies are dynamic in behaviour. That means that it is very difficult to determine

all the types of anomalies for the entire dataset, and the future.

A point anomaly is usualy a single record that is considered to be abnormal with

respect to the other records in the dataset, or in other words; an outlier. However,

environmental sound signals are noisy and thus highly volatile. Thereby, a single digit

outlier does not intrinsically mean a real life anomalous situation and filtering them

out would not reveal any useful information. Instead, a period of those noisy signals

can be gathered and described by a mathematical model. According to how accurate

these outliers are and how frequently they occured in that timeframe, they will only

influence the model when they are significant. Now the models of those timeframes

can be studied and those outliers will be seen as point anomalies. Section 3.4.2

describes the formation of these models and the choices made according to

components and time windows. Figure 1 Shows in green a normal minute described

by a Gaussian Mixture Model consisting of 5 components. The red Gaussian

component does not occur frequently in time and is thereby considered a point

anomaly. The cause behind those unusual gaussian components will be discussed in

the results section, but from now on when anomalies are discussed, those modelled

components count instead of single records.

11

Figure 1: Point anomaly

A contextual anomaly is an anomalous record (gaussian component) within a specific

context. For example, a gaussian component or a combination of those may only be

considered anomalous when evaluated in the context of temporal and spatial

information. For example, a reading of certain values may be normal during a random

night, but anomalous during daytime. Pictorally this is shown in Figure 2. Note that

the figure only represents amplitude in function of time and is a true simplification to

visualize the concept.

12

Figure 2: Contextual anomaly

A conceptual anomaly is a record or component, anomalous with respect to the entire

dataset. More specifically, this means that the record may not be considered as

anomalous alone or in temporal prospective. However, when combined within a

collection of sequential records, it may be anomalous when it does not behave

according to the derived patterns or frequency distributions. Due to the lack of time,

this type of anomaly is beyond the scope of this thesis and points out an interesting

topic for future research.

2.1.3. Methodology

Like many other pattern classification tasks, audio classification is made up of three

fundamental components: Sensing, feature extraction and classification. The sensing

section is described above. It is decided beforehand and determines the input data and

thus the start point for this research. The main challenge is the route to choose and the

development of the algorithms, in order to obtain interesting and reliable results.

Figure 3 shows the concept of optional approaches for feature extraction as well as

classification. Each approach is considered based on their related work. The results

and possible applications of the different approaches and their algorithms are the

fundaments for the chosen approach for this specific research.

13

Figure 3: Classification methodology

2.2. Feature extraction

2.2.1. Field Knowledge

Field knowledge relates the spectrogram to its original audio content, based on the

knowledge of how audio features translate acoustic events. A systematic taxonomy of

all the audio features is outside the scope of this thesis, but nevertheless a distinction

can be made according to a time viewpoint: low-level features and mid-level features.

Low-level features are also called spectral features or frequency features while mid-

level features are also called temporal features. Cepstral features are not further

distinguished in this work and belong to spectral features. For this thesis, the raw

audio signals have been converted to spectrograms that serve as input data. Therefore,

feature extraction starts from the spectrogram instead of the raw signal. The possible

14

features to be extracted are numerous. The following list is therefore far from

complete but points the most frequently used features of audio signals. For a detailed

description and the mathematical equation of each one of them, there is separately

referred to Appendix A.

Figure 4: Audio Features

Low-level features are spectrogram characteristics and are derived for each single

sample. They can discover point anomalies without any relation to another sample.

Low-level features can be extracted straight from the Short Time Fourier Transform

(spectrogram), or after the application of a harmonic or perceptual model.

Low-level spectral features

Low-level spectral features are instantaneously computed from the Short Time

Fourier Transform (STFT) of the signal. The frequency domain reveals the spectral

distribution of a signal. For each frequency (or frequency band/bin) the domain

provides the corresponding magnitude and phase. Since phase variation has little

effect on the sound we hear, features that evaluate the phase information are usually

ignored. Consequently, we focus on features that capture basic properties of the

15

spectral properties of audio signal. The references of the low-level spectral features

are: [1][2][3][4][5][6].

- The Spectral Energy (SpE) equals the energy of the signal. It is the sum of the

power of each energy value or amplitude.

- The Spectral centroid (SC) represents the “balancing point”, or the midpoint

of the spectral power distribution of a signal. It is related to the brightness of a

sound. The higher the centroid, the brighter (high frequency) the sound is.

A.1.1

- The spectral spread (SS) is the second central moment of the spectrum. It is a

measure that signifies if the power spectrum is concentrated around the

centroid or spread out over the spectrum. A.1.2

- The spectral roll-off point (SRP) is the N% percentile of the power spectral

distribution, where N is usually 85% or 95%. The spectral roll-off point is the

frequency below which N% of the magnitude distribution is concentrated.

A.1.3

- The Spectral Entropy (SE) defines the complexity of the spectrogram, the

lower the value the more 'ordered' or linear the spectrogram. A.1.4

- The Spectral Kurtosis or flatness gives a measure of flatness of a distribution

around its mean value. The kurtosis indicates the peakedness/flatness of the

distribution. A.1.5

- The Spectral Skewness is a measure of the asymmetry of the data around the

sample mean.

- The Spectral Slope represents the amount of decreasing of the spectral

amplitude. It is computed by linear regression of the spectral amplitude.

Low-level harmonic features

Low-level harmonic features are derived after the application of a harmonic model. At

each time frame, the peaks of the Short Time Fourier Transform (STFT) of the

16

windowed signal segment are estimated. Peaks close to multiples of the fundamental

frequency at this frame are then chosen in order to estimate the sinusoidal harmonic

frequency and amplitude [7].

- The fundamental frequency of a harmonic signal is the frequency so that its

integer multiple best explain the content of the signal spectrum. The

fundamental frequency has been computed using the maximum likelihood

algorithm [8]. A higher resolution of the dataset is needed, thus this feature

cannot be used.

- The noisiness is the ratio between the energy of the noise (non-harmonic part)

to the total energy. It is close to 1 for purely noise signal and 0 for purely

harmonic signal.

Low-level perceptual features

To obtain Low-level perceptual features, the audio signal is processed by a filter bank

to compress the signal without humanly noticeable changes. From the resulting

signals, linear and non-linear predictors are computed after which spectral features

can be derived, together with model specific coefficients. Before the advent of

modern digital signal processing, band pass filters were the only way to become a

Time-Frequency presentation: they divide the input signal into frequency bands and

the magnitude of each filter's output controls a transducer that records the

spectrogram as an image on paper. But for the scope of this thesis, filters are

considered as part of feature extraction techniques onto the already created

spectrogram.

- For this thesis, the third octave filter bank has been applied on the spectrogram

and is the start point for further research.

- The Log-Gabor filter, named after Dennis Gabor, is an improvement of the

original Gabor filter that is primarily used for edge detection in image

processing. It offers simultaneous localization of spatial and frequency

information, while Fourier Transform only provides frequency information.

Examples of Linear band conversions:

17

- Linear Frequency Cepstral Coefficients (LFCC) are similar to Mel Frequency

Cepstral Coefficients (see below), but the MFCC scaled filter banks become

wider at higher frequencies while they are equally spaced for linear filter or

LFCC.

- Linear Predictive Coding (LPC) is a mathematical operation where future

values of a discrete-time signal are estimated as a linear function of previous

samples.

Examples of logarithmic conversions:

- Mel Frequency Cepstral Coefficients (MFCC) originate from automatic

speech recognition but evolved into one of the standard techniques in most

domains of audio recognition applications such as environmental sound

classifications. They represent timbral information (spectral envelop) of a

signal. Computation of MFCC includes conversion of the Fourier coefficients

to Mel-scale. After conversion, the obtained vectors are logarithmized, and

decorrelated by discrete cosine transform (DCT) in order to remove redundant

information. Figure 5 shows the process of MFCC feature extraction [9][12].

Figure 5: MFCC extraction process

- Gammatone Frequency Cepstral Coefficients (GFCC) are a variant of the

MFCC using the cubic root of the time frequency representation instead of the

log, in combination with a gammatone weighted filter bank instead of a Mel

weighted filter bank.

- Although the Mel bands are used for the Mel Frequency Cepstral Coefficients,

the Bark bands are a better approximation of the Human Auditory System.

This latter will be used for the calculation of the Loudness, specific loudness,

sharpness and spread. A.1.7

Mid-level Temporal Features

https://en.wikipedia.org/wiki/Discrete_time

https://en.wikipedia.org/wiki/Signal_processing

https://en.wikipedia.org/wiki/Linear_transformation

18

Other than low-level features, they discover relations of subsequent frames and are

able to recognise events and thus detect temporal anomalies. The size of the frames

and their overlap is of crucial significance.

- Zero-crossing rate (ZCR) is the most common type of zero crossing based

audio features. It is defined as the number of time-domain zero crossings

within a processing frame [10]. A.1.8

- The Spectral Flux (SF) defines the amount of frame-to-frame fluctuation in

time: i.e., it measures the change in the shape of the power spectrum [11].

A.1.9

- Short Time Energy (STE) is one of the energy based audio features. It is easy

to calculate and provides a convenient representation of the amplitude

variation over time. It indicates the loudness of an audio signal and is thus

reliable indicator for silence detection [12][13][14][15]. A.1.10

- Temporal Centroid (TC) is the time average over the envelope of a signal in

seconds. It is the point in time where most of the energy of the signal is

located on average. A.1.11

- Energy Entropy (EE) can be interpreted as a measure of abrupt changes in the

energy level of an audio signal. A.1.12

- Autocorrelation (AC) represents the correlation of a signal with a time-shifted

version of the same signal for different time lags. It reveals repeating patterns

and their periodicities in a signal and can be employed, for example, for the

estimation of the fundamental frequency of a signal. This allows

distinguishing between sounds that have harmonic spectrum and non-

harmonic spectrum, e.g., between musical sounds and noise [1]. A.1.13

- Root Mean Square (RMS) is a measurement of energy in a signal. A.1.14

- Mean and Variance and standard deviation are the ever classic, simple and

useful features to get fast insight into data.

19

2.2.2. Data exploration

Audio classification systems have been traditionally relied on hand crafted audio

features with the Mel Frequency Cepstral Coefficients being the most popular choice.

MFCC simulate the human hearing sense by extracting those features that are the

most important for human interpretation. Consequently, they are highly effective

when it comes to music and speech analysis, but they might oversee important aspects

in noisy environments. Data exploration searches for characteristics of the data

without any interpretation assumption. By clustering data in intelligent ways,

characteristics can be revealed and used as features. As this is a form of clustering, the

same algorithms as clustering for classification can be used, which are listed in the

next paragraph. Data exploration is obviously more complex than field knowledge but

enables new insights and flexibility over environments and applications.

2.3. Classification

Algoritms for detecting anomalies, or any other kind of classification, can be

categorized based on the types of data labels known apriori. If the algorithm knows a

set of example records labelled as anomalous, it is referred to as a supervised learning

algorithm. The examples, commonly termed as training samples, are used to teach the

classifier how to assign an unseen feature vector to the correct class. In contrary, if the

algorithm has no notion of the labels of data, anomalous or otherwise, the algorithm is

referred to as a unsupervised learning algorithm. Further, a third category titled semi-

supervised learning algorithms, involves approaches that assume the training data

only has labelled instances for the normal datatype. This is a more common approach

to anomaly detection than supervised learning algorithms as it is norrmally difficult to

identify all the abnormal classes. Finally, most anomaly detection applications in

practice have the anomalies defined through human effort. As a result, it is much

more common in practice to find datasets which are unlabelled, or partially unlabelled,

requiring the use of an unsupervised or semi-supervised learning algorithm. Many

supervised algorithms can be seen as both supervised and unsupervised. A discussion

is thus possible about the categorization below but only the understanding of the

algorithms matters in order to become creative and innovative in applying them. See

their referred appendix for a more detailed description.

20

2.3.1. Supervised

- Gaussian Mixture Models (GMM) are a form of statistical modelling.

Statistical modelling approaches rely on the assumption that normal records

occur in high probability regions of a stochastic model, while anomalies occur

in the low probability region. These techniques fit a statistical model to the

given training dataset and then apply a statistical inference model to determine

if the test record belongs to the model with high probability. There are two

types of statistical modelling techniques: parametric and non-parametric

models. Parametric techniques assume that normal data is generated by a

parametric distribution; such an example is the Gaussian model and the

Regression model. Non-parametric techniques make few assumptions about

the underlying distribution of the data and include techniques such as

histogram-based, and kernel-based approaches. GMMs are used in classifying

different audio classes. It is an intuitive approach when the model consists of

several Gaussian components, which can be seen to model acoustic features.

In classification, each class is represented by a GMM and refers to its model.

Once the GMM is trained, it can be used to predict which class a new sample

probably belongs to. The potential of Gaussian mixture models to represent an

underlying set of acoustic classes by individual Gaussian components, in

which the spectral shape of the acoustic class is parameterized by the mean

vector and the covariance matrix, is significant, especially if the dataset

(feature data points) is large. Moreover, these models have the ability to form

a smooth approximation to the arbitrarily shaped observation densities in the

absence of other information. With Gaussian mixture models, each sound is

modelled as a mixture of several Gaussian clusters in the feature space. There

are several advantages to the statistical modelling technique to anomaly

detection that make it work for Big sensor Data cases. First, if assumptions

hold true then the approach provides a statistically justified solution to the

problem. Second, the anomaly score output by the statistical model can be

associated with a confidence interval, which can be manipulated by the

algorithm. This provides additional information for the algorithm to use to

improve the efficiency and performance of the algorithm. Finally, the Big

sensor Data context normally follows a normal, or Gaussian, distribution.

21

However, there are several disadvantages to the statistical modelling approach.

First, if the underlying assumption of the data being distributed from a

particular distribution fails then the statistical technique will fail. Second,

selecting the test hypothesis is not a straightforward task as there exists several

types of test statistics. While these disadvantages certainly effect the Big

sensor Data scenario, there are ways to overcome the disadvantages while

retaining the aforementioned advantages. For example, evaluating the

algorithm with respect to a variety of test statistics can be done a priori.

Comparing the results of this step will allow the algorithm to select the most

appropriate test statistic for the given use case.

- Hidden Markov Models (HMM) are a tool for representing probability

distributions over sequences of observations. It gets its name from two

defining properties. First, it assumes that the observation at a certain time was

generated by some process whose state is hidden from the observer. Second, it

assumes that the state of this hidden process satisfies the Markov property:

that is, given the value of the previous state, the current state is independent of

all the states prior to the previous one. In other words, the state at some time

encapsulates all we need to know about the history of the process in order to

predict the future of the process.

- The K-Nearest Neighbor (K-NN) algorithm is despite its simplicity, well

tailored for both binary and multi-class problems. Its outstanding characteristic

is that it does not require a training stage in the strict sense. The training

samples are rather used directly by the classifier during the classification stage.

The key idea behind this classifier is that, if we are given a set of patterns

(unknown feature vector), we first detect its k-nearest neighbors in the training

set and count how many of those belong to each class. In the end the feature

vector is assigned to the class that has the highest number of neighbors.

- Support Vector Machines (SVM) are classifiers that have been successfully

employed in numerous machine-learning fields. It is a very effective method

for general-purpose pattern recognition. Given a set of points that belong to

either of two classes, a SVM finds a hyper plane leaving the largest possible

22

fraction of points of the same class on the same side, while maximizing the

distance of either class from the hyper plane.

- Bayesian networks (BN) are commonly used in multi-class anomaly detection

scenarios. Bayesian networks are a classification-based approach whereby the

network estimates the probability of observing a class label given a test record.

The class label with the highest probability is chosen as the predicted class

label for the test record. There are a few advantages to the classification-based

Bayesian network approach. First, multi-class Bayesian networks use a

powerful algorithm to distinguish between instances belonging to different

classes. Second, the testing phase is fast. This is important in anomaly

detection for sensor networks, as real-time evaluation for all the sensor

readings must be processed. Without a real-time testing phase, this is

impossible. Bayesian networks also have some disadvantages for anomaly

detection that eliminate them from contextual-based approaches. First, where

there are multiple anomaly class labels or multiple normal class labels, the

system relies on those readily available class labels for all the normal classes.

As was previously discussed, this is generally not possible for practical

purposes. Human intervention is normally required to label training records,

which is not always available. Second, Bayesian approaches assign discrete

labels to the test record. This is disadvantageous for sensor networks as

meaningful anomaly score is not always required. However, when evaluating

the effectiveness of the anomaly detection approach, it is useful to

discriminate between anomaly scores. This is more evident in Big Data

contexts as a continuous anomaly score can aid in reducing the overall number

of anomalies needing to be evaluated.

2.3.2. Unsupervised

- Neural networks, in particular Deep Learning Networks, are one of the

primary tools for feature learning. A neural network is a complex hierarchy of

linear and nonlinear nodes that can learn complex functions automatically,

given they are fed by enough input data. Deep networks are many-layered

versions of neural networks, and the deepness of them allows them to learn

hierarchies of features, enabling them to automatically learn very high level

23

features about a signal. Neural networks were invented as early as the 1940s,

and were researched extensively until the early 1990s, when successes in other

machine learning algorithms eventually took the focus in the machine learning

community. However, in the last decade, important advancements in training

deep networks, combined with the general increased computing power

available in society brought neural networks back to prominence. Across

computational research genres, but specifically in the image and audio

research, deep neural nets have recently shown promising results. One of the

biggest problems with deep neural networks is that the derivatives that are

back propagated during supervised training become extremely weak so as to

be of minimal effectiveness by the time they reach the beginning of the

network. In particular, it was shown that the use of greedy layer-wise training

is what brought the neural network back to prominence. This initializes the

data in an unsupervised fashion, one layer at a time, while freezing the weights

of the other layers. An unsupervised algorithm such as k-means or sparse

coding is typically used for this. Unsupervised initialization of the network

significantly improves the performance of neural networks.

- Self Organizing Maps (SOM) or Self Organizing Feature Maps (SOFM) are a

type of Artificial Neural Networks, invented by professor Kohonen. It uses a

data visualization technique to reduce the dimensions of data through the use

of self-organizing neural networks. The problem that data visualization

attempts to solve is that humans simply cannot visualize high dimensional data,

so techniques are created to help us understand this high dimensional data.

The way SOMs go about reducing dimensions is by producing a map of

usually one or two dimensions, which plot the similarities of the data by

grouping similar data items together. So SOMs accomplish two things, they

reduce dimensions and display similarities.

- K-means Clustering relies on a single underlying assumption for anomaly

detection: normal data records occur in dense neighbourhoods, while

anomalies occur far from their closest neighbours. The major consideration for

nearest neighbour-based approaches is that an eligible distance or similarity

metric must be used for comparing data records. For some datasets this is

24

simple; for example, continuous features can be evaluated with the classical

K-means algorithm that uses Euclidean distance. For multiple features, each

feature is compared individually and then aggregated. Using K-means

clustering is usually straightforward and only requires the correct similarity

metric. The definition of the similarity metric is thus one of the major

drawbacks for K-means based approaches. In many cases, including the audio

sensor data scenario, it is difficult to determine a suitable similarity metric for

the aggregation and variety of unstructured, semi-structured, and structured

data. The performance of the K-means algorithm relies heavily on this

similarity metric, and thus when it is difficult to determine the distance

measure, the technique performs poorly.

- Spherical k-Means [17] is a slight variation on the traditional K-Means

algorithm, where the centroids are constrained to the unit sphere at each

update step. This has the function of using the cosine distance for similarity to

points in the input space instead of the Euclidean distance typically used in

traditional k-Means.

- The intelligent K-means or iK-means algorithm, designed by Mirkin in 2005,

addresses another drawback of the standard K-means. The standard K-means

algorithm needs the number of clusters K as an input parameter, while this is

actually a variable that is sought after. Predefining the number of clusters K is

solved by the iK-means algorithm, based on the following principle: the

farther a point is from the centroid, the more interesting it becomes. iK-means

uses the basic ideas of Principal Component Analysis (PCA) and selects those

points farthest from the centroid that correspond to the maximum data scatter.

These Anomalous Patterns (AP) are formed as explained below:

1. 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒 𝑐𝑒𝑛𝑡𝑟𝑒 𝑜𝑓 𝑔𝑟𝑎𝑣𝑖𝑡𝑦 𝑜𝑓 𝑔𝑖𝑣𝑒𝑛 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡𝑠 𝑐𝑔.

2. 𝒓𝒆𝒑𝒆𝒂𝒕 3. 𝑐𝑟𝑒𝑎𝑡𝑒 𝑎 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑 𝑐 𝑡ℎ𝑒 𝑓𝑎𝑟𝑡ℎ𝑒𝑠𝑡 𝑓𝑟𝑜𝑚 𝑐𝑔.

4. 𝑐𝑟𝑒𝑎𝑡𝑒 𝑎 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑐𝑖𝑡𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠 𝑡ℎ𝑎𝑡 𝑎𝑟𝑒 𝑐𝑙𝑜𝑠𝑒𝑟 𝑡𝑜 𝑐 𝑐𝑜𝑚𝑝𝑎𝑟𝑒𝑑 𝑡𝑜 𝑐𝑔.

5. 𝑢𝑝𝑑𝑎𝑡𝑒 𝑡ℎ𝑒 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑 𝑜𝑓 𝑠𝑖𝑡𝑒𝑟 𝑎𝑠 𝑠𝑔.

6. 𝑠𝑒𝑡 𝑐𝑔 = 𝑠𝑔.

7. 𝑑𝑖𝑠𝑐𝑎𝑟𝑑 𝑠𝑚𝑎𝑙𝑙 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠 𝑢𝑠𝑖𝑛𝑔 𝑎 𝑝𝑟𝑒𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑑 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑. 8. 𝒖𝒏𝒕𝒊𝒍 𝑠𝑡𝑜𝑝𝑝𝑖𝑛𝑔 𝑐𝑟𝑖𝑡𝑒𝑟𝑖𝑜𝑛 𝑖𝑠 𝑚𝑒𝑡.

25

At the end of the iK-means, only the good centroids will be left, as small

anomalous pattern clusters are discarded.

- Sparse coding is based on the fact that sparse feature vectors contain mostly

values of zero and one or a few non-zero values. Although these feature

vectors can be classified by traditional machine learning algorithms, there are

various recently developed algorithms that explicitly take advantage of the

sparse nature of the data, leading to massive speedups in time, as well as

improved performance. Because of their speed, these algorithms perform well

on very large collections of data such as audio big data.

- Non-negative Matrix Factorization (NMF) is an algorithm for describing the

data as the product of a set of bases and a set of activations, both of which are

non-negative. It is useful for finding parts based decompositions of data. Since

all the components are non-negative, each basis contributes only additively to

the whole. This promotes a solution in which high-energy foreground events

and constant low-energy background energy may be described by different

bases, and therefore separated in the feature presentation. Most applications of

NMF to audio processing decompose spectral magnitude frames (columns of a

spectrogram), and have each NMF bases consist of a single short time frame.

Since we are interested in learning bases that correspond to entire events, we

use the convolutive information of NMF. In this version, bases consist of

spectro-temporal patches of a number of spectral frames stacked together. The

pattern described by each frame is then activated as a whole to contribute to

the reconstruction of the data. Additionally we would like to ensure some level

of sparsity in the activations of these bases. This is in order to encourage the

bases to learn more foreground event patterns and fewer patterns that mimic

the background, which would be activated non-sparsely over large segments

of the data. This NMF algorithm allows us both to locate transients in time,

and to build a dictionary of event-patch code words, within a single

optimization framework, avoiding the separate transient detection and patch

clustering of our earlier approach.

26

- Principal Component Analysis or PCA whitening [19] does what its name says,

it finds the principle components of data. It is often useful to measure data in

terms of its principle components rather than on the standard x-y axis. The

principle components are the directions where there is the most variance,

where the data is most spread out. A set of data points is deconstructed into its

pairs of eigenvectors and eigenvalues. An eigenvector is a direction in which

the highest variances occur, while an eigenvalue is a number telling you how

much variance there is in the data in that direction. The eigenvector with the

highest eigenvalue is therefore the principal component. Instead of ZCA

whitening, PCA whitening serves to further decorrelate the inputs from each

other. This step can significantly help the quality of the input representation,

and serves to decorrelate the input representation.

2.4. Related work

As stated before, algorithms are used for feature extraction in the form of clustering,

as well as for classification purposes. Furthermore, some algorithms generally stated

as supervised algorithms can be used in an unsupervised matter. For the division of

related work, the main algorithm and the way it is used, i.e.: supervised or

unsupervised, matters. Feature extraction in the form of data exploration is thus

considered unsupervised. Conclusions will be made based on their effectiveness and

application purposes.

2.4.1. Supervised

- Ntalampiras, Potamitis and Fakotakis [20] describe acoustic surveillance of

hazardous situations. They make use of the MFCC's with additional low-level

features, such as fundamental frequency and the audio spectrum flatness,

based on the MPEG-7 standard. Gaussian Mixture Models are used for

clustering and Hidden Markov Models for classification, using labelled data of

explosions, gunshots and screaming in a subway as training data. First, a

GMM of 19 models is constructed after which a HMM should classify each

atypical situation. Screaming has been detected perfectly, gunshots with 93%

accuracy and explosions with only 86%. This research confirms that noisy

27

sounds are very difficult to classify when based on classic low-level features

and the need for a new approach is prominent.

- Cotton, Courtenay et al [21] compare spectral features to spectro-temporal

features for acoustic event detection, applied on soundtracks. The sample rate

is 12kHz. For the spectro-temporal features, they made use of convolutive non-

negative Matrix factorization (NMF) with the goal to separate event features

(activations) from constant background (basis patches). Each patch consists

out of 32 frames, and 20 basis patches are distinguished. The system seeks per

event type the combination of the basis patch with the activation patch that

contributes the most energy. For the activation patch assignment, a sliding

window of 1s with hops of 250ms is used. From this window, the log of the

maximum of each activation dimension is derived and normalized so that each

basis has a maximum activation of 1 over the entire dataset. The parallel

approach of comparison contains low-level features, more specifically 25

MFCC's. The frame spacing for the low-level features is much smaller, only

10ms instead of 250ms. The features of both techniques are classified making

use of Hidden Markov Models. The observation matrix is trained with

assumption that each event class can be modelled by a simple Gaussian

distribution. A transition matrix is build for the stream of labels. To conclude,

the MFCC's outperform the features found by NMF for the original sound

track data set. With the addition of noise, the accuracy of MFCC decreases

significantly while NMF remains stable.

- Radhakrishnan, Divakaran and Smaragdis [22] explain the use of Gaussian

Mixture Models for the analysis of audio for surveillance applications. Usual

background examples serve as labelled input data thus this research can

actually be classified as 'semi-supervised'. A GMM models this usual

background and the likelihood to classify new arriving data under the

background model is used to flag suspicious events. In the absence of a

suspicious event, the GMM is incrementally upgraded. Thus in case of an

inlier, the model is updated while an outlier can have two different outcomes;

false alarm or a potential risk. The sampling rate is 125Hz and feature vectors

consist out of 12 MFCC's, which are to be modelled by a GMM. The penalty

28

technique used to minimize the expression is the Minimum Description

Length (MDL). This optimizes the GMM in terms of complexity: as less as

possible factors (classes) without too much loss of dimension. Such a GMM is

made for each sound class (such as 'rush hour' or 'night'). Furthermore,

possible anomalies are investigated in a temporal way: The sequence of

classified normalities and anomalies is smoothed to filter out sole

instantaneous anomalies, as they are probably noise.

2.4.2. Unsupervised

- Hao et al [23] describe a parameter free audio-motif discovery in large data

archives. It claims to be parameter free but the one and only significant

parameter is the size of the events that are sought after. It assumes that this

frame-length is known, for which it cannot fully be seen as unsupervised

learning. A misunderstanding of the window length drastically changes the

results as no overlap is used. Their research is thus useful for the recognition

of events that very much look alike. It randomly searches for windows and

uses the highest count of non-trivial matches as a distance measure. The

system quits when probabilities of finding a better window decrease below a

certain threshold.

- Justin Salomon and Juan Pablo Bello [17] introduce a data exploration method

for feature extraction. Starting from the Mel spectrogram, the technique is

named scattering transform. The sampling rate is 44kHz and the time window

is 370ms, which means that 16280 frames are scattered. It is phase invariant,

just as the system of this thesis. It applies a wavelet filter bank onto a signal,

hierarchically ordered. The first order is octave, existing of 8 divisions and

thus comparable to the similar sized Mel spectrogram. For each frequency

octave band, the high frequency amplitude modulations are captured by the

second order coefficients. Higher order coefficients can be calculated but most

of the signals energy is captured by the first and second order. Clustering for

classification is done using spherical K-means. Unlike the traditional K-means

clustering, the centroids must lie on the unit sphere.

29

- Another research of Salomon and Bello [26] is 'Unsupervised feature learning

for urban sound classification.' It also starts with the Mel spectrogram and

applies PCA whitening onto them. The feature vectors are downsampled by

taking the mean and variance of all those captured by a certain time window.

The K-means algorithm clusters these averaged feature vectors and builds a

codebook with the K-means as words. A Random Forest Classifier is used for

the classification process but is not further described here. The interesting part

of this paper is the gathering of a certain time window and the K-means

clustering technique to construct the codebook.

- Gomes and Batista [24] also make use of motifs, but this time specifically

applied on urban sounds. To transform the raw data into useful information, it

makes use of SAX, which is comparable with the Fourier Transform but needs

less space. Each clustered spectrum is represented as a letter from the alphabet.

The bigger the alphabet, the higher the resolution. A series of letters is now

obtained, which is divided into segments of equal length, using the Piecewise

Aggregation Approximation (PAA) algorithm. Each segment gets assigned

with the average number, together with the amplitude. Subsequences are now

compared and top similar subsequences are called a motif. When they look

exactly the same, a higher resolution is applied to see if they actually are the

same. The biggest problem in this approach is again the predefined length of

the segments.

- Lee, Largmann, Pham and Ng [25] used Convolutional Deep Belief Networks

for audio classification in an unsupervised manner. The deep belief network is

a probabilistic model composed of one visible layer and many hidden layers.

Each hidden layer unit learns a statistical relationship between the units in the

lower layer; the higher layer representations thus tend to become more

complex. The hidden layers are trained once at a time, bottom up. In this paper,

the convolutional deep belief networks are applied on unlabelled auditory data

such as speech and music. The contribution for this thesis is thus rather small,

but it is interesting as it is truly unsupervised and a successful alternative for

baseline features such as the MFCC, as the learned features correspond to

phones and phonemes.

30

- Cai, Lu and Hanjalic [27] developed unsupervised content discovery in

composite audio. This research can be divided into two parts; audio elements

discovery and spotting key elements. Spectral features together with the

MFCC's add up to 29 dimensions for the feature vectors. Again, windows

gather different feature vectors and the mean and standard deviation are now

the new features. The window length is 1 second and the overlap 0,5s. The K-

means algorithm is used to cluster these mean and standard deviation feature

vectors. After clustering, a time series can be generated and smoothed to

eliminate anomalies caused by error or noise. Key features can be spotted by

their occurrence frequency. An analogy to text is made and assumes key

features shorter in length and less common. One unifying importance score for

key elements is obtained. The first K elements are key elements. Now that

key elements are distinguished from background, a temporal research takes

place to detect which key elements occur together and accompanied by which

background. In other words, scenes are detected. If the affinity (based on a

threshold) between two key elements is low, a new scene starts, otherwise it is

seen as one. Scenes are again being clustered with K-means, using the

Bayesian Information Criterion (AIC) as complexity penalty algorithm.

2.4.3. Conclusions

The main conclusion that can be made from related work is that research depends

either on the extraction of classically known low-level features, or there is labelled

data available so features can be derived by data exploration. The major part of

research even combines both field knowledge for feature extraction and supervised

learning for classification. It lays in men's nature to apply knowledge rather than to

dive into the unknown and furthermore, it performs quite accurately for the fields that

are most attractive and thus received most attention: speech and music. Ntalampiras et

al [20] for example rely on MFCC's and have labelled data available. Their evident

conclusion is that it only works accurately for speech involved samples.

Radhakrishnan et al [22] also rely on MFCC's for feature extraction and have samples

of 'normal' audio samples, in the search for anomalies. This research already belongs

to the semi-supervised category because it is now unknown what is looked after.

Obviously, as this thesis treats urban sound signals, the focus of related work is not

31

onto low-level features combined with supervised learning thus the listing above is

not representative for the focus of audio analysis in general. Data exploration for

feature extraction, as well as unsupervised classification, is still in its infancy. A

combination of both seems even inexistent and this is why this thesis is unique,

because it combines data exploration with unsupervised classification. The reason is

very simple, standard low-level features have been proven to be ineffective for

environmental audio signals, there is little to no additional information about what

types of features could be significant, there is no labelled data available and there is

no possibility to create them by human supervising.

Interesting work for feature extraction is the idea of scattering, done by Salomon and

Bello, applied in two of their papers [17] [26]. Although they start from the Mel

spectrum, it is an interesting start point to discover features. Also Cai, Lu et al [27]

scatter the feature vectors and apply statistical parameters. For this thesis, instead of

scattering low-level features, the spectrograms could be scattered and described by a

more complex statistical model instead of basic parameters. Classification inspiration

comes from the same papers [17][26][27] for their use of K-means, but especially the

use of GMM's is attractive for both feature extraction as well as classification,

inspired by Ntalampiras et al [20].

32

3. Proposed Model New

Approach: GMM.

3.1. Concept

The approach of this thesis is data exploration for feature extraction, combined with

unsupervised learning for classification. In chapter 4, a more classical research is

applied, starting from spectral features and using similar techniques for classification.

The results are compared and their similarities and dissimilarities discussed in chapter

5 and 6.

3.2. Data Preparation

3.2.1. Missing Data

The data is provided in gunzip files per hour of time, containing roughly eight

spectrograms a second. There is a little deviation on the inter arrival time of data but

that is negligible. In order to detect missing data, and let them be part of the system,

empty files have been filled with zero values and included in the research. The main

idea behind it is that malfunctioning or defection of a microphone can become part of

the research. When malfunctioning and defects are integrated, trends in those can

also be revealed and alerted when a shift occurs. The minute in which the microphone

fails or resumes will anyhow get detected as a point anomaly because it partially

exists of data and partially of zero values, which otherwise never occurs. According to

contextual anomalies the zero values can also be useful to detect at which times it is

rather unusual that microphones fail. For example, the system might find a failure

quite normal, unless it always happens in wintertime and all of the sudden it fails

during summer. This might help discovering causes of breakdown or other

disturbances.

3.2.2. Reorganization

For the ease of use, the original data is reorganized into a file per month with two

cells per hour, one containing the data matrix and one containing the time details.

33

3.3. Programming Language

Matlab is chosen for its efficiency and ready-made functions. There is also a lot of

external information available, which is helpful when addressing a niche field.

3.3.1. Efficiency

Throughout this project, there has been taken great care of efficiency in code writing.

Basic rules such as pre-allocation, the use of functions, clearing loops from

independent variables, the use of disk variables (matfile) to access data without

loading the total workspace, and many other basic rules.

The first stage of this project was running on a MacBook air 11" and the runtime was

consequently a restrictive element. Matlab includes a Parallel Computing Toolbox,

which enables to pool the processor cores. 'For' loops are to be replaced with 'parfor'

loops and must meet some specific demands in order to be effective. The most

difficult demand to meet is that each iteration of the loop must be independent of any

other iteration. Nested loops are thus very difficult to implement, as everything must

be written off to a cell from which it can be looked up in another iteration. Parfor is

truly effective in runtime as it doubles the execution speed. On the other hand, the

implementation is very time consuming and an intensive thinking process thus it is

only used for very computational intensive tasks in this thesis.

Another technique is to cluster different servers, which could be applied with the help

of a friends MacBook Pro 15". A cluster connects different processors and uses their

cores in parallel. Unfortunately, the used version of Matlab does not have access to

the Distributed Computing Server toolbox. However, there exists a system that can

get around the Matlab toolbox and operates just as successful. It is called Matlab MPI,

the successor of pMatlab, and is developed by the MIT. The difficulty is the setup, as

it needs a paswordless ssh connection between the two processors, requiring to bypass

all security systems that are automatically implemented. The efficiency of Matlab

MPI again depends on the complexity of the task, as the setup of the cluster and the

division of the tasks can take more than an hour. For very computational intensive

scripts, it does compensate the setup time and is thus used accordingly.

34

3.4. Feature Selection

3.4.1. Failed try-out

An initial attempt was to create low-level spectral features by generating a Gaussian

Mixture Model per spectrogram. There are three major issues with this approach: First,

to generate a GMM, scatter points of a certain distribution are needed, as shown in

Figure 6.

Figure 6: 1-Dimensional GMM

Instead, the data points of the spectrogram are actually weighted data points instead of

scatter points, shown in Figure 7. Matlab is not suited for curve fitting, it needs the

scattered data points, which involves significant data expansion.

35

Figure 7: 1-Dimensional Gaussian curve fitting

Second, in order to know the optimal amount of components to describe the

spectrogram, the number of components is initially set high and descents, applying the

AIC penalty algorithm to determine the best amount of components. The Aikaike

Information Criterion or AIC is a measure of quality for a given set of data. Given a

collection of models to describe the data, the AIC estimates the quality of each model,

relative to its other models. Let L be the maximum value of the likelihood function for

the model and let k be the number of estimated parameters of the model. Then the

AIC value of the model is the following:

𝐴𝐼𝐶 = 2𝑘 − 2𝑙𝑛(𝐿)

Given a set of candidate models for the dataset, the preferred model is the one with

the lowest AIC value. Because the now simulated scattered data occurs in bins, it is

ill-varianced and gives an error in the attempt to extract too many normal components.

This is a problem because it occurs as soon as the maximum amount of components is

set slightly higher than the optimal, which is necessary to run the AIC technique.

36

To solve this problem, another GMM function available on the Internet has been

applied that bypass the ill-variance error notification. It is the EM_GM function

constructed by Patrick P.C. Tsui, member of the research group of the Department of

Electrical and Computer Engineering at the University of Waterloo. This function,

which also uses the AIC criterion, might not encounter any hindrances to compute the

GMM's, but it reveals that each histogram needs another number of Gaussian

components to accurately fit the distribution, meaning that the number of components

cannot be hard coded, which is computationally very expensive.

Thirdly, with all the computational effort and data reconstruction, the result is not as

excited and precise as was hoped for, as shown in Matlab graph 1.

Matlab graph 1: GMM per spectrogram

For the above described reasons, i.e. the data expansion, the ill-variance and the poor

fit, the idea is abandoned.

3.4.2. Gaussian Mixture Model per minute

37

Creating low-level spectral features by a descriptive model or curve fitting, is proven

to be expensive, both in computation time and in data size. A better approach is to

describe spectral and temporal features in one model. By scattering different

spectrograms in one single graph, relations between data point in frequency as well as

in time, can be captured. This significantly reduces data in size and furthermore, it

makes use of the known fact that data points in vicinity of time or frequency, do not

differ a lot from each other. They can thus be described by a single entity.

Different time windows have been observed and a 1-minute time frame gives the most

interesting impression. When looking at Matlab graph 2, it seems that different events

occur, concentrated around a mean and thus approaching normal distributions. A

Gaussian Mixture Model could describe the total minute possibly in a very accurate

way and furthermore distinguish each of these seemingly independent events.

Matlab graph 2: GMM on a 1 minute spectrogram

To give a better impression of the GMM fit, the following Matlab graph 3 shows the

3D presentation of the GMM, which looks promising as it simulates the original

scatter points quite precisely.

38

Matlab graph 3: GMM of one scattered minute

In order to save a lot of computational power, hard coding the number of Gaussian

components is tested for feasibility. Hundred random minutes have been scattered and

their optimal number of components has been determined with the AIC criterion. Of

those 100 random minutes, 94 of them point five components as the optimal number

to describe, which gives the possibility to hard code the number of components to five,

saving a lot of runtime.

The result is that each minute, originally described by 480 spectrograms of each 31

digits (14880 in total) is now described by a GMM of five components, each

consisting of five digits (25 in total), good for a data reduction of 595 times. Of the

five parameters describing each Gaussian component, two are allocated to the mean

and three to the covariance:

The mean value is defined by its coordinates, one of the frequency axis and one of the

amplitude axis.

39

The variance of a Gaussian distribution in two-dimensional space cannot be

characterized fully by a single number, nor do the variances in the x and y directions

contain all of the necessary information; a 2x2 matrix is necessary to fully

characterize the two-dimensional variation, and is therefore called the covariance.

Because the covariance of a random variable with itself is simply that random

variable's variance, each element on the principal diagonal of the covariance matrix is

the variance of one of the random variables. The covariance matrix of each Gaussian

component is thus a 2x2 matrix and can be described by three digits; a, b and c.

[𝑎 𝑏𝑏 𝑐

] = (𝑐𝑜)𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑚𝑎𝑡𝑟𝑖𝑥 𝑜𝑓 𝑎 2𝐷 𝑔𝑎𝑢𝑠𝑠𝑖𝑎𝑛.

𝑎 =𝑐𝑜𝑠2𝜃

2𝜎𝑥2

+𝑠𝑖𝑛2𝜃

2𝜎𝑦2

𝑏 = −𝑠𝑖𝑛2𝜃

4𝜎𝑥2

+𝑠𝑖𝑛2𝜃

4𝜎𝑦2

𝑐 =𝑠𝑖𝑛2𝜃

2𝜎𝑥2

+𝑐𝑜𝑠2𝜃

2𝜎𝑦2

3.5. Classification

The obtained Gaussian parameters or features are the key ingredients for the feature

vectors, the input for the classifiers. Different dimensions and different classification

algorithms will be used, depending on the type of anomaly that is sought after. In

chapter five, the results will be compared mutually and with the results of the classical

approach with spectral features.

3.5.1. Point anomaly: Unusual event

A point anomaly in this project is something unusual on the time scale of a minute.

That can be either one unusual Gaussian component (unusual event), or an uncommon

combination of them (unusual minute). First discussed is a single unusual Gaussian

component, assumed to be an underlying unusual audio event.

2D GMM Clustering

To understand how events behave in time, how many types of events exist or which

distribution they follow, all Gaussian components or 'events' of the entire dataset have

40

to be observed. Each event is consisting of five digits but to give an intuitively

understandable visualization, only the first two digits, which are the mean values, are

plotted in Error! Reference source not found.. Note that the missing minutes, which

were filled with zero values, have also been forced into five Gaussian components and

their mean values are thus plotted at zero amplitude.

Matlab graph 4: Mean values of all Gaussian components

This graph is so densely filled that a third dimension, in the form of a histogram, is

needed to reveal the distribution of the scatter points. As done in Matlab graph 5, it

shows that at high frequency values, the variance of the distribution is much lower,

which was not at all visible in Matlab graph 4. At high frequencies, there is expected

to have almost always similar events, while medium to medium-low frequencies seem

to include many different possible events. Low frequencies again tend to aggregate

around only one main type of event.

41

Matlab graph 5: Histogram of mean values of all Gaussian components

The histogram suggests again a combination of Gaussian distributions. A Gaussian

Mixture Model of these mean values is generated and shown in both 2D and 3D in

Matlab graph 6 and Matlab graph 7. The AIC criterion assigns 15 clusters.

42

Matlab graph 6: GMM of mean values of all Gaussian components (2D)

43

Matlab graph 7: GMM of mean values of all Gaussian components (3D)

5D GMM Clustering

So far, the Gaussian components have only been clustered based on their mean values.

The same technique of GMM clustering will be applied on the five dimensions of

each Gaussian component. Visually, this is hard to represent. Therefore, after the

clustering process, the mean values of the centroids are plotted in Matlab graph 8 and

those can visually be compared to the 2D clustering. The AIC criterion assigns 57

clusters.

Matlab graph 8: 2D presentation of 5D GMM clustering

5D GMM Clustering (Standardized)

The input (feature vectors) of the previous 5D GMM Clustering, were the original

values of the mean and covariance. Two independent measures are thus combined in

one feature vector. Their scale differs thus they wont be treated equally important

when clustered. The features with the highest absolute range have more influence in

the clustering process then those with a smaller absolute range. Feature scaling

44

rescales all features to the same significance level. The two possible feature-scaling

techniques are standardization and normalization.

Standardization or Z-score normalization rescales the features so they have a normal

distribution with 𝜇 = 0 and 𝜎 = 1 . The standard scores, also called Z-scores, are

calculated as follows:

𝑧 =𝑥 − 𝜇

𝜎

The mean value 𝜇 and the standard deviation 𝜎 are calculated for the first two digits

or columns of all feature vectors, and for the last three digits or columns, not

separately for each digit of the feature vector, as the mean values are supposed to be

on one scale and the covariance’s on another.

An alternative approach is the Normalization or Min-Max scaling. In this approach,

the data is scaled to a fixed range, usually 0 to 1. The cost of having this bounded

range, is that the standard deviations are much smaller, which supresses the effect of

outliers and goes against the goal of the clustering process: the search for outliers. A

Min-Max scaling is typically done via the following equation:

𝑋𝑛𝑜𝑟𝑚 =𝑋 − 𝑋𝑚𝑖𝑛

𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛

Standardization or Z-score normalization is the best approach for this project. After

clustering, the data is scaled back to the original scale for the visual representation

and comparison. Again, a 2D plot of the resulting clusters is constructed, in Matlab

graph 9. The AIC criterion assigns 42 clusters.

45

Matlab graph 9: 2D presentation of 5D GMM clustering (standardized)

2D iK-means Clustering

A very popular technique for unsupervised classification is the K-means algorithm. It

is one of the simplest algorithms and works really well with large datasets, large

numbers of clusters and large feature vectors. The major drawback of the classical K-

means algorithm is that the number of clusters K, must be predefined. As described in

the algorithm section above, the intelligent K-means or iK-means solves this and will

be applied. It is difficult to have an understanding of what happens and how the

Gaussian components are clustered, because K-means only defines borderlines

between clusters with no indication of probability distribution inside each cluster.

Matlab graph 10 shows that the intelligent K-mean (2D) assigns 15 clusters.

46

Matlab graph 10: K-means defined clusters

Each cluster contains at least one value according to the algorithm, so no empty zones

are shown. It is rather unfortunate that every space of the graph must be part of a

cluster. Some clusters are thus almost empty and occupy big areas, in which data

points are easily far away from their centroid and thus seen as an anomaly, while they

might be close to the border of more occupied clusters and thus not far away from the

overall point of gravity. The details are discussed in the results chapter Error!

Reference source not found.. The results of the iK-means clustering are not used for

classification.

5D iK-means Clustering

For the same reasons as described in the section about 2D iK-means Clustering, 5D

iK-means clustering is not used for classification. The results of the found anomalies

and the explanation of their quality are discussed in Error! Reference source not

found..

Conclusion Clustering techniques

47

A detailed evaluation of the different techniques is done in chapter 5.1. The

conclusion is that 5D GMM clustering, not standardized, is the best cluster technique.

Therefore, 5D clustering (not standardized) will be used as a base for each type of

classification.

3.5.2. Point anomaly: Unusual minute

To detect unusual minutes, or an unusual combination of Gaussian components,

statistical methods are used, based on the previous clustering of each Gaussian

component. Each audio event is now reduced to its assigned cluster number, thus

point anomalies in the form of unusual events are 'erased' and do not influence this

classification. Consequently, each minute is described by 5 digits: the index numbers

of the clusters to which each component is assigned to. Two different approaches can

be distinguished, based on the mutual dependence assumption of the Gaussian

components. They are assumed to be either independent or dependent of each other.

The anomalies are calculated by joint correlation or joint probability respectively.

Joint Probability

This classification assumes that individual audio events or Gaussian components have

no relation to each other. For example, rain has nothing to do with how much traffic

there is at that moment. The joint probability per minute is calculated by multiplying

the frequency probability of each cluster that occurs in that minute. All those joint

probability values are assumed to be normal distributed and a certain confidence

interval serves as a threshold for 'unusual' minutes to be detected. In other words, if

the joint probability of a minute is very low, it is seen as an anomaly. This technique

is performed on the results of both 2D and 5D GMM clustering. For the results and

plots of these anomalies, there is referred to 5.2.1.

Joint Correlation

On the other hand, Gaussian components or audio events can be assumed correlated.

For example, when it rains heavily, you do not expect a lot of human voices in the

streets. The correlation matrix for these events is calculated and for each minute, the

correlation coefficient of every combination of clusters that occurs is summed up.

This one value per minute is called the joint correlation. Again, these values are

assumed to be normal distributed and with a certain defined threshold, anomalies are

48

detected. Very low joint correlations mean that many even negatively correlated

events occur at the same time. For the results of the unusual minutes based on joint

correlation, there is referred to 5.2.2.

3.5.3. Contextual anomaly

Contextual anomalies are normal events that occur in a strange context of time. Time

is this time the deciding factor for anomalies, not an outlying Gaussian component.

For this reason, all five Gaussian components are assigned to their cluster and

replaced by that centroid's mean parameters. The centroids used are generated by 5D

GMM clustering, not standardized. Each feature vector thus consists out of 12 digits,

five times two mean values and two digits that represent the hour of the day in a

continuous way. The time digits are defined as follows. Imagine a clock that is

divided in 24 hours instead of 12, as in Figure 8: 24-hour clock. One single pointer

can define the time of the day and its x and y coordinates describe that time in a

continuous manner. The continuity is important because for example, 23h is in reality

close to 00h, but mathematically those are the furthest away from each other and

would result into wrong results when clustered because clustering makes use of

Euclidean distance. As an example, 6h becomes [1,0], 12h becomes [-1,0], and so on...

The time is not discretized to hours but approaches continuity as also minutes and

seconds are encountered in the pointer.

Figure 8: 24-hour clock

Classification is now done by 12D GMM clustering, but this time, all data must be

standardized. It is quite obvious that the scale of the mean values differs from the

clock coordinates, as those axes are freely chosen. Standardization puts all these

49

values on the same scale, making sure that the time factor remains significant. For the

results of contextual anomalies is referred to chapter 5.3.

3.5.4. Anomaly Threshold Definition

Possible System Outcomes

In general, anomaly detectors are not perfect. Specifically, anomaly detectors

typically navigate a trade-off between two kinds of errors:

False Positives - Type 1 error: A type 1 error occurs when an anomaly detector

incorrectly rejects a benign input. A high false positive rate can significantly impair

the utility of the anomaly detector. Each false positive denies some part of the

functionality of the system to the user.

False Negatives - Type 2 error: A type 2 error occurs when an anomaly detector

incorrectly accepts a malicious input, leaving the system open to attack.

Making an anomaly detector more sensitive increases the false positive rate and

decreases the false negative rate (and vice-versa). Appropriately balancing these two

rates is therefore essential in obtaining an effective anomaly detector. Anomaly

detectors are typically tuned by running the detector on a representative set of inputs

to develop an intuitive understanding of how the anomaly detector will operate in

practice. Current techniques are ad-hoc, not guided by any theoretically well-founded

framework or analysis, and it therefore provides no guarantees on the effectiveness of

the anomaly detector and no guidance on how to effectively test the anomaly detector.

In this thesis, no representative set of inputs is available as a base to hard code a

threshold. Instead, the interaction of a human supervisor and a constantly adapting

threshold, should improve the accuracy of the system over time with the goal that the

supervisor only 'confirms' true positives.

Human defined anomalies

The first step in defining the anomaly threshold is a clear understanding of the

anomalies themselves. Different types of anomalies can be distinguished:

Binary anomalies are easy to understand. The anomaly that is sought after is well

known and unanimous agreed anomalous, for example the occurrence of cancer in a

50

person, or the presence of an oil reservoir at a geographic location. According to the

consequences of both false positives and false negatives, their desired probability rate

can be defined and its associated threshold for anomaly detection. For the detection of

cancer for example, the consequence of a false assignment is enormous, the life of a

person depends on it and a false negative could be fatal. On the other hand, a false

positive is more of a moral mistake, but wont affect the person’s physical health.

Therefore it is likely to have a higher false positive rate and a small false negative rate.

Another example is the detection of oil fields. You would rather want to be very sure

that there is in fact an oil field before huge investments on the drilling installations are

made. In this case, false positives should be low compared to false negatives. A false

positive assumes an oil field discovery where there isn't, which implies a huge money

loss.

Non-binary anomalies are less easy to interpret. The anomaly that is sought after is

not exactly defined, but subject to the opinion of analysers and supervisors. It is

therefore difficult to measure the false positive and false negative rates and

consequently to define the preferred ratio. A good example is any type of anomaly

detection in high dimensional spaces. Anomalies are now anything that differs from

normal behaviour, whatever that might be. It is unclear in advance which anomalies

are likely to be harmful and which anomalies are totally acceptable. Only by

supervising and assigning the different types of anomalies, a more consistent and

general statement can be made according to the false positive and false negative ratio.

For urban environments, harmful or suspicious events are assumed to be very rare.

According to the United Nations Office on Drugs and Crime [28], one murder and

113 rape cases per 100 000 habitants are reported in Paris, good for 2508 cases yearly.

There are no direct numbers available for pickpockets, but different unofficial reports

point Barcelona as the city in the world with the highest pickpocket rate of 6000 a

year. Paris is stated as the fifth highest pickpocket city but no specific numbers are

provided so Paris is for the example assumed to be as bad as Barcelona. Furthermore,

Not all crime cases are reported so in total, roughly 30000 cases a year are estimated.

The area of a single random microphone placed in an urban environment is around

2500 m2, hindrances taken into account. In other words, each microphone could report

0.5 suspicious events a year. These thoughts and calculations are far from accurate

51

and totally useless from a scientific point of view, but they are interesting to keep in

mind when defining a threshold for anomaly detection. They indicate that false

negatives in the sense of crime are very unlikely to occur (assuming that the system's

recognition abilities are on point). However, crime and violence are far from the only

anomalies of interest. The false positives will be crucial in the decision of the final

anomaly threshold.

For every type of anomaly sought after, feature vectors get assigned with an anomaly

measure. Whether it is the probability density function for single components or the

joint probability or joint correlation for combined occurrences, each clustering

technique has its anomaly measure. The feature vectors or input entities of the entire

dataset are ordered by that measure, in ascending order. In other words, the first in the

list is the 'strangest' case according to the computer, and along the list, cases get more

'normal'. The difficulty in this project is that the anomalies cannot be transformed

back to their original audio waves, which excludes the more intuitive acoustical

supervision. Instead, the corresponding spectrograms are plotted and subdue to only

visual interpretation.

Threshold

At first, the software system defines a list of positives, defined by an initial threshold.

That list contains both true positives and false positives. The only way to distinguish

the true positives from the false positives is by human supervision. Human

supervision is time consuming and expensive, thus the list to be supervised must be

limited, yet contain as much true positives as possible. So where to set the initial

threshold? This trade-off can be translated in function of one single measure, cost.

Hereby, a very important assumption must be made: the human supervisor is always

right, his or her verdict is assumed to be correct. For the optimal cost analysis, the

parameters to set, or the questions to be answered, are the following:

- What is the cost of a false negative?

First of all, remember that a false negative means that an anomaly occurred

but did not get detected. This means that the anomaly never got notified to the

supervisor, the system simply put its threshold too narrow to detect it. It means

a failure of the system and is in fact the worst possible outcome that must be

52

avoided by all means. The cost of an undetected anomaly can be a direct cost,

such as a damage claim for a harmed person or object due to the lack of

intervention. It can also be indirect, think about the loss of confidence in the

system and thus a future loss in sales. The total cost for a false negative can be

written as the sum of the direct cost and indirect costs: 𝐶𝑓𝑛 = 𝐶𝑑 + 𝐶𝑖.

- What is the cost of a true negative?

A true negative is a neutral outcome. It means that no harmful situation

occurred and also no supervision had to take place. This cost could be

interpreted as a negative cost because it provides only benefits. A revenue and

profit study is however beyond the scope of this thesis and therefore the cost

of a true negative is set to zero and does not contribute to the equation.

- What is the cost of a false positive?

Each positive, whether false or true, is subject to human supervision. A false

positive is a harmless situation detected as harmful. The operation cost is thus

a full cost because it is totally unnecessary. The cost of a false positive can be

written as: 𝐶𝑓𝑝 = 𝐶𝑜𝑝.

- What is the cost of a true positive?

A true positive is morally the least desirable situation, however, it is exactly

the purpose of the software development and a confirmation of the accuracy of

the system. Operation costs occurred, but they are compensated by direct and

indirect revenues. Direct revenues are for example more expensive policies in

environments that have an increased exposure to danger, while indirect

revenues can be the growth of the credibility of the system. The net cost is

negative because customers are willing to pay for this outcome. It can be

written as: 𝑅𝑡𝑝 = 𝐶𝑜𝑝 − 𝑅𝑑 − 𝑅𝑖.

It is easy to see that true positives and true negatives are preferred outcomes, while

false positives and negatives involve only costs. Basically, a failure of the system is

bad but unfortunately impossible to exclude. The two errors are negatively correlated,

thus a trade-off must be made. False negatives can involve damage to humans, so

their cost gets weighted by a factor H (0<H<1) depending on the environment, while

53

false positives only bring financial costs. To define the threshold, the following

equation is used:

𝑚𝑖𝑛 𝐹𝑁

𝐻∗ 𝐶𝑓𝑛 + 𝐹𝑃 ∗ 𝐶𝑓𝑝 + 𝑇𝑃 ∗ 𝑅𝑡𝑝

or

𝑚𝑖𝑛 𝑓𝐹𝑁(𝑡)

𝐻∗ 𝐶𝑓𝑛 + 𝑓𝐹𝑃(𝑡) ∗ 𝐶𝑓𝑝 + 𝑓𝑇𝑃(𝑡) ∗ 𝑅𝑡𝑝

where:

𝐹𝑁: number of False Negatives

𝐹𝑃: number of False Positives

𝑇𝑃: number of True positives

𝐻: factor for moral damage to human beings

𝐶𝑓𝑛: Cost per false negative

𝐶𝑓𝑝: Cost per false negative

𝑅𝑡𝑝: Revenue per True positive

Note that FN, FP and TP are actually functions of t: threshold. Whereby 𝑓𝐹𝑃(𝑡) is

exponential and 𝑓𝑇𝑃(𝑡) and 𝑓𝐹𝑁(𝑡) are logarithmic. This equation however, can only

be solved when all parameters, except for the threshold, are known. In reality, it

almost never happens that they are known beforehand, thus different thresholds must

be set to learn the functions.

Machine Learning Threshold

The previous equation is stationary, that means that 𝑓𝐹𝑃(𝑡), 𝑓𝑇𝑃(𝑡) and 𝑓𝐹𝑁(𝑡) do not

change over time. The dependency on a human supervisor does not change. In fact, in

the same matter as the system in this thesis learns what normal Gaussian components

are, the system can learn from the supervisor's opinion and study the characteristics of

the true positives and false positives. The next time that a positive comes in, the

system applies a second classification based on the listed true positives and false

positives. Only after that second classification, the supervisor gets notified. This

significantly decreases the need of the supervisor and increases the accuracy of the

54

system. The functions now evolve to their best performance and the equation can be

applied for an optimal anomaly recognition rate.

55

4. Classical approach: Spectral

Features

A more classical approach based on spectral features is performed to compare end

results and increase insight in the main approach.

4.1. Feature Extraction

4.1.1. Spectral Features

Each spectrogram is described by nine characteristics, low-level spectral features. The

nine features are; Spectral Energy, Spectral Centroid, Spectral Spread, Spectral Roll-

off Point, Spectral Entropy, Spectral Kurtosis or flatness, Spectral Skewness, Spectral

Slope and Noisiness. The calculation of each one of them can be found in A.1 or more

specifically in the Matlab code A.2.8. The exact choice of those features is trivial

because Principle Component Analysis will decorrelate them and then recombine

them to valuable features.

4.1.2. Principal Component Analysis (PCA)

In general, Principal Component Analysis or PCA allows obtaining a linear subspace

of the original data. It uses an orthogonal transformation to convert a set of

observations of possibly correlated variables (features) into a set of values of linearly

uncorrelated variables called principal components. Principle components reveal the

underlying structure of the data, they are the directions where there is the most

variance, also called eigenvectors. However, variance is an absolute number, not a

relative one. This means that the variance of some features will be much larger than

others, just because of the scale of calculated feature. The largest variance, and thus

the largest eigenvector, will implicitly be defined by the first feature if the data is not

standardized. To avoid this scale-dependent nature of PCA, the data is centered and

scaled (standardized) by subtracting each feature with its mean and dividing this

subtraction by its standard deviation.

Furthermore, PCA assumes that the underlying components or features are normal

distributed. If they are normal distributed, then PCA actually acts as an independent

https://en.wikipedia.org/wiki/Orthogonal_transformation

https://en.wikipedia.org/wiki/Correlation_and_dependence

https://en.wikipedia.org/wiki/Correlation_and_dependence

https://en.wikipedia.org/wiki/Independent_component_analysis

56

component analysis since uncorrelated Gaussian variables are statistically

independent. However, if the underlying components are not normally distributed,

PCA merely generates decorrelated variables, which are not necessarily statistically

independent. In this case, non-linear dimensionality reduction algorithms, such as

Independent Component Analysis (ICA), might be a better choice. The distribution of

each of the spectral features rejects the hypothesis of normal distribution in the one-

sample Kolmogorov-Smirnov test. However, an approach to normal would be

sufficient. The total size of the data are around 262 million spectrograms,

consequently the processing is slow. A sample is taken and its distribution plotted in a

histogram. The sample size depends on the following four factors; population size,

margin of error, confidence level, response distribution (or standard deviation). The

necessary sample size is then:

(𝑍 − 𝑠𝑐𝑜𝑟𝑒)2 ∗ 𝑆𝑡𝑑𝐷𝑒𝑣 ∗ (1 − 𝑆𝑡𝑑𝐷𝑒𝑣)/(𝑚𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟)

Note that the population size is not included in the equation because above 20000, the

population does not matter and the equation is therefore a simplified version. For a

95% confidence interval, 0.5 standard deviation and a margin of error of 5%, the

recommended sample size is 385. A margin is taken into account and a sample size of

1000 is used, randomly selected without replacement. The distribution of each of the

nine spectral features is shown below.

https://en.wikipedia.org/wiki/Independent_component_analysis

57

The major part of the features is close enough to normal distribution, which makes

PCA a valuable technique for feature reduction.

After standardizing each feature individually and applying the Principle Component

Analysis, the difficulty lies in determining the number of factors or components that

account for a large amount of the overall variance. To that end, two stopping rules

have been considered to determine when to stop adding factors:

Amount of explained variance: based on this, the chosen factors should explain 70%

of your total variance at least. To understand the meaning of “total variance” as it is

used in a principal component analysis, remember that the observed variables are

standardized, which means that each variable has a mean of zero and a variance of

one. The “total variance” in the data set is simply the sum of the variances of these

observed variables and each observed variable contributes one unit. This rule is not

very useful for this project as it is not clear if each variable has a useful contribution

to the data.

Kaiser's stopping rule/Eigenvalue one criterion: based on this criterion you choose

the components with eigenvalues higher than 1. As shown in Table 1, only two newly

composed components have an eigenvalue higher than one. However, two

components instinctively seem very few and the third value of 0,6135 is significant.

Therefore, three components are selected.

3,6044 1,4307 0,6135 0,2058 0,1016 0,0416 0,0147 0,0054 0,0002

Table 1. Eigenvalues of spectral features

4.2. Classification

4.2.1. Point anomaly

Only one type of anomalies is detected with the classical approach based on spectral

features. After classification, each spectrogram will be labelled as normal or abnormal.

The following step is to search in a temporal direction for the amount of abnormal

spectrograms in a certain timeframe. To be able to compare the results to the main

approach, a frame of one minute, or 480 spectrograms will be used. No overlap is

used again to simulate the minutes of the main approach. If the amount of abnormal

58

spectrograms in a certain frame exceeds a certain threshold, that frame or minute will

be classified as a point anomaly.

Gaussian Mixture Model

The feature vectors now consist out of three digits and form the input for

classification. The distribution of the new features should be Gaussian, if all

underlying features were normal distributed. As known, the underlying features are

not normal distributed so again their histogram is plotted out to get a closer view. In

case they would be perfectly normal distributed, a GMM with only three clusters

would be needed.

The histograms reveal that the new composed features are not perfectly normal

distributed. For this reason, the number of clusters is determined by the AIC criterion

and results in seven clusters.

Temporal Smoothing

The basic idea of smoothing is to filter out noise or random anomalies and only real

anomalies remain. Consider the following sequence of spectrograms whereby 'n'

stands for normal and 'a' for anomalous: n-n-n-n-n-n-n-n-n-n-a-n-n-n-n-n-n-n-n-n-n-...

It is obvious that the one anomaly in a sequence is due to noise and will get labelled

as normal. In this research, temporal smoothing is not used. Instead, a more simple

approach is used which counts the number of anomalies per minute. A threshold of a

certain amount of anomalies per minute will define whether the minute is anomalous

or not.

5. Results new approach

In 3.5.4 Anomaly Threshold Definition, there is stated that all feature vectors are

ordered and plotted in descending order of them being anomalous. Practically, it

would be ridiculous concerning to computational efforts and clearness for

59

interpretation plot all of the feature vectors. Only the 0.05 percentile most anomalous

is selected as a candidate anomaly. The exact amount of anomalies representing that

0.05 percentile is reported below for each anomaly type.

5.1. Unusual events

5.1.1. GMM 2D

Error! Reference source not found. visualizes the 0.05 percentile anomalous

Gaussian components, based on their mean values only. This 2D presentation is

intuitively very obvious, low probability density functions are seen as anomalous.

Matlab graph 11: mean values of 1257 anomalies defined by 2D GMM

Anomaly types

The underlying spectrogram of each of these anomalous Gaussian components is

plotted out in their occurring minute, starting with the most anomalous one. The

Gaussian components are also plotted to see if it gives a proper estimation of the

60

spectrogram and an accurate anomaly indication. Among the most anomalous

Gaussian components, four types can be distinguished;

Anomaly type 1: High amplitudes at high frequency values Matlab graph 12. The

Gaussian components fits the spectrogram quite accurately and is thus a correct

anomaly detection.

Matlab graph 12: Anomaly type 1: high values at high frequencies

Anomaly type 2: High amplitude spread at high frequency, shown in Matlab graph 13.

These peaks occur mostly at a single frequency band, but sometimes over several

adjacent frequency bands. They are described by a dedicated Gaussian component and

correctly reported anomalous.

61

Matlab graph 13: Anomaly type 2: High amplitude spread at high frequency

Anomaly type 3: High amplitudes at low frequencies, shown in Matlab graph 14.

These occur mostly over two frequency bands and gets a dedicated Gaussian

component assigned.

62

Matlab graph 14: High amplitudes at low frequencies

Anomaly type 4: Low total amplitude variability, shown in Matlab graph 15. It

frequently happens that one Gaussian component covers almost all frequency bands.

As visible in the figure below, that Gaussian component represents the spectrogram in

combination with the other Gaussians, but on itself, it does not fit the data points and

thus does not represent a certain underlying audio event. These kinds of minutes

probably need another number of Gaussian components for accurate data fitting. They

are indeed anomalous but they drop the subject of a replacing model for data

63

representation.

Matlab graph 15: Anomaly type 4: Low total amplitude variability

Model shortcomings

Anomaly type 4 states an important issue of the very fundamental basis of the model.

Each minute is different and might need another number of Gaussian components to

optimally describe it. While 95% of the minutes point to 5 components as the optimal

amount, the remaining 5% tends to another number. Evidently, these 5% are

anomalous in a way, therefore to our interest but described inaccurately. The

following figures show some of these anomalies.

Example 1: Event switch during the minute, shown in Matlab graph 16 and Matlab

graph 17. By restricting the amount of components to five, the model is forced to

'merge' different audio information into one component, reducing its accuracy.

64

Matlab graph 16: Event switch during minute

65

Matlab graph 17: Event switch during minute

Example 2: Inaccurate data fitting Error! Reference source not found. and Error!

Reference source not found.. Because the data is not perfectly normal distributed, an

overlapping Gaussian component may cover some of that asymmetric skew. However,

that Gaussian on itself does not represent any underlying audio events and that minute

might be better described by less components.

Matlab graph 18: Inaccurate data fitting

66

Matlab graph 19: Inaccurate data fitting

Threshold

There is no need to define a threshold for anomalies because this technique will not be

used. All techniques are first compared mutually in 5.1.4 2D vs. 5D vs. 5D

standardized and with the classical approach in 6 Results Classical Approach.

5.1.2. GMM 5D

Both the mean values and the variances of the Gaussian components now count for

clustering. In other words, the feature vector exists out of five digits instead of two.

The 2D plot in Matlab graph 20 only represents the mean values of the anomalies and

immediately states that the variance has a significant impact on clustering.

67

Matlab graph 20: Mean values of 1257 anomalies defined by 5D GMM

Types of anomalies

Anomaly type 1: Microphone failure or resumption, shown in Matlab graph 21. The

minute at which a microphone fails or resumes, gives both zero values and non-zero

values in one single spectrogram. The zero values strongly affect the Gaussian

components, especially in the sense of variance. Note that this happens due to the hard

coded number of Gaussian components, as otherwise it might fit an additional

dedicated Gaussian component with a normal behaving variance, not resulting

anomalous.

68

Matlab graph 21: Anomaly type 1: Microphone failure or resumption

Anomaly type 2: All other anomalies, examples in Matlab graph 22, Matlab graph 23,

Matlab graph 24. In essence, all pointed anomalies are spectrograms that have

problems being captured accurately in five components. Data points with different

distributions have to be squeezed into a single component and result in unlikely

variances. Matlab graph 22 is an example of a spectrogram that needs more

components to be described accurately. Also Matlab graph 23 obviously would split

the skew of the low frequency values and the noise at the high frequency values into

different components. Matlab graph 24 is less straightforward. It even seems that the

Gaussian component is redundant.

69

Matlab graph 22: Anomaly type 2

70



Model shortcomings

Each one of the anomalies represents a bad fit and points the weakness of the system.

The problem is that the anomaly does not really point to specific underlying data. It

seems that most of the anomalies need more Gaussian components to describe the

minute accurately. When the AIC criterion is applied on all anomalies, the average of

nine comes out as optimal amount of components. The question rises whether five

Gaussian components, although optimal for 94% of the minutes, is actually the good

amount to work with. To test and clarify this uncertainty, all minutes are recalculated

from the beginning, this time with nine components. Of those newly described

minutes, the anomalies are calculated in the exact same way as is done with five

components. Below, some of the anomalies are shown in.

71

Matlab graph 25: Anomaly based on 9 Gaussian components

72



A new problem arises, namely over fitting. Because nine components are optimal only

for a few minutes, all too often there are Gaussian components that take unusual

variances because of the over fitting, not because of an underlying anomaly. The idea

of nine Gaussian components for all minutes is abandoned.

Reconstruction of GMM on anomalies

When supervising the anomalies based on five Gaussian components, as noted before,

it is obvious that many of those minutes are actually under fit, i.e. more Gaussian

components are needed to describe that minute. This means that probably many

minutes are not even anomalous, but only labelled as anomalous due to a misfit. To

solve this, the assigned anomalies with five Gaussian components get a second 'treat'.

The GMM on those minutes is reconstructed, without hard coding the number of

Gaussian components per minute. Each minute gets thus assigned its optimal amount

of components. Each of these newly constructed components goes through the

classifying process and the remaining anomalous Gaussian components are observed.

Of the 1257, only 265 anomalies remain. Those 265 anomalies are compared to the

73

supervision applied on the 1257 anomalies. Only 28% of the 265 is defined true

positive by supervision, doing worse than the original anomaly selection and

suggesting many false negatives if this technique is followed. Consider the following

minute:

Matlab graph 28: anomalous Gaussian out of 5

74


The second fit is almost nothing better than the first fit, yet needs a lot of computation

power to be obtained. The same happens for all almost all minutes. Another example

is shown below.

75


76


The second round with a customized number of Gaussians applied to each anomalous

candidate is abandoned as it does not give better results.

Threshold

An initial threshold is set on a probability density function or pdf value of 7.77e-13.

All 1257 results are reviewed by the study group and around 70% of those are true

positives, leaving 30% of false positives. An interesting fact is that the number of true

positives does not decrease towards the end of the list, suggesting that there are many

false negatives and the threshold must be broaden. When the threshold is changed a

couple of times, a regression function can be made and an optimal threshold can be

set. From there on, all false positives join a new population, on which a similar 5D

GMM classifier is applied. Every newly incoming possible anomaly now goes

through the second classifier and when it is still suggested a true positive, it is

supervised.

5.1.3. GMM 5D standardized

77

Matlab graph 32: Mean values of 1257 anomalies defined by 5D standardized GMM clustering

On first sight, the anomalies seem to be quite similar to those created by not

standardized 5D clustering. Therefore, the anomalies of the three techniques are

compared not only with each other, but also with a totally different, more classical

approach based on spectral features. The mutual comparison of techniques in 5.1.4

and the comparison with the classical approach in 6, point 5D not standardized as

optimal technique. The standardized results are thus not discussed.

5.1.4. 2D vs. 5D vs. 5D standardized

The anomalies assigned by each of the three different cluster dimensions are

compared below. The order is not taken into account, only the presence of the same

minute in all the detected anomalous minutes.

2D 5D 5D standardized

2D 100% 34,4% 35,9%

5D 34,4% 100% 37,4%

5D standardized 35,9% 37,4% 100%

Table 2: Comparison 2D, 5D and 5D standardized clustering

The table confirms the logical reasoning that 5D clustering without standardization

and 5D clustering with standardization equal the most, while 2D clustering equals 5D

with standardization more because the standardization process enlarges the impact of

the mean versus the variance. Unfortunately, this information does not reveal which

of the techniques is best. Therefore, the decision is based on visual supervision of the

spectrograms and additionally, all techniques are compared with a totally different

approach; the classical approach based on spectral features. Based on supervision, the

not standardized method is preferred by professor Botteldooren because the measures

of both mean and variance are expressed in the same units and standardize both of

them independently might mess up their relationship. For the comparison with the

classical approach is referred to 6. Also there, 5D not standardized clustering comes

out as the best approach.

5.2. Unusual minutes

78

5.2.1. Joint Probability

The results of the joint probability are still due to supervision. The method of

detection is assigned in 3.5.2. 35,7% of the assigned anomalies also occur in

anomalous events.

Matlab graph 33: Unusual minute based on joint probability

5.2.2. Joint Correlation

Also unusual minutes based on joint correlation are still due to supervision and can be

treated by the same supervision/threshold system as anomalous events. None of

anomalies also occur in unusual events. Compared to the anomalies assigned by joint

probability, they seem less anomalous. An example is shown below.

79

Matlab graph 34: Unusual minute based on joint correlation

5.3. Contextual anomalies

Contextual anomalies are defined based on their timing, for this reason, they appear to

be normal visually. A human supervisor that knows the environment or

neighbourhood should revise these. It is rather impossible for me as an individual

without further data of the setting of the microphone to decide whether that minute is

actually anomalous. As visible in the Matlab graphs, the minutes do not appear to be

unusual.

80

Matlab graph 35: unusual minute by their time context

6. Results Classical Approach

The Classical Approach has some shortcomings compared to the new approach. First

of all, the program is very heavy to run, too heavy in fact. Sampling is necessary,

which decreases the quality of the result. Furthermore, the technique does not

encounter any temporal relationships, while the new approach with GMM's per

minute describe both spectral relationships and temporal relationships into one single

model. As this technique needs much more research to make it fully valuable, the

spectrograms of the anomalies are not studied. This setup is only used to get more

insight into the new approach, and of course also for the sake of learning the

algorithms that are used.

6.1. Point anomalies

As stated before, only point anomalies are defined by this technique. The same

percentile of 0.05% is used, which again results in 1257 anomalous minutes, just as

much as in the new approach. Now it is of ultimate importance to compare the dates

81

of the anomalies. Each row [year month day hour minute] of the classical approach

anomalies matrix is looked up for existence in the matrix of the new approach. Keep

in mind that the two techniques totally differ in their ideology; where the new

approach searches for one anomalous Gaussian component per minute (out of 5), the

classical approach considers the total minute, not just a fragment of it. For this reason,

significant differences are expected. The table below shows the comparison results.

2D 5D 5D standardized

Number of anomalies 1257 1257 1257

Matching anomalies 332 435 377

Percentage matching 26,4% 34,6% 29,9%

Table 3: Comparison classical approach with main approach

5D without standardization comes out best. Also visually it is the most promising

method. For this reason, 5D clustering is used as basis for unusual minutes and

contextual anomalies.

7. Model Extension

This Chapter is an extension of the main model, to get a better understanding about its

flexibility multi applicable character. The concept is to take all nodes (microphones)

as input for the feature extraction. The classifier is applied only onto node 275, the

subject node of this thesis. The same threshold should now assign way less anomalies

and reveal the 'general' anomalies instead of only the local anomalies.

7.1. Feature Extraction

Due to time constraints, not all data of all nodes is processed. The following nodes are

included in the extension: 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262,

264, 265, 266, 268, 273 and 274. Per node, the features of around six months are

generated by GMM's. The manner is the exact same as described in 3.4.2; Again, each

minute is described by five Gaussian components, consisting of five digits, two for

the mean value and three for the variance.

7.2. Classification

82

The features of all those nodes are clustered in the exact same manner as in 3.5. Then,

the feature vectors of node 275 are classified by this new model and the results are

compared. Unfortunately, the program could not finish before the due date of the

written report. Therefore, the results will be sent to the study group and they will be

included in the public defense.

8. Conclusion and Outlook

8.1. Conclusion

Acoustic information is a highly valuable source of information for environmental

context awareness. Based on the human ability to interpret sound signals in an

effortless way, this research aims to develop a system that performs as close as

possible to the performance of a human supervisor. One of the main difficulties of this

thesis is that the transformed data is not reversible to its original audio waves, which

makes acoustic supervision impossible. Another difficulty is that the data is of

significant size, calling for computational efficient techniques and creative thinking.

This thesis only uses the incoming data as input, without other assumptions, labelled

datasets, or metadata. This unsupervised approach has the advantage that all results

are directly originating from the input data, no other knowledge can be mistakenly

applied. The model that this thesis proposes makes use of Gaussian Mixture Models

for feature extraction. More specifically, all the spectrograms of a certain timeframe

are modelled by one GMM. This approach does not only allow significant data

reduction, it also captures both spectral relations and temporal relations in a single

model. The newly created features, which are the parameters of these Gaussian

components now serve to form different types of feature vectors, depending on the

type of anomaly that is sought after.

When looking at the results of the unusual events, the created model fits the data very

accurately, and where it does not, a supervisor helps classifying true positives and

false positives. The latter ones are input for another GMM classifier that is gradually

updated and not only replaces the human supervisor over time but also reduces the

total error rate.

83

Besides those anomalous events, also the combination of events within a timeframe

can behave unusual. These are however, still due to supervision but can be treaten

with a similar approach as the unusual events.

At last, totally normal moments from a acoustic point of view, can still be unusual

according time. A deeper understanding of the environment is necessary to examine

these and again, the false positives can become the input of a new classifier.

Instead of applying a hard threshold on the nomination of anomalies, a more intuitive

and morally correct technique is applied. The rate of false positives is initially taken

too high and human supervisor assigns each anomaly with a label: 'false positive' or

'true positive'. The false positives are stored and their characteristics are learned by

the system. This self-enhancement, also called machine learning, gradually decreases

the rate of false positives and increases the accuracy of the system.

Besides the significant data reduction, the speed of the program and the advantages of

unsupervised learning, another advantage of this research is that the developed

technique can be applied on any environment. The technique will learn the location's

specific features and increase accuracy levels with time.

8.2. Outlook

The duration of a thesis project allows only a certain deepness of research, so

evidentially, there is room for improvement.

A first possibility is to describe each minute by its optimal number of Gaussian

components and then apply the classification. This is however, a very computational

intensive approach, which can still operate faster real time speed, as soon as the

system is ready to run on newly incoming data. The training set however, is too big

for that approach. Furthermore, the level of difficulty increases significantly, as some

of the feature vectors are now of variable length. Although the accuracy of the system

would improve, it is doubtful if it compensates the increased complexity and

computational intensity.

84

Furthermore, conceptual anomalies are not addressed in this research. The GMM's

only encounter small-scale temporal relations, in between one minute and one day.

Although, the evolution of the environment over time is also very important and could

reveal trends, seasonality, ...

Another interesting topic for future work is to build taxonomies for different types of

environments. Instead of using a huge training set every time this program is applied

onto a new environment, the knowledge of likewise locations could be used to

converge faster and improve the level of anomaly accuracy.

85

Bibliography

[1] T. H. Park. (2010). Introduction to digital signal processing: Computer

musically speaking. World Scientific.

[2] W. A. Sethares, R. D. Morris, J. C. Sethares. (2005). Beat tracking of

musical performances using low-level audio features. Speech and Audio

Processing: IEEE Transactions on. Vol. 13, No. 2, pp. 275-285.

[3] G. Peeters. (2004). A large set of audio features for sound description

(similarity and classification). CUIDADO I.S.T. Project Report.

[4] M. McKinney and J. Breebaart (Oct 2003). Features for audio and music

classification. Proceedings of the 4th International Conference on Music

Information Retrieval (ISMIR 03). Baltimore, Maryland, USA.

[5] T. Andersson. (2004). Audio classification and content description. Lulea

University of Technology, Sweden.

[6] H. Misra, S. Ikbal, H. Bourlard, H. Hermansky. (2004). Spectral entropy

based feature for robust Automatic Sound Recognition in Acoustics, Speech,

and Signal Processing. (ICASSP). IEEE International Conference on. Vol. 1,

pp. I-193.

[7] S. Rossignol, X. Rodet, J. Soumagne, J.-L. Collette, P. Depalle. (s.a.).

Automatic Characterisation of musical signals: feature extraction and

temporal segmentation. IRCAM, Paris.

[8] B. Doval, X. Rodet. (1991). Estimation of fundamental frequency of musical

sound signals. Proc.ICASSP 5, pp. 3657-3660.

[9] L.R. Rabiner, B.H. Juang. (1993). Fundamentals of speech recognition.

Prentice-hall.

[10] C. Panagiotakis, G. Tziritas. (2005). A speech/music discriminator based on

rms and zero-crossings. Multimedia, IEEE Transactions on. Vol. 7, No. 1,

pp.155-166.

[11] V. Peltonen, J. Tuomi, A. Klapuri, J. Huopaniemi, T. Sorsa. (2002).

Computational auditory scene recognition. In Acoustics, Speech, and Signal

Processing. (ICASSP). IEEE International Conference on. Vol. 2, pp. II-

1941.

[12] H. Lu, W. Pan, N. D. Lane, T. Choudhury, A. T. Campbell. (2009).

Soundsense: scalable sound sensing for people-centric applications on

mobile Bibliography 90 phones. In Proceedings of the 7th international

conference on Mobile systems, applications, and services. pp. 165-178.

[13] T. Zhang, C-CJ Kuo. (2001). Audio content analysis for online audiovisual

data segmentation and classification. Speech and Audio Processing, IEEE

Transactions on. Vol. 9, No. 4, pp.441-457.

[14] D. Li, I. K. Sethi, N. Dimitrova, T. McGee. (2001). Classification of general

audio data for content-based retrieval. Pattern recognition letters. Vol. 22,

No. 5, pp. 533-544.

[15] L. Lu, H-J. Zhang, H. Jiang. (2002). Content analysis for audio classification

and segmentation. Speech and Audio Processing, IEEE Transactions on. Vol.

10, No. 7, pp. 504-516.

[16] R. Radhakrishnan, A. Divakaran, P. Smaragdis. (2005). Audio analysis for

surveillance applications. in IEEE WASPAA’05. pp. 158-161.

[17] J. Salamon, J.P. Bello. (s.a.). feature learning with deep scattering for urban

sound analysis. Center for urban science and progress. New York University.

86

[18] Coates, A.Y. Ng. (2012). Learning feature representations with K-means.

Neural Networks: Tricks of the Trade, 2nd edition, Springer LNCS 7700.

Stanford University.

[19] S. van den Oord, B. Dieleman, Schrauwen. (s.a.). Deep-content based music

recommendation. Ghent University: Electronics and Information Systems

department (ELIS).

[20] S. Ntalampiras, I. Potamitis, N. Fakotakis. (s.a.). On acoustic surveillance of

hazardous situations. University of Patras, Greece: department of Electrical

and Computer Engineering.

[21] Cotton, V. Courtenay, Ellis, P.W. Daniel. (2011). Spectral Vs. Spectro-

Temporal Features For Acoustic Event Detection. Colombia University:

Department of Electrical Engineering.

[22] R. Radhakrishnan, A. Divakaran, P. Smaragdis. (2005). Audio Analysis for

Surveillance Applications. Cambridge: Mitsubishi Electric Research Labs.

[23] Y. Hao, M. Shokoohi-Yekta, G. Papageorgiou, E. Keogh. (s.a.). Parameter-

Free Audio Motif Discovery in Large Data Archives. University of California,

Riverside.

[24] E. F. Gomes, F. Batista. (2015). Using Multiresolution Time Series Motifs to

Classify Urban Sounds. International Journal of Software Engineering and Its

Applications. Vol. 9, No. 8, pp. 189-196.

[25] H. Lee, Y. Largman, P. Pham, A. Ng. (s.a.). Unsupervised Feature Learning

for Audio Classification using Convolutional Deep Belief Networks. Stanford

University: Computer Science department.

[26] J. Salamon, J.P. Bello. (s.a). Unsupervised Feature Learning for Urban

Sound Classification. New York University: Center for Urban Science and

Progress, Music and Audio Research Laboratory.

[27] R. Cai, L. Lu, A. Hanjalic. (s.a). Unsupervised Content Discovery in

Composite Audio. Delft University of Technology: Department of

Mediamatics, Tshinghua University: Department of Computer Science.

[28] Global Study on Homicide. United Nations Office on Drugs and Crime, 201

1

A. Appendix

A.1. Features

A.1.1. Spectral Centroid (SC)

Spectral centroid represents the “balancing point”, or the midpoint of the spectral

power distribution of a signal. It is related to the brightness of a sound. The higher the

centroid, the brighter (high frequency) the sound is. A spectral centroid provides a

noise-robust estimate of how the dominant frequency of a signal changes over time.

As such, spectral centroids are an increasingly popular tool in several signal

processing applications, such as speech processing. Spectral centroid is obtained by

evaluating the “center of gravity” using the Fourier transform’s frequency and

magnitude information. The individual centroid of a spectral frame is defined as the

average frequency weighted by amplitudes, divided by the sum of the amplitudes. The

following equation shows how to compute the spectral centroid, SCi, of the ith

audio

frame.

𝑆𝐶𝑖 =∑ 𝑘. |𝑋𝑖(𝑘)|2𝐾−1

𝑘=0

∑ |𝑋𝑖(𝑘)|2𝐾−1𝑘=0

Here, Xi(k) is the amplitude corresponding to bin k (in DFT spectrum of the signal) of

the ith

audio frame and K is the size of the frame. The result of the spectral centroid is

a bin index within the range 0 < 𝑆𝐶 < 𝐾 − 1. It can be converted either to Hz (using

the following equation) or to a parameter range between zero and one by dividing it

by the frame size, K. The frequency of bin index k can be computed from the block

(frame) length K and sample rate fs by:

𝑓(𝑘) = (𝑓𝑠

𝐾)𝑘

Low results indicate significant low frequency components and insignificant high

frequency components (low brightness) and vice versa.

2

A.1.2. Spectral Spread (SS)

The spectral spread is the second central moment of the spectrum. It is a measure that

signifies if the power spectrum is concentrated around the centroid or spread out over

the spectrum. In order to compute it, one has to take the deviation of the spectrum

from the spectral centroid, according to the following equation:

𝑆𝑆𝑖 = √∑ (𝑘 − 𝑆𝐶𝑖)2. |𝑋𝑖(𝑘)|2𝐾−1

𝑘=0

∑ |𝑋𝑖(𝑘)|2𝐾−1𝑘=0

A.1.3. Spectral Roll-off Point (SRP)

The spectral rolloff point is the N% percentile of the power spectral distribution,

where N is usually 85% or 95%. The spectral rolloff point is the frequency below

which N% of the magnitude distribution is concentrated. It increases with the

bandwidth of a signal. Spectral rolloff is extensively used in music information

retrieval and speech/music segmentation. The spectral rolloff point is calculated as

follows:

𝑆𝑅𝑃 = 𝑓(𝑁) = (𝑓𝑠

𝐾)𝑁

Where N is the largest bin that fulfills:

∑ |𝑋(𝑘)|2 ≤ 𝑇𝐻. ∑ |𝑋(𝑘)|2

𝐾−1

𝑘=0

𝑁

𝑘=0

Where X(k) are the magnitude components, k the frequency index and f(K) the

(frequency) spectral roll-off point with (100*TH)% of the energy. TH is a threshold

between 0 and 1. A commonly used value for the threshold is 0,85 and 0,95. This

measure is useful in distinguishing voiced speech from unvoiced: Unvoiced audio has

a high proportion of energy contained in the high-frequency range of the spectrum,

whereas most of the energy for voiced speech and music is contained in the lower

bands.

A.1.4. Spectral Entropy (SE)

3

Spectral entropy is computed in a similar manner to the entropy of energy, although,

this time the computation takes placeS in the frequency domain. More specifically, we

first divide the spectrum of the short-term frame into L sub-bands (bins). The energy

Ef of the fth

sub-band, f=0,...,L-1, is then normalized by the total spectral energy, this

is:

𝑛𝑓 =𝐸𝑓

∑ 𝐸𝑓𝐿−1𝑓=0

The entropy of the normalized spectral energy nf is finally computed according to the

equation:

𝐻 = − ∑ 𝑛𝑓 . 𝑙𝑜𝑔2(𝑛𝑓)

𝐿−1

𝑓=0

A.1.5. Spectral Kurtosis or flatness

The kurtosis gives a measure of flatness of a distribution around its mean value. It is

computed from the 4th order moment. The kurtosis K indicates the

peakedness/flatness of the distribution. K=3 means a normal distribution, while K<3

means a flatter distribution and K>3 a peaker distribution.

A.1.6. Mel Frequency Cepstral Coefficients (MFCC)

MFCC originate from automatic speech recognition but evolved into one of the

standard techniques in most domains of audio recognition applications such

environmental sound classifications. They represent timbral information (spectral

envelop) of a signal. Computation of MFCC includes conversion of the Fourier

coefficients to Mel-scale. After conversion, the obtained vectors are logarthmized,

and decorrelated by discrete cosine transform (DCT) in order to remove redundant

information. Figure 9 shows the process of MFCC feature extraction.

4

Figure 9. MFCC extraction process

The first step, pre-processing, consists of pre-emphasizing, frame block- ing and

windowing of the signal. The aim of this step is to model small (typically, 20ms)

sections of the signal (frame) that are statistically stationary. The window function,

typically a Hamming window, removes edge effects. The next step takes the Discrete

Fourier transform (DFT) of each frame. We retain only the logarithm of the amplitude

spectrum. We discard phase information because perceptual studies have shown that

the amplitude of the spectrum is much more important than the phase. We take the

logarithm of the amplitude because the perceived loudness of a signal has been found

to be approximately logarithmic. After a discrete Fourier transform, the power

spectrum is transformed to Mel-frequency scale. This step smooths the spectrum and

emphasizes perceptually meaningful frequencies. Mel- frequency scale is based on

mapping between actual frequency and perceived pitch by human auditory system.

The mapping is approximately linear below 1 KHz and logarithmic above. This is

done using a filter bank consisting of triangular filters, spaced uniformly on the Mel-

scale. An approximate conversion between a frequency value in Hertz (f) and in mel

is given by:

𝑚𝑒𝑙(𝑓) = 2595𝑙𝑜𝑔10 (1 +𝑓

700)

Finally, the cepstral coefficients are calculated from the mel-spectrum by taking the

discrete cosine transform (DCT) of the logarithm of the mel-spectrum. This

calculation is given in by:

𝑐𝑖 = ∑(𝑙𝑜𝑔𝑆𝑘). 𝑐𝑜𝑠(𝑖𝜋

𝐾(𝑘 −

1

2))

𝐾−1

𝑘=0

Where ci is the ith

MFCC, Sk is the output of the kth

filter bank channel (i.e. the

weighted sum of the power spectrum bins on that channel) and K is the number of

coefficients (number of the Mel-filter banks). The used K value is usually between 20

and 40, mostly 23.

The components of MFCCs are the first few DCT coefficients that describe the coarse

spectral shape. The first DCT coefficient represents the average power (energy) in the

spectrum. The second coefficient approximates the broad shape of the spectrum and is

5

related to the spectral centroid. The higher order coefficients represent finer spectral

details (e.g., pitch). In practice, the first 8-13 MFCC coefficients are used to represent

the shape of the spectrum. The higher order coefficients are ignored since they

provide more redundant information. However, some applications require more

higher-order coefficients to capture pitch and tone information. The Mel spectrum is

particularly useful in Machine Learning tasks, because it is stable to deformation

using a Euclidean norm, unlike the spectrogram. However, the averaging used to

create the Mel-spectrum causes significant loss of high-frequency information unless

the window size is kept small.

A.1.7. Bark bands

Although the Mel bands are used for the Mel Frequency Cepstral Coefficients, the

Bark bands are a better approximation of the Human Auditory System. This latter will

be used for the calculation of the Loudness, specific loudness, sharpness and spread.

Conversion from Hz to the Bark scale:

𝐵 = 13. 𝑎 𝑡𝑎𝑛(𝑓

1315,8) + 3,5. 𝑡𝑎𝑛(

𝑓

7518)

Where B is the frequency expressed in Bark, and f in Hertz. The linear frequency axe

is converted into Bark scale. The bark scale axe is then divided into 24 equally spaced

bands. The energy of the bins k of the FFT corresponding to each Bark band z are

then summed up to form the contribution to the band z.

𝑎𝑚𝑝𝑙𝑏𝑎𝑛𝑑_𝑣(𝑧) = ∑ 𝐴𝑘2

𝑒𝑛𝑑(𝑧)

𝑘=𝑏𝑒𝑔𝑖𝑛(𝑧)

Where Ak is the amplitude of the bin k of the FFT.

A.1.8. Zero Crossing Rate (ZCR)

Zero Crossing Rate is the most common type of zero crossing based audio features. It

is defined as the number of time-domain zero crossings within a processing frame. It

indicates the frequency of signal amplitude sign change. ZCR allow for a rough

6

estimation of dominant frequency and spectral centroid. We used the following

equation to compute the average zero-crossing rate.

𝑍𝐶𝑅 = 1

2𝑁(∑ |𝑠𝑔𝑛(𝑥(𝑛)) − 𝑠𝑔𝑛(𝑥(𝑛 − 1))|)

𝑁

𝑛=1

where x is the time-domain signal, sgn is the signum function, and N is the size of

processing frame. The signum function implementation can be defined as

𝑠𝑔𝑛(𝑥) = {1 𝑥 ≥ 0−1 𝑥 < 0

One of the most attractive properties of the ZCR is that it is very fast to calculate. As

being a time-domain feature, there is no need to calculate the spectra. Fur- thermore, a

system which uses only the ZCR-based features would not even need digital-to-

analog conversion, but only the information whenever the sign of the signal changes.

However, ZCR can be sensitive to noise. Though using a threshold value (level) near

to zero can significantly reduce the sensitivity to noise, determining appropriate

threshold level is not easy.

A.1.9. Spectral Flux (SF)

The Spectral Flux is a 2-norm of the frame-to-frame spectral amplitude difference

vector. It defines the amount of frame-to-frame fluctuation in time. i.e., it measures

the change in the shape of the power spectrum. It is computed via the energy

difference between consecutive frames as follows:

𝑆𝐹𝑗 = ∑ ||𝑋𝑓(𝑘)| − |𝑋𝑓−1(𝑘)||

𝐾−1

𝑘=0

Where f is the index of the frame and K is the frame length. Spectral flux is an

efficient feature for speech/music discrimination, since in speech the frame-to-frame

spectra fluctuate more than in music, particularly in unvoiced speech.

7

A.1.10. Short Time Energy (STE)

The short-time energy is one of energy based audio features. Li and Zhang used it to

classify audio signals. It is easy to calculate and provides a convenient representation

of the amplitude variation over time. It indicates the loudness of an audio signal. STE

is a reliable indicator for silence detection. It is defined to be the sum of a squared

time domain sequences of audio data, as shown in equation.

𝑆𝑇𝐸 =1

𝑁∑(𝑥(𝑛))2

𝑁

𝑛=1

Where x(n) is the value of the sample (in time domain) and N is the total number of

samples in the processing window (frame size). The STE of audio signal may be

affected by the gain value of the recording devices. Usually we normalize the value of

STE to reduce the effect.

ZCR and STE are widely used in speech and music recognition applications. Speech,

for example, has a high variance in ZCR and STE values, while in music these values

are normally much more constant. ZCR and STE have been also used in ESR

applications due to their simplicity and low computational complexity.

A.1.11. Temporal Centroid (TC)

Temporal Centroid is the time average over the envelope of a signal in seconds. It is

the point in time where most of the energy of the signal is located in average.

𝑇𝐶 =∑ 𝑛. |𝑥(𝑛)|2𝑁

𝑛=1

∑ |𝑥(𝑛)|2𝑁𝑛=1

Note that the computation of temporal centroid is equivalent to that of spectral

centroid in the frequency domain.

A.1.12. Energy Entropy (EE)

The short-term entropy of energy can be interpreted as a measure of abrupt changes in

the energy level of an audio signal. In order to compute it, we first divide each short-

term frame in K sub-frames of fixed duration. Then for each sub-frame, j, we compute

8

its energy as for STE and divide it by the total energy, Eshortframe, of the short-term

frame.

𝑒𝑗 =𝐸𝑠ℎ𝑜𝑟𝑡𝑓𝑟𝑎𝑚𝑒𝑗

𝐸𝑠ℎ𝑜𝑟𝑡𝑓𝑟𝑎𝑚𝑒𝑖

where

𝐸𝑠ℎ𝑜𝑟𝑡𝑓𝑟𝑎𝑚𝑒𝑖= ∑ 𝐸𝑠ℎ𝑜𝑟𝑡𝑓𝑟𝑎𝑚𝑒𝑘

𝐾

𝑘=1

At a final step, the entropy, H(i), of the sequence ej is computed according to the

equation:

𝐻(𝑖) = − ∑ 𝑒𝑗 . 𝑙𝑜𝑔2(𝑒𝑗)

𝐾

𝑗=1

A.1.13. Autocorrelation (AC)

The autocorrelation domain represents the correlation of a signal with a time-shifted

version of the same signal for different time lags. It reveals repeating patterns and

their periodicities in a signal and can be employed, for example, for the estimation of

the fundamental frequency of a signal. This allows distinguishing between sounds that

have harmonic spectrum and non- harmonic spectrum, e.g., between musical sounds

and noise. Autocorrelation of a signal is calculated as follows:

𝐴𝐶 = 𝑓𝑥𝑥[𝜏] = 𝑥[𝜏] ∗ 𝑥[−𝜏] = ∑ 𝑥(𝑛). 𝑥(𝑛 + 𝜏)

𝑁

𝑛=0

Where is the lag (discrete delay index), 𝑓𝑥𝑥[𝜏] is the corresponding autocorrelation

value, N is the length of the frame, n the sample index and when =0, 𝑓𝑥𝑥[𝜏] becomes

the signal's power. Similar to the way RMS is computed, autocorrelation also steps

through windowed portions of a signal where each window frame's samples are

multiplied with each other and then summed according to the above equation. This is

9

repeated where one frame is kept constant while the other x(n+) is updated by

shifting the input x(n) via .

A.1.14. Root Mean Square (RMS)

As STE, the RMS value is a measurement of energy in a signal. The RMS value is

however defined to be the square root of the average of a squared signal, as seen in

the following equation:

𝑅𝑀𝑆 = √1

𝑁∑(𝑥(𝑛))2

𝑁

𝑛=1

10

A.2. Matlab Code

A.2.1. Workspace_Generator

%%%%%%%%%%%%%%%%%%%%%%%%%%%5%%%%%

% workspace generator node 275 %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

clear; clc; rand('seed',1)

%startdata 16-10-2014 8uur

year=2014;

month=10;

day=16;

hour=08;

zero=num2str(0);

streepje=('-');

maand=1;

%open alle bestanden van knoop 275 in de cell files

for aantalmaanden=1:13

filesmaand=cell(1,2);

countfile=0;

while maand==aantalmaanden

countfile=countfile+1;

if day<10

day_string=num2str(day);

day_string=strcat(zero, day_string);

else

day_string=num2str(day);

end

if month<10

month_string=num2str(month);

month_string=strcat(zero, month_string);

else


end

if hour<10

hour_string=num2str(hour);

hour_string=strcat(zero,hour_string);

else

hour_string=num2str(hour);

end

year_string=num2str(year);

date=strcat(year_string,streepje,month_string,streepje,day_string,str

eepje,hour_string);

init=('node_275_utc_');

format1=('.txt.gz');

format2=('.txt');

filename_zip=strcat(init,date,format1);

filename_unzip=strcat(init,date,format2);

if exist(filename_unzip,'file')

importdata(filename_unzip);

ans2=ans.data;

filesmaand(countfile,1)={ans2};

filesmaand(countfile,2)={[year month day hour]};

elseif exist(filename_zip,'file')

gunzip(filename_zip);

importdata(filename_unzip);

ans2=ans.data;

filesmaand(countfile,1)={ans2};


11

else

filesmaand(countfile,1)={zeros(28800,31)};


end

if hour==23

hour=0;

day=day+1;

if and(month==1,day==32)

day=1;

month=month+1;

maand=maand+1;

elseif month==2 && day==29

day=1;

month=month+1;

maand=maand+1;


day=1;

month=month+1;

maand=maand+1;


day=1;

month=month+1;

maand=maand+1;


day=1;

month=month+1;

maand=maand+1;


day=1;

month=month+1;

maand=maand+1;


day=1;

month=month+1;

maand=maand+1;


day=1;

month=month+1;

maand=maand+1;


day=1;

month=month+1;

maand=maand+1;


day=1;

month=month+1;

maand=maand+1;


day=1;

month=month+1;

maand=maand+1;


day=1;

month=1;

maand=maand+1;

year=year+1;

end

else

hour=hour+1;

end

clearvars ans2;

end

puntmat=('.mat');

filenaam=strcat(year_string,streepje,month_string,puntmat);

save(filenaam,'filesmaand','-v7.3');

end

A.2.2. Mid-level_GMM_generator

12

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Mid-level GMM generator: 5 components %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

clear;

clc;

rand('seed',1)

N_7_k=5;

N_7_maxiter=100;

timespan=480;

N_7_xx=linspace(1,31,31)';

year=2014;

zero=num2str(0);

streepje=('-');

formatmat=('.mat');

puntmat=('.mat');

%load data per month

for maand=10:22

month=maand;

N_7_featurevectorcell=cell(1,2);

if month>=13

year=2015;

month=month-12;

end


if month<10

month_string=strcat(zero,month_string);

end


filename=strcat(year_string,streepje,month_string);

files=matfile(filename);

filesmaand=files.filesmaand;

[filesmaand_rijen, filesmaand_kolommen]=size(filesmaand);

starttijd=1;

for h=1:filesmaand_rijen

hour_to_open=filesmaand{h,1};

[hour_to_open_rij, hour_to_open_kolom]=size(hour_to_open);

if mod((hour_to_open_rij-starttijd),timespan)==0

binarynumber=0;

elseif h==filesmaand_rijen

binarynumber=-1;

else

binarynumber=1;

end

rowsfeaturevectormatrix=N_7_k*...

(floor((hour_to_open_rij-...

(starttijd-1))/timespan)+binarynumber);

columfeaturevectormatrix=N_7_k;

N_7_featurevectormatrix=zeros(rowsfeaturevectormatrix,...

columfeaturevectormatrix);

N_7_featurevectormatrixcell=cell...

(rowsfeaturevectormatrix/N_7_k,1);

forloopaantal=floor((hour_to_open_rij-...

(starttijd-1))/timespan);

parfor r=1:forloopaantal

N_7_matrix=zeros(timespan*31,2);

for i=1:timespan

spectra=hour_to_open...

(starttijd+((r-1)*timespan)+i-1,:);

additionmatrix=[N_7_xx spectra'];

N_7_matrix((i-1)*31+1:...

((i-1)*31)+31,1:2)=additionmatrix;

end

13

N_7_GMModel=fitgmdist(N_7_matrix,N_7_k,'MaxIter'...

,N_7_maxiter,'RegularizationValue',0.1,'Start',...

'randSample','CovarianceType','full');

N_7_featurevectormatrixje=zeros(N_7_k,5);

for c=1:N_7_k

component_mu=N_7_GMModel.mu(c,:);

component_sigma=N_7_GMModel.Sigma(:,:,c);

N_7_featurevectormatrixje(c,:)=...

[component_mu(1,1) component_mu(1,2)...

component_sigma(1,1) component_sigma(1,2)...

component_sigma(2,2)];

end

N_7_featurevectormatrixcell(r,1)=...

{N_7_featurevectormatrixje};

end

starttijd=starttijd+forloopaantal*timespan;

%vanaf hier de overschot vd matrix nemen met de volgende file

N_7_matrix=zeros(timespan*31,2);

counter2=0;

%aantalvanvorigecel=filetoopenrij-starttijd;

if binarynumber==1

for o=starttijd:hour_to_open_rij

spectra=hour_to_open(o,:);


N_7_matrix(counter2*31+1:(counter2*31)+31,1:2)...

=additionmatrix;

counter2=counter2+1;

end

aantalvannieuwecel=timespan-...

(hour_to_open_rij-starttijd+1);

starttijd=1;

hour_to_open=filesmaand{h+1,1};

for u=starttijd:aantalvannieuwecel

spectra=hour_to_open(u,:);


N_7_matrix(counter2*31+1:(counter2*31)...

+31,1:2)=additionmatrix;


end

N_7_GMModel=fitgmdist(N_7_matrix,N_7_k,...

'MaxIter',N_7_maxiter,'RegularizationValue',0.1,'Start'...

,'randSample','CovarianceType','full');

N_7_featurevectormatrixje=zeros(N_7_k,5);

for c=1:N_7_k

%counter1=counter1+1;

component_mu=N_7_GMModel.mu(c,:);

component_sigma=N_7_GMModel.Sigma(:,:,c);

N_7_featurevectormatrixje(c,:)=...

[component_mu(1,1) component_mu(1,2) ...

component_sigma(1,1) component_sigma(1,2) ...

component_sigma(2,2)];

end

N_7_featurevectormatrixcell((rowsfeaturevectormatrix...

/5),1)={N_7_featurevectormatrixje};

end

for b=0:forloopaantal+binarynumber-1

N_7_featurevectormatrix((b*N_7_k)+1:(b*N_7_k)+5,:)...

=N_7_featurevectormatrixcell{b+1,1};

end

N_7_featurevectorcell(h,1)={N_7_featurevectormatrix};

N_7_featurevectorcell(h,2)={filesmaand(h,2)};

starttijd=aantalvannieuwecel+1;

end

14

features_string=('features');

filenaamfeature=strcat(features_string,streepje,year_string,...

streepje,month_string,puntmat);

save(filenaamfeature,'N_7_featurevectorcell','-v7.3');

end

sprintf('done')

A.2.3. Cluster Gaussian Components 5D

%%%%%%%%%%%%%%%%%%%%%%%%%

% Gaussian cluster 5d %

%%%%%%%%%%%%%%%%%%%%%%%%%

clear

clc

year=2014;

zero=num2str(0);

streepje=('-');

formatmat=('.mat');

puntmat=('.mat');

features=('features');

gaussiancel=cell(1,2);

counter1=0;

counter2=0;

for maand=10:22

month=maand;

if month>=13

year=2015;

month=month-12;

end


if month<10


end


feature_file=strcat(features,streepje,year_string,...

streepje,month_string,formatmat);

feature_file=matfile(feature_file);

N_7_featurevectorcell=feature_file.N_7_featurevectorcell;

[featurevectorcell_rijen, featurevectorcell_kolommen]...

=size(N_7_featurevectorcell);

for k=1:featurevectorcell_rijen


gaussiancel{counter1,1}=N_7_featurevectorcell{k,1};

cel=N_7_featurevectorcell{k,2};

[rijblabla kolomblabla]=size(N_7_featurevectorcell{k,1});

matrixblabla=[];

for h=1:rijblabla

matrixblabla=[matrixblabla;cel{1,1}];

end

gaussiancel{counter1,2}=matrixblabla;

end

end

gaussianmat=cell2mat(gaussiancel);

N_10_matrix=gaussianmat(:,1:5);

%maak nu een gaussian mixture model van alle gaussians

N_10_kmax=60;

N_10_maxiter=100;

N_10_replicates=10;

15

%Create and select the best GMM with AIC and determine k

AIC=zeros(1,N_10_kmax);

N_10_GMModels=cell(1,N_10_kmax);

N_10_options=statset('MaxIter',N_10_maxiter);

for k=1:60

k

N_10_GMModels{k}=fitgmdist(N_10_matrix,k,...

'MaxIter',N_10_maxiter,'RegularizationValue',0.1,...

'Start','randSample','CovarianceType','full');

N_10_AIC(k)=N_10_GMModels{k}.AIC;

save('N_10_GMModels.mat','N_10_GMModels','-v7.3');

end

[minAIC,numComponents]=min(N_10_AIC);

N_10_bestModel=N_10_GMModels{numComponents};

for c=1:numComponents

component_mu=N_10_bestModel.mu(c,:);

component_sigma=N_10_bestModel.Sigma(:,:,c);

N_10_featurevectormatrix(c,:)=[component_mu(1,1)...

component_mu(1,2) component_sigma(1,1)...

component_sigma(1,2) component_sigma(2,2)];

end

%plot the 3d gaussians to see if it fits

figure;

hold on

for u=1:numComponents

mu=N_10_featurevectormatrix(u,1:2);

SIGMA=[N_10_featurevectormatrix(u,3)...

N_10_featurevectormatrix(u,4);...

N_10_featurevectormatrix(u,4) N_10_featurevectormatrix(u,5)];

X = mvnrnd(mu,SIGMA,100);

gmm=fitgmdist(X,1,'MaxIter',N_10_maxiter,...

'RegularizationValue',0.1,'Start','randSample',...

'CovarianceType','full');

ezsurf(@(x1,x2)pdf(gmm,[x1 x2]),[0 31],[0 80],100);

end

hold off;

%scatter(N_10_matrix(:,1),N_10_matrix(:,2));

%figure;

%ezcontour(@(x1,x2)pdf(N_10_bestModel,[x1 x2]),[0 31],[0 80],100);

%figure;

%ezsurf(@(x1,x2)pdf(N_10_bestModel,[x1 x2]),[0 31],[0 80],100);

sprintf('done')

A.2.4. Define anomalies based on clustering

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Gaussian anomaly 5d based on pdf %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%load N_10_bestModel in gaussian5d

%save gaussiananomaly5d

%save gaussiananomalies5d

%save gaussiananomalydata5d

%save gaussiananomalydatums5d

%clear

clc

year=2014;

zero=num2str(0);

streepje=('-');

formatmat=('.mat');

puntmat=('.mat');


16


counter1=0;

counter2=0;

for maand=6:18

month=maand+4;

if month>=13

year=2015;

month=month-12;

end


if month<10


end


feature_file=strcat(features,streepje,year_string,streepje,...

month_string,formatmat);



[featurevectorcell_rijen,...

featurevectorcell_kolommen]=size(N_7_featurevectorcell);






matrixblabla=[];

for h=1:rijblabla


end


end

end



y=pdf(N_10_bestModel,N_10_matrix);

percentielthreshold= prctile(y,0.05);

minderdan=sum(y<percentielthreshold);

%give data for each anomaly

[rijen kolommen]=size(gaussianmat);

gaussiananomaly5d=gaussianmat(:,1);

gaussiananomalydata5d=gaussianmat(:,6:9);

gaussiananomalies5d=[];

gaussiananomalydatums5d=[];

pdfen=[];

for u=1:rijen

if y(u)<percentielthreshold

gaussiananomaly5d(u)=1;

gaussiananomalies5d=[gaussiananomalies5d;...

N_10_matrix(u,:) y(u)];

gaussiananomalydatums5d=[gaussiananomalydatums5d;...

gaussiananomalydata5d(u,:) y(u) u];

pdfen=[pdfen; y(u)];

else

gaussiananomaly5d(u)=0;

gaussiananomalydata5d(u,:)=0;

end

end

pdfen=sortrows(pdfen,1);

gaussiananomalies5d=sortrows(gaussiananomalies5d,6);

gaussiananomalies5d=gaussiananomalies5d(:,1:5);

gaussiananomalydatums5d=sortrows(gaussiananomalydatums5d,5);

[pointanomalies kolomblabla]=size(gaussiananomalydatums5d);

indexgaussian5d=cluster(N_10_bestModel,N_10_matrix);

17

featurevectorgaussian5d=zeros(rijen,5);

for h=1:rijen

index=indexgaussian5d(h);

featurevectorgaussian5d(h,1:5)=N_10_bestModel.mu(index,1:5);

end

hold on;

scatter(gaussiananomalies5d(:,1),gaussiananomalies5d(:,2),...

[],'filled');

hold off;

A.2.5. Plot minutes of anomalous Gaussian components

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% spectrogram abnormal gaussian 5D %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% laad gaussiananomaly5d.mat

% laad gaussiananomalydata5d.mat

% laad gaussiananomalydatums5d.mat

clc

year=2014;

zero=num2str(0);

streepje=('-');

formatmat=('.mat');

puntmat=('.mat');



counter1=0;

counter2=0;

for maand=10:22

month=maand;

if month>=13

year=2015;

month=month-12;

end


if month<10


end






[featurevectorcell_rijen, featurevectorcell_kolommen]=...

size(N_7_featurevectorcell);






matrixblabla=[];

for h=1:rijblabla


end


end

end



[rijen, kolommen]=size(gaussiananomaly5d);

%in which minute does the anomaly occur?

18

anomalycounter=0;

[rijendata kolomendata]=size(gaussiananomalydatums5d);

welkeminuut=zeros(rijendata,5);

for n=1:rijendata

anomalycounter=anomalycounter+1;

welkeminuut(anomalycounter,1)=gaussiananomalydatums5d(n,1);




uur=gaussiananomalydatums5d(n,4);

hour=uur;

nogeencounter=0;

idx2=gaussiananomalydatums5d(n,6);

while hour==uur

nogeencounter=nogeencounter+1;

blabla=idx2-nogeencounter;

if blabla==0

hour=38;

else

hour=N_10_matrix(blabla,4);

end

end

welkeminuut(anomalycounter,5)=ceil(nogeencounter/5);

end

[rijenanomalies,kolommenanomalies]=size(welkeminuut);

for z=1:rijenanomalies

figure;

jaar=welkeminuut(z,1);

maand=welkeminuut(z,2);

dag=welkeminuut(z,3);

uur=welkeminuut(z,4);

minuut=welkeminuut(z,5);

jaar_string=num2str(jaar);

maand_string=num2str(maand);

if maand<10

maand_string=strcat(zero,maand_string);

end

file=strcat(jaar_string,streepje,maand_string,formatmat);

file=matfile(file);

filesmaand=file.filesmaand;

[rijenfilesmaand, kolommenfilesmaand]=size(filesmaand);

vector=[jaar maand dag uur];

countertje=1;

h=1;

blabla1=filesmaand{h,2};

while isequal(blabla1,vector)==0

h=h+1;


countertje=countertje+1;

end

uurmatrix=filesmaand{countertje,1};

xx=linspace(1,31,31);

h=figure;

[uurrij, uurkolom]=size(uurmatrix);

c = linspace(0,1,480);

if minuut==60

if uurrij<28800

counter8=0;

for g=(minuut*480)-479:uurrij


scatter(xx,uurmatrix(g,:),[],...

[(1-c(counter8)) 0 c(counter8)]);

hold on

end

uurmatrix2=filesmaand{countertje+1,1};

for b=28800-uurrij


scatter(xx,uurmatrix(g,:),[],[(1-c(counter8))...

0 c(counter8)]);

19

hold on

end

else

counter8=0;

for g=(minuut*480)-479:(minuut*480)



0 c(counter8)]);

hold on

end

end

else

if uurrij<28799

counter8=0;

for g=(minuut*480)-479:min(uurrij,(minuut*480))



0 c(counter8)]);

hold on

end

else

counter8=0;




0 c(counter8)]);

hold on

end

end

end

rijtezoeken=[jaar maand dag uur];

[ja, positieanomaly]=ismember(rijtezoeken,N_10_matrix,'rows');

positieanomaly=positieanomaly+(minuut-1)*5;

for s=1:5

mu = [gaussianmat(positieanomaly+s-1,1)...

gaussianmat(positieanomaly+s-1,2)];

Sigma = [gaussianmat(positieanomaly+s-1,3)...

gaussianmat(positieanomaly+s-1,4); gaussianmat...

(positieanomaly+s-1,4) gaussianmat(positieanomaly+s-1,5)];

x1 = 0:1:31; x2 = 0:1:80;

[X1,X2] = meshgrid(x1,x2);

F = mvnpdf([X1(:) X2(:)],mu,Sigma);

F = reshape(F,length(x2),length(x1));

mvncdf([0 0],[1 1],mu,Sigma);

contour(x1,x2,F,[.0001 .001 .01 .05:.1:.95 .99 .999 .9999]);

xlabel('frequency'); ylabel('amplitude');

line([0 0 1 1 0],[1 0 0 1 1],'linestyle','--','color','k');

hold on;

end

positieanomaly=gaussiananomalydatums5d(z,6);

scatter(gaussianmat(positieanomaly,1),...

gaussianmat(positieanomaly,2),200,[0 .6 .2],'d','LineWidth',3);

caption=sprintf('anomalous gaussian, datum: %d-%d-%d %dh,...

minuut:%d',jaar,maand,dag,uur,minuut);

title(caption, 'FontSize', 15);

hold off

saveas(h,sprintf('gaussian5d_anomalies_FIG_%d.fig',z));

close all;

%if you prefer to print only the anomalous gaussian contour

% figure;

%

% mu = [gaussianmat(positieanomaly,1)...

gaussianmat(positieanomaly,2)];

% Sigma = [gaussianmat(positieanomaly,3)...

gaussianmat(positieanomaly,4); gaussianmat(positieanomaly,4)


20

% x1 = 0:1:31; x2 = 0:1:80;

% [X1,X2] = meshgrid(x1,x2);

% F = mvnpdf([X1(:) X2(:)],mu,Sigma);

% F = reshape(F,length(x2),length(x1));

% mvncdf([0 0],[1 1],mu,Sigma);

% contour(x1,x2,F,[.0001 .001 .01 .05:.1:.95 .99 .999 .9999]);

% xlabel('frequency'); ylabel('amplitude');

% line([0 0 1 1 0],[1 0 0 1 1],'linestyle','--','color','k');

% hold on;

% scatter(gaussianmat(positieanomaly,1),...


%

% caption=sprintf('anomalous gaussian, datum:...

%d-%d-%d %dh, minuut:%d',jaar,maand,dag,uur,minuut);

% title(caption, 'FontSize', 15);

%

% hold off

% saveas(h,sprintf('onegaussian5d_anomalies_FIG_%d.fig',z));

% close all;

end

A.2.6. Cluster based on 5D iKmeans

%%%%%%%%%%%%%%%%%%%%%%%

% cluster k-means 5d %

%%%%%%%%%%%%%%%%%%%%%%%

%steek eerst alles in 1 matrix.

clear

clc

year=2014;

zero=num2str(0);

streepje=('-');

formatmat=('.mat');

puntmat=('.mat');



counter1=0;

counter2=0;

for maand=6:18

month=maand+4;

if month>=13

year=2015;

month=month-12;

end


if month<10


end






[featurevectorcell_rijen, featurevectorcell_kolommen]=size...

(N_7_featurevectorcell);





end

end

21



%kmeans werkt niet want veel te grote matrixen nodig

%gebruik iK-means

[Centroids QtdEntitiesInCluster] = i_Kmeans...

(N_10_matrix, 1, false, 60);

%enkel verticaal clusteren

%[indexnumber Centrevector]=kmeans(N_7_featurevectormatrix(:,1),5);

% x1 = min(N_10_matrix(:,1)):0.01:max(N_10_matrix(:,1));

% x2 = min(N_10_matrix(:,2)):0.01:max(N_10_matrix(:,2));

% [x1G,x2G] = meshgrid(x1,x2);

% XGrid = [x1G(:),x2G(:)]; % Defines a fine grid on the plot

%

% [clusternumber Centrevector] = kmeans(N_10_matrix,58);

%

%

% idx2Region =

kmeans(XGrid,58,'MaxIter',1,'Start',Centrevector(:,1:2));

%

% figure;

% gscatter(XGrid(:,1),XGrid(:,2),idx2Region,...

% [0,0.75,0.75;0.75,0,0.75;0.75,0.75,0;...

% 0.5,0,0;0,0.5,0;0,0,0.5;...

% 0.2,0,0;0,0.2,0;0,0,0.2;...

% 0,0.8,0.8;0.8,0,0.8;0,0,0.8;...

% 0,0.35,0.35;0.35,0,0.35;0.35,0.35,0;...

% 0,0.75,0.75;0.75,0,0.75;0.75,0.75,0;...

% 0.5,0,0;0,0.5,0;0,0,0.5;...

% 0.2,0,0;0,0.2,0;0,0,0.2;...

% 0,0.8,0.8;0.8,0,0.8;0,0,0.8;...

%

0,0.35,0.35;0.35,0,0.35;0.35,0.35,0;...0,0.75,0.75;0.75,0,0.75;0.75,0

.75,0;...

% 0.5,0,0;0,0.5,0;0,0,0.5;...

% 0.2,0,0;0,0.2,0;0,0,0.2;...

% 0,0.8,0.8;0.8,0,0.8;0,0,0.8;...

%

0,0.35,0.35;0.35,0,0.35;0.35,0.35,0;...0,0.75,0.75;0.75,0,0.75;0.75,0

.75,0;...

% 0.5,0,0;0,0.5,0;0,0,0.5;...

% 0.2,0,0;0,0.2,0;0,0,0.2;...

% ],'..');

% hold on;

% plot(N_10_matrix(:,1),N_10_matrix(:,2));

% title 'Mean values of gaussian components';

% xlabel 'Frequency band mean';

% ylabel 'Amplitude mean';

% %legend('Region 1','Region 2','Region 3','Region 4','Region

5','Data','Location','Best');

% hold off;

A.2.7. Anomalies based on 5d iKmeans

%%%%%%%%%%%%%%%%%%%%%%%%%

% posterior kmeans 5d %

%%%%%%%%%%%%%%%%%%%%%%%%%

%load Centroids5d in kmeans5d

clc

year=2014;

22

zero=num2str(0);

streepje=('-');

formatmat=('.mat');

puntmat=('.mat');



counter1=0;

counter2=0;

for maand=6:18

month=maand+4;

if month>=13

year=2015;

month=month-12;

end


if month<10


end


feature_file=strcat(features,streepje,...

year_string,streepje,month_string,formatmat);









end

end




distancematrix=zeros(rijen,19);

for y=1:rijen

for o=1:19

D = pdist2(Centroids(o,:),N_10_matrix(y,1:5));

distancematrix(y,o)=D;

end

end

%mindistance=zeros(rijen,1);

distancetranspose=transpose(distancematrix);

[mindistance, indexkmeans5d]=min(distancetranspose);

mindistance=transpose(mindistance);

indexkmeans5d=transpose(indexkmeans5d);

% for t=1:rijen

% mindistance(t)=min(distancematrix(t,:));

% end

percentielthreshold= prctile(mindistance,99.99);

meerdan=sum(mindistance>percentielthreshold);

kmeansanomaly5d=gaussianmat(:,1);

kmeansanomalydata5d=gaussianmat(:,6:9);

kmeansanomalies5d=[];

kmeansanomalydatums5d=[];

23

for u=1:rijen

if mindistance(u)>percentielthreshold

kmeansanomaly5d(u)=1;

kmeansanomalies5d=[kmeansanomalies5d;N_10_matrix(u,:)];

kmeansanomalydatums5d=[kmeansanomalydatums5d;...

kmeansanomalydata5d(u,:)];

else


kmeansanomalydata5d(u,:)=0;

end

end

[pointanomalies kolomblabla]=size(kmeansanomalydatums5d);

featurevectorkmeans5d=zeros(rijen,5);

for h=1:rijen

index=indexkmeans5d(h);

featurevectorkmeans5d(h,:)=Centroids(index,:);

end

%plot the gmm and the anomaly points

%een kleur per event (event als ze opvolgend zijn)

kleurlijn=[1];

kleurcounter=1;

for k=2:pointanomalies

if isequal(kmeansanomalydatums5d(k,:),...

kmeansanomalydatums5d(k-1,:));

else

kleurcounter=kleurcounter+1;

end

kleurlijn=[kleurlijn kleurcounter];

end

figure;

% for t=1:19

% mu = N_10_bestModel.mu(t);

% Sigma = N_10_bestModel.Sigma(:,:,t);

% x1 = 0:1:31; x2 = 0:1:80;




%


% contour(x1,x2,F,[.0001 .001 .01 .05:.1:.95 .99 .999 .9999]);

% xlabel('x'); ylabel('y');


% hold on;

% end

% hold off;


%hold on;

scatter(kmeansanomalies5d(:,1),kmeansanomalies5d(:,2)...

,[],kleurlijn,'filled');

%hold off;

A.2.8. Clustering defined by spectral features

%%%%%%%%%%%%%%%%%%%%%%%%%

% posterior kmeans 5d %

%%%%%%%%%%%%%%%%%%%%%%%%%

%load Centroids5d in kmeans5d

24

clc

year=2014;

zero=num2str(0);

streepje=('-');

formatmat=('.mat');

puntmat=('.mat');



counter1=0;

counter2=0;

for maand=6:18

month=maand+4;

if month>=13

year=2015;

month=month-12;

end


if month<10


end


feature_file=strcat(features,streepje,year_string...

,streepje,month_string,formatmat);









end

end




distancematrix=zeros(rijen,19);

for y=1:rijen

for o=1:19

D = pdist2(Centroids(o,:),N_10_matrix(y,1:5));

distancematrix(y,o)=D;

end

end

%mindistance=zeros(rijen,1);

distancetranspose=transpose(distancematrix);

[mindistance, indexkmeans5d]=min(distancetranspose);

mindistance=transpose(mindistance);

indexkmeans5d=transpose(indexkmeans5d);

% for t=1:rijen

% mindistance(t)=min(distancematrix(t,:));

% end

percentielthreshold= prctile(mindistance,99.99);

meerdan=sum(mindistance>percentielthreshold);

kmeansanomaly5d=gaussianmat(:,1);

kmeansanomalydata5d=gaussianmat(:,6:9);

kmeansanomalies5d=[];

25

kmeansanomalydatums5d=[];

for u=1:rijen

if mindistance(u)>percentielthreshold


kmeansanomalies5d=[kmeansanomalies5d;N_10_matrix(u,:)];

kmeansanomalydatums5d=[kmeansanomalydatums5d;kmeansanomalydata5d(u,:)

];

else


kmeansanomalydata5d(u,:)=0;

end

end

[pointanomalies kolomblabla]=size(kmeansanomalydatums5d);

featurevectorkmeans5d=zeros(rijen,5);

for h=1:rijen

index=indexkmeans5d(h);

featurevectorkmeans5d(h,:)=Centroids(index,:);

end


%een kleur per event (event als ze opvolgend zijn)

kleurlijn=[1];

kleurcounter=1;

for k=2:pointanomalies

if isequal(kmeansanomalydatums5d(k,:),...

kmeansanomalydatums5d(k-1,:));

else

kleurcounter=kleurcounter+1;

end

kleurlijn=[kleurlijn kleurcounter];

end

figure;

% for t=1:19

% mu = N_10_bestModel.mu(t);

% Sigma = N_10_bestModel.Sigma(:,:,t);

% x1 = 0:1:31; x2 = 0:1:80;




%


% contour(x1,x2,F,[.0001 .001 .01 .05:.1:.95 .99 .999 .9999]);

% xlabel('x'); ylabel('y');


% hold on;

% end

% hold off;


%hold on;

scatter(kmeansanomalies5d(:,1),kmeansanomalies5d(:,2)...

,[],kleurlijn,'filled');

%hold off;

A.2.9. Anomalies based on spectral feature clustering

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% posterior GMM spectral features %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

26

PCAspectralfeatures=('PCAspectralfeatures');

streepje=('-');

formatmat=('.mat');


counter1=0;

counter2=0;

year=2014;

zero=('0');

for maand=6:18

month=maand+4;

if month>=13

year=2015;

month=month-12;

end


if month<10


end


feature_file=strcat(PCAspectralfeatures,streepje,...



N_6_featurevectorcellpca=feature_file.N_6_featurevectorcellpca;


(N_6_featurevectorcellpca);


hour_to_open=N_6_featurevectorcellpca{k,1};

data_to_open=N_6_featurevectorcellpca{k,2};

[rij kolom]=size(hour_to_open);

nieuwekolom=zeros(rij,2);

data_to_open=[data_to_open nieuwekolom];

anomalies=0;

for h=1:rij


minuut=floor(h/480)+1;

data_to_open(h,5)=minuut;

anomalies=anomalies+gaussiananomaly(counter1,1);

if mod((h),480)==0

data_to_open(h,6)=anomalies;

for c=1:479

data_to_open((h-c),6)=anomalies;

end

anomalies=0;

elseif h==rij

for w=0:mod(rij,480)-1

data_to_open(h-w,6)=anomalies;

end

anomalies=0;

end

end

N_6_featurevectorcellpca{k,2}=data_to_open;

% gaussiancel{counter1,1}=N_6_featurevectorcell{k,1};

% gaussiancel{counter1,2}=N_6_featurevectorcell{k,2};

end

PCAspectralfeatures_anomalies=('PCAspectralfeatures_anomalies');

feature_file=strcat(PCAspectralfeatures_anomalies,streepje...

,year_string,streepje,month_string,formatmat);

save(feature_file,'N_6_featurevectorcellpca','-v7.3');

27

end

A.2.10. Unusual minutes Joint Probability

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% unusual minute joint probability 5d %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%load indexgaussian5d.mat uit gaussian 5d

%save unusualminutesjointprob

year=2014;

zero=num2str(0);

streepje=('-');

formatmat=('.mat');

puntmat=('.mat');



counter1=0;

counter2=0;

for maand=10:22

month=maand;

if month>=13

year=2015;

month=month-12;

end


if month<10


end


feature_file=strcat(features,streepje,year_string...

,streepje,month_string,formatmat);









end

end


gaussianmat=gaussianmat(2401:end,:);

%maak nieuwe matrix per minuut

[aantalrijenx5 aantalkolommen]=size(indexgaussian5d);

aantalrijen=aantalrijenx5/5;

minuutmatrix=zeros(aantalrijen,10);

counterminuut=0;

for b=1:aantalrijenx5

rij=ceil(b/5);

if mod(b,5)==0

kolom=5;

else

kolom=mod(b,5);

end

minuutmatrix(rij,kolom)=indexgaussian5d(b);

if mod(b,5)==1

minuutmatrix(rij,6:9)=gaussianmat(b,6:9);

end

if b>1

if gaussianmat(b,9)==gaussianmat(b-1,9);

28

counterminuut=counterminuut+1;

minuutmatrix(rij,10)=ceil(counterminuut/5);

else

counterminuut=1;

minuutmatrix(rij,10)=counterminuut;

end

else

counterminuut=1;


end

end

string=('string');

A=zeros(max(indexgaussian5d),1);

for s=1:max(indexgaussian5d)

A(s)=(sum(indexgaussian5d(:)==s)/aantalrijenx5);

end

jointprobmatrix=zeros(aantalrijen,1);

for c=1:aantalrijen

jointprob=1;

for w=1:max(indexgaussian5d)

if sum(minuutmatrix(c,:)==w)*A(w)>0

jointprob=jointprob*sum(minuutmatrix(c,:)==w)*A(w);

end

end

jointprobmatrix(c,1)=jointprob;

end

percentielthreshold= prctile(jointprobmatrix,0.1);

minderdan=sum(jointprobmatrix<percentielthreshold);

unusualminutesjointprob=zeros(minderdan,11);

anomalycountertje=0;

gaussiananomaly5d=zeros(aantalrijen,1);

for g=1:aantalrijen

if jointprobmatrix(g,:)<percentielthreshold

anomalycountertje=anomalycountertje+1;...

unusualminutesjointprob(anomalycountertje,1:10)=...

minuutmatrix(g,:);

unusualminutesjointprob(anomalycountertje,11)...

=jointprobmatrix(g,:);

gaussiananomaly5d(g,1)=0;

else


end

end

unusualminutesjointprob=sortrows(unusualminutesjointprob,11);

unusualminutesjointprob=unusualminutesjointprob(:,1:10);

A.2.1. Unusual minutes Joint Correlation

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% unusual minute correlation 5d %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%load indexgaussian5d.mat uit gaussian 5d

%save unusualminutesjointcorr

year=2014;

zero=num2str(0);

streepje=('-');

formatmat=('.mat');

puntmat=('.mat');



counter1=0;

counter2=0;

for maand=10:22

month=maand;

29

if month>=13

year=2015;

month=month-12;

end


if month<10


end












end

end



%maak nieuwe matrix per minuut




counterminuut=0;


rij=ceil(b/5);

if mod(b,5)==0

kolom=5;

else

kolom=mod(b,5);

end


if mod(b,5)==1


end

if b>1




else

counterminuut=1;


end

else

counterminuut=1;


end

end

%maak die matrix binary

binarymatrix=zeros(aantalrijen,max(indexgaussian5d));

for a=1:aantalrijen

for d=1:max(indexgaussian5d)

som=(sum(minuutmatrix(a,:)==d));

if som>0

binarymatrix(a,d)=1;

else

binarymatrix(a,d)=0;

end

end

end

30

correlations=corrcoef(binarymatrix);

jointcorrelationmatrix=zeros(aantalrijen,1);

for q=1:aantalrijen

optetellen=0;

for r=1:4

for z=r+1:5

if correlations(minuutmatrix(q,z),minuutmatrix(q,r))==1

else

optetellen=optetellen+correlations(minuutmatrix(q,z)...

,minuutmatrix(q,r));

end

end

end

jointcorrelationmatrix(q,1)=optetellen;

end

%very uncommon minutes

percentielthreshold= prctile(jointcorrelationmatrix,0.1);

minderdan=sum(jointcorrelationmatrix<percentielthreshold);

unusualminutesjointcorr=zeros(minderdan,11);


gaussiananomaly5d=zeros(aantalrijen,1);

for g=1:aantalrijen

if jointcorrelationmatrix(g,:)<percentielthreshold

anomalycountertje=anomalycountertje+1;

unusualminutesjointcorr(anomalycountertje,1:10)...

=minuutmatrix(g,:);

unusualminutesjointcorr(anomalycountertje,11)...

=jointcorrelationmatrix(g,:);


else


end

end

unusualminutesjointcorr=sortrows(unusualminutesjointcorr,11);

unusualminutesjointcorr=unusualminutesjointcorr(:,1:10);

A.2.2. Plot Unusual minutes based on Joint Probability

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% spectrograms low joint probabilities 5d %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% load unusualminutesjointprob uit unusual minutes 5d normal

% load N_10_featurevectormatrix uit gaussian5d normal

% load gaussiananomalydata5d uit gaussian5d normal

% print the unusualminutes

[rij kolom]=size(unusualminutesjointprob);

clc

year=2014;

zero=num2str(0);

streepje=('-');

formatmat=('.mat');

puntmat=('.mat');



counter1=0;

counter2=0;

for maand=10:22

month=maand;

if month>=13

year=2015;

month=month-12;

end

31


if month<10


end












end

end




for z=1:rij

jaar=unusualminutesjointprob(z,6);

maand=unusualminutesjointprob(z,7);

dag=unusualminutesjointprob(z,8);

uur=unusualminutesjointprob(z,9);

minuut=unusualminutesjointprob(z,10);

% second option to define minuut, same result

% hour=uur;

% nogeencounter=0;

% [~, idx2]=ismember(unusualminutesjointprob...

(z,6:9),N_10_matrix,'rows');

% while hour==uur

% nogeencounter=nogeencounter+1;

% blabla=idx2-nogeencounter;

% hour=N_10_matrix(blabla,4);

% end

% minuut=ceil(nogeencounter/5);



if maand<10


end


file=matfile(file);




countertje=1;

h=1;



h=h+1;



end



h=figure;



if minuut==60

if uurrij<28800

32

counter8=0;




0 c(counter8)]);

hold on

end


for b=28800-uurrij



0 c(counter8)]);

hold on

end

else

counter8=0;




0 c(counter8)]);

hold on

end

end

else

if uurrij<28799

counter8=0;





hold on

end

else

counter8=0;





hold on

end

end

end

%[~,indx]=ismember(unusualminutesjointprob(z,6:9),...

gaussiananomalydata5d,'rows');

%positieanomaly=indx+minuut-1;

%positieanomaly=positiematrix(z);

hold on;

for w=1:5

rijtezoeken=[jaar maand dag uur];

[ja, positieanomaly]=ismember(rijtezoeken,N_10_matrix,...

'rows');

positieanomaly=positieanomaly+(minuut-1)*5;

mu=[N_10_featurevectormatrix(indexgaussian5d...

(positieanomaly+w-1),1)

N_10_featurevectormatrix(indexgaussian5d...

(positieanomaly+w-1),2)];

Sigma=[N_10_featurevectormatrix(indexgaussian5d...

(positieanomaly+w-1),3)

N_10_featurevectormatrix(indexgaussian5d...

(positieanomaly+w-...

1),4);N_10_featurevectormatrix(indexgaussian5d...

(positieanomaly+w-1),4) N_10_featurevectormatrix...

(indexgaussian5d(positieanomaly+w-1),5)];

%mu = [N_10_featurevectormatrix(unusualminutesjointprob(z,w),1)...

N_10_featurevectormatrix(unusualminutesjointprob(z,w),2)];

%Sigma = [N_10_featurevectormatrix(unusualminutesjointprob(z,w),3)

N_10_featurevectormatrix(unusualminutesjointprob(z,w),4);

33

N_10_featurevectormatrix(unusualminutesjointprob(z,w),4)

N_10_featurevectormatrix(unusualminutesjointprob(z,w),5)];

x1 = 0:1:31; x2 = 0:1:80;





contour(x1,x2,F,[.0001 .001 .01 .05:.1:.95 .99 .999 .9999]);



hold on;

scatter(mu(1), mu(2),200,[0 .6 .2],'d','LineWidth',3);

end

% alternative method to find the minute

% for h=0:4

%

% rijtezoeken=[jaar maand dag uur];

% [ja, positieanomaly]...

=ismember(rijtezoeken,N_10_matrix,'rows');

% positieanomaly=positieanomaly+(minuut-1)*5;

% mu = [gaussianmat(positieanomaly+h,1)...

gaussianmat(positieanomaly+h,2)];

% Sigma = [gaussianmat(positieanomaly+h,3)...

gaussianmat(positieanomaly+h,4);...

gaussianmat(positieanomaly+h,4)...

gaussianmat(positieanomaly+h,5)];

% x1 = 0:1:31; x2 = 0:1:80;




%


% contour(x1,x2,F,[.0001 .001 .01 .05:.1:...

.95 .99 .999 .9999]);



% hold on;

% scatter(mu(1), mu(2),200,[0 .6 .2],'d','LineWidth',3);

%

% end

hold off

caption=sprintf('low joint-probability minute,...

datum: %d-%d-%d %dh, minuut:%d',jaar,maand,dag,uur,minuut);


saveas(h,sprintf('unusualminute_jointprob_5d_FIG_%d.fig',z));

close all;

end

% minute plotted with its proper gaussian components instead of

cluster centroids

% %for z=1:rij

% for z=1:1

% jaar=unusualminutesjointprob(z,6);

% maand=unusualminutesjointprob(z,7);

% dag=unusualminutesjointprob(z,8);

% uur=unusualminutesjointprob(z,9);

% minuut=unusualminutesjointprob(z,10);

% % hour=uur;

% % nogeencounter=0;

% % [~, idx2]=ismember...

(unusualminutesjointprob(z,6:9),N_10_matrix,'rows');

% % while hour==uur

% % nogeencounter=nogeencounter+1;

% % blabla=idx2-nogeencounter;

34

% % hour=N_10_matrix(blabla,4);

% % end

% % minuut=ceil(nogeencounter/5);

% jaar_string=num2str(jaar);

% maand_string=num2str(maand);

% if maand<10

% maand_string=strcat(zero,maand_string);

% end

%

%

%

% file=strcat(jaar_string,streepje,maand_string,formatmat);

% file=matfile(file);

% filesmaand=file.filesmaand;

%

% [rijenfilesmaand, kolommenfilesmaand]=size(filesmaand);

% vector=[jaar maand dag uur];

% countertje=1;

% h=1;

% blabla1=filesmaand{h,2};

% while isequal(blabla1,vector)==0

% h=h+1;

% blabla1=filesmaand{h,2};

% countertje=countertje+1;

% end

% uurmatrix=filesmaand{countertje,1};

% xx=linspace(1,31,31);

% h=figure;

%

% [uurrij, uurkolom]=size(uurmatrix);

% c = linspace(0,1,480);

% if minuut==60

% if uurrij<28800

% counter8=0;

% for g=(minuut*480)-479:uurrij

% counter8=counter8+1;

% scatter(xx,uurmatrix(g,:),[],...


% hold on

% end

% uurmatrix2=filesmaand{countertje+1,1};

% for b=28800-uurrij


% scatter(xx,uurmatrix(g,:),[],[(1-c(counter8))...

0 c(counter8)]);

% hold on

% end

% else

%

% counter8=0;

% for g=(minuut*480)-479:(minuut*480)



0 c(counter8)]);

% hold on

% end

% end

% else

% if uurrij<28799

% counter8=0;

% for g=(minuut*480)-479:min(uurrij,(minuut*480))



0 c(counter8)]);

% hold on

% end

% else

% counter8=0;

% for g=(minuut*480)-479:(minuut*480)

35


% scatter(xx,uurmatrix(g,:),[],[(1-c(counter8)) ...

0 c(counter8)]);

% hold on

% end

% end

% end

%

% %[~,indx]=ismember(unusualminutesjointprob(z,6:9),...

gaussiananomalydata5d,'rows');

% %positieanomaly=indx+minuut-1;

% %positieanomaly=positiematrix(z);

% hold on;

%

%

% for s=1:5

%

% rijtezoeken=[jaar maand dag uur];

% [ja, positieanomaly]=ismember(rijtezoeken,N_10_matrix,...

'rows');

% positieanomaly=positieanomaly+(minuut-1)*5;

% mu = [gaussianmat(positieanomaly+s-1,1)...


% Sigma = [gaussianmat(positieanomaly+s-1,3)...

gaussianmat(positieanomaly+s-1,4);...

gaussianmat(positieanomaly+s-1,4) ...


% x1 = 0:1:31; x2 = 0:1:80;




%


% contour...

(x1,x2,F,[.0001 .001 .01 .05:.1:.95 .99 .999 .9999]);



% hold on;

% scatter(mu(1), mu(2),200,[0 .6 .2],'d','LineWidth',3);

%

% end

%

% hold off

% caption=sprintf('low joint-probability minute, ...


% title(caption, 'FontSize', 15);

%

%

% saveas(h,sprintf...

('unusualminuteowngaussian_jointprob_5d_FIG_%d.fig',z));

% close all;

% end

A.2.1. Plot Unusual minutes based on Joint Correlation

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% spectrograms low joint correlations 5d %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% load unusualminutesjointcorr uit unusual minutes 5d normal

% load N_10_featurevectormatrix uit gaussian5d normal

% load gaussiananomalydata5d uit gaussian5d normal

[rij kolom]=size(unusualminutesjointcorr);

zero=num2str(0);

streepje=('-');

formatmat=('.mat');

puntmat=('.mat');

36


%print the unusualminutes

year=2014;


counter1=0;

counter2=0;

for maand=10:22

month=maand;

if month>=13

year=2015;

month=month-12;

end


if month<10


end






[featurevectorcell_rijen, featurevectorcell_kolommen]...

=size(N_7_featurevectorcell);





end

end




for z=1:rij

jaar=unusualminutesjointcorr(z,6);

maand=unusualminutesjointcorr(z,7);

dag=unusualminutesjointcorr(z,8);

uur=unusualminutesjointcorr(z,9);

%minuut=unusualminutesjointcorr(z,10);

hour=uur;

nogeencounter=0;

[~, idx2]=...

ismember(unusualminutesjointcorr(z,6:9),N_10_matrix,'rows');

while hour==uur




end

minuut=ceil(nogeencounter/5);



if maand<10


end


file=matfile(file);


37



countertje=1;

h=1;



h=h+1;



end



h=figure;



if minuut==60

if uurrij<28800

counter8=0;





hold on

end


for b=28800-uurrij




hold on

end

else

counter8=0;





hold on

end

end

else

if uurrij<28799

counter8=0;





hold on

end

else

counter8=0;





hold on

end

end

end

%[~,indx]=ismember(unusualminutesjointprob(z,6:9)...

,gaussiananomalydata5d,'rows');

%positieanomaly=indx+minuut-1;


hold on;

for w=1:5

38

mu = [N_10_featurevectormatrix...

(unusualminutesjointcorr(z,w),1) ...

N_10_featurevectormatrix(unusualminutesjointcorr(z,w),2)];

Sigma = [N_10_featurevectormatrix...

(unusualminutesjointcorr(z,w),3) N_10_featurevectormatrix...

(unusualminutesjointcorr(z,w),4); N_10_featurevectormatrix...

(unusualminutesjointcorr(z,w),4) N_10_featurevectormatrix...

(unusualminutesjointcorr(z,w),5)];

x1 = 0:1:31; x2 = 0:1:80;





contour(x1,x2,F,[.0001 .001 .01 .05:.1:.95 .99 .999 .9999]);



hold on;


end

caption=sprintf('low joint-correlation minute, ...



hold off

saveas(h,sprintf('unusualminutes_jointcorr_5d_FIG_%d.fig',z));

close all;

end

%most usual minutes

percentielthreshold= prctile(jointcorrelationmatrix,99.99);

meerdan=sum(jointcorrelationmatrix>percentielthreshold);

unusualminutesjointcorr=zeros(meerdan,10);


for g=1:aantalrijen

if jointcorrelationmatrix(g,:)>percentielthreshold

anomalycountertje=anomalycountertje+1;

unusualminutesjointcorr(anomalycountertje,:)...

=minuutmatrix(g,:);

end

end

%print the usualminutes

for z=1:meerdan

jaar=unusualminutes(z,6);

maand=unusualminutes(z,7);

dag=unusualminutes(z,8);

uur=unusualminutes(z,9);

minuut=unusualminutes(z,10);



if maand<10


end


file=matfile(file);




countertje=1;

h=1;



h=h+1;



end



39

h=figure;



if minuut==60

if uurrij<28800

counter8=0;




0 c(counter8)]);

hold on

end


for b=28800-uurrij




hold on

end

else

counter8=0;





hold on

end

end

else

if uurrij<28799

counter8=0;





hold on

end

else

counter8=0;





hold on

end

end

end


%mu = [gaussianmat(positieanomaly,1)...


%Sigma = [gaussianmat(positieanomaly,3)...

gaussianmat(positieanomaly,4); gaussianmat...

(positieanomaly,4) gaussianmat(positieanomaly,5)];

%x1 = 0:1:31; x2 = 0:1:80;

%[X1,X2] = meshgrid(x1,x2);

%F = mvnpdf([X1(:) X2(:)],mu,Sigma);

%F = reshape(F,length(x2),length(x1));

%mvncdf([0 0],[1 1],mu,Sigma);

%contour(x1,x2,F,[.0001 .001 .01 .05:.1:.95 .99 .999 .9999]);

%xlabel('frequency'); ylabel('amplitude');

%line([0 0 1 1 0],[1 0 0 1 1],'linestyle','--','color','k');

%scatter(gaussianmat(positieanomaly,1),...


40

caption=sprintf('high minute correlation, ...



hold off

saveas(h,sprintf('highcorrelation_2d_FIG_%d.fig',z));

close all;

end

A.2.2. Cluster Contextual feature vectors

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% contextual anomalies based on 5d %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%load indexgaussian5d.mat uit gaussian 5d normal

%load N_10_featurevectormatrix

%save N_12_bestModel

%save matrix_to_cluster_standardized

%save matrix_to_cluster

%save minuutmatrix

%save minuutmatrix3

%save N_12_featurevectormatrix

year=2014;

zero=num2str(0);

streepje=('-');

formatmat=('.mat');

puntmat=('.mat');



counter1=0;

counter2=0;

for maand=6:18

month=maand+4;

if month>=13

year=2015;

month=month-12;

end


if month<10


end


feature_file=strcat...

(features,streepje,year_string,...










end

end



41

%maak nieuwe matrix per minuut feature vector (12 digits)




counterminuut=0;


rij=ceil(b/5);

if mod(b,5)==0

kolom=5;

else

kolom=mod(b,5);

end


if mod(b,5)==1


end

if b>1




else

counterminuut=1;


end

else

counterminuut=1;


end

end

minuutmatrix2=minuutmatrix;

minuutmatrix3=zeros(aantalrijen,25);

for w=1:aantalrijen

for a=1:5

minuutmatrix2(w,(a*2)-...

1)=N_10_featurevectormatrix(minuutmatrix(w,a),1);

minuutmatrix2(w,(a*2))=N_10_featurevectormatrix...

(minuutmatrix(w,a),2);

minuutmatrix3(w,(a*5)-4)=N_10_featurevectormatrix...








minuutmatrix3(w,(a*5))=N_10_featurevectormatrix...


end

hour=minuutmatrix(w,16);

minuut=minuutmatrix(w,17);

hour_float=hour+(minuut/6)/10;

timeangle=(360/24)*hour_float;

minuutmatrix2(w,11)=cosd(timeangle);

minuutmatrix2(w,12)=sind(timeangle);

end

minuutmatrix=minuutmatrix2;

matrix_to_cluster=minuutmatrix2(:,1:12);

matrix_standardize1=matrix_to_cluster(:,1:10);

matrix_standardize2=matrix_to_cluster(:,11:12);

gem1=mean2(matrix_standardize1);

42

stdev1=std2(matrix_standardize1);

gem2=mean2(matrix_standardize2);

stdev2=std2(matrix_standardize2);

matrix_standardize1=(matrix_standardize1-gem1)/stdev1;

matrix_standardize2=(matrix_standardize2-gem2)/stdev2;

matrix_to_cluster(:,1:10)=matrix_standardize1;

matrix_to_cluster(:,11:12)=matrix_standardize2;

matrix_to_cluster_standardized=matrix_to_cluster;

% norm1 = matrix_normalize1 - min(matrix_normalize1(:));

% norm1 = norm1 ./ max(norm1(:));

% norm2 = matrix_normalize2 - min(matrix_normalize2(:));

% norm2 = norm2 ./ max(norm2(:));

% matrix_to_cluster_normalized=zeros(502672,12);

% matrix_to_cluster_normalized(:,1:10)=norm1;

% matrix_to_cluster_normalized(:,11:12)=norm2;

%cluster die matrix nu met gmm

N_12_kmax=60;

N_12_maxiter=100;

N_12_replicates=10;

%Create and select the best GMM with AIC and determine k

N_12_AIC=zeros(1,N_12_kmax);

N_12_GMModels=cell(1,N_12_kmax);

N_12_options=statset('MaxIter',N_12_maxiter);

%'Replicates',N_10_replicates,

for k=1:N_12_kmax

N_12_GMModels{k}=fitgmdist(matrix_to_cluster,k...

,'MaxIter',N_12_maxiter,'RegularizationValue'...

,0.1,'Start','randSample','CovarianceType','full');

N_12_AIC(k)=N_12_GMModels{k}.AIC;

save('N_12_GMModels.mat','N_12_GMModels','-v7.3');

end

[minAIC,numComponents]=min(N_12_AIC);

N_12_bestModel=N_12_GMModels{numComponents};

N_12_featurevectormatrix=zeros(numComponents,12);

for c=1:numComponents

component_mu=N_12_bestModel.mu(c,:);

component_sigma=N_12_bestModel.Sigma(:,:,c);

for t=1:12

N_12_featurevectormatrix(c,t)=component_mu(1,t) ;

end

end

A.2.3. Define contextual anomalies based on clustering

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Posterior contextual anomaly 5d %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%open N_12_bestModel bij contextual 5d

%open matrix_to_cluster_standardized

%load minuutmatrix

%save contextualanomaly5d

%save contextualanomalies5d

%save contextualanomalydata5d

%save contextualanomalydatums5d

%save indexgaussian5d

%save featurevectorgaussian

43

[rijen kolommen]=size(matrix_to_cluster_standardized);

y=pdf(N_12_bestModel,matrix_to_cluster_standardized);

percentielthreshold= prctile(y,0.05);

minderdan=sum(y<percentielthreshold);

contextualanomaly5d=matrix_to_cluster_standardized(:,1);

contextualanomalydata5d=minuutmatrix(:,13:16);

contextualanomalies5d=[];

contextualanomalydatums5d=[];

for u=1:rijen

if y(u)<percentielthreshold

contextualanomaly5d(u)=1;

contextualanomalies5d=[contextualanomalies5d...

;minuutmatrix(u,1:12)];

contextualanomalydatums5d=[contextualanomalydatums5d....

;contextualanomalydata5d(u,:) y(u) u];

else

contextualanomaly5d(u)=0;

contextualanomalydata5d(u,:)=0;

end

end

contextualanomalydatums5d=sortrows(contextualanomalydatums5d,5);

[pointanomalies kolomblabla]=size(contextualanomalydatums5d);

indexgaussian5d=cluster(N_12_bestModel,matrix_to_cluster_standardized

);

featurevectorgaussian5d=zeros(rijen,12);

for h=1:rijen

index=indexgaussian5d(h);

featurevectorgaussian5d(h,:)=N_12_bestModel.mu(index,:);

end


A.2.4. plot spectrograms contextual anomalies

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% spectrograms contextual anomalies 5d %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%load contextualanomaly5d

%load contextualanomalies5d

%load contextualanomalydata5d

%load contextualanomalydatums5d

%load minuutmatrix3

clc

year=2014;

zero=num2str(0);

streepje=('-');

formatmat=('.mat');

puntmat=('.mat');



counter1=0;

counter2=0;

for maand=6:18

month=maand+4;

if month>=13

year=2015;

month=month-12;

44

end


if month<10


end


feature_file=strcat(features,streepje,...










end

end




[rijen, kolommen]=size(contextualanomaly5d);

aantalanomaliesvorig=0;

aantalanomalies=0;

eind=(rijen)-1;

%als er een anomaly is, de hoeveelste minuut in dat uur?

anomalycounter=0;

[rijendata kolomendata]=size(contextualanomalydatums5d);

welkeminuut=zeros(rijendata,5);

for n=1:rijendata

anomalycounter=anomalycounter+1;

welkeminuut(anomalycounter,1)=contextualanomalydatums5d(n,1);




uur=contextualanomalydatums5d(n,4);

hour=uur;

nogeencounter=0;

idx2=contextualanomalydatums5d(n,6);

while hour==uur




end

welkeminuut(anomalycounter,5)=ceil(nogeencounter/5);

end

[rijenanomalies,kolommenanomalies]=size(welkeminuut);

%for z=1:3

for z=1:rijenanomalies

jaar=welkeminuut(z,1);

maand=welkeminuut(z,2);

dag=welkeminuut(z,3);

uur=welkeminuut(z,4);

minuut=welkeminuut(z,5);



if maand<10


end

45


file=matfile(file);




countertje=1;

h=1;



h=h+1;



end



h=figure;



if minuut==60

if uurrij<28800

counter8=0;





hold on

end


for b=28800-uurrij




hold on

end

else

counter8=0;





hold on

end

end

else

if uurrij<28799

counter8=0;





hold on

end

else

counter8=0;





hold on

end

end

end

for t=1:5

hold on;

mu = [minuutmatrix3(z,(t*5)-4) minuutmatrix3(z,(t*5)-3)];

46

Sigma = [minuutmatrix3(z,(t*5)-2) ...

minuutmatrix3(z,(t*5)-1); minuutmatrix3(z,(t*5)-1)...

minuutmatrix3(z,(t*5))];

x1 = 0:1:31; x2 = 0:1:80;





contour(x1,x2,F,[.0001 .001 .01 .05:.1:.95 .99 .999 .9999]);



hold on;


end

caption=sprintf('contextual anomaly, ...



hold off

saveas(h,sprintf('contextual_anomalies_FIG_%d.fig',z));

close all;

end

Talis Vertriest Urban Data Mining applied to sound sensor networks

Documents

Transcript of Talis Vertriest Urban Data Mining applied to sound sensor networks