Piano Transcription using Wavelet Decomposition and...

Piano Transcription usingWavelet Decomposition and Neural Networks

Esben Madsen, Johnni Thomsen Pedersen and Louise Baldus VestergaardGroup 742, Supervisor: Søren Krarup Olesen

Abstract

This paper examines the possibility of transcribing the notesplayed on a piano using a simple feature extraction and neu-ral networks. Earlier transcription systems using neuralnetworks have used considerably complex algorithms to han-dle polyphonic music.

We have implemented the Daubechies D4 wavelet decompo-sition in ANSI C for feature extraction and suggested a neu-ral network feed-forward structure for note detection.

88 networks were constructed, one for each note on a piano.Training of the 88 networks have been done with a reducedset of training data; each note of one specific piano have beenused. The results achieved were inconclusive; the overall per-formance are not satisfactory, perhaps due to too few train-ingdata or non-optimal wavelet decomposition. From the re-sults we find it acceptable to conclude that it is possible touse wavelet decomposition as a feature extraction tool for aneural network, but the ammount of training data must begreater and other wavelet decompositions should be exam-ined.

Keywords: Piano transcription, wavelet decomposi-tion, neural networks.

1 Introduction

Various solutions have been proposed for creat-ing an automatic transcription system for mu-sic composers and musicians in general [1]. Asthere are many different genres of music and in-struments, most of the previous transcription sys-tems have focused on either classification of in-struments [2], detection of notes [3] or rhythmplayed [4] or a combined version of these [5]. Asa full transcription system is out of scope for thisresearch, the aim has been limited to transcriptionof one specific piano, with focus on feature extrac-tion.

1.1 Note Characteristics

Piano notes consists of a fundamental frequency,F0, and overtones. The overtone partial frequen-cies are slightly inharmonic [1, p.203]. The pianois a pitched instrument [1, p.167], and feature ex-traction of the piano notes should be solved bysome kind of pitch detection. Since piano soundsconsist of several frequency components, pitchdetection should either aim to find the funda-mental frequency played, or to extract the signif-icant features of the note. The notes of a pianospans over a little more than eight octaves, hencethe frequency contents of the seperate tones andovertones might overlap when notes are playedsimultaniously. For a transcription to be usefullfor musicians, the transcription should be repre-sented on a note sheet, hence the time at whichthe note would be played is important too.

1.2 Time vs Frequency Analysis

Music can be interpreted in the time and/or fre-quency domain. In the time domain, music isrecorded and played. Detection of notes presentcould be achieved with cross correlation methods.In the frequency domain it can be represented andunderstood by e.g. Fourier Transform(FT) [1, p.21]. FT reveals the frequency contents of a signal,but it is only well defined for infinite length, sta-tionary, continous sine waves. This is not fullyusefull for music transcription, since the signalprocessed is neither infinite nor stationary [1]. In-stead a time-frequency representation should bechosen.

1

2 System Description

The constructed transription system can basicallybe caracterized as a pattern recognition system,and thus a nerual network as used in this systemis one of several options.

The transcription system converts a piece of sam-pled piano music to a representation of notes ona time scale. The system was divided into theblocks shown on figure 1.

FIGURE 1: Block diagram of the sys-tem

The two most important blocks were the featureextraction and the note recognition, and they re-cieved the most attention in this paper.

The feature extraction block had the purpose ofcompressing the massive amount of informationin the sound data to as few as possible meaning-ful features, that could be analyzed further. Thisblock was implemented as a wavelet decomposi-tion, described in 4.

The note recognition block was used for analyz-ing the features extracted from the signal. Thiswas implemented with neural networks, and theimplementation described in 5.

The decision/discrimination block was meant togive a reliable binary result based on the noterecognition. This was done simply with a thresh-old on the output of the neural networks.

Presentation of the output could be done in vari-ous ways. For a fully working system, presenta-tion on a note sheet and/or as a MIDI file wouldbe desired, but for this research, the presentationof data was done simply by viewing the outputfrom the decision/discrimination block.

3 Method overview

In the following the term “pitch” is used. The def-inition of pitch in this paper is the same as used inthe MIDI protocol. Most musicians would proba-bly call the 88 keys on a piano for A0 to C8, withA4 having a fundamental frequency of 440 Hz.Thus A0 would have a fundamental frequencyof 27.5 Hz. Since music is percieved dyadically,there is uneven spacing between the fundamen-tal frequency of the notes. It is often practical tohave an expression for these with even spacing.The MIDI definition is:

Pitch = 69 + 12 · log2(f

440) (1)

A4 now corresponds to pitch 69. The lowest noteon the piano is pitch 21, the highest is pitch 108.

As preprocessing, the music is split into blocksof 4096 samples, corresponding to a little under110 of a second, since the sampling frequency is44.1 kHz. The processing of a sound block is il-lustrated on figure 2, following these steps:

1. A block of 4096 samples of the wave file ispicked out.

2. The block is processed, using the imple-mented wavelet transform, which is de-scribed in 4.

3. 112 predefined coefficients of the waveletdecomposition is extracted.

4. The coefficients are fed to each of the 88 neu-ral networks, each network charged withthe detection of a specific note.

5. Each neural network then present a valueon the output. The value indicates howlikely it is that the note, which this spe-cific network was trained to recognize, waspresent in the block.

2

FIGURE 2: Data flow through the sys-tem

The array of neural networks will enable detec-tion of polyphonic piano music.

4 Feature Extraction

For the feature extraction, it was decided to testwhether a wavelet decomposition would suffice,compared to the more complex methods pre-viously used, like for instance the networks ofadaptive oscillators used by Marolt in [5].

The wavelet transforms have some interestingfeatures compared to other methods of extractingfrequency eg. the Fourier transform.

• In it’s commonly used form, the scale of thetransform is dyadic, that is, each frequencyband contains one octave.

• The algorithms are very effective – thecomplexity of a Discrete Wavelet Trans-form (DWT) for some algorithms is O(N).For other algorithms the complexity can inworst case be up to O(N · log(N)), like theFast Fourier Transform (FFT). [6, p. 40]

• A wavelet expansion (inverse transform)can give a better description and separationof local changes than a fourier transform [6,p. 7], that by definition only can represent asignal as a combination of sines.

• A wavelet can be designed for a specifictype of signals, and is thus able to containsharp corners and represent discontinuitiesor sharp corners with only a few coeffi-cients, where the Fourier Transform wouldrequire a lot of coefficients.

The mother and father1 wavelet pair chosen forimplementation was the Daubechies D4 algo-rithm, which is used in a wide range of applica-tions, and often used for examples in litteratureon wavelets, due to it being both very simple andvery effective.

The DWT was implemented using the liftingscheme2 [7], which improves the complexity byroughly 50% compared to the standard filter bankimplementation [7, p. 264].

To balance frequency and time resolutions, an in-put of 4096 samples (with sample rate 44.1 kHz)is selected, giving an output of the same size, andthe DWT is performed recursively, giving 12 sub-bands of length 20 to 211, each band containingone octave (not necessarily equal to the piano oc-taves).

1The mother wavelet is the wavelet function, and the father wavelet is the scaling function2Lifting scheme is an effective way of implementing a wavelet decomposition

3

Frequencyrange (Hz)

Pitch(MIDI)

Number ofsamples

Usedsamples

112

21-43 21-28 443-86 29-40 886-172 41-52 16 16172-345 53-64 32 32345-689 65-76 64 32689-1378 77-88 128 321.38-2.76 k 89-100 2562.76-5.51 k 101-108 5125.5-11.0 k 102411-22 k 2048

4096 112

TABLE 1: Contents of the wavelet de-compositions and the used samples

Table 1 illustrates how parts of the wavelets areselected for further analysis. As the upper 2bands (5.5-11 and 11-22 kHz) does not containmuch relevant information, disredarding onsetdetection [1, p. 108] (the highest fundamental fre-quency is just above 4 kHz), these are discardedhereby already limiting the data by 1

4 . Further-more, by looking at the transforms of recordedsamples, the scope is limited to only includingthe four octave bands from 86 Hz to 1375 Hz, asthese contain the most information/amplitude re-sponse overall. Finally, only the first 32 samplesfrom each of the two bands with 64 and 128 sam-ples are used, giving a total of 112 samples to useas input to the neural networks.

There are of course other types of wavelets andscaling functions than the one used, includingHaar, which is the most simple type as wellas more comlex constructions, like the Cohen-Daubechies-Feauveau wavelet, however a reviewof multiple types is out of scope of this article.

5 Note Recognition

To detect which notes were present in the pi-ano music, some note recognition was needed.As previously mentioned, neural networks werechosen for this task, since they have been used byothers with good results [1] [5].

A neural network consists of at least an input andan output layer, and possibly some hidden lay-ers, each containing a number of neurons. Theneural network takes as input a feature set, whichwill have to be generated from the input signal,and via a weighing vector an output result isachieved.

Neural networks are trained prior to recognition.This training can be supervised, telling the net-work what the desired output are, given a specificinput. It is also possible to train a network un-supervised. Typically, the training algorithm willthen try to classify the input in 2 or more prepro-grammed classes.

Neural networks have a huge advantage whenit comes to runtime complexity. When trainedsufficiently, they are fast to execute and don’ttake up much memory. Due to their nature, alarge amount of training data is needed by super-vised learning in order to make the results suffi-ciently reliable. If the training data doesn’t con-tain enough “general” information of the class tobe detected, it can respond very well to trainingdata and not well at all to unknown data. Thisfact and the time and computational resources re-quired during training is the foremost drawback.It is also very difficult to construct a “cookbook”neural network; a lot of iterations are needed.

The major force of neural networks are the effe-ciency during run-time. Once properly trained,the execution takes up a very small amount ofmemory. In the case of the feed forward network,there are no need to save any calculations, exceptthe results needed for the next neuron.

A neuron consists of weighting coefficients foreach input, these are summed, and possiblybiased and an activation function handles thesummed output [8, p. 11].

The activation function can theoretically be anykind of function, but in actual implementation3 types are predominant: The threshold func-tion, the piecewise-linear function and the sig-moid function.

According to [8, p. 14], the sigmoid is the com-monly used. As no exact structures of neural net-works are mentioned in the method descriptionof neither [5] nor [9], the sigmoid activation func-tion was chosen.

The network was chosen to have:

4

• supervised learning, since training - andtest data are avaliable

• multilayer structure, due to the complexityof the desired system

• feed-forward, fully connected structure

It was chosen to implement a network with thefollowing neuron - and layer structure:

• 112 neurons in the input layer, for each ofthe extracted sample values

• Three hidden layers with 20, 30 and 30 neu-rons respectively

• An output layer with one neuron

This structure of the network is outlined in thebottom of figure 2.

5.1 System Development

5.1.1 Data

Extraction of commercially prerecorded piano se-quences have been beyond the scope of this arti-cle. Instead it has been attempted to use a ratherreduced sample space. For the test- and trainingnotes respectively, a separate sequence of all 88piano notes were recorded to a wave file with aresolution of 16 bit and a sample rate of 44.1 kHz.Each note was played, keeping the key depressedsomewhere between one half and one second.Both sequences were recorded using the same pi-ano. As the conditioning of both training and testdata were identical, they will from here on be re-ferred to simply as “the data”. An examinationof the recorded wave files showed a considerableincrease in signal energy at the onset of each note.This is due to the percusive nature of the hammerhitting the strings [1, p. 108]. Each note were ex-tracted into its own wave file, containing 30.000samples, starting 2000 samples before maximumenergy level were reached for each note.

Separate data were constructed for each specificneural network, as displayed in figure 4. Thisway each network could be trained with their tar-get note present in half of the data. The data con-tained 1.000 sequences consisting of 4096 samplesof mixed piano notes, totalling a little more than

4 million 16 bit samples. It was made sure thatroughly half of the sequences contained the notethat the specific network were to detect. Each se-quence consisted of 0 to 3 simultaneously playedpiano notes with uniform distribution of bothnumber of notes played and which notes wasplayed, not considering the target note. To takethe hammer stroke into account and get more dif-ferent looking data from the sample set, the 4096samples to be extracted from each single notewere chosen at random. Each note could be takenarbitrarily from the first sample to the 25.000ndsample. Figure 3 shows an example of data for thenetwork to detect pitch 69. This was done to min-imize the risk of overfitting the networks. Eachnetwork was trained with 5 epochs (repetitions)of training data. Figure 3 shows an example ofdata for the network to detect pitch 69.

FIGURE 3: Example of 5 consecutivegenerated sequences

5

FIGURE 4: A flowchart of note extrac-tion and training of the neural net-works

6 Results

MatLab have been used for generation and train-ing of the neural networks.

There is a huge difference in how well a given net-work performs. Figure 5 shows the Type I andII errors for all networks, where Type I is a falsepositive/hits, type II is a false negative/misses.These errors are taken from a dataset of 1000 se-quences. Until around pitch 60, type II errors areby far predominant, meaning that extremely few

hits are detected. Around pitch 80 the amountof type I and type II error roughly evens out, butwithout much consistency from pitch to pitch. Allin all 79.4% of all errors are Type II errors.

FIGURE 5: Type I (solid red line) andtype II (blue stippled line) errors foreach network

Disregarding whether a given netresponse is con-sidered a hit or miss, figure 6 shows both the leastsquare error and the mean absolute error.

30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

FIGURE 6: Least square error(solidred line) and the mean absolute er-ror(stippled blue line)

As end result we achieved correct detection oftarget note in 33.1% of the actual note occurenceand correct absence of target note in 82.9 % of thecases. An overview is displayed table 2.

6

Output1 0

Input1 33.1% 66.9%

0 17.7% 82.9%

TABLE 2: Overview of results – Type Iand type II errors as well as correctlydetected notes.

It should be noted that pitch 71 performs excep-tionally well, so it was decided to investigate fur-ther. A new training and test sequence was run,this time with up to 10 different notes present and10.000 sequences. The note was correctly “de-tected” in 72.1% of the cases when present andcorrectly “not detected” in 97.7% when missing.

7 Discussion

Overall the results are ambiguous. The rootcause is considered to be the limited training datapresent. In the case of pitch 71, it is well beyondreasonable doubt that the results are not haphaz-ard. The extremely fine results for pitch 71 couldbe caused by a large corrolation between train-ing and testdata. But since it still performs well,even when using up to 10 simultaneous notes, wespeculate that the main cause is the choice of fea-tureset used. This indicates that the network re-sponse relies heavily on what parts of the waveletis used. We find it acceptable to conclude that itis possible to use wavelet decomposition as a fea-ture extraction tool for a neural network. Furtherresearch should focuse on evaluation of best fit-ting mother wavelet as well as selection of coeffi-cients from the wavelet decomposition.

The cause of the notable rise in variance for TypeI and II errors on figure 5 can be explained by thechoice of featureset. As seen in table 1, frequen-cies above 1.4 kHz are not represented. Accordingto formula 1, this roughly corresponds to pitch 87.That means that the fundamentals of pitch 88 -108is not represented in the featureset and that pitch75 - 87 is only represented by their fundamentalfrequency.

our decision algorithm is rather crude; if a net-work outputs more than 0.5, we consider it a hit.A much more plausible method would be to em-ploy a statistical framework. Both based on ac-cumulated a priori knowledge of the frequencywith which the note is played, but also which in-tervals seems reasonable; an interval of a smallsecond occurs with extremely lower probabilitythan for example an octave. Since the outputsfrom each network is not binary, these can easilybe weighed to accomodate a different statisticalprobability set depending on musicstyle.

8 Conclusion

A simplified framework for polyphonic pianonote recognition has been made. The goal hasbeen to determine whether a wavelet decompo-sition could be used for feature extraction for aneural network and this has been achieved onlyto some extent. The overall results does not fullyconfirm the usability of wavelets for decompos-tion, but a certain pitch is consequently perform-ing convincingly. Our test results have not shownwhether the decomposition provides an insuffi-cent featureset for the network, the network suf-fers from non-optimal design or the reduced sam-pleset used are to blame. Further studies in thefield should examine this issue.

A D4 wavelet decomposition lifting scheme hassuccefully been implemented in ANSI C. It hasbeen concluded that wavelet decomposition isvery efficient regarding execution speed com-paired to the FFT. The actual implementation ofD4 are based on Daubechies own publicationsand is more efficient than the default decompo-sition method, which uses filter banks.

9 Acknowledgements

We would like to thank Uwe Hartmann for an in-troduction to neural networks.

References

[1] Anssi Klapuri and Manuel Davy, editors. Sig-nal Processing Methods for Music Transcription.

7

Springer, 1st edition, 2006. ISBN 0-387-30667-6.

[2] Geoffroy Peeters Perfecto Herrera-Boyer andShlomo Dubnov. Automatic classification ofmusical instrument sounds. Journal of NewMusic Research, 2003.

[3] Anssi Klapuri. Automatic transcription ofmusic. Proceedings of the Stockholm Mu-sic Acoustics Conference, Stockholm, Swe-den, August 6-9, 2003 (SMAC 03).

[4] Anssi Klapuri and Manuel Davy, editors. Sig-nal Processing Methods for Music Transcription.Springer, 1st edition, 2006. ISBN 0-387-30667-6. Chapter 4: Beat Tracking and Musical Me-tre Analysis by Stephen Hainsworth.

[5] Matija Marolt. Transcription of polyphonicpiano music with neural networks. Proceed-ings of Workshop on Current Research Direc-

tions in Computer Music, Barcelona, Spain,November 15-17, 2001.

[6] Ramesh A. Gopinath C Sidney Burrus andHaitao Guo. Introduction to Wavelets andWavelet Transforms. Prentice Hall, 1st edition,1998. ISBN 0-13-489600-9.

[7] Ingrid Daubechies and Wim Sweldens. Fac-toring wavelet transforms into lifting steps.Journal of Fourier Analysis and Applications,1998. http://www.springerlink.com/content/r0n381423k7v8655/.

[8] Simon Haykin. Adaptive Filtering Theory. In-formation and System Sciences. Prentice Hall,4th edition, 2002. ISBN 0130901261.

[9] Monti Bello and Sandler. Techniques for auto-matic music transcription. In Proceedings ofthe first International Symposium on MusicInformation Retrieval (ISMIR-00), Plymouth,Massachusetts, USA, October 2000.

8

Contents

1 Preface 5

2 General Guidelines 5

3 List of Abbreviations 7

I Analysis 8

4 External Restrains 8

5 Initial Specification of Requirements 105.1 Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105.2 Instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105.3 Harmony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105.4 Detection Speed . . . . . . . . . . . . . . . . . . . . . . . . . . 105.5 Success Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

6 Detection of Note Onset 116.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

7 Methods for Detection of Monophonic and Polyphonic Sig-nals 137.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

7.2.1 Off-line . . . . . . . . . . . . . . . . . . . . . . . . . . 137.2.2 On-line . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

8 Pitch Detection 168.1 Approaching the Problem from a Physical Angle . . . . . . . . 168.2 Previous Pitch Detection Studies . . . . . . . . . . . . . . . . 17

9 Wavelets and Assement of Efficiency 189.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1

10 Neural Network 2110.1 Method Overview . . . . . . . . . . . . . . . . . . . . . . . . . 2110.2 Suitability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

11 Data Preprocessing for a Neural Network Proposed by Oth-ers 23

12 Deciding Method for Further Analysis 2412.1 Preliminary Analysis . . . . . . . . . . . . . . . . . . . . . . . 24

12.1.1 Statistical Methods . . . . . . . . . . . . . . . . . . . . 2412.1.2 Methods based on Auditory Models . . . . . . . . . . . 2412.1.3 Neural Networks . . . . . . . . . . . . . . . . . . . . . 2512.1.4 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . 25

12.2 Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

13 General Construction 2713.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2713.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2713.3 Sampled Piano Music . . . . . . . . . . . . . . . . . . . . . . . 2713.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 2713.5 NN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2813.6 Decision/Discrimination . . . . . . . . . . . . . . . . . . . . . 2813.7 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

14 General Thoughts 2914.1 optimizing the networks . . . . . . . . . . . . . . . . . . . . . 2914.2 Optimizing the Wavelet Decomposition . . . . . . . . . . . . . 2914.3 Optimization of the Decision Algorithm . . . . . . . . . . . . . 29

II Design 30

15 Architectural Considerations for the Neural Network 3015.1 Adjustable Network Elements . . . . . . . . . . . . . . . . . . 3015.2 Classes of Networks . . . . . . . . . . . . . . . . . . . . . . . . 3115.3 Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3215.4 Types of Neurons . . . . . . . . . . . . . . . . . . . . . . . . . 33

2

16 Overall Architecture 3516.1 Wavelet Decomposition . . . . . . . . . . . . . . . . . . . . . . 3616.2 Neural Network Structure . . . . . . . . . . . . . . . . . . . . 36

16.2.1 Feature Set versus Pitch . . . . . . . . . . . . . . . . . 3716.2.2 Training and Test . . . . . . . . . . . . . . . . . . . . . 37

III Implementation 39

17 Software design 3917.1 Userguide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4017.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 41

18 Real time considerations 41

19 FANN – Fast Artificial Neural Network library 42

20 Port Audio 50

IV C Source Code 55

21 Makefile 55

22 main.c 55

23 fileio.h 60

24 fileio.c 60

25 waveread.h 61

26 waveread.c 62

27 wavelet.h 64

28 wavelet.c 65

29 ann.h 66

30 ann train.c 68

3

31 ann test.c 69

32 ann run.c 70

V Matlab Source Code 71

33 pianocomp.m 71

34 pianomix.m 71

35 featureextraction.m 73

36 NNgen.m 74

37 resultpresentation.m 75

4

1 Preface

Transcription of musical scores has through time been a tedious manual taskto be taken on only by highly trained musicians. As the computer technologyin the 80’s matured to produce the “personal computer”, sporadic research inautomated music transcription suddenly became more focused; a relativelycosteffective platform was now available.

Today there is still room for both improvements and development of themethods used, as a universal method of transcription has not been discovered.Not only the instrument(s) to be transcribed, but also the style of music hasa profound impact on the efficiency of a given method.

The (grand) piano is by most considered the “reference instrument”, prob-ably due to conventions inherrited from classical music. It is assumed thattranscription of music played on the piano will have the broadest interest topotential customers. Based on this assumption, this documentation will an-alyze potentially efficient methods for transcription of piano generated musicand describe the implementation of one such. This actual implementationwill be called Musician’s Transcription Tool (MTT).

2 General Guidelines

This documentation is to be wieved as a collection of work sheets. The aimwill be to arrange these in a plausible manner, but this will not necessarilyalways be the case. It is suggested to use the table of contents to look uprelevant information regarding a subject.

Quotations and references to other works will be put in the footnote on agiven page. Also, a complete compilation of used litterature will be includedat the end of this document.

5

This project has been composed by:

Esben Madsen

Johnni Thomsen Pedersen

Louise Baldus Vestergaard

6

3 List of Abbreviations

ANN Artificial Neural Networkcommonly called Neural Network (NN)

D4 Daubechies 4-tap waveletDWT Discrete Wavelet TransformDyadic Related by a factor 2 (like octaves)MIDI Musical Instrument Digital InterfaceMTT Musicians Transcription ToolNN Neural Network

sometimes referred to as Artificial Neural Network (ANN)

7

Part I

Analysis

4 External Restrains

Since this is an University Project for 7th semester, there are certain framesprimarily set by the Study Guidelines concerning objectives and documen-tation.1 The purpose of this semester project is design, implementation andanalysis of a solution to a practically occurring problem, which naturallyrequires stochastic signal processing methods and/or transmission of signals.

The project period runs from September 1st to December 19th 2008.Project is documented in three ways:

• A scientific article

• A poster with presentation on SEMCON 08.

• Edited worksheets, which document the details of the project

From the study guidelines further information states the goals for this projectunit1:

“ The project unit takes its starting point in a practical problem,which reflects the students’ chosen specialization, and where sig-nal processing methods and/or transmission of signals is a naturalaspect.

• Through a stepwise refinement process of the given applica-tion, a set of specifications are generated. There is no re-quirement for a real-time implementation (HW and/or SW),thus the specification can related to the behavioural level only.However, a real-time implementation is allowed in the projects,and the specification therefore has to be extended at all rele-vant points, in case such an implementation is included.

1esn.aau.dk/fileadmin/esn/Studieordning/Cand_SO_ed_aalborg_sep08.pdf p.14

8

http://esn.aau.dk/fileadmin/esn/Studieordning/Cand_SO_ed_aalborg_sep08.pdf

• Algorithms for the complete functionality (or parts hereof)are designed, and are next being applied for 1) a functionalsimulation, and possibly 2) a real-time implementation.

• In terms of the design phase, an analysis of the algorithmiccomputational and numerical properties is conducted.

• The implementation is next compared to the specification,and a comparison and evaluation is performed.

”

9

5 Initial Specification of Requirements

The following will be a description of the specific demands desired for therequired base functionality of the MTT. This specification has not been usedfor the actual implementation, but serves to document the process.

5.1 Platform

The MTT must be able to run on a PC or laptop with minimum specs:

• 1.8 GHz P4 processor or equivalent

• 1 GB ram

• Soundcard capable of recording and playback in 16 bits @ 44100 KHzsample rate.

• Either windows XP or Linux installed

5.2 Instrument

The MTT are to be able to detect and transcribe notes played on bothupright and grand pianos.

5.3 Harmony

The MTT are to be able to detect and transcribe up to 10 simultaneousnotes.

5.4 Detection Speed

The MTT are to be able to detect and transcribe notes played with a timeresolution of 50 ms.

5.5 Success Rate

The overall success rate is to be able to transcribe at least 80% correctdetection over a broad range of musical styles.

10

6 Detection of Note Onset

6.1 Purpose

It is assumed that to best detect pitch of a note, a proper placement of saidnote in time is needed. The begining of a note is called the note “onset”.This document will propose different methods for detecting that onset.

6.2 Methods

Effecient onset detection methods vary considerably with the instrument inquestion. If an instrument have a large transient at onset (e.g. percussioninstrument, piano and guitar), it is suggested to view the music from a powerperspective2. The suggested algorithm is:

Ej(n) =∑k∈kj

|STFTWx (n, k)|2 (1)

Where:STFT is the short time Fourrier transform of x(n)k is the discrete frequency indexW is the window used to weigh x(n)n is the time, at which the window is centered

To further optimize equation 1, a three point linear regression is pro-posed2. It is of interest to find the gradient of Ej(n) in order to detect thestart of the transient. For the specific three point linear regression, followingequations define this gradient:

Dj(n) =Ej(n+ 1)− Ej(n− 1)

3(2)

Where:Ej(n) is the energy envelope functionDj(n) is the gradient of Ej(n)

Although this method allows power measurement in distinct frequency-bands, it seems rather complex to calculate. It would be a good idea tocompare this method to a simpler method. It would be interesting to de-

2Klapuri & Davy 2006, p. 107 - 109

11

termine if the onset information can be obtained using the broadspecteredsignal. This could be done as in equation 3:

E(j, n) =

n+ j2∑

k=n− j2

x(k)2 (3)

In both cases a decision algorithm is needed to descriminate onset periodsfrom rest.

12

7 Methods for Detection of Monophonic and

Polyphonic Signals

7.1 Purpose

To detect the notes in music, it is necessary to correctly identify and, in thecase of multiple notes, separate the fundamental frequencies that the signalconsists of.

The main source for this worksheet is chapter 7 of Klapuri & Davy3.

7.2 Methods

The methods for detecting the fundamental frequencies (F0) in a polyphonicsignal can roughly be separated in the statistical approach, which this work-sheet will focus on, and an approach based on an auditory model. Theauditory model is based on the way human perception and separation ofconcurrent sounds is done, and will not be examined further – chapter 8 ofKlapuri3 can be used as a reference on this topic.

The statistical methods can basically be separated into the off-line ap-proaches, which are based on analysis of a constant signal, and on-line ap-proaches, which only uses the current sample or frame to estimate the signals.

7.2.1 Off-line

Off-line methods rely on analyzing a signal, that does not change in the cho-sen interval (no new or lost notes). As there must be no transition betweennotes in the processed waveform, this means, that an onset and offset detec-tion must be made beforehand. The signal is then modeled with a parameterestimation. Due to the signals being “complete” (no transitions), these meth-ods are very accurate, but also prove rather computationally heavy.

The Bayesian off-line model is mathematical and probabilistic, and itleads to the simplest model that explains a given waveform. The estimationof multiple fundamental frequencies (F0’s) is complex and possibly computa-tionally heavy, which is probably the reason this method has not been givenmuch attention3 (p. 203-204). Often the estimation is a maximum a posteri-ori (MAP) or minimum mean square error (MMSE) estimation . Apart from

3Anssi Klapuri & Manuel Davy, “Signal Processing Methods for Music Transcription”,2006

13

signal detection, the model may also be used for source separation (detectionof instruments), compression, pitch correction and other useful applications3

(p. 203-204). A proof of the performance is seen in the article “Bayesiananalysis of polyphonic western tonal music”4 which reports a 100% accuracyon one F0 and 71% on four fundamental frequencies.

7.2.2 On-line

The on-line methods uses only the current sample or frame in a sampledsignal for the analysis, and therefore has no requirement for a separate on-set/offset detection.

The Cemgil on-line processing is a MAP estimation, where the frequenciesare divided in a grid and then a “piano roll”5 estimation is performed for eachfrequency in the grid3 (p. 220).

On-line methods based on sliding windows include an approach by Duboisand Davy, where the signal is percieved as a Gaussian random walk for bothfrequencies and amplitude (number of notes can increase, decrease or remainconstant)3 (p. 221-223). Another approach is described by Vincent andPlumbey : Frequencies are divided in a fixed grid like in the Cemgil model, butthe parameter priors are independent of neighbouring frames. The unknownparameters are then MAP estimated and finally, the parameters of differentframes are linked together and reestimated3 (p. 223-225).

There are also on-line methods based on the bayesian model, which mostlyconsists of modeling the signal spectrogram and following harmonic trajec-tories3 (p. 225).Yeh and Robel has proposed a model that is based on generation of “candi-date notes”, that are evaluated with a score function. Further examinationof this, requires a look at the external sources, as the short text in the book3

(p. 225) is rather confusing.Dubois and Davy has introduced a method based on spectrogram modelingwith a zero-mean white Gaussian noise. This method is an extension to theirmodel based on the sliding window.Thornburg et al. has proposed a method for melody extraction, and it istherefore only possible to do monotone recognition.

4M. Davy, S. Godsill & J. Idier, “Bayesian analysis of polyphonic western tonal music”,Journal of the Acoustical Society of America, 2005

5Derived from “self playing” pianos, the piano roll is a representation of whether eachsingle note is present on a time scale.

14

Sterian et al. uses, in their model, a Kalman filter to extract sinusoidalpartials and grouping these into their sources.

15

8 Pitch Detection

The main sources for this worksheet is a web page by Professor Marina Bosifrom Standford University6, and chapter 4 in the book by Anssi Klapuri andManuel Davy7.

According to Klapuri & Davy7 there are four key characters for music,which is important when working with sound signals: pitch, loudness, dura-tion and timbre. The topic of this worksheet is pitch detection.

Pitch is defined as

a perceptual attribute which follows the ordering of sounds on afrequency-related scale extending from low to high. More exactly,pitch is defined as the frequency of a sine wave that is matched tothe target sound by the human listener. Fundamental frequency,(F0) is the corresponding physical term, and is defined for periodicor nearly periodic sounds only.

There are various ways to detect pitches in music. One way is to simulatethe human ear, since this is one of the most complex yet precise detectors.However, the complexity of this model is out of scope for this project, henceit will not be examined.

8.1 Approaching the Problem from a Physical Angle

Time domain detection can be done by observing the signal to detect peri-odicity. One could count the numbers of zero crossings, but though this is aneasy and cheap method, it is also very inaccurate, since small variations ofthe signal around the zero line might induce fatal errors. One more complex,yet also more precise way of time domain detection is autocorrelation.

Autocorrelation is a tool to find patterns in a signal and determine fun-damental frequencies. If the input is periodic, the autocorrelation functionwill be as well. If the signal is harmonic, the autocorrelation function willhave peaks in multiples of the fundamental frequency. This method is suit-able for e.g. speech recognition due to the low frequency range of speech

6http://ccrma-www.stanford.edu/~pdelac/154/m154paper.htm7Anssi Klapuri & Manuel Davy, “Signal Processing Methods for Music Transcription”,

2006

16

http://ccrma-www.stanford.edu/~pdelac/154/m154paper.htm

signals. This method might be very expensive because it includes a lot ofmultiply-add calculations.

frequency domain detection is Another approach. Here the signal is ex-amined in the frequency domain in order to detect the frequency spectrumof the signal. Here, there are also some different ways to detect pitch.

The signal can be broken down to small segments, which can each beevaluated by multiplying the signal with a window to get a Short TimeForuier Transformation(STFT) of the segment. One of the disadvantages ofthis method is that the signal is broken into equally sized segments whichis disadvantageous, since the spacing between the notes are nonlinear. Thismeans that less information is available in the high frequencies than in thelow.

8.2 Previous Pitch Detection Studies

Various scientists and acoustic engineers have examined the problem of tran-scribing pitches in music. Some of the more interesting results are derivedby (sources for the following was found in: Klapuri and Davy, section 8.4)7:

• Martin, who applied Ellis’s model to process signals consisting of morethan two simultanious sounds.

• Godsmark and Brown, who examined ways of auditory scene analysismodels. They discovered, that by applying these models, they wereable to transcribe 4 simultanious sounds.

• Marolt, which examined ways to transcribe piano music. Since thisis our main topic, his discoveries will be examined further later on.For now, it will be enough to know, that he applied time-delay neuralnetworks to identify each piano key sound, and by doing this carefully,he was able to transcribe with a good precision.

17

9 Wavelets and Assement of Efficiency

9.1 Purpose

The Fourrier transform is the decomposition of a given signal into a series ofsines. Each sine in the decomposition will feature both infinite energy andextremely strong autocorrelation. A consequence of the Fourrier transformis the lack of time/frequency information, meaning that greater resolution infrequency requires more samples, thereby making it imposible to determineat what instant a given component is added. In the attempt to decide whichnotes are played at a given time, the Fourier transform may not be suitable.

Another approach to signal decomposition was suggested around 1910 byHaar8. He concluded that if a signal was to be decomposed without sufferingfrom the lack of time/frequency information, the waveform to be the keyelement needed three main features: Finite energy, a weak autocorrelationand scaleability. A wavelet is one such waveform. The following will be ananalysis of wavelet ability and effeciency/complexity. An illustration of thetime/frequency resolution of the FFT algorithm and wavelets can be seen onfigure 1.

Time Time

Lowfrequenciesare better

resolved infrequency

Highfrequenciesare better

resolved intime

Freq

uenc

y

Freq

uenc

y

Figure 1: Comparison of time-frequency resolution for wavelets and fft.

8A wavelet tour of signal processing(1999) p. 7, Stefane Mallat9http://en.wikipedia.org/wiki/Image:Wavelet_-_Morlet.png, july 15 2005, all

copyrights declined

18

http://en.wikipedia.org/wiki/Image:Wavelet_-_Morlet.png

Figure 2: A Morlet wavelet9

9.2 Analysis

A wavelet is scalable and can be placed arbitrarily in time. Therefore the“generic” wavelet is dubbed the mother wavelet ψ. All wavelets in a givendecomposition stems from this wavelet and are called child wavelets. Theseare written as

ψa,b(t) =1√aψ(t− ba

) (4)

Where:a is the scaling factor, governs the frequency represented

by the waveletb is the placement in time

Different mother wavelets have been proposed, and some types of waveletsare often more suited than others for a given application. A “mother wavelet”is then the generic wavelet of a given type, f.ex. Haar, Daubechies, the“mexican hat” or Morlet, which is seen on figure 2.

A discrete wavelet transform (DWT) is the decomposition of the (discrete)signal into various child wavelets. The shorter wavelets will be able to easilyrepresent very fast signal transitions, while the longer wavelets representslower frequencies. A very nice feature, in relation to audio processing, is thedyadic nature of the decomposition. This means that analysis in octaves caneasily be accomodated.

When seing the actual implementation, this becomes apparent. The typeof mother wavelet to be used, determines the filter coefficients.

10http://en.wikipedia.org/wiki/Image:Wavelets_-_DWT_Freq.png, july 15 2005

19

http://en.wikipedia.org/wiki/Image:Wavelets_-_DWT_Freq.png

Figure 3: Wavelet decomposition is dyadic10

Figure 4: Continous downsampling by a factor 211

As far as efficiency goes, the computational complexity is O(n), i.e. alinear rise. This means that Discrete Time Wavelet Transform (DTWT) iseven more efficient than the FFT12.

9.3 Conclusion

The DWT could be used to determine diverse features of a music signal. Adiscussion and choice of mother wavelet is needed. Wether or not a specificdecomposition is suitable as input to a neural network is to be determined.

11en.wikipedia.org/wiki/Image:Wavelets_-_Filter_Bank.png, july 15 200512Introduction to wavelets and wavelet transforms(1998) p. 40, C. Sidney Burrus et al.

20

http://en.wikipedia.org/wiki/Image:Wavelets_-_Filter_Bank.png

10 Neural Network

This worksheet is about the Neural Network method. The concept will bedescribed, and the applicability for polyphonic music transcription will beconsidered. The purpose of this worksheet is to get an overview of the NNmethod, in order to determine whether or not it is a suitable tool for musictranscription in this project.

10.1 Method Overview

An NN is a system, which can be trained to recognize or identify nonlineari-ties when processing a signal. The method is suitable for systems, where theuser has some preliminary knowledge relevant for the classification. Beforethe network block a feature extraction of the input signal must be made. Itcan be done in various ways, e.g. wavelets or ear models13. The NN methodis inspired by the biological nervous system. It consists of weighted neuronsignals and a comparison algorithm. Neurons are models for the way thebiological nervous system perceive what they are exposed to. The weightingalgorithm is adjusted by training the system, using data and known outputvalues. A comparison is done between the output of the weighted neuronsignals, and these are compared to the known output. In each iteration theweighting function is adjusted. The result of these iterations will be thetrained system. By training the system to recognize the nervous signals tothe extend possible, the system should be able to process any related inputby its achieved algorithms. Figure 5 shows a block diagram of these relations.

10.2 Suitability

There are both advantages and disadvantages by applying the NN methodas a tool in this project. Earlier studies of various scientists14 have shownthat it is possible to accomplish useable results of music transcription byapplying a feature extraction and the NN method. However, there are no

13Article: ”Automatic music transcription and audio source separation”, M. D. Plumb-ley et. al., 2002

14e.g. Matija Marolt(phd from University of Ljubljana), A. Klapuri(Tampere Universityof Technology, Finland

21

Figure 5: Block diagram showing the training of a neural network; Source: Matlabdocumentation on Neural Networks

lectures regarding this method on this semester, and the complexity of NNis quite high.

It seems the method is suitable for music transcription, and when ex-amining the feature extraction analysis block, considering e.g. wavelets, itmight be possible to develop a more suitable transcription system than thealready achieved results by other scientists.

22

11 Data Preprocessing for a Neural Network

Proposed by Others

Through studies of litterature regarding the field of music transcription itseems the ammount of research is somewhat scattered on genre -, instrument- or note recognition.

One of the sources of information for this project has been Matija Maroltfrom University of Ljubljana, Slovenia15. He has, the last decade publishedarticles on music transcription using neural networks. The aim of the tran-scription has varied a bit, but the emphasis has been on Piano transcrip-tion. He has, together with a colleague, Marko Privosnik, from University ofLjubljana, worked on a piano transcription system called SONIC16. In theirpublication, they describe how they extract partial(meaning the data, theyfeed into the neural network for training) by feeding the piano signal throughfollowing steps16:

1. A Gammatone filterbank, which split the signal into several frequencychannels.

2. Meddis hair cell model, which converts each gammatone filter outputinto a probalistic representation of firing activity in the auditory nerve.

3. Network of up to ten adaptive oscillators, which has phase, frequencyand output as adjustable variables, and extracts partials for the noterecognition.

Their system was tested with different piano pieces in different recordings,and it were able to detect up to 95% of the notes, with 13% extra notesdetected by fault. More test results can be viewed in their article16. Thepreprocessing seems quite complex, and induces thoughts on whether it couldbe done in a simpler way.

15Source:http://www.fri.uni-lj.si/en/personnel/271/oseba.html16Source: M. Marolt, M. Privosnik, SONIC : a system for transcription of piano music, in

Kluev, V., D’Attelis, C. E., Mastorakis, N. E. (eds.), Advances in automation, multimediaand video systems and modern computer science, WSES Press, cop. 2001. (http://lgm.fri.uni-lj.si/matic/clanki/malta2001.pdf)

23

http://www.fri.uni-lj.si/en/personnel/271/oseba.html

http://lgm.fri.uni-lj.si/matic/clanki/malta2001.pdf

http://lgm.fri.uni-lj.si/matic/clanki/malta2001.pdf

12 Deciding Method for Further Analysis

After a preliminary analysis of different methods for piano transcription,a decision has to be made on which methods are to analyze further andultimately implement.

This document attempts to summarize the results of the initial analysis,in order to form a basis for the decision.

The requirements to the system, stated that a real time implementationis wanted, so methods with high computational complexity of the runningsystem is unwanted. The system is also required to give a representation forevery combination of played tones, and hence whether each individual tonehas been played at a given time.

12.1 Preliminary Analysis

The initial analysis has been focused on different ways to attack the problem,ranging from solutions like a purely statistical approach and auditory basedmodels to methods based on wavelets and neural networks.

12.1.1 Statistical Methods

In the analysis of statistical methods to estimate fundamental frequencies,a wide range of different methods is explained in the book by Klapuri &Davy17.

From the analysis it is concluded that a wide range of different methodsare available, and many of them also quite usable, but for most of them,a heavy computational load is to be expected, and therefore a real timeimplementation may not easily be achieved.

12.1.2 Methods based on Auditory Models

The auditory models are models based on how the human ear works and howthe human perception of music is. Chapter 8 of Klapuri & Davy17 gives agood introduction to a range of these.

The concrete methods have by now only been examined superficially,but includes for instance separation of the tone bands using a filter bank or

17Anssi Klapuri & Manuel Davy, “Signal Processing Methods for Music Transcription”,2006

24

channel and peak selection as well as pitch-perception models. A methodused by Matija Marolt was to identify tones using adaptive oscillators18 forpreprocessing of the signal to use as input to an neural network.

12.1.3 Neural Networks

An neural network takes as input a feature set, which will have to be gener-ated from the input signal, and a result is calculated via a weighting vector.

Neural networks have a huge advantage when it comes to the computa-tional complexity of running the trained networks, but due to their nature,a large amount of training data is needed in order to make the results suffi-ciently reliable, and the training requires a lot of computation.

Earlier studies by Matija Marolt have shown significant results on us-ing neural networks19 with preprocessing of the data by groups of adaptiveoscillators18.

12.1.4 Wavelets

The wavelets provides a way to transform a given signal into frequency com-ponents, like the Fourier transform, and provides the opportunity to studyeach frequency component with a resolution that matches the scale. This fea-ture of the wavelet along with the fact that it, unlike the statistical methods,computationally is not very complex, makes it a good candidate for featureextraction from a recorded signal.

12.2 Decision

Based on the key points of the above, it has been decided that the furtheranalysis will focus on the use of neural networks for the transcription. To useNN, a preprocessing of the data is necessary to minimize the computationalload. This preprocessing is a feature extraction, and further analysis ofwavelets will be performed in order to decide whether these can be usedfor the preprocessing of data for the neural networks.

18Matija Marolt, “Networks of Adaptive Oscillators for Partial Tracking and Transcrip-tion of Music Recordings”, Journal of New Music Research, 2004, vol. 33, no. 1, pp.49-59, 2004.

19Matija Marolt, “A connectionist approach to automatic transcription of polyphonicpiano music”, IEEE Transactions on Multimedia, 2004, Vol. 6, no. 3, pp. 439-449, 2004.

25

The project will from this point on focus on the combination of neuralnetworks and wavelets.

26

13 General Construction

13.1 Purpose

To clarify what building blocks are needed to realize the software of thetransciption tool.

13.2 Analysis

A generic construction, based on a neural network (NN), is viewed on 6.

Figure 6: Blockdiagram of a system based on neural networks.

13.3 Sampled Piano Music

This is a sampled piece of piano music of arbitrary length. It is assumedthat bitresolution is 16 and samplerate is 44100 Hz, so as to comply with thewave-format featured on industrial manufactured CD’s.

13.4 Feature Extraction

The music signal has to be transformed to another representation, one thatsomehow makes it easier to differentiate the different piano notes. The ob-

27

vious repesentation would be frequency components, eg. via Fast FourierTransform (FFT) or Discrete Wavelet Transform (DWT). The optimal fea-ture set would be one that was easily recogniceable/unique for a given noteand also one that has minimal variation, from piano to piano.

13.5 NN

The NN could handle all note detection simultaneously or be split up in88 different networks, each handling a specific note. It is to be determinedwhether a given type of neuron is the most optimal, so the number of neuronscan be minimized, without compromising detection effectivity.

13.6 Decision/Discrimination

As some piano notes have a somewhat strong correlation, especially the oc-taves of a given note, it is very likely that a “false hit” will will be registeredfrom time to time. A discrimination algorithm should be able to remove somerrors. Some errors can be detected and rectified by simple rules. If notesfrom C120, C3, E3, G3 and G5 was detected, G5 would most likely be a falsereading, as the first four notes would require two hands to play. The keywould be to find the best balance between false hits and no dection of notesactually played.

13.7 Presentation

Some kind of representation is needed. The notes could simply be writtendirectly to a file, a score or be presented on the PC monitor. It is not to bea focal point, but should be effective as a diagnostics tool during the designphase.

20The representation of notes is called the scientific pitch notation. A4 is the note withfundamental frequency 440 Hz.

28

14 General Thoughts

The purpose of this worksheet is to brainstorm on the different possibilitiesfor implementation and further development.

14.1 optimizing the networks

The most optimal method for making an optimal neural network would be tomake an input neuron for each point of data in the decomposition and trainthe network. It would take a massive RAM storage, to accomodate this, andlots of time. To optimize execution speed of the networks in run-time, itcould be determined which of the input neurons where asociated with theweights holding the biggest absolute value. An example would be to keepthe 100 most sensitive inputs and then retrain.

14.2 Optimizing the Wavelet Decomposition

At the same time as network optimization is carried out, it would be inter-esting to try different wavelet decompositions, to determine if some typeswhere more effective than others. It would also be very interesting too seeif some wavelets were more appropriate at a given interval. Perhaps it is agood idea to detect higher pitched notes using a shorter decomposition?

14.3 Optimization of the Decision Algorithm

If it was possible to make some kind of “music style detection”, the knowledgecould make base for a probabilistic decision. Jazz would most likely featuresome signature chord modulation that is not seen in classical music and viceversa.

29

Part II

Design

15 Architectural Considerations for the Neu-

ral Network

This worksheet contains an overview of possible methods for structuring thenetwork of neurons and types neurons. The main sources for this worksheetare ”Neural Networks, a Comprehensive Foundation” by Simon Haykin, chap-ter 1 21 and the article:”An Introduction to Computing with Neural Nets” byRichard P. Lippmann22.

15.1 Adjustable Network Elements

A network is a construction of neurons and links between them, like shownin figure 9, page 32. It can be constructed in various ways. Adjustment ofthe structure of neurons and links in the network can be done to achieve thebest results for the network.

The network consists of one input layer with a number of neurons, oneoutput layer with a number of neurons and possibly a number of hiddenlayers not necessarily including the same number of neurons per layer. Thereare three parameters to adjust:

• Number of input neurons(source nodes)

• Number of hidden layers and neurons(computation nodes) in these

• Number of output neurons(computation nodes)

These numbers of hidden layers and neurons in all layers, as well as the typeof neurons must be decided in order to design the network, but before this isdone, some considerations regarding the structure must be made.

21Source info: ”Neural Networks, a Comprehensive Foundation”, second edition, 1999,Simon Haykin, Prentice Hall, isbn: 0-13-273350-1, chapter 1

22Source info: An Introduction to Computing with Neural Nets, IEEE ASSP magazine,April 1987, Richard P. Lippmann

30

15.2 Classes of Networks

The most simple network is a single layer network, with an input layer andan output layer. According to which kind of data the network should handle,different kinds of network types are available. Figure 7 shows a tree diagramof some different types of networks23. For more information on the specificclasses and their algorithms, Richard P. Lippmann has a more profounddescription in his article on neural nets22.

First thing to determine is whether the input signal is binary or conti-nous. This clarifies which kind of algorithms are most suitable for solvingthe problem.

Second thing to determine is whether or not there are data to train thesystem. If data are avaliable, it is possible to apply supervised learning. Ifthere is no training data, the system must be trained unsupervised. Thisis done by initializing the system with a very simple structure, and thengradually optimizing it by feeding the output data into the system to adjustthe structure.

Figure 7: A taxonomy of six neural nets that can be used as classifiers. Classicalalgorithms which are most similar to the neural net models are listed along thebottom. The figure and caption text is from Richard P. Lippmanns article ”AnIntroduction to Computing with Neural Nets”, p. 6, figure 3 23

23Richard P. Lippmann,An Introduction to Computing with Neural Nets, IEEE ASSPmagazine, April 1987

31

15.3 Layers

Figure 8: Neural network structure for a fully connected feed forward single-layernetwork, consisting of an input layer and an output layer24.

Figure 9: Neural network structure for a fully connected feed forward multi layernetwork, consisting of an input layer, one hidden layer and an output layer25.

There are two significant categories of layered networks:

• Single-layer networks, as shown in figure 8

• Multi layer networks, as shown in figure 9

Both net structures can be used for binary as well as continous inputs22.In multi-layer networks, the hidden layers are included to enable the

posibility to extract higher-order statistics21.The multilayer network, withhidden layers, are beneficial when the size of the input layer is large21.

24Source:http://commons.wikimedia.org/wiki/Image:SingleLayerNeuralNetwork_english.png

25Source: http://commons.wikimedia.org/wiki/Image:MultiLayerNeuralNetwork_english.png

32

http://commons.wikimedia.org/wiki/Image:SingleLayerNeuralNetwork_english.png

http://commons.wikimedia.org/wiki/Image:SingleLayerNeuralNetwork_english.png

http://commons.wikimedia.org/wiki/Image:MultiLayerNeuralNetwork_english.png

http://commons.wikimedia.org/wiki/Image:MultiLayerNeuralNetwork_english.png

On both figure 8 and 9 the networks are structured as feed forward net-works. It is possible to use feedback in networks, but this will not be ex-amined further in this project. For more information, see the studies of e.g.Matija Marolt26.

15.4 Types of Neurons

According to Haykin, a neuron is defined as: An information-procesing unitthat is fundamental to the operation of a neural network 21.

Figure 10 shows a nonlinear model of a neuron. On the figure three

Figure 10: Nonlinear model of a neuron. Source:Haykin, p. 11, fig. 1.5

elements are shown21:

• A set of connecting links

• An adder, including a possible bias

• An activation function

Together these three elements form the neuron. The connecting links eachhas their own weighting to their respective input signals. After the input

26”Connectionist Approach to Automaic transcription of Polyphonic Piano Music”,IEEE Transaction on multimedia, vol. 6, no. 3, June 2004, Matija Marolt

33

signals are weighted, they are added, and perhaps biased21:

uk =m∑

j=1

wjkxj (5)

vk = uk + bk (6)

(7)

The bias will be described further below. The next element is the activationfunction:

yk = ϕ(vk)

= ϕ(uk + bk) (8)

In the activation function the output signal is normalized. According toHaykin21, the typically normalizing range is [0,1] or [-1,1].

There are three main categories of activation functions:

• Threshold function, also known as Heaviside function

• Piece-wise linear function

• Sigmoid function

The three function types is shown in figure 11.

34

Figure 11: (a) Threshold function, (b) Piecewise-linear function, (c) Sigmoid func-tion for varying slope parameter a. Source:Haykin, p. 13, fig. 1.8

16 Overall Architecture

It has been decided that the core functionality, namely the ability to detectcertain piano notes, is to be implemented as a NN. The network basicallyconsists of 88 smaller, parallel NN’s, each governing the detection of a singlepiano note. Feature extraction for the NN’s is to be done via discee waveletdecomposition. Prior to implementation, it is to be decided which givenwavelet expresses the most aggressive response, in terms of energy, from a

35

given note. This is done to reduce the data to the NN’s. As these two buildingblocks are considered essential to the project, these are the only focal pointsfrom now on. Figure 12 is a graphical representation of the architecture.

Figure 12: The implemented principle of notedetection. The wavefile is decom-posed into 12 wavelets (the last 1-bit output is a residual) of dyadically declininglengths. The NN that detects a given pitch is fed with the decomposition thatproduces the most power if the note is played. A detection threshhold value is setto determine hit/no hit.

16.1 Wavelet Decomposition

Not written yet...

16.2 Neural Network Structure

Structuring a neural network is not an exact science, hence choices must bemade by qualified guessing.

36

Since the project group has not previously worked with neural networks,a meeting with Uwe Hartmann was arranged27. Below is listed some of theadvises he gave on choice of neuron type:

• Go for the sigmoid function. It is simple and commonly used.

• Choose the soft curve; the specific function is less important.

• Range between -1 and +1, not 0 and 1

Based on literature studies and the meeting with Uwe Hartmann, follow-ing choices have been made:

Her skal sta noget om valg af neurontype, antal lag samt antal neuronerpr. lag... - Nar det er blevet skrevet.

16.2.1 Feature Set versus Pitch

Er ikke sikker pa at vi skal ha’ afsnittet med, men eet eller andet sted børnævnes, hvordan vi har besluttet at benytte de wavelettaps vi bruger til detenkelte NN.

16.2.2 Training and Test

As training (and test) set for a given NN, a wavefile and detection vectorneed to be constructed. The wavefile should be a sequence of different mono-and polyphonic notes. The detection vector is simply the input for the backpropagation in the NN. The training notes are acousticly recorded pianonotes, played on an arbitrarily chosen piano. The test notes are a secondrecording of the same piano. As the Musician Transcription Tool needs tobe able to handle 10 different simultaneous notes, it is suggested to make ana composition algorithm like this:

1. Decide whether target note is to be included in composition or not(P=0.5)

2. Decide randomly the number of notes to be composed (between 0 and10)

27Uwe Hartmann is a Senior Professor at Aalborg University, and has conducted researchand held courses in Neural Networks.

37

3. If target note is included and more than 0 notes are to be composed,write 1 to the detection vector

4. Decide randomly what other notes to be played

5. Add all the needed notes and normalize

6. If a longer sequence is needed, continue from step 1.

38

Part III

Implementation

17 Software design

This worksheet aims to describe the structure of the transcription program.The training, test and run functionality will be implemented in the sameprogram, hereby reducing the amount of written code, as most of the actualactions are identical for all three cases. A simplified diagram of the programcan be seen on figure 13. As a design tool, a userguide has been written.

Figure 13: A simplified chart of the principle of the transcription program

39

17.1 Userguide

The program, which is called transcribe, can be run in three ways:

train For training the system

test For testing the system

run To get transcription results from the program

The runtype and files to use, are chosen by using input arguments to theprogram. There are no limits to the number of arguments, however only oneruntype is allowed.

To use the program for training a network, the first argument must betrain and the remaining argument must be in groups of three:

1. The wavefile to be analysed

2. The correct results for training (a file containing ones and zeros fittingthe wavefile – each one or zero must be the result for a block of 4096samples28)

3. The wanted name of the data (combination of wavelet data and correctresults for training) and net-file (where the neural network is saved)

Example: transcribe train A4.wav A4 hi Net/NN A4 C8.wav C8 hit Net/NN C8

This will train the networks for the .wav-files by comparing the calculatedwavelets with the defined results, and the resulting networks will be saved inthe folder Net with the names NN A4.net and NN C8.net

The syntax for testing the network is identical, but the first argumentmust then be test and the argument groups now mean:

1. The wavefile to be analysed

2. The correct results for testing (a file in the same format as for training)

3. The name net-file (where the neural net was is saved)

284096 samples for decomposition is seleced from a wish to have a sufficiently low timeresolution, sufficiently high frequency resolution and, due to the wavelet transform method,it must be a power of 2.

40

Example: transcribe test A4tst.wav A4tst hit Net/NN A4 C8tst.wav C8tst hit Net/NN C8

This will test the two networks, that were trained before, and return themean square error of each.

Finally, to run the trascription, use the argument run followed by all the.wav-files to analyse. The results will be written in textfiles with a fixedname for each neural network.

Example: transcribe run Musik.wav Muzak.wav

This will output two files for each of the 88 neural networks containing thefloat output values of the networks, of the format “[filename] [netname].test”,eg. Musik.wav A4.test will contain the results from the network trained forrecognizing A4, after processing the file Musik.wav.

17.2 Implementation

The software has been implemented in C, and the decision of combiningthe functionality into one program has prevented a lot of double work. Inparticular the main part of training and test only differ by a few lines ofcode.

The overall implementation is working, however the output of the neu-ral networks seems to be wrong. Reading a wavefile and decomposing it intowavelets is working and has been succesfully compared with matlab results.The neural networks can be trained and tested with a tesfile, but when in-putting data for classifying, the output values are not very well spread overthe range of -1 to 1 as they ought to be. In fact a very great deal of differentinputs can generate the exact same output, and from a wavefile with 7000decompositions, only 123 different output values were detected.

Since the used neural networks are implemented via an external library,which we have not developed, a quick solution was to skip further developmentof the program and use corresponding algorithms in matlab. In a realtimeimplementation, as was originally wanted, the C implementation can be usedas a good basis.

18 Real time considerations

An initially desired feature of the system was the possibility to transcribethe music in real time on “regular hardware”. Although interaction with the

41

sound card has not been implemented, the analysis of wave files indicate thatit is indeed possible to run the transcription in real time.

The wav files used for training, was mono, 44,1 kHz, 16 bit with a lengthof 10 minutes and 50 seconds. The program in this case only runs a singlenetwork, but running the trained networks takes almost no time comparedto the wavelet transform and file writing, and thus this is not assumed to bea significant contribution. On a 3 year old laptop with 1.5 GHz single coreprocessor and 512 MB DDR Ram, with Ubuntu linux, the performance wasmeasured by processing a single note while using the programs top (displayingresource usage) and time (measuring “active” time for a brocess). The resultswere:

• CPU usage: about 90% (88.5-92.9)

• Memory usage: max 114 MB (the entire wave, which is 55 MB is loadedand parts of it buffered)

• Time usage: 59 seconds (real time) – 54 seconds of actual processing

Since the processing time is less than a 10th of the playing time on thishardware, a real time implementation should be achievable, even withoutoptimization of algorithms.

19 FANN – Fast Artificial Neural Network

library

This is the documentation for the FANN library, that was made as a quick-guide for relevant part to use it in the project. Because the output hadstrange results (a lot of similar numbers as output - only a handful of differ-ent outputs in the range -1 to 1 with several thousand different inputs), theC-code based on this was not finished completely.The library can be found on www.leenissen.dk/fann/.On www.leenissen.dk/fann/html/, a reference manualexists for all func-tions.The files mentioned below can be found as a zip archive on http://kom.

aau.dk/group/08gr742/fann.zip

42

http://www.leenissen.dk/fann/

http://www.leenissen.dk/fann/html/

http://kom.aau.dk/group/08gr742/fann.zip

http://kom.aau.dk/group/08gr742/fann.zip

Indhold:

--------

1 Indhold i mappen

1.1 How to read

2 Kompilering af FANN

2.1 Linux

2.2 Windows

3 Brug af FANN

3.1 Træning af ANN

3.2 Brug af ANN

3.3 Test af ANN

4 Kompilering af kode

***************************************************************

* 1 * Indhold i mappen *

***************************************************************

fann-2.0.0.zip C and Python Library Source Code

for all platforms

Support for:

Gnu Makefile, Visual Studio 6/.Net,

Borloand C++ Builder and other

standard compilers.

fann_doc_complete_1.0.pdf Complete documentation of V1.0

Test/ Mappe til test-kode, der viser

hvordan FANN bruges

Test/*.data datafiler til input i FANN

Test/make_testdata.m genereing af datafiler til eksemplet

(ud fra pattern recognition filer)

Test/Makefile Makefile, der muliggør bygning og

linking automatisk

43

Test/test_train.c Program der træner ANN ud fra

train[1-4].data-filerne

Test/test_test.c Program der tester ANN ud fra

test[1-4].data-filerne

Test/test_run.c Program der klassificerer ud fra

input - IKKE LAVET FÆRDIG

Test/*.net Trænede netværk

1.1 How to read

- Nar der i denne fil bruges betegnelsen $FANNDIR, sa er det

den mappe fann-2.0.0.zip er udpakket i.

- kodeeksempler er indrykket med 1 tabulator

***************************************************************

* 2 * Kompilering af FANN *

***************************************************************

Udpak fann-2.0.0.zip og kompiler til den platform hvorpa den

skal bruges

2.1 Linux

Ga ind i mappen og udfør flg.:

./configure

make

sudo make install

sudo ldconfig

(ldconfig er for at sikre at biblioteket ogsa findes nar

programmet køres)

2.2 Windows

i mapperne "MicrosoftVisualC++6.0" og "MicrosoftVisualC++.Net"

ligger projektfiler til de respektive Visual Basic

44

***************************************************************

* 3 * Brug af FANN *

***************************************************************

FANN inkluderer funktioner, der direkte benyttes til at træne,

gemme og bruge et netværk. Nar man er færdig med at bruge et

ANN, skal det nedlægges igen for ikke at optage al hukommelsen.

3.1 Træning af ANN

Før træningen kan foretages, skal træningsdata foreligge i

korrekt form i en fil - se 3.1.1

Træningen foregar herefter i 3 trin (se 3.1.2):

- Definer parametre

- træn netværk pa filen

- gem og nedlæg netværket

3.1.1 Træningsdata

Træningsdata skal gemmes i en fil i følgende format

[antal træningsmøstre] [input pr mønster] [outputs pr mønster]

[input 1,1] [input 1,2] [...]

[output 1,1] [output 1,2] [...]

[input 2,1] [input 2,2] [...]

[output 2,1] [output 2,2] [...]

[...]

f.eks. ser træningsdata for træning af en xor-funktion saledes

ud (4 sæt, 2 input, 1 output)

4 2 1

45

-1 -1

-1

-1 1

1

1 -1

1

1 1

-1

3.1.2 Eksempel pa brug:

Et ANN defineres som en struct - tallene her er fra et eksempel

const unsigned int num_input = 2;

const unsigned int num_output = 1;

const unsigned int num_layers = 3;

const unsigned int num_neurons_hidden = 3;

const float desired_error = (const float) 0.001;

const unsigned int max_epochs = 500000;

const unsigned int epochs_between_reports = 1000;

struct fann *ann = fann_create_standard(num_layers,

num_input, num_neurons_hidden, num_output);

Herefter defineres overføringsfunktionerne for neuronerne:

fann_set_activation_function_hidden(ann,

FANN_SIGMOID_SYMMETRIC);

fann_set_activation_function_output(ann,

FANN_SIGMOID_SYMMETRIC);

Andre muligheder for overføringsfunktioner er: FANN_LINEAR,

FANN_LINEAR_PIECE, FANN_LINEAR_PIECE_SYMMETRIC, FANN_SIGMOID,

FANN_SIGMOID_SYMMETRIC, FANN_SIGMOID_SYMMETRIC_STEPWISE,

FANN_SIGMOID_STEPWISE, FANN_THRESHOLD,

FANN_THRESHOLD_SYMMETRIC, FANN_GAUSSIAN,

FANN_GAUSSIAN_SYMMETRIC, FANN_ELLIOT, FANN_ELLIOT_SYMMETRIC

46

(se beskrivelser i $FANNDIR/src/include/fann_data.h)

fann_set_activation_function_hidden sætter

overføringsfunktionen for samtlige skjulte neuroner.

I stedet kan sættes en overføringsfunktion for en enkelt neuron

med funktionen

fann_set_activation_function(ann, FUNKTION, layer, neuron)

eller for et helt lag med

fann_set_activation_function_layer(ann, FUNKTION, layer)

Der loades en fil og trænes pa denne:

fann_train_on_file(ann, "xor.data", max_epochs,

epochs_between_reports, desired_error);

Som standard benyttes træningsalgoritmen FANN_TRAIN_RPROP, men

denne kan ændres ved at kalde

fann_set_training_algorithm(ann, ALGORITME)

hvor ALGORITME er en af følgende: FANN_TRAIN_INCREMENTAL,

FANN_TRAIN_BATCH, FANN_TRAIN_RPROP, FANN_TRAIN_QUICKPROP

(se beskrivelser i $FANNDIR/src/include/fann_data.h)

Netværket gemmes:

fann_save(ann, "xor_float.net");

Netværket nedlægges for at frigøre hukommelsen:

fann_destroy(ann);

3.2 Brug af ANN

Nar et ANN skal bruges, skal det først enten trænes som ovenfor

eller loades fra en fil, der er trænet pa denne vis.

Derefter, defineres input og output udregnes.

47

Variable oprettes til in- og outputs:

fann_type *calc_out;

fann_type input[2];

fann_type er typen af vægte, og er enten float, double eller

int alt efter om fann.h/floatfann.h, doublefann.h eller

fixedfann.h - i vores tilfælde bliver det nok float...

Opret netværket fra den gemte fil:

struct fann *ann = fann_create_from_file("xor_float.net");

Definer inputs:

input[0] = -1;

input[1] = 1;

Udregn og print output:

calc_out = fann_run(ann, input);

printf("xor test (%f,%f) -> %f\n", input[0], input[1],

calc_out[0]);

Netværket nedlægges for at frigøre hukommelsen:

fann_destroy(ann);

3.3 Test af ANN

ANN kan testes ved at give et enkelt input med funktionen

fann_test(ann, input, desired_output);

Der kan ogsa testes pa et helt datasæt med funktionen

48

fann_test_data(ann, data);

Begge opdaterer MSE for netværket, der kan aflæses med

fann_get_MSE(ann)

Der findes ikke en funtion til at teste fra en fil, men

inspireret ud fra hvordan fann_train_on_file er realiseret

(i $FANNDIR/src/fann_train_data.c), bør det kunne lade sig

gøre saledes:

struct fann_train_data *data =

fann_read_train_from_file("filename.data");

fann_test_data(ann, data);

***************************************************************

* 4 * Kompilering af kode *

***************************************************************

Nar en kode skal kompileres, er det væsentligt at de rette

libraries loades.

Der er oprettet en Makefile til formalet, der fungerer pa

linux:

make kompilerer alle filer

make all kompilerer alle filer

make clean rydder op

make filnavn kompiler filnavn.c

make runtest kompilerer og kører test_train og test_test

49

20 Port Audio

Port audio is a cross-platform audio API, and was examined as an option forrealtime implementation of the system. The time did not allow a real timeimplementation, but an unfinished worksheet of how to use the API is seenbelow.Port Audio can be found on www.portaudio.com/

The files mentioned below can be found as a zip archive on http://kom.

aau.dk/group/08gr742/port_audio.zip

Indhold:

--------

1 Indhold i mappen

2 Kompilering af PortAudio

2.1 Linux

2.2 Windows

3 Kompilering af eksempler

3.1 Linux

3.2 Windows

4 Opbygning af programmer

4.1 At skrive en callback-funktion

4.2 At initialisere PortAudio

4.3 At abne en stream

4.4 At starte, stoppe og afbryde en stream

4.5 At lukke en stream og terminere PortAudio

4.6 Diverse funktioner

4.7 At søge efter enheder

4.8 Blocking I/O funktioner (alternativ til callback)

***************************************************************

* 1 * Indhold i mappen *

***************************************************************

50

http://www.portaudio.com/

http://kom.aau.dk/group/08gr742/port_audio.zip

http://kom.aau.dk/group/08gr742/port_audio.zip

pa_stable_v19_20071207.tar.gz selve PortAudio distributionen

skal kompileres til det system der bruges (Windows/Linux)

Eksempler mappe med eksempler og guides

til at kompilere dem

***************************************************************

* 2 * Kompilering af PortAudio *

***************************************************************

2.1 Linux

Guide:

www.portaudio.com/trac/wiki/TutorialDir/Compile/Linux

1) Udpak pa_stable_v19_20071207.tar.gz

2) Aben mappen i en terminal

3) Skriv ./configure

4) Skriv make

hvis alt gar vel, vil det nu være bygget

2.2 Windows

Guide (Visual studio):

www.portaudio.com/trac/wiki/TutorialDir/Compile/Windows

Guide (Gratis tools) :

www.portaudio.com/trac/wiki/TutorialDir/Compile/WindowsMinGW

1) Udpak pa_stable_v19_20071207.tar.gz (hvis ikke du kan abne

51

http://www.portaudio.com/trac/wiki/TutorialDir/Compile/Linux

http://www.portaudio.com/trac/wiki/TutorialDir/Compile/Windows

http://www.portaudio.com/trac/wiki/TutorialDir/Compile/WindowsMinGW

tar-filer, sa hent f.eks. http://peazip.sourceforge.net/)

2) Følg guide

hopefully all goes well...

***************************************************************

* 3 * Kompilering af eksempler *

***************************************************************

3.1 Linux

Generering af savtand (eksempel fra distributionen - lyd ud):

* Hovedfil: patest_saw.c

* portaudio.h og libportaudio.a skal ligge i samme mappe

*

* Kommando:

* gcc -lasound -ljack -lpthread -o patest_saw.bin

patest_saw.c libportaudio.a

*

* Forklaring:

* -lasound : link med asond library (ALSA lyd)

* -ljack : link med jack lib (JACK sound)

* -lpthread : link med pthread lib (Posix threads - tradet

programmering ikke strengt nødvendig for denne fil...)

* -o <filnavn> : eksekverbar fil gemmes som <filnavn>

* patest_saw.c : denne fil

* libportaudio.a : bibliotek, der gør at den eksekverbare fil

kan benyttes uden at portaudio skal være installeret

Optag en lyd og afspil den

(fra distributionen: lyd ind --> gem i fil --> lyd ud)

* Hovedfil: patest_read_record.c

* portaudio.h og libportaudio.a skal ligge i samme mappe

*

* Kommando:

* gcc -lasound -ljack -lpthread -o patest_saw.bin

52

patest_saw.c libportaudio.a

*

* Forklaring - se ovenfor

3.2 Windows (fra guide - ikke testet)

in any project, in which you require portaudio, you can just

link with portaudio_x86.lib, (or _x64) and of course include

the relevant headers (portaudio.h, and/or pa_asio.h,

pa_x86_plain_converters.h) Your new exe should now use

portaudio_xXX.dll.

3.2.1 MinGV

3.2.2 Visual Studio o.l.

***************************************************************

* 4 * Opbygning af programmer *

***************************************************************

Der er taget udgangspunkt i den officielle tutorial pa

http://www.portaudio.com/trac/wiki/TutorialDir/TutorialStart

Det kan ogsa anbefales at læse portaudio.h, der indeholder

informationer om alle funktioner.

Generel opbygning af et PortAudio program:

* Skriv en "callback"-funktion som PortAudio kalder, nar der

skal processeres lyd - denne ma ikke være for

beregningstung!

* Initialiser PA library og aben en stream til audio I/O.

* Start streamen. Callback-funktionen bliver nu kaldt

gentagne gange af PortAudio i baggrunden.

53

* I callback’et kan der læses lyddata fra inputBuffer

og/eller skrives data til outputBuffer.

* Stop streamen ved at returnere 1 fra callback’et, eller ved

at kalde en stop-funktion.

* Luk streamen og terminer PA library.

4.1 At skrive en callback-funktion

4.2 At initialisere PortAudio

4.3 At abne en stream

4.4 At starte, stoppe og afbryde en stream

4.5 At lukke en stream og terminere PortAudio

4.6 Diverse funktioner

4.7 At søge efter enheder

4.8 Blocking I/O funktioner (alternativ til callback)

54

Part IV

C Source CodeThis is the collection of source code for the off-line transcription system.Except for the actual execution of the neural networks, which gives “funny”results, everything is working as expected.

21 Makefile

The Makefile is used for building and linking the program with GNU make.

1 # The make f i l e r e qu i r e s that the fann l i b r a r y i s i n s t a l l e d23 GCC=gcc456 SOURCES. c= main . c wavelet . c waveread . c ann t ra in . c ann te s t . c ann run . c

f i l e i o . c7 INCLUDES=8 CFLAGS= −O3 −lm −l f ann9 SLIBS=

10 PROGRAM= t r an s c r i b e1112 OBJECTS= $ (SOURCES. c : . c=.o )1314 .KEEP STATE:1516 debug := CFLAGS= −g1718 a l l debug : $ (PROGRAM)1920 $ (PROGRAM) : $ (INCLUDES) $ (OBJECTS)21 $ (LINK. c ) −o $@ $ (OBJECTS) $ (SLIBS)2223 c l ean :24 rm −f $ (PROGRAM) $ (OBJECTS)

22 main.c

The main part links the entire program and deals with the input arguments,that determines whether the program is used for training, testing or runningas well as which files to operate on.

55

1 #include <s t d i o . h>2 #include <s t d l i b . h>3 #include <math . h>4 #include <s t r i n g . h>56 #include ”wavelet . h”7 #include ”waveread . h”8 #include ”ann . h”9 #include ” f i l e i o . h”

1011 int main ( int argc , char ∗∗ argv ) {12 char ∗ f i l ename , ∗netname , ∗datname , ∗checkname ,∗ act ion , run [ ]= ”run” , t r a i n

[ ]= ” t r a i n ” , t e s t [ ]= ” t e s t ” ;13 wavheader whd ;14 int argnum , i , j , k , ∗ bu f f e r , bu f s i z e , s i g l e n , ha l f , s i g s t a r t ;15 f loat ∗wavelet , ∗ indata , c a l c ou t ;16 FILE ∗ i f p ,∗ ofp , ∗ f t e s t ;1718 i f ( argc==1){19 p r i n t f ( ”This program needs arguments to work . . . \ nThe f i r s t argument

must be one o f :\ nrun\ t to run the program\ nte s t \ t to t e s t theprogram\ nt ra in \ t to t r a i n the program\n” ) ;

20 e x i t (EXIT FAILURE) ;21 }22 ac t i on=argv [ 1 ] ;23 i f ( strcmp ( act ion , run )&&strcmp ( act ion , t r a i n )&&strcmp ( act ion , t e s t ) ) {24 p r i n t f ( ”The f i r s t argument must be one o f :\ nrun\ t to run the program\

nte s t \ t to t e s t the program\ nt ra in \ t to t r a i n the program\n\nrunning program , s i n c e no other ac t i on i s s p e c i f i e d . . . \ n\n” ) ;

25 ac t i on=run ;26 i =1;27 } else {28 i =2;29 }3031 for ( argnum=i ; argnum<argc ; argnum++){32 f i l ename= argv [ argnum ] ;33 p r i n t f ( ” f i l e %d o f %d : %s \n” , argnum−1, argc −2, f i l ename ) ;34 i f ( ( i f p = fopen ( f i l ename , ” rb” ) )==NULL) {35 f p r i n t f ( s tde r r , ”Could not open the f i l e %s f o r read ing \n” ,

f i l ename ) ;36 } else { // prevent doing t h in g s with nonex i s t ing f i l e3738 /∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗39 ∗ In t h i s part , the wave− f i l e i s read . . . ∗40 ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/4142 // Read the wave header :43 whd=wavread head ( i fp , f i l ename ) ;44 b u f s i z e = whd . da t a s i z e / whd . b l o c k a l i g n ;4546 // Print r e l e v an t in f o :47 p r i n t f ( ”%s : %d\n%s : %d\n%s : %d\n%s : %d\n%s : %d\n%s : %d\n%s : %d\n

%s : %d\n” , ” F i l e s i z e ” ,whd . f i l e s i z e , ”Number o f channe l s ” ,whd .num chan , ”Samplerate ” ,whd . samplerate , ”Byterate ” ,whd . byterate ,”Block al ignment ” ,whd . b l o ck a l i gn , ” Bi t s per sample” ,whd . b i t s ,”Data s i z e ” ,whd . datas i z e , ”Number o f samples ” , b u f s i z e ) ;

56

484950 // Read the content s in to an in t b u f f e r :51 bu f f e r=( int ∗) mal loc ( s izeof ( int [ b u f s i z e ] ) ) ;52 // 8 b i t wave i s unsigned − 16 & 24 b i t are s igned − l e f t and r i g h t

s h i f t to ge t the s i g n b i t co r r e c t53 i f (whd . b i t s >8){54 for ( i =0; i<bu f s i z e ; i++){55 bu f f e r [ i ]= g e t f i l e i n t s ( i f p , whd . b l o c k a l i g n ) << ( s izeof

(unsigned long )∗8−whd . b i t s ) >> ( s izeof (unsigned long )∗8−whd . b i t s ) ;

56 }57 } else {58 for ( i =0; i<bu f s i z e ; i++){59 bu f f e r [ i ]= g e t f i l e i n t s ( i f p , whd . b l o c k a l i g n ) ;60 }61 }62 f c l o s e ( i f p ) ;636465 #i f d e f DEBUG66 // Write the va lue s in the f i l e tmp fo r comparing with the o r i g i n a l

wave in matlab67 i n t ou t ( bu f f e r , bu f s i z e , ”debug wave” ) ;68 // The t e s t was a succes s : var iance o f c read ./ o r i g i n a l ( with

cor r ec t i on fo r 0/0− i n s t ance s ) was 069 #end i f70 p r i n t f ( ”wave− f i l e loaded . . . \ n” ) ;7172 /∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗73 ∗ Here the check− f i l e i s opened and a header i s wr i t t en to the ∗74 ∗ t e s t / t r a i n i n g data−f i l e , i f a network i s to be t ra ined or t e s t e d ∗75 ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/76 i f ( ( ! strcmp ( act ion , t r a i n ) ) | | ( ! strcmp ( act ion , t e s t ) ) ) {77 // next va lue must be the r e s u l t s f o r t r a i n in g / t e s t i n g78 i f ( ( argnum+1)<argc ) {79 checkname=argv [ argnum+1] ;80 i f ( ( i f p = fopen ( checkname , ” r ” ) )==NULL) {81 f p r i n t f ( s tde r r , ”Could not open the check− f i l e %s f o r

read ing \n” , checkname ) ;82 } else {83 // next va lue must be the data/net−name84 i f ( argnum+2<argc ) {85 netname=(char ∗) mal loc ( ( s t r l e n ( argv [ argnum+2])+

s t r l e n ( ” . net ” )+1)∗ s izeof (char ) ) ;86 s p r i n t f ( netname , ”%s . net ” , argv [ argnum+2]) ;87 datname=(char ∗) mal loc ( ( s t r l e n ( argv [ argnum+2])+

s t r l e n ( ” . data” )+1)∗ s izeof (char ) ) ;88 s p r i n t f ( datname , ”%s . data” , argv [ argnum+2]) ;89 i f ( ( ofp = fopen ( datname , ”w” ) )==NULL) {90 f p r i n t f ( s tde r r , ”Could not open the f i l e %s

f o r wr i t i ng ” , datname ) ;91 } else { // wr i t e t r a in / t e s t− f i l e header #number

#input #output9293 f p r i n t f ( ofp , ”%d %d %d\n” ,6999 ,A4 LEN, 1 ) ;94 f c l o s e ( ofp ) ;

57

95 }96 } else {97 f p r i n t f ( s tde r r , ”Not enough arguments − net /data

name miss ing \n” ) ;98 }99 }

100 } else {101 f c l o s e ( i f p ) ; // c l o s e check− f i l e102 f p r i n t f ( s tde r r , ”Not enough arguments − check− f i l e miss ing \n”

) ;103 }104 }105 /∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗106 ∗ Here the wav− f i l e i s ”chopped in p i e c e s ” and each ∗107 ∗ p iece transformed using a D4 wave l e t decomposit ion ∗108 ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/109110 // Define the s i g n a l l eng t h111 s i g l e n =4096; // 0 ,093 s v . 44100 Hz − 0 ,043 s v . 96000 Hz112 wavelet=( f loat ∗) mal loc ( s izeof ( f loat [ s i g l e n ] ) ) ;113114 p r i n t f ( ” execut ing wavelet trans form : ” ) ;115 i =0;116 while ( ( ( i +1)∗ s i g l e n ) <= bu f s i z e ) { // Go through the b u f f e r

u n t i l a whole s i g l e n cannot be used117 s i g s t a r t=i ∗ s i g l e n ;118 p r i n t f ( ”\b\b\b\b\b%5d” , i ) ;119 // Copy par t o f the s i g n a l to wave l e tarray120 for ( j =0; j < s i g l e n ; j++){121 wavelet [ j ]=( f loat ) bu f f e r [ s i g s t a r t+j ] ;122 }123 // Perform the wave l e t decomposit ion ( implemented in wave l e t . c )

− the r e s u l t o f the decomposit ion i s in the wave l e t array124 wavelet db4 ( wavelet , s i g l e n ) ;125 // Write output o f each decomposit ion to f i l e s126 f i l ename=(char ∗) mal loc ( ( s t r l e n ( ” wl1234 ” ) + s t r l e n ( argv [

argnum ] ) + 5 ) ∗ s izeof (char ) ) ;127 s p r i n t f ( f i l ename , ”%s wl%d” , argv [ argnum ] , i ) ;128 i f ( ( f t e s t=fopen ( f i l ename , ” r ” ) )==NULL) { // only wr i t e i f the

f i l e does not e x i s t . . .129 f l o a t o u t ( wavelet , s i g l e n , f i l ename ) ;130 } else {131 f c l o s e ( f t e s t ) ;132 }133 f r e e ( f i l ename ) ;134 // Perform t ra i n i n g / t e s t /run−ac t i ons f o r the s p e c i f i c transform135 i f ( ! strcmp ( act ion , run ) ) {136 indata=( f loat ∗) mal loc ( s izeof ( f loat [A4 LEN ] ) ) ;137 for ( k=A4 ST ; k<A4 END; k++){138 indata [ k−A4 ST]=wavelet [ k ] ;139 }140 // Define names o f the net to load and r e s u l t f i l e to wr i t e141 f i l ename=(char ∗) mal loc ( ( s t r l e n ( ”A#4. t e s t ” )+ s t r l e n ( argv

[ argnum ] ) +1)∗ s izeof (char ) ) ;142 s p r i n t f ( f i l ename , ”%s %s%d . t e s t ” , argv [ argnum ] , ”A” ,4 ) ;143 netname=(char ∗) mal loc ( ( s t r l e n ( ”A#4. net ” )+1)∗ s izeof (char

) ) ;

58

144 s p r i n t f ( netname , ”%s%d . net ” , ”A” ,4) ;145146 ann run ( netname , indata , f i l ename ) ;147 f r e e ( f i l ename ) ;148 f r e e ( netname ) ; //∗/149150 } else i f ( ( ! strcmp ( act ion , t r a i n ) ) | | ( ! strcmp ( act ion , t e s t ) ) ) {151 // here we must generate a t e s t / t r a i n i n g f i l e152 // A4/440Hz/ p i t c h 69 −−> wl−i n t e r v a l : 32−63 (32 va lue s )153 k=f s c a n f ( i f p , ”%d”,& j ) ;154 i f ( ! f e o f ( i f p ) ) {155 i f ( ( ofp = fopen ( datname , ”a” ) )==NULL) {156 f p r i n t f ( s tde r r , ”\nCould not open the f i l e %s f o r

wr i t i ng \n” , datname ) ;157 } else { // wr i t e t r a in / t e s t− f i l e l i n e s [ inpu t s ] \n

[ output ]158 for ( k=A4 ST ; k<A4 END; k++){159 f p r i n t f ( ofp , ”%f ” , wavelet [ k ] ) ;160 }161 f p r i n t f ( ofp , ”\n%d\n” , j ) ;162 f c l o s e ( ofp ) ;163 }164 }165166 }167168169 i++;170 } // end wave le t−l oop171 p r i n t f ( ”\n\n” ) ;172173 // Perform t ra i n i n g / t e s t /run−ac t i ons f o r the en t i r e wave f i l e174 i f ( ( ! strcmp ( act ion , t r a i n ) ) | | ( ! strcmp ( act ion , t e s t ) ) ) {175 f c l o s e ( i f p ) ; // c l o s e check− f i l e176 argnum=argnum+2; // in t r a in and t e s t , 2 ex t ra args are

needed fo r data/netname and c h e c k f i l e177 }178 i f ( ! strcmp ( act ion , run ) ) {179180 } else i f ( ! strcmp ( act ion , t r a i n ) ) {181 p r i n t f ( ” inputs from %d to %d (%d t o t a l ) \n” ,A4 ST ,A4 END,

A4 LEN) ;182 ann t ra in ( datname , netname , A4 LEN) ;183 f r e e ( netname ) ;184 f r e e ( datname ) ;185 } else i f ( ! strcmp ( act ion , t e s t ) ) {186 ann te s t ( datname , netname ) ;187 f r e e ( netname ) ;188 f r e e ( datname ) ;189 }190191 f r e e ( bu f f e r ) ;192 f r e e ( wavelet ) ;193 } // end e l s e ( to s k i p bad arguments/ f i l enames )194 } // for−l oop running through args195 e x i t (EXIT SUCCESS) ;196 }

59

23 fileio.h

This file merely holds the prototypes for the file input and output functions.

12 #ifndef FILEIO H3 #define FILEIO H45 // sub func t ion fo r re turn ing an in t e g e r o f s i z e by t e s from the f i l e6 int g e t f i l e i n t s (FILE ∗ i f p , int s i z e ) ;78 void f l o a t o u t ( f loat ∗ bu f f e r , int bu f s i z e , char ∗ f i l ename ) ;9 void i n t ou t ( int ∗ bu f f e r , int bu f s i z e , char ∗ f i l ename ) ;

101112 #endif

24 fileio.c

This file contains 3 functions: one for reading an integer of different lengthfrom a binary file and two for writing an array of either integers or floats toa text file.

1 #include <s t d i o . h>2 #include <s t d l i b . h>34 #include ” f i l e i o . h”56789 /∗ Returns an in t e g e r o f ” s i z e ” by t e s from the f i l e ∗/

1011 int g e t f i l e i n t s (FILE ∗ i f p , int s i z e ) {12 int r e t v a l =0;1314 i f ( f r ead (&re tva l , s i z e , 1 , i f p ) != 1) {15 i f ( f e o f ( i f p ) ) {16 p r i n t f ( ”Premature end o f f i l e . ” ) ;17 } else {18 p r i n t f ( ” F i l e read e r r o r . ” ) ;19 }20 e x i t (EXIT FAILURE) ;21 }22 return ( r e t v a l ) ;23 }242526272829

60

30 /∗ The f o l l ow i n g func t i ons are used f o r ou tpu t t i n g data to f i l e s f o r readingin matlab or a regu l a r t e x t e d i t o r . . .

31 ∗/3233 void f l o a t o u t ( f loat ∗ bu f f e r , int bu f s i z e , char ∗ f i l ename ) {34 // Write the va lue s in the f i l e [ f i l ename ] f o r comparing with the o r i g i n a l

r e s u l t s in matlab35 int i ;36 FILE ∗ ofp ;37 i f ( ( ofp = fopen ( f i l ename , ”w” ) )==NULL) {38 f p r i n t f ( s tde r r , ”Could not open the f i l e %s f o r wr i t i ng \n” ,

f i l ename ) ;39 } else {40 for ( i =0; i<bu f s i z e ; i++){41 f p r i n t f ( ofp , ”%f \n” , bu f f e r [ i ] ) ;42 }43 f c l o s e ( ofp ) ;44 }45 }4647 void i n t ou t ( int ∗ bu f f e r , int bu f s i z e , char ∗ f i l ename ) {48 // Write the va lue s in the f i l e [ f i l ename ] f o r comparing with the o r i g i n a l

r e s u l t s in matlab49 int i ;50 FILE ∗ ofp ;51 i f ( ( ofp = fopen ( f i l ename , ”w” ) )==NULL) {52 f p r i n t f ( s tde r r , ”Could not open the f i l e %s f o r wr i t i ng \n” ,

f i l ename ) ;53 } else {5455 for ( i =0; i<bu f s i z e ; i++){56 f p r i n t f ( ofp , ”%d\n” , bu f f e r [ i ] ) ;57 }58 f c l o s e ( ofp ) ;59 }6061 }

25 waveread.h

The header for the waveread functions contains a typedef of a struct to holdrelevant information from the header of the wave file, as well as prototypesfor the functions.

1 /∗ Header f i l e f o r waveread . c ∗/23 #ifndef WAVEREADH4 #define WAVEREADH56 typedef struct wh {7 char ∗ f i l ename ;8 int f i l e s i z e ;9 int samplerate ;

61

10 int num chan ;11 int byte ra te ;12 int b l o c k a l i g n ;13 int b i t s ;14 int da ta s i z e ;15 } wavheader ;16171819 // pro to type s :2021 // ”main” func t ion − re turns a s t r u c t conta in ing r e l e v an t in f o from the

header and qu i t s the program i f the f i l e i s not a v a l i d wave− f i l e .22 wavheader wavread head (FILE ∗ i f p , char ∗ f i l ename ) ;2324 // sub func t ion fo r check ing s t r i n g p a r t s o f the header − e x i t s i f they are

not as expec ted .25 void chkheadstr (FILE ∗ i f p , char ∗ f i l ename , char ∗header ) ;262728 #endif

26 waveread.c

The waveread file contains functions to read the header of a wave file andcheck that file follows a supported format.

1 #include <s t d i o . h>2 #include <s t d l i b . h>34 #include ”waveread . h”5 #include ” f i l e i o . h”67 /∗ The mainfunction below i s necessary to compile t h i s f i l e s tanda lone ∗/89 /∗

10 i n t main( i n t argc , char ∗∗ argv ){11 char ∗ f i l ename ;12 i n t f i l e s i z e ;13 wavheader whd ;14 FILE ∗ i f p ,∗ ofp ;151617 f i l ename= argv [ 1 ] ;18 p r i n t f (”%d : %s\n” , argc−1, f i l ename ) ;19 i f ( ( i f p = fopen ( f i lename ,” rb ”) )==NULL){20 f p r i n t f ( s tderr ,” Could not open the f i l e %s fo r reading ” ,

f i l ename ) ;2122 }2324 // Read the header :25 whd=wavread head ( i f p , f i l ename ) ;

62

2627 // Print r e l e v an t in f o :28 p r i n t f (” F i l e s i z e : %d\n” ,whd . f i l e s i z e ) ;29 p r i n t f (”Number o f channels : %d\n” ,whd . num chan) ;30 p r i n t f (” Samplerate : %d\n” ,whd . samplerate ) ;31 p r i n t f (” Byterate : %d\n” ,whd . b y t e r a t e ) ;32 p r i n t f (” Block al ignment : %d\n” ,whd . b l o c k a l i g n ) ;33 p r i n t f (” Bi t s per sample : %d\n” ,whd . b i t s ) ;34 p r i n t f (”Data s i z e : %d\n” ,whd . da t a s i z e ) ;3536 // Read the content s in to an in t b u f f e r :37 b u f s i z e = whd . da t a s i z e / whd . b l o c k a l i g n ;38 p r i n t f (”Number o f samples : %d\n” , b u f s i z e ) ;3940 i n t b u f f e r [ b u f s i z e ] ;41 f o r ( i =0; i<b u f s i z e ; i++){42 b u f f e r [ i ]= ge thead in t ( i f p , whd . b l o c k a l i g n ) << ( s i z e o f (

unsigned long )∗8−whd . b i t s ) ;43 }4445 f c l o s e ( i f p ) ;46 e x i t (EXIT SUCCESS) ;47 }48 ∗/4950 wavheader wavread head (FILE ∗ i f p , char ∗ f i l ename ) {5152 char headchk [ 4 ] ;53 int samplerate , f i l e s i z e , num chan ;54 wavheader wavhead ;5556 // Check fo r RIFF header57 chkheadstr ( i f p , f i l ename , ”RIFF” ) ;5859 // Get f i l e s i z e ( r e s t o f f i l e )60 wavhead . f i l e s i z e = g e t f i l e i n t s ( i f p , 4) ;6162 // Check fo r the WAVE and ”fmt ” headers63 chkheadstr ( i f p , f i l ename , ”WAVE” ) ;64 chkheadstr ( i f p , f i l ename , ” fmt ” ) ;6566 // Check fo r whether the codec i s PCM:67 i f ( g e t f i l e i n t s ( i f p , 4 ) !=16){68 p r i n t f ( ”The f i l e %s i s not a va l i d Wave− f i l e (Not PCM codec )

\n” , f i l ename ) ;69 f c l o s e ( i f p ) ;70 e x i t (EXIT FAILURE) ;71 }72 // Check whether the data i s uncompressed73 i f ( g e t f i l e i n t s ( i f p , 2 ) !=1){74 p r i n t f ( ”The f i l e %s i s compressed , and cannot be used\n” ,

f i l ename ) ;75 f c l o s e ( i f p ) ;76 e x i t (EXIT FAILURE) ;77 }7879 // ge t the number o f channels

63

80 wavhead . num chan = g e t f i l e i n t s ( i f p , 2 ) ;8182 // ge t samplerate83 wavhead . samplerate = g e t f i l e i n t s ( i f p , 4) ;8485 // ge t b y t e r a t e86 wavhead . byte ra te = g e t f i l e i n t s ( i f p , 4) ;8788 // ge t Block al ignment89 wavhead . b l o c k a l i g n = g e t f i l e i n t s ( i f p , 2 ) ;9091 // ge t Bi t s per sample92 wavhead . b i t s = g e t f i l e i n t s ( i f p , 2 ) ;9394 // Check SubChunk2 ID (” data ”)95 chkheadstr ( i f p , f i l ename , ”data” ) ;9697 // ge t Data S i ze98 wavhead . da t a s i z e = g e t f i l e i n t s ( i f p , 4) ;99

100 return (wavhead ) ;101 }102103104105106107108109 /∗ This func t i on cheks the current header f o r matching the co r r e c t s t r i n g ∗/110111 void chkheadstr (FILE ∗ i f p , char ∗ f i l ename , char ∗header ) {112 char headchk [ 4 ] ;113114 i f ( f r ead ( headchk , 4 , 1 , i f p ) != 1) {115 i f ( f e o f ( i f p ) ) {116 p r i n t f ( ”Premature end o f f i l e . ” ) ;117 } else {118 p r i n t f ( ” F i l e read e r r o r . ” ) ;119 }120 e x i t (EXIT FAILURE) ;121 }122 i f ( memcmp( headchk , header , 4 ) ) {123 p r i n t f ( ”The f i l e %s i s not a va l i d Wave− f i l e (No \”%s \”

header ) \n” , f i l ename , header ) ;124 f c l o s e ( i f p ) ;125 e x i t (EXIT FAILURE) ;126 }127128 }

27 wavelet.h

This file just contains the prototype for the wavelet decomposition.

64

12 #ifndef WAVELET H3 #define WAVELET H456 void wavelet db4 ( f loat ∗wavelet , int s i g l e n ) ;78 #endif

28 wavelet.c

The implemented wavelet decomposition, using the lifting scheme on the D4algorithm.

1 #include <s t d i o . h>2 #include <s t d l i b . h>3 #include <math . h>4 #include <s t r i n g . h>56 #include ”waveread . h”7 #include ”wavelet . h”89

1011121314 // return the Daubechies D4 wave l e t decomposit ion from { b u f f e r [ s i g s t a r t ] to

b u f f e r [ s i g s t a r t+s i g l e n ]} in wave l e t15 void wavelet db4 ( f loat ∗wavelet , int s i g l e n ) {16 int n , i , f i r s t , l a s t , h a l f ;17 f loat tmp , sq r t3=sq r t (3 ) , sq r t2=sq r t (2 ) ; // c a l c u l a t e the square

roo t s only once in s t ead o f doing i t in every i t e r a t i o n to savecomputation .

181920 // The D4 decomposit ion21 for ( n=s i g l e n ; n > 1 ; n = n >> 1) { // forward transform − l e n g t h

i s ha lved each i t e r a t i o n2223 // S p l i t s t ep ( even elements are p laced in the f i r s t h a l f and odd

elements in the second h a l f )24 f i r s t =1;25 l a s t=n−1;26 while ( f i r s t <l a s t ) {27 for ( i=f i r s t ; i<l a s t ; i=i +2){28 tmp=wavelet [ i ] ;29 wavelet [ i ]=wavelet [ i +1] ;30 wavelet [ i +1]=tmp ;31 }32 f i r s t ++;33 l a s t −−;34 }

65

3536 // Forward transform s tep − coded from equat ions in s e c t i on ”A

L i f t i n g Scheme Version o f the Daubechies D4 Transform ” at h t t p://www. bearcave . com/mis l / m i s l t e c h / wave l e t s / daubechies / index . html

37 // The transform i s performed in 4 s t e p s :38 // 1) update ( add u1 ( odd ) to even )39 // 2) Pred ic t ( s u b t r a c t p ( even ) from odd )40 // 3) update ( add u2 ( odd ) to even )41 // 4) Normalize4243 ha l f = n/2 ;4445 // Update 146 for ( i = 0 ; i < ha l f ; i++){47 wavelet [ i ] = wavelet [ i ] + sqr t3 ∗ wavelet [ h a l f+i ] ;48 }4950 // Pred ic t51 wavelet [ h a l f ] = wavelet [ h a l f ] − ( sq r t3 /4 . 0 ) ∗wavelet [ 0 ] − ( ( (

sqrt3 −2) /4 . 0 ) ∗wavelet [ ha l f −1]) ;52 for ( i = 1 ; i < ha l f ; i++){53 wavelet [ h a l f+i ] = wavelet [ h a l f+i ] − ( sq r t3 /4 . 0 ) ∗

wavelet [ i ] − ( ( ( sqrt3 −2) /4 . 0 ) ∗wavelet [ i −1]) ;54 }5556 // Update 257 for ( i = 0 ; i < ha l f −1; i++){58 wavelet [ i ] = wavelet [ i ] − wavelet [ h a l f+i +1] ;59 }60 wavelet [ ha l f −1] = wavelet [ ha l f −1] − wavelet [ h a l f ] ;6162 // Normalize63 for ( i = 0 ; i < ha l f ; i++){64 wavelet [ i ] = ( ( sqrt3 −1.0) / sq r t2 ) ∗ wavelet [ i ] ;65 wavelet [ i+ha l f ] = ( ( sq r t3 +1.0) / sq r t2 ) ∗ wavelet [ i+

ha l f ] ;66 }67686970 }71 }

29 ann.h

The header file for the neural networks, contain definitions of relevant pa-rameters, that can be changed at compile time, like the sigmoid functions,training algorithms, network characteristics etc. Also the prototypes for thetraining, testing and run functions are specified.

1 #ifndef ANN H2 #define ANN H

66

34 /∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗5 ∗ For t r a i n i n g : ∗6 ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/78 // Comment in the t r a i n i n g a lgor i thm9

10 //#de f ine ANN TRAIN ALG FANN TRAIN INCREMENTAL11 //#de f ine ANN TRAIN ALG FANN TRAIN BATCH12 #define ANN TRAIN ALG FANN TRAIN RPROP // t h i s i s the d e f a u l t s e t t i n g13 //#de f ine ANN TRAIN ALG FANN TRAIN QUICKPROP1415 // Comment in the wanted t r an s f e r func t i on1617 //#de f ine ANN NEURON TF FANN LINEAR18 //#de f ine ANN NEURON TF FANN LINEAR PIECE19 //#de f ine ANN NEURON TF FANN LINEAR PIECE SYMMETRIC20 //#de f ine ANN NEURON TF FANN SIGMOID21 #define ANN NEURON TF FANN SIGMOID SYMMETRIC22 //#de f ine ANN NEURON TF FANN SIGMOID SYMMETRIC STEPWISE23 //#de f ine ANN NEURON TF FANN SIGMOID STEPWISE24 //#de f ine ANN NEURON TF FANN THRESHOLD25 //#de f ine ANN NEURON TF FANN THRESHOLD SYMMETRIC26 //#de f ine ANN NEURON TF FANN GAUSSIAN27 //#de f ine ANN NEURON TF FANN GAUSSIAN SYMMETRIC28 //#de f ine ANN NEURON TF FANN ELLIOT29 //#de f ine ANN NEURON TF FANN ELLIOT SYMMETRIC3031 #define NUMOUTPUT 1 // − output neurons32 #define NUM LAYERS 3 // − l a y e r s33 #define NUM HIDDEN 5 // − hidden neurons34 #define DES ERR 0.001 // de s i r ed error35 #define MAX EPOCHS 20000 // 500000 // maximum number o f

t r a i n i n g s t e p s36 #define EPOCHS BETWEEN REPORTS 1000 // how many s t ep s between repor t s ? (

d i s p l a y current error )373839 /∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗40 ∗ Limits f o r the networks ∗41 ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/42 #define A4 ST 16//3243 #define A4 END 128//6444 #define A4 LEN A4 END−A4 ST4546 /∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗47 ∗ Prototypes ∗48 ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/4950 int ann t ra in (char ∗ i n f i l ename , char ∗ out f i l ename , const unsigned int

num input ) ;51 int ann te s t (char ∗ t e s t f i l e name , char ∗ net f i l ename ) ;52 f loat ann run (char ∗ net f i l ename , f loat ∗ indata , char ∗ out f i l ename ) ;5354 #endif

67

30 ann train.c

This function is called, when a network is to be trained. It reads from thespecified input file, trains with the selected number of inputs and writesthe network in the output file. The files must be opened before calling thefunction.

123 #include <s t d i o . h>45 #include ” fann . h”6 #include ”ann . h”78 int ann t ra in (char ∗ i n f i l ename , char ∗ out f i l ename , const unsigned int

num input ) {9

1011 const unsigned int num output = NUMOUTPUT;12 const unsigned int num layers = NUM LAYERS;13 const unsigned int num neurons hidden = NUM HIDDEN;14 const f loat d e s i r e d e r r o r = ( const f loat ) DES ERR;15 const unsigned int max epochs = MAX EPOCHS;16 const unsigned int epochs between repor t s = EPOCHS BETWEEN REPORTS;171819 // Create FANN st ruc t , d e f i n i n g the ANN20 struct fann ∗ann ;2122 // ANN i s i n i t i a l i z e d from d e f i n i t i o n s in ann . h23 ann = fann c r ea t e s t anda rd ( num layers , num input , num neurons hidden

, num output ) ;2425 // Define t r an s f e r f unc t i ons f o r the neurons − TEST NEURON TF i s de f ined in

ann . h26 f a nn s e t a c t i v a t i o n f un c t i o n h i dd en ( ann , ANN NEURON TF) ;27 f a nn s e t a c t i v a t i o n f un c t i o n ou t pu t ( ann , ANN NEURON TF) ;2829 // Define t r a i n in g a lgor i thm ( not necessary − RPROP i s d e f a u l t ) −

TEST TRAIN ALG i s de f ined in ann . h30 f a nn s e t t r a i n i n g a l g o r i t hm (ann , ANN TRAIN ALG) ;3132 // t ra in network33 p r i n t f ( ”Train ing on %s :\n” , i n f i l ename ) ;34 f a n n t r a i n o n f i l e ( ann , in f i l ename , max epochs ,

epochs between report s , d e s i r e d e r r o r ) ;3536 // Save network37 p r i n t f ( ”The network i s saved in %s :\n\n” , out f i l ename ) ;38 fann save ( ann , out f i l ename ) ;3940 // Destroy network to f r e e memory :41 fann des t roy ( ann ) ;4243 return (0 ) ;

68

4445 }

31 ann test.c

This function does almost the same as the training, except that the networkisn’t trained, but the mean square error of running the network on data ofthe same form as the training data is calculated.

1 #include <s t d i o . h>23 #include ” fann . h”4 #include ”ann . h”56 int ann te s t (char ∗ t e s t f i l e name , char ∗ net f i l ename ) {78 int i ;9

10 // Create 4 s t ru c t s , d e f i n i n g the ANNs − t h ee se are t r ea t e d one atthe time

11 struct fann ∗ann ;12 struct f a nn t r a i n da t a ∗data ;131415 // Load network16 p r i n t f ( ”Opening net : %s \n” , ne t f i l ename ) ;17 ann = f a n n c r e a t e f r om f i l e ( ne t f i l ename ) ;18 i f ( ann == NULL)19 {20 f p r i n t f ( s tde r r , ”Error : The net f i l e %s cannot be

opened f o r read ing \n” , ne t f i l ename ) ;21 return (1 ) ;22 }2324 // Test on f i l e25 p r i n t f ( ”Test ing on f i l e : %s \n” , t e s t f i l e n ame ) ;2627 data = f a n n r e a d t r a i n f r om f i l e ( t e s t f i l e n ame ) ;28 i f ( data == NULL)29 {30 f p r i n t f ( s tde r r , ”Error : the t e s t f i l e %s cannot be

opened f o r read ing \n” , t e s t f i l e n ame ) ;31 return (1 ) ;32 }3334 f ann t e s t da t a ( ann , data ) ;35 p r i n t f ( ”Test r e s u l t : MSE of ANN %s = %f \n” , net f i l ename ,

fann get MSE (ann ) ) ;3637 // Destroy network to f r e e memory :38 fann des t roy ( ann ) ;3940 return (0 ) ;

69

4142 }

32 ann run.c

This function is used for giving an output from an already trained network.There appear to be something strange somewhere in the neural network part,as the output always gives a lot of floating point numbers of the exact samevalue.

1 #include <s t d i o . h>2 #include <s t d l i b . h>34 #include ” fann . h”5 #include ”ann . h”678 f loat ann run (char ∗ net f i l ename , f loat ∗ indata , char ∗ out f i l ename ) {9

10 f loat ∗ c a l c ou t ;11 struct fann ∗ann ;1213 FILE ∗ o u t f i l e ;1415 // p r i n t f (”ann : run : indata [31]=% f \n” , indata [ 3 1 ] ) ;1617 // Load network18 // p r i n t f (”Opening net : %s\n” , ne t f i l ename ) ;19 ann = f a n n c r e a t e f r om f i l e ( ne t f i l ename ) ;2021 // Open f i l e f o r wr i t i n g r e s u l t s22 // p r i n t f (”Open o u t f i l e %s fo r wr i t i n g \n” , out f i l ename ) ;23 i f ( ( o u t f i l e = fopen ( out f i l ename , ”a” ) )==NULL)24 {25 f p r i n t f ( s tde r r , ”Error : %s cannot be opened f o r wr i t i ng \n” ,

out f i l ename ) ;26 // Destroy network to f r e e memory :27 fann des t roy ( ann ) ;28 return (1 ) ;29 } else {3031 c a l c ou t=fann run (ann , indata ) ;32 f p r i n t f ( o u t f i l e , ”%f \n” , c a l c ou t [ 0 ] ) ;3334 // p r i n t f (” r e s u l t a t : %f \n” , c a l c ou t [ 0 ] ) ;35 f c l o s e ( o u t f i l e ) ;36 }37 // Destroy network to f r e e memory :38 fann des t roy ( ann ) ;3940 return ( c a l c ou t [ 0 ] ) ;41 }

70

Part V

Matlab Source Code

33 pianocomp.m

1 % s c r i p t e t t age r en wavesekvens a f de 88 optagede t e s t og t raen ing s toner ind, skaerer dem i b idder a f 30.000 og l a v e r en matrice med

2 % ”backup” er den raa wav e f i l der indeho lder hhv t raen ings og t e s t d a t a3 % foer s c r i p t e t koeres kop i e r e s no t e s t r i n g over i backup45 de t ec t ednote s = 88 ; % kan aendres hv i s der eksempe lv i s i k k e ” h i t t e s ” paa

a l l e 88 toner . eks . pga s t o e j .6 note l ength = 30000; % de 30.000 samples der s k a l udtages a f hver en k e l t tone7 maxarray = zeros (1 , de t e c t ednote s ) ;% array der indeho lder indeks paa de

s t o e r s t e peaks89 no t e s t r i n g = [ zeros (1 ,50000) backup ] ; %zerospaddes

1011 newstr ing = zeros (1 , de t e c t ednote s ∗ note l ength ) ; % kommer t i l a t indeho lde en

s t r eng med a l l e de t i l k l i p p e d e toner1213 for i = 1 : de t e c t ednote s14 [m p ] = max( n o t e s t r i n g ) ; %re turnerer indek s e t paa s t o e r s t e ( p o s i t i v e )

ampl i tude15 maxarray ( i ) = p ; %indek s e t gemmes16 no t e s t r i n g (p−50000:p+50000) = zeros (1 ,100001) ; %Peaken ov e r s k r i v e s med

0 , f o r i k k e at g i v e f l e r e h i t s17 end1819 maxarray = sort ( maxarray ) ; %indeksene s o r t e r e s2021 % a l l e tonerne smides i en lang s t r eng22 for i = 1 : de t e c t ednote s23 newstr ing ( ( ( i −1)∗ note l ength )+1: i ∗ note l ength ) = backup ( maxarray ( i ) −54999:

maxarray ( i )−25000) ;24 end2526 t e s tmat r i x = zeros (88 , note l ength ) ;% ” t e s t ”matrix u d s k i f t e s med ” t r a i n i n g ”

matrix , og s c r i p t e t koeres igen2728 for i = 1 : 8829 te s tmat r i x ( i , : ) = newstr ing ( ( ( i −1)∗ note l ength )+1: i ∗ note l ength ) ; %tonerne

l a e g g e s over i en matrice30 end

34 pianomix.m

1 function [ wav e f i l e , d e t e c t i onv e c t o r ] = pianomix ( pitch , notes ,no o f s imul taneous , notematr ix )

71

2 %Tager ” p i t c h ” , ” notes ” = anta l sekvenser der oenskes generere t , ” no o f . . . ”maks . an ta l samt id ige toner , matricen med toner

3 %re turnerer en wav e f i l og en vek tor der f o r t a e l l e r i h v i l k e sekvenser denoenskede tone er t i l s t ede

45 p i t ch = pi t ch − 20 ; %kond i t i one re s t i l matricen67 i f ( ( p i t ch < 1) | | p i t ch > 88)8 disp ( ’ Pitch out o f range ! ’ ) ;9 return

10 end1112 i f ( ( notes < 1) | | notes > 50001)13 disp ( ’ Notes out o f range ! ’ ) ;14 return15 end1617 wave f i l e = zeros (1 ,4096∗ notes ) ;1819 de t e c t i onv e c t o r = zeros (1 , notes ) ; % 0 hv i s tonen i k k e er t i l s tede , 1 e l l e r s2021 for i = 1 : notes2223 p i t chve c t o r = zeros (1 , 10 ) ; % a l l e ikke−nul vae rd i e r ud t rykker en

komposant2425 i f rand < 0 .5 %den oenskede tone medtages26 de t e c t i onv e c t o r ( i ) = 1 ;27 p i t chve c t o r (1 ) = p i t ch ;2829 ex t ranote s = f loor ( no o f s imu l taneous ∗rand ) ; %t i l f a e l d i g t h e l t a l

mellem 0 og 9 , begge ink .3031 i f ex t ranot e s ;32 j = 1 ;33 while j <= ext ranote s34 p i t chve c t o r ( j +1) = ce i l (88∗rand ) ; %t i l f a e l d i g t h e l t a l mellem

1 og 88 , begge ink .35 i f ( length ( unique (nonzeros ( p i t chve c t o r ( 1 : j +1) ) ) )==j +1)36 j = j +1; %f ind ny t i l f a e l d i g tone , h v i s den fundne

a l l e r e d e er medtaget37 end38 end39 end4041 else %den oenskede tone medtages i k k e42 de t e c t i onv e c t o r ( i ) = 0 ;4344 ex t ranote s = f loor ( ( no o f s imu l taneous +1)∗rand ) ; %t i l f a e l d i g t h e l t a l

mellem 0 og 10 , begge ink .4546 i f ex t ranot e s ;47 j = 1 ;48 while j <= ext ranote s49 p i t chve c t o r ( j ) = ce i l (88∗rand ) ; %t i l f a e l d i g t h e l t a l mellem 1

og 88 , begge ink .

72

50 i f ( ( length ( unique (nonzeros ( p i t chve c t o r ( 1 : j ) ) ) )==j ) &&p i t chve c t o r ( j )˜=p i t ch )

51 j = j +1; %f ind ny t i l f a e l d i g tone , h v i s den fundnea l l e r e d e er medtaget , e l l e r den oenskede tone ermedtaget

52 end53 end54 end5556 end5758 %de 4096 samples udtages f o r s k e l l i g e s t ede r f ra de 30.000 samples59 for k = 1 : length (nonzeros ( p i t chve c t o r ) )60 time = round(15000∗rand ) ;61 wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) = wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) +

notematr ix ( p i t chve c t o r ( k ) , time +10001: time+14096) ; %mixkomposit ion

62 end636465 %normal i ser ing66 i f (max( wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) )>1)67 wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) = wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) /(

max(abs ( wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) ) +0.01) ) ; %normal i ser ing68 end69 end

35 featureextraction.m

1 function [ output ] = f e a t u r e e x t r a c t i o n ( s i gna l , p i t ch )23 % Signa l s k a l vaere 4096 samples45 %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− wave l e t .m

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−67 % Based on matlab code from h t t p ://www. con t ro l . auc . dk/˜ a l c /Fnct−31.m8 % which i s a s soc i a t ed with the book9 % ”Ripp les in Mathematics − The Discre t e Wavelet Transform”

10 % Arne Jensen , Anders l a Cour−Harbo , Springer−Verlag 2001.11 % ISBN 3−540−41662−5.12 % See a l s o h t t p ://www. con t ro l . auc . dk/˜ a l c / r i p p l e s . html1314 S=s i g n a l ( 1 : 4096 ) ;15 wl = [ ] ;1617 N = length (S) ;1819 while N>120 s1 = S ( 1 : 2 :N−1) + sqrt (3 ) ∗S ( 2 : 2 :N) ;

% update 121 d1 = S ( 2 : 2 :N) − sqrt (3 ) /4∗ s1 − ( sqrt (3 )−2) /4∗ [ s1 (N/2) s1 ( 1 :N/2−1) ] ;

% pred i c t22 s2 = s1 − [ d1 ( 2 :N/2) d1 (1 ) ] ;

% update 2

73

23 s = ( sqrt (3 )−1)/sqrt (2 ) ∗ s2 ;% normalize

24 d = ( sqrt (3 )+1)/sqrt (2 ) ∗ d1 ;% normalize

2526 wl=[d wl ] ; % save WL transform in vec tor ( l i k e the c−code )27 N=N/2 ; % prepare f o r next s t ep . .28 S=s ;29 end30 wl=[ s wl ] ;3132 %

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

333435 output = [ wl ( 17 : 3 2 ) wl ( 33 : 6 4 ) wl ( 65 : 9 6 ) wl (129 : 160 ) ] ;

36 NNgen.m

1 %Sc r i p t e t opre t t e r , t raener og genererer t e s t d a t e f o r netvaerk23 p i t ch = [21 1 0 8 ] ; %Det i n t e r v a l a f p i t c h e s der s k a l t raenes og t e s t e s45 Anta l t e s t t on e r = 1000 ;6 max s imultane toner = 10 ;78 Result = zeros (88 , Anta l t e s t t on e r ) ; %det f a k t u e l l e r e s u l t a t9 Netoutput = zeros (88 , Anta l t e s t t on e r ) ; %netvae rk s r e sponse t

10 wave le tar ray = zeros ( Anta l t e s t t one r , 1 12 ) ;1112 for m = pitch (1 ) : p i t ch (2 )1314 load t r a in ingmat r i x ;15 %wave f i l t i l t raen ing l a v e s og vek to r med 0 og 1 l a v e s16 [ wave f i l e , d e t e c t i onv e c t o r ] = pianomix ( m, Anta l t e s t t one r ,

max simultane toner , t r a in ingmat r i x ) ;17 %wave le tdekompos i t ion18 for i = 1 : Anta l t e s t t on e r19 wave le tarray ( i , : ) = f e a t u r e e x t r a c t i o n ( wave f i l e ( ( ( i −1)∗4096)+1: i

∗4096) ,m) ;20 end2122 clear wave f i l e t r a in ingmat r i x2324 net = newff ( waveletarray ’ , d e t e c t i onvec to r , [ 2 0 30 30 ] ) ; % opre t t e r e t nyt25 %neura l t ne tvaerk a f feed forward typen , med 112 inputneuroner , 3 s k j u l t e26 %lag med hhv . 20 , 30 og 30 neroner , og 1 outputneuron2728 net = i n i t ( net ) ; % soerger f o r d e f a u l t i n i t i a l i s e r i n g a f vaeg te og b i a s

inden traen ing2930 net . trainParam . epochs = 5 ; % netvae rke t ud sae t t e s f o r d a t a s a e t t e t 5 gange3132 net = t r a i n ( net , waveletarray ’ , d e t e c t i onv e c t o r ) ; %ne t t e t t raenes

74

3334 %ne t t e t gemmes35 save ( s t r c a t ( ’NN’ ,num2str(m) ) ) ;3637 %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−t e s t

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−3839 load t e s tmat r i x ;4041 %wave f i l t i l t e s t genereres og de t e c t i onvek t o r en s k t i v e s t i l en matrice42 [ wave f i l e , Result ( i , : ) ] = pianomix ( m, Anta l t e s t tone r ,

max simultane toner , t e s tmat r i x ) ;4344 for i = 1 : Anta l t e s t t on e r45 %wave l e t dekomposit ion46 wave le tar ray ( i , : ) = f e a t u r e e x t r a c t i o n ( wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) ,m)

;4748 end4950 clear wave f i l e t e s tmat r i x5152 %Netvaerket s imu leres med t e s t d a t a og re turnere ne t vae rk s r e sponse t53 Netoutput ( i , : ) = sim ( net , waveletarray ’ ) ;54 clear net555657 m58 end

37 resultpresentation.m

12 e r ro r type = zeros (7 ) ; %1: cor r e c t h i t s , 2 : co r r e c t misses , 3 : f a l s e h i t s3 % 4: f a l s e miss , 5 ; LSE, 6 : mean error , 7 : s i gn examination45 for i = 21 : 10867 t r u e h i t s = 0 ;8 t ru e m i s s e s = 0 ;9 f a l s e h i t s = 0 ;

10 f a l s e m i s s e s = 0 ;1112 for j = 1 : 10001314 i f ( round( Netoutput ( i , j ) )==1 && Result ( i , j )==1 )15 t r u e h i t s = t r u e h i t s + 1 ;16 end17 i f ( round( Netoutput ( i , j ) )==0 && Result ( i , j )==0 )18 t ru e m i s s e s = t ru e m i s s e s + 1 ;19 end20 i f ( round( Netoutput ( i , j ) )==1 && Result ( i , j )==0 )21 f a l s e h i t s = f a l s e h i t s + 1 ;22 end23 i f ( round( Netoutput ( i , j ) )==0 && Result ( i , j )==1 )

75

24 f a l s e m i s s e s = f a l s e m i s s e s + 1 ;25 end2627 end2829 e r ro r type (1 , i ) = t r u e h i t s ;30 e r r o r type (2 , i ) = t ru e m i s s e s ;31 e r r o r type (3 , i ) = f a l s e h i t s ;32 e r r o r type (4 , i ) = f a l s e m i s s e s ;33 e r r o r type (5 , i ) = sum( power ( Netoutput ( i , : )−Result ( i , : ) , 2 ) ) /1000 ;34 e r r o r type (6 , i ) = sum(abs ( Netoutput ( i , : )−Result ( i , : ) ) ) /1000 ;35 e r r o r type (7 , i ) = sum( Result ( i , : )−Netoutput ( i , : ) ) /1000 ;3637 end3839 % x = [ 2 1 : 1 0 8 ] ;40 %41 % p lo t f o rma t e r i n g ;42 % p l o t ( x , e r ror t ype ( 3 , : ) , ’ r ’ , x , e r ror t ype ( 4 , : ) , ’ b ’ ) ;43 % se t ( gca , ’XLim’ , [ 2 1 108]) ;44 %45 % p lo t f o rma t e r i n g ;46 % p l o t ( x , e r ror t ype ( 5 , : ) , ’ r ’ , x , e r ror t ype ( 6 , : ) , ’ b ’ ) ;47 % se t ( gca , ’XLim’ , [ 2 1 108]) ;48 %49 % p lo t f o rma t e r i n g ;50 % p l o t ( x , e r ror t ype ( 7 , : ) , ’ r ’ ) ;51 % se t ( gca , ’XLim’ , [ 2 1 108]) ;

76

Piano Transcription using Wavelet Decomposition and...

Documents

Transcript of Piano Transcription using Wavelet Decomposition and...