Piano Transcription using Wavelet Decomposition and...
Transcript of Piano Transcription using Wavelet Decomposition and...
Piano Transcription usingWavelet Decomposition and Neural Networks
Esben Madsen, Johnni Thomsen Pedersen and Louise Baldus VestergaardGroup 742, Supervisor: Søren Krarup Olesen
Abstract
This paper examines the possibility of transcribing the notesplayed on a piano using a simple feature extraction and neu-ral networks. Earlier transcription systems using neuralnetworks have used considerably complex algorithms to han-dle polyphonic music.
We have implemented the Daubechies D4 wavelet decompo-sition in ANSI C for feature extraction and suggested a neu-ral network feed-forward structure for note detection.
88 networks were constructed, one for each note on a piano.Training of the 88 networks have been done with a reducedset of training data; each note of one specific piano have beenused. The results achieved were inconclusive; the overall per-formance are not satisfactory, perhaps due to too few train-ingdata or non-optimal wavelet decomposition. From the re-sults we find it acceptable to conclude that it is possible touse wavelet decomposition as a feature extraction tool for aneural network, but the ammount of training data must begreater and other wavelet decompositions should be exam-ined.
Keywords: Piano transcription, wavelet decomposi-tion, neural networks.
1 Introduction
Various solutions have been proposed for creat-ing an automatic transcription system for mu-sic composers and musicians in general [1]. Asthere are many different genres of music and in-struments, most of the previous transcription sys-tems have focused on either classification of in-struments [2], detection of notes [3] or rhythmplayed [4] or a combined version of these [5]. Asa full transcription system is out of scope for thisresearch, the aim has been limited to transcriptionof one specific piano, with focus on feature extrac-tion.
1.1 Note Characteristics
Piano notes consists of a fundamental frequency,F0, and overtones. The overtone partial frequen-cies are slightly inharmonic [1, p.203]. The pianois a pitched instrument [1, p.167], and feature ex-traction of the piano notes should be solved bysome kind of pitch detection. Since piano soundsconsist of several frequency components, pitchdetection should either aim to find the funda-mental frequency played, or to extract the signif-icant features of the note. The notes of a pianospans over a little more than eight octaves, hencethe frequency contents of the seperate tones andovertones might overlap when notes are playedsimultaniously. For a transcription to be usefullfor musicians, the transcription should be repre-sented on a note sheet, hence the time at whichthe note would be played is important too.
1.2 Time vs Frequency Analysis
Music can be interpreted in the time and/or fre-quency domain. In the time domain, music isrecorded and played. Detection of notes presentcould be achieved with cross correlation methods.In the frequency domain it can be represented andunderstood by e.g. Fourier Transform(FT) [1, p.21]. FT reveals the frequency contents of a signal,but it is only well defined for infinite length, sta-tionary, continous sine waves. This is not fullyusefull for music transcription, since the signalprocessed is neither infinite nor stationary [1]. In-stead a time-frequency representation should bechosen.
1
2 System Description
The constructed transription system can basicallybe caracterized as a pattern recognition system,and thus a nerual network as used in this systemis one of several options.
The transcription system converts a piece of sam-pled piano music to a representation of notes ona time scale. The system was divided into theblocks shown on figure 1.
FIGURE 1: Block diagram of the sys-tem
The two most important blocks were the featureextraction and the note recognition, and they re-cieved the most attention in this paper.
The feature extraction block had the purpose ofcompressing the massive amount of informationin the sound data to as few as possible meaning-ful features, that could be analyzed further. Thisblock was implemented as a wavelet decomposi-tion, described in 4.
The note recognition block was used for analyz-ing the features extracted from the signal. Thiswas implemented with neural networks, and theimplementation described in 5.
The decision/discrimination block was meant togive a reliable binary result based on the noterecognition. This was done simply with a thresh-old on the output of the neural networks.
Presentation of the output could be done in vari-ous ways. For a fully working system, presenta-tion on a note sheet and/or as a MIDI file wouldbe desired, but for this research, the presentationof data was done simply by viewing the outputfrom the decision/discrimination block.
3 Method overview
In the following the term “pitch” is used. The def-inition of pitch in this paper is the same as used inthe MIDI protocol. Most musicians would proba-bly call the 88 keys on a piano for A0 to C8, withA4 having a fundamental frequency of 440 Hz.Thus A0 would have a fundamental frequencyof 27.5 Hz. Since music is percieved dyadically,there is uneven spacing between the fundamen-tal frequency of the notes. It is often practical tohave an expression for these with even spacing.The MIDI definition is:
Pitch = 69 + 12 · log2(f
440) (1)
A4 now corresponds to pitch 69. The lowest noteon the piano is pitch 21, the highest is pitch 108.
As preprocessing, the music is split into blocksof 4096 samples, corresponding to a little under110 of a second, since the sampling frequency is44.1 kHz. The processing of a sound block is il-lustrated on figure 2, following these steps:
1. A block of 4096 samples of the wave file ispicked out.
2. The block is processed, using the imple-mented wavelet transform, which is de-scribed in 4.
3. 112 predefined coefficients of the waveletdecomposition is extracted.
4. The coefficients are fed to each of the 88 neu-ral networks, each network charged withthe detection of a specific note.
5. Each neural network then present a valueon the output. The value indicates howlikely it is that the note, which this spe-cific network was trained to recognize, waspresent in the block.
2
FIGURE 2: Data flow through the sys-tem
The array of neural networks will enable detec-tion of polyphonic piano music.
4 Feature Extraction
For the feature extraction, it was decided to testwhether a wavelet decomposition would suffice,compared to the more complex methods pre-viously used, like for instance the networks ofadaptive oscillators used by Marolt in [5].
The wavelet transforms have some interestingfeatures compared to other methods of extractingfrequency eg. the Fourier transform.
• In it’s commonly used form, the scale of thetransform is dyadic, that is, each frequencyband contains one octave.
• The algorithms are very effective – thecomplexity of a Discrete Wavelet Trans-form (DWT) for some algorithms is O(N).For other algorithms the complexity can inworst case be up to O(N · log(N)), like theFast Fourier Transform (FFT). [6, p. 40]
• A wavelet expansion (inverse transform)can give a better description and separationof local changes than a fourier transform [6,p. 7], that by definition only can represent asignal as a combination of sines.
• A wavelet can be designed for a specifictype of signals, and is thus able to containsharp corners and represent discontinuitiesor sharp corners with only a few coeffi-cients, where the Fourier Transform wouldrequire a lot of coefficients.
The mother and father1 wavelet pair chosen forimplementation was the Daubechies D4 algo-rithm, which is used in a wide range of applica-tions, and often used for examples in litteratureon wavelets, due to it being both very simple andvery effective.
The DWT was implemented using the liftingscheme2 [7], which improves the complexity byroughly 50% compared to the standard filter bankimplementation [7, p. 264].
To balance frequency and time resolutions, an in-put of 4096 samples (with sample rate 44.1 kHz)is selected, giving an output of the same size, andthe DWT is performed recursively, giving 12 sub-bands of length 20 to 211, each band containingone octave (not necessarily equal to the piano oc-taves).
1The mother wavelet is the wavelet function, and the father wavelet is the scaling function2Lifting scheme is an effective way of implementing a wavelet decomposition
3
Frequencyrange (Hz)
Pitch(MIDI)
Number ofsamples
Usedsamples
112
21-43 21-28 443-86 29-40 886-172 41-52 16 16172-345 53-64 32 32345-689 65-76 64 32689-1378 77-88 128 321.38-2.76 k 89-100 2562.76-5.51 k 101-108 5125.5-11.0 k 102411-22 k 2048
4096 112
TABLE 1: Contents of the wavelet de-compositions and the used samples
Table 1 illustrates how parts of the wavelets areselected for further analysis. As the upper 2bands (5.5-11 and 11-22 kHz) does not containmuch relevant information, disredarding onsetdetection [1, p. 108] (the highest fundamental fre-quency is just above 4 kHz), these are discardedhereby already limiting the data by 1
4 . Further-more, by looking at the transforms of recordedsamples, the scope is limited to only includingthe four octave bands from 86 Hz to 1375 Hz, asthese contain the most information/amplitude re-sponse overall. Finally, only the first 32 samplesfrom each of the two bands with 64 and 128 sam-ples are used, giving a total of 112 samples to useas input to the neural networks.
There are of course other types of wavelets andscaling functions than the one used, includingHaar, which is the most simple type as wellas more comlex constructions, like the Cohen-Daubechies-Feauveau wavelet, however a reviewof multiple types is out of scope of this article.
5 Note Recognition
To detect which notes were present in the pi-ano music, some note recognition was needed.As previously mentioned, neural networks werechosen for this task, since they have been used byothers with good results [1] [5].
A neural network consists of at least an input andan output layer, and possibly some hidden lay-ers, each containing a number of neurons. Theneural network takes as input a feature set, whichwill have to be generated from the input signal,and via a weighing vector an output result isachieved.
Neural networks are trained prior to recognition.This training can be supervised, telling the net-work what the desired output are, given a specificinput. It is also possible to train a network un-supervised. Typically, the training algorithm willthen try to classify the input in 2 or more prepro-grammed classes.
Neural networks have a huge advantage whenit comes to runtime complexity. When trainedsufficiently, they are fast to execute and don’ttake up much memory. Due to their nature, alarge amount of training data is needed by super-vised learning in order to make the results suffi-ciently reliable. If the training data doesn’t con-tain enough “general” information of the class tobe detected, it can respond very well to trainingdata and not well at all to unknown data. Thisfact and the time and computational resources re-quired during training is the foremost drawback.It is also very difficult to construct a “cookbook”neural network; a lot of iterations are needed.
The major force of neural networks are the effe-ciency during run-time. Once properly trained,the execution takes up a very small amount ofmemory. In the case of the feed forward network,there are no need to save any calculations, exceptthe results needed for the next neuron.
A neuron consists of weighting coefficients foreach input, these are summed, and possiblybiased and an activation function handles thesummed output [8, p. 11].
The activation function can theoretically be anykind of function, but in actual implementation3 types are predominant: The threshold func-tion, the piecewise-linear function and the sig-moid function.
According to [8, p. 14], the sigmoid is the com-monly used. As no exact structures of neural net-works are mentioned in the method descriptionof neither [5] nor [9], the sigmoid activation func-tion was chosen.
The network was chosen to have:
4
• supervised learning, since training - andtest data are avaliable
• multilayer structure, due to the complexityof the desired system
• feed-forward, fully connected structure
It was chosen to implement a network with thefollowing neuron - and layer structure:
• 112 neurons in the input layer, for each ofthe extracted sample values
• Three hidden layers with 20, 30 and 30 neu-rons respectively
• An output layer with one neuron
This structure of the network is outlined in thebottom of figure 2.
5.1 System Development
5.1.1 Data
Extraction of commercially prerecorded piano se-quences have been beyond the scope of this arti-cle. Instead it has been attempted to use a ratherreduced sample space. For the test- and trainingnotes respectively, a separate sequence of all 88piano notes were recorded to a wave file with aresolution of 16 bit and a sample rate of 44.1 kHz.Each note was played, keeping the key depressedsomewhere between one half and one second.Both sequences were recorded using the same pi-ano. As the conditioning of both training and testdata were identical, they will from here on be re-ferred to simply as “the data”. An examinationof the recorded wave files showed a considerableincrease in signal energy at the onset of each note.This is due to the percusive nature of the hammerhitting the strings [1, p. 108]. Each note were ex-tracted into its own wave file, containing 30.000samples, starting 2000 samples before maximumenergy level were reached for each note.
Separate data were constructed for each specificneural network, as displayed in figure 4. Thisway each network could be trained with their tar-get note present in half of the data. The data con-tained 1.000 sequences consisting of 4096 samplesof mixed piano notes, totalling a little more than
4 million 16 bit samples. It was made sure thatroughly half of the sequences contained the notethat the specific network were to detect. Each se-quence consisted of 0 to 3 simultaneously playedpiano notes with uniform distribution of bothnumber of notes played and which notes wasplayed, not considering the target note. To takethe hammer stroke into account and get more dif-ferent looking data from the sample set, the 4096samples to be extracted from each single notewere chosen at random. Each note could be takenarbitrarily from the first sample to the 25.000ndsample. Figure 3 shows an example of data for thenetwork to detect pitch 69. This was done to min-imize the risk of overfitting the networks. Eachnetwork was trained with 5 epochs (repetitions)of training data. Figure 3 shows an example ofdata for the network to detect pitch 69.
FIGURE 3: Example of 5 consecutivegenerated sequences
5
FIGURE 4: A flowchart of note extrac-tion and training of the neural net-works
6 Results
MatLab have been used for generation and train-ing of the neural networks.
There is a huge difference in how well a given net-work performs. Figure 5 shows the Type I andII errors for all networks, where Type I is a falsepositive/hits, type II is a false negative/misses.These errors are taken from a dataset of 1000 se-quences. Until around pitch 60, type II errors areby far predominant, meaning that extremely few
hits are detected. Around pitch 80 the amountof type I and type II error roughly evens out, butwithout much consistency from pitch to pitch. Allin all 79.4% of all errors are Type II errors.
FIGURE 5: Type I (solid red line) andtype II (blue stippled line) errors foreach network
Disregarding whether a given netresponse is con-sidered a hit or miss, figure 6 shows both the leastsquare error and the mean absolute error.
30 40 50 60 70 80 90 1000
0.2
0.4
0.6
0.8
FIGURE 6: Least square error(solidred line) and the mean absolute er-ror(stippled blue line)
As end result we achieved correct detection oftarget note in 33.1% of the actual note occurenceand correct absence of target note in 82.9 % of thecases. An overview is displayed table 2.
6
Output1 0
Input1 33.1% 66.9%
0 17.7% 82.9%
TABLE 2: Overview of results – Type Iand type II errors as well as correctlydetected notes.
It should be noted that pitch 71 performs excep-tionally well, so it was decided to investigate fur-ther. A new training and test sequence was run,this time with up to 10 different notes present and10.000 sequences. The note was correctly “de-tected” in 72.1% of the cases when present andcorrectly “not detected” in 97.7% when missing.
7 Discussion
Overall the results are ambiguous. The rootcause is considered to be the limited training datapresent. In the case of pitch 71, it is well beyondreasonable doubt that the results are not haphaz-ard. The extremely fine results for pitch 71 couldbe caused by a large corrolation between train-ing and testdata. But since it still performs well,even when using up to 10 simultaneous notes, wespeculate that the main cause is the choice of fea-tureset used. This indicates that the network re-sponse relies heavily on what parts of the waveletis used. We find it acceptable to conclude that itis possible to use wavelet decomposition as a fea-ture extraction tool for a neural network. Furtherresearch should focuse on evaluation of best fit-ting mother wavelet as well as selection of coeffi-cients from the wavelet decomposition.
The cause of the notable rise in variance for TypeI and II errors on figure 5 can be explained by thechoice of featureset. As seen in table 1, frequen-cies above 1.4 kHz are not represented. Accordingto formula 1, this roughly corresponds to pitch 87.That means that the fundamentals of pitch 88 -108is not represented in the featureset and that pitch75 - 87 is only represented by their fundamentalfrequency.
our decision algorithm is rather crude; if a net-work outputs more than 0.5, we consider it a hit.A much more plausible method would be to em-ploy a statistical framework. Both based on ac-cumulated a priori knowledge of the frequencywith which the note is played, but also which in-tervals seems reasonable; an interval of a smallsecond occurs with extremely lower probabilitythan for example an octave. Since the outputsfrom each network is not binary, these can easilybe weighed to accomodate a different statisticalprobability set depending on musicstyle.
8 Conclusion
A simplified framework for polyphonic pianonote recognition has been made. The goal hasbeen to determine whether a wavelet decompo-sition could be used for feature extraction for aneural network and this has been achieved onlyto some extent. The overall results does not fullyconfirm the usability of wavelets for decompos-tion, but a certain pitch is consequently perform-ing convincingly. Our test results have not shownwhether the decomposition provides an insuffi-cent featureset for the network, the network suf-fers from non-optimal design or the reduced sam-pleset used are to blame. Further studies in thefield should examine this issue.
A D4 wavelet decomposition lifting scheme hassuccefully been implemented in ANSI C. It hasbeen concluded that wavelet decomposition isvery efficient regarding execution speed com-paired to the FFT. The actual implementation ofD4 are based on Daubechies own publicationsand is more efficient than the default decompo-sition method, which uses filter banks.
9 Acknowledgements
We would like to thank Uwe Hartmann for an in-troduction to neural networks.
References
[1] Anssi Klapuri and Manuel Davy, editors. Sig-nal Processing Methods for Music Transcription.
7
Springer, 1st edition, 2006. ISBN 0-387-30667-6.
[2] Geoffroy Peeters Perfecto Herrera-Boyer andShlomo Dubnov. Automatic classification ofmusical instrument sounds. Journal of NewMusic Research, 2003.
[3] Anssi Klapuri. Automatic transcription ofmusic. Proceedings of the Stockholm Mu-sic Acoustics Conference, Stockholm, Swe-den, August 6-9, 2003 (SMAC 03).
[4] Anssi Klapuri and Manuel Davy, editors. Sig-nal Processing Methods for Music Transcription.Springer, 1st edition, 2006. ISBN 0-387-30667-6. Chapter 4: Beat Tracking and Musical Me-tre Analysis by Stephen Hainsworth.
[5] Matija Marolt. Transcription of polyphonicpiano music with neural networks. Proceed-ings of Workshop on Current Research Direc-
tions in Computer Music, Barcelona, Spain,November 15-17, 2001.
[6] Ramesh A. Gopinath C Sidney Burrus andHaitao Guo. Introduction to Wavelets andWavelet Transforms. Prentice Hall, 1st edition,1998. ISBN 0-13-489600-9.
[7] Ingrid Daubechies and Wim Sweldens. Fac-toring wavelet transforms into lifting steps.Journal of Fourier Analysis and Applications,1998. http://www.springerlink.com/content/r0n381423k7v8655/.
[8] Simon Haykin. Adaptive Filtering Theory. In-formation and System Sciences. Prentice Hall,4th edition, 2002. ISBN 0130901261.
[9] Monti Bello and Sandler. Techniques for auto-matic music transcription. In Proceedings ofthe first International Symposium on MusicInformation Retrieval (ISMIR-00), Plymouth,Massachusetts, USA, October 2000.
8
Contents
1 Preface 5
2 General Guidelines 5
3 List of Abbreviations 7
I Analysis 8
4 External Restrains 8
5 Initial Specification of Requirements 105.1 Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105.2 Instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105.3 Harmony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105.4 Detection Speed . . . . . . . . . . . . . . . . . . . . . . . . . . 105.5 Success Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6 Detection of Note Onset 116.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7 Methods for Detection of Monophonic and Polyphonic Sig-nals 137.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.2.1 Off-line . . . . . . . . . . . . . . . . . . . . . . . . . . 137.2.2 On-line . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
8 Pitch Detection 168.1 Approaching the Problem from a Physical Angle . . . . . . . . 168.2 Previous Pitch Detection Studies . . . . . . . . . . . . . . . . 17
9 Wavelets and Assement of Efficiency 189.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1
10 Neural Network 2110.1 Method Overview . . . . . . . . . . . . . . . . . . . . . . . . . 2110.2 Suitability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
11 Data Preprocessing for a Neural Network Proposed by Oth-ers 23
12 Deciding Method for Further Analysis 2412.1 Preliminary Analysis . . . . . . . . . . . . . . . . . . . . . . . 24
12.1.1 Statistical Methods . . . . . . . . . . . . . . . . . . . . 2412.1.2 Methods based on Auditory Models . . . . . . . . . . . 2412.1.3 Neural Networks . . . . . . . . . . . . . . . . . . . . . 2512.1.4 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . 25
12.2 Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
13 General Construction 2713.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2713.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2713.3 Sampled Piano Music . . . . . . . . . . . . . . . . . . . . . . . 2713.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 2713.5 NN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2813.6 Decision/Discrimination . . . . . . . . . . . . . . . . . . . . . 2813.7 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
14 General Thoughts 2914.1 optimizing the networks . . . . . . . . . . . . . . . . . . . . . 2914.2 Optimizing the Wavelet Decomposition . . . . . . . . . . . . . 2914.3 Optimization of the Decision Algorithm . . . . . . . . . . . . . 29
II Design 30
15 Architectural Considerations for the Neural Network 3015.1 Adjustable Network Elements . . . . . . . . . . . . . . . . . . 3015.2 Classes of Networks . . . . . . . . . . . . . . . . . . . . . . . . 3115.3 Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3215.4 Types of Neurons . . . . . . . . . . . . . . . . . . . . . . . . . 33
2
16 Overall Architecture 3516.1 Wavelet Decomposition . . . . . . . . . . . . . . . . . . . . . . 3616.2 Neural Network Structure . . . . . . . . . . . . . . . . . . . . 36
16.2.1 Feature Set versus Pitch . . . . . . . . . . . . . . . . . 3716.2.2 Training and Test . . . . . . . . . . . . . . . . . . . . . 37
III Implementation 39
17 Software design 3917.1 Userguide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4017.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 41
18 Real time considerations 41
19 FANN – Fast Artificial Neural Network library 42
20 Port Audio 50
IV C Source Code 55
21 Makefile 55
22 main.c 55
23 fileio.h 60
24 fileio.c 60
25 waveread.h 61
26 waveread.c 62
27 wavelet.h 64
28 wavelet.c 65
29 ann.h 66
30 ann train.c 68
3
31 ann test.c 69
32 ann run.c 70
V Matlab Source Code 71
33 pianocomp.m 71
34 pianomix.m 71
35 featureextraction.m 73
36 NNgen.m 74
37 resultpresentation.m 75
4
1 Preface
Transcription of musical scores has through time been a tedious manual taskto be taken on only by highly trained musicians. As the computer technologyin the 80’s matured to produce the “personal computer”, sporadic research inautomated music transcription suddenly became more focused; a relativelycosteffective platform was now available.
Today there is still room for both improvements and development of themethods used, as a universal method of transcription has not been discovered.Not only the instrument(s) to be transcribed, but also the style of music hasa profound impact on the efficiency of a given method.
The (grand) piano is by most considered the “reference instrument”, prob-ably due to conventions inherrited from classical music. It is assumed thattranscription of music played on the piano will have the broadest interest topotential customers. Based on this assumption, this documentation will an-alyze potentially efficient methods for transcription of piano generated musicand describe the implementation of one such. This actual implementationwill be called Musician’s Transcription Tool (MTT).
2 General Guidelines
This documentation is to be wieved as a collection of work sheets. The aimwill be to arrange these in a plausible manner, but this will not necessarilyalways be the case. It is suggested to use the table of contents to look uprelevant information regarding a subject.
Quotations and references to other works will be put in the footnote on agiven page. Also, a complete compilation of used litterature will be includedat the end of this document.
5
3 List of Abbreviations
ANN Artificial Neural Networkcommonly called Neural Network (NN)
D4 Daubechies 4-tap waveletDWT Discrete Wavelet TransformDyadic Related by a factor 2 (like octaves)MIDI Musical Instrument Digital InterfaceMTT Musicians Transcription ToolNN Neural Network
sometimes referred to as Artificial Neural Network (ANN)
7
Part I
Analysis
4 External Restrains
Since this is an University Project for 7th semester, there are certain framesprimarily set by the Study Guidelines concerning objectives and documen-tation.1 The purpose of this semester project is design, implementation andanalysis of a solution to a practically occurring problem, which naturallyrequires stochastic signal processing methods and/or transmission of signals.
The project period runs from September 1st to December 19th 2008.Project is documented in three ways:
• A scientific article
• A poster with presentation on SEMCON 08.
• Edited worksheets, which document the details of the project
From the study guidelines further information states the goals for this projectunit1:
“ The project unit takes its starting point in a practical problem,which reflects the students’ chosen specialization, and where sig-nal processing methods and/or transmission of signals is a naturalaspect.
• Through a stepwise refinement process of the given applica-tion, a set of specifications are generated. There is no re-quirement for a real-time implementation (HW and/or SW),thus the specification can related to the behavioural level only.However, a real-time implementation is allowed in the projects,and the specification therefore has to be extended at all rele-vant points, in case such an implementation is included.
1esn.aau.dk/fileadmin/esn/Studieordning/Cand_SO_ed_aalborg_sep08.pdf p.14
8
• Algorithms for the complete functionality (or parts hereof)are designed, and are next being applied for 1) a functionalsimulation, and possibly 2) a real-time implementation.
• In terms of the design phase, an analysis of the algorithmiccomputational and numerical properties is conducted.
• The implementation is next compared to the specification,and a comparison and evaluation is performed.
”
9
5 Initial Specification of Requirements
The following will be a description of the specific demands desired for therequired base functionality of the MTT. This specification has not been usedfor the actual implementation, but serves to document the process.
5.1 Platform
The MTT must be able to run on a PC or laptop with minimum specs:
• 1.8 GHz P4 processor or equivalent
• 1 GB ram
• Soundcard capable of recording and playback in 16 bits @ 44100 KHzsample rate.
• Either windows XP or Linux installed
5.2 Instrument
The MTT are to be able to detect and transcribe notes played on bothupright and grand pianos.
5.3 Harmony
The MTT are to be able to detect and transcribe up to 10 simultaneousnotes.
5.4 Detection Speed
The MTT are to be able to detect and transcribe notes played with a timeresolution of 50 ms.
5.5 Success Rate
The overall success rate is to be able to transcribe at least 80% correctdetection over a broad range of musical styles.
10
6 Detection of Note Onset
6.1 Purpose
It is assumed that to best detect pitch of a note, a proper placement of saidnote in time is needed. The begining of a note is called the note “onset”.This document will propose different methods for detecting that onset.
6.2 Methods
Effecient onset detection methods vary considerably with the instrument inquestion. If an instrument have a large transient at onset (e.g. percussioninstrument, piano and guitar), it is suggested to view the music from a powerperspective2. The suggested algorithm is:
Ej(n) =∑k∈kj
|STFTWx (n, k)|2 (1)
Where:STFT is the short time Fourrier transform of x(n)k is the discrete frequency indexW is the window used to weigh x(n)n is the time, at which the window is centered
To further optimize equation 1, a three point linear regression is pro-posed2. It is of interest to find the gradient of Ej(n) in order to detect thestart of the transient. For the specific three point linear regression, followingequations define this gradient:
Dj(n) =Ej(n+ 1)− Ej(n− 1)
3(2)
Where:Ej(n) is the energy envelope functionDj(n) is the gradient of Ej(n)
Although this method allows power measurement in distinct frequency-bands, it seems rather complex to calculate. It would be a good idea tocompare this method to a simpler method. It would be interesting to de-
2Klapuri & Davy 2006, p. 107 - 109
11
termine if the onset information can be obtained using the broadspecteredsignal. This could be done as in equation 3:
E(j, n) =
n+ j2∑
k=n− j2
x(k)2 (3)
In both cases a decision algorithm is needed to descriminate onset periodsfrom rest.
12
7 Methods for Detection of Monophonic and
Polyphonic Signals
7.1 Purpose
To detect the notes in music, it is necessary to correctly identify and, in thecase of multiple notes, separate the fundamental frequencies that the signalconsists of.
The main source for this worksheet is chapter 7 of Klapuri & Davy3.
7.2 Methods
The methods for detecting the fundamental frequencies (F0) in a polyphonicsignal can roughly be separated in the statistical approach, which this work-sheet will focus on, and an approach based on an auditory model. Theauditory model is based on the way human perception and separation ofconcurrent sounds is done, and will not be examined further – chapter 8 ofKlapuri3 can be used as a reference on this topic.
The statistical methods can basically be separated into the off-line ap-proaches, which are based on analysis of a constant signal, and on-line ap-proaches, which only uses the current sample or frame to estimate the signals.
7.2.1 Off-line
Off-line methods rely on analyzing a signal, that does not change in the cho-sen interval (no new or lost notes). As there must be no transition betweennotes in the processed waveform, this means, that an onset and offset detec-tion must be made beforehand. The signal is then modeled with a parameterestimation. Due to the signals being “complete” (no transitions), these meth-ods are very accurate, but also prove rather computationally heavy.
The Bayesian off-line model is mathematical and probabilistic, and itleads to the simplest model that explains a given waveform. The estimationof multiple fundamental frequencies (F0’s) is complex and possibly computa-tionally heavy, which is probably the reason this method has not been givenmuch attention3 (p. 203-204). Often the estimation is a maximum a posteri-ori (MAP) or minimum mean square error (MMSE) estimation . Apart from
3Anssi Klapuri & Manuel Davy, “Signal Processing Methods for Music Transcription”,2006
13
signal detection, the model may also be used for source separation (detectionof instruments), compression, pitch correction and other useful applications3
(p. 203-204). A proof of the performance is seen in the article “Bayesiananalysis of polyphonic western tonal music”4 which reports a 100% accuracyon one F0 and 71% on four fundamental frequencies.
7.2.2 On-line
The on-line methods uses only the current sample or frame in a sampledsignal for the analysis, and therefore has no requirement for a separate on-set/offset detection.
The Cemgil on-line processing is a MAP estimation, where the frequenciesare divided in a grid and then a “piano roll”5 estimation is performed for eachfrequency in the grid3 (p. 220).
On-line methods based on sliding windows include an approach by Duboisand Davy, where the signal is percieved as a Gaussian random walk for bothfrequencies and amplitude (number of notes can increase, decrease or remainconstant)3 (p. 221-223). Another approach is described by Vincent andPlumbey : Frequencies are divided in a fixed grid like in the Cemgil model, butthe parameter priors are independent of neighbouring frames. The unknownparameters are then MAP estimated and finally, the parameters of differentframes are linked together and reestimated3 (p. 223-225).
There are also on-line methods based on the bayesian model, which mostlyconsists of modeling the signal spectrogram and following harmonic trajec-tories3 (p. 225).Yeh and Robel has proposed a model that is based on generation of “candi-date notes”, that are evaluated with a score function. Further examinationof this, requires a look at the external sources, as the short text in the book3
(p. 225) is rather confusing.Dubois and Davy has introduced a method based on spectrogram modelingwith a zero-mean white Gaussian noise. This method is an extension to theirmodel based on the sliding window.Thornburg et al. has proposed a method for melody extraction, and it istherefore only possible to do monotone recognition.
4M. Davy, S. Godsill & J. Idier, “Bayesian analysis of polyphonic western tonal music”,Journal of the Acoustical Society of America, 2005
5Derived from “self playing” pianos, the piano roll is a representation of whether eachsingle note is present on a time scale.
14
Sterian et al. uses, in their model, a Kalman filter to extract sinusoidalpartials and grouping these into their sources.
15
8 Pitch Detection
The main sources for this worksheet is a web page by Professor Marina Bosifrom Standford University6, and chapter 4 in the book by Anssi Klapuri andManuel Davy7.
According to Klapuri & Davy7 there are four key characters for music,which is important when working with sound signals: pitch, loudness, dura-tion and timbre. The topic of this worksheet is pitch detection.
Pitch is defined as
a perceptual attribute which follows the ordering of sounds on afrequency-related scale extending from low to high. More exactly,pitch is defined as the frequency of a sine wave that is matched tothe target sound by the human listener. Fundamental frequency,(F0) is the corresponding physical term, and is defined for periodicor nearly periodic sounds only.
There are various ways to detect pitches in music. One way is to simulatethe human ear, since this is one of the most complex yet precise detectors.However, the complexity of this model is out of scope for this project, henceit will not be examined.
8.1 Approaching the Problem from a Physical Angle
Time domain detection can be done by observing the signal to detect peri-odicity. One could count the numbers of zero crossings, but though this is aneasy and cheap method, it is also very inaccurate, since small variations ofthe signal around the zero line might induce fatal errors. One more complex,yet also more precise way of time domain detection is autocorrelation.
Autocorrelation is a tool to find patterns in a signal and determine fun-damental frequencies. If the input is periodic, the autocorrelation functionwill be as well. If the signal is harmonic, the autocorrelation function willhave peaks in multiples of the fundamental frequency. This method is suit-able for e.g. speech recognition due to the low frequency range of speech
6http://ccrma-www.stanford.edu/~pdelac/154/m154paper.htm7Anssi Klapuri & Manuel Davy, “Signal Processing Methods for Music Transcription”,
2006
16
signals. This method might be very expensive because it includes a lot ofmultiply-add calculations.
frequency domain detection is Another approach. Here the signal is ex-amined in the frequency domain in order to detect the frequency spectrumof the signal. Here, there are also some different ways to detect pitch.
The signal can be broken down to small segments, which can each beevaluated by multiplying the signal with a window to get a Short TimeForuier Transformation(STFT) of the segment. One of the disadvantages ofthis method is that the signal is broken into equally sized segments whichis disadvantageous, since the spacing between the notes are nonlinear. Thismeans that less information is available in the high frequencies than in thelow.
8.2 Previous Pitch Detection Studies
Various scientists and acoustic engineers have examined the problem of tran-scribing pitches in music. Some of the more interesting results are derivedby (sources for the following was found in: Klapuri and Davy, section 8.4)7:
• Martin, who applied Ellis’s model to process signals consisting of morethan two simultanious sounds.
• Godsmark and Brown, who examined ways of auditory scene analysismodels. They discovered, that by applying these models, they wereable to transcribe 4 simultanious sounds.
• Marolt, which examined ways to transcribe piano music. Since thisis our main topic, his discoveries will be examined further later on.For now, it will be enough to know, that he applied time-delay neuralnetworks to identify each piano key sound, and by doing this carefully,he was able to transcribe with a good precision.
17
9 Wavelets and Assement of Efficiency
9.1 Purpose
The Fourrier transform is the decomposition of a given signal into a series ofsines. Each sine in the decomposition will feature both infinite energy andextremely strong autocorrelation. A consequence of the Fourrier transformis the lack of time/frequency information, meaning that greater resolution infrequency requires more samples, thereby making it imposible to determineat what instant a given component is added. In the attempt to decide whichnotes are played at a given time, the Fourier transform may not be suitable.
Another approach to signal decomposition was suggested around 1910 byHaar8. He concluded that if a signal was to be decomposed without sufferingfrom the lack of time/frequency information, the waveform to be the keyelement needed three main features: Finite energy, a weak autocorrelationand scaleability. A wavelet is one such waveform. The following will be ananalysis of wavelet ability and effeciency/complexity. An illustration of thetime/frequency resolution of the FFT algorithm and wavelets can be seen onfigure 1.
Time Time
Lowfrequenciesare better
resolved infrequency
Highfrequenciesare better
resolved intime
Freq
uenc
y
Freq
uenc
y
Figure 1: Comparison of time-frequency resolution for wavelets and fft.
8A wavelet tour of signal processing(1999) p. 7, Stefane Mallat9http://en.wikipedia.org/wiki/Image:Wavelet_-_Morlet.png, july 15 2005, all
copyrights declined
18
Figure 2: A Morlet wavelet9
9.2 Analysis
A wavelet is scalable and can be placed arbitrarily in time. Therefore the“generic” wavelet is dubbed the mother wavelet ψ. All wavelets in a givendecomposition stems from this wavelet and are called child wavelets. Theseare written as
ψa,b(t) =1√aψ(t− ba
) (4)
Where:a is the scaling factor, governs the frequency represented
by the waveletb is the placement in time
Different mother wavelets have been proposed, and some types of waveletsare often more suited than others for a given application. A “mother wavelet”is then the generic wavelet of a given type, f.ex. Haar, Daubechies, the“mexican hat” or Morlet, which is seen on figure 2.
A discrete wavelet transform (DWT) is the decomposition of the (discrete)signal into various child wavelets. The shorter wavelets will be able to easilyrepresent very fast signal transitions, while the longer wavelets representslower frequencies. A very nice feature, in relation to audio processing, is thedyadic nature of the decomposition. This means that analysis in octaves caneasily be accomodated.
When seing the actual implementation, this becomes apparent. The typeof mother wavelet to be used, determines the filter coefficients.
10http://en.wikipedia.org/wiki/Image:Wavelets_-_DWT_Freq.png, july 15 2005
19
Figure 3: Wavelet decomposition is dyadic10
Figure 4: Continous downsampling by a factor 211
As far as efficiency goes, the computational complexity is O(n), i.e. alinear rise. This means that Discrete Time Wavelet Transform (DTWT) iseven more efficient than the FFT12.
9.3 Conclusion
The DWT could be used to determine diverse features of a music signal. Adiscussion and choice of mother wavelet is needed. Wether or not a specificdecomposition is suitable as input to a neural network is to be determined.
11en.wikipedia.org/wiki/Image:Wavelets_-_Filter_Bank.png, july 15 200512Introduction to wavelets and wavelet transforms(1998) p. 40, C. Sidney Burrus et al.
20
10 Neural Network
This worksheet is about the Neural Network method. The concept will bedescribed, and the applicability for polyphonic music transcription will beconsidered. The purpose of this worksheet is to get an overview of the NNmethod, in order to determine whether or not it is a suitable tool for musictranscription in this project.
10.1 Method Overview
An NN is a system, which can be trained to recognize or identify nonlineari-ties when processing a signal. The method is suitable for systems, where theuser has some preliminary knowledge relevant for the classification. Beforethe network block a feature extraction of the input signal must be made. Itcan be done in various ways, e.g. wavelets or ear models13. The NN methodis inspired by the biological nervous system. It consists of weighted neuronsignals and a comparison algorithm. Neurons are models for the way thebiological nervous system perceive what they are exposed to. The weightingalgorithm is adjusted by training the system, using data and known outputvalues. A comparison is done between the output of the weighted neuronsignals, and these are compared to the known output. In each iteration theweighting function is adjusted. The result of these iterations will be thetrained system. By training the system to recognize the nervous signals tothe extend possible, the system should be able to process any related inputby its achieved algorithms. Figure 5 shows a block diagram of these relations.
10.2 Suitability
There are both advantages and disadvantages by applying the NN methodas a tool in this project. Earlier studies of various scientists14 have shownthat it is possible to accomplish useable results of music transcription byapplying a feature extraction and the NN method. However, there are no
13Article: ”Automatic music transcription and audio source separation”, M. D. Plumb-ley et. al., 2002
14e.g. Matija Marolt(phd from University of Ljubljana), A. Klapuri(Tampere Universityof Technology, Finland
21
Figure 5: Block diagram showing the training of a neural network; Source: Matlabdocumentation on Neural Networks
lectures regarding this method on this semester, and the complexity of NNis quite high.
It seems the method is suitable for music transcription, and when ex-amining the feature extraction analysis block, considering e.g. wavelets, itmight be possible to develop a more suitable transcription system than thealready achieved results by other scientists.
22
11 Data Preprocessing for a Neural Network
Proposed by Others
Through studies of litterature regarding the field of music transcription itseems the ammount of research is somewhat scattered on genre -, instrument- or note recognition.
One of the sources of information for this project has been Matija Maroltfrom University of Ljubljana, Slovenia15. He has, the last decade publishedarticles on music transcription using neural networks. The aim of the tran-scription has varied a bit, but the emphasis has been on Piano transcrip-tion. He has, together with a colleague, Marko Privosnik, from University ofLjubljana, worked on a piano transcription system called SONIC16. In theirpublication, they describe how they extract partial(meaning the data, theyfeed into the neural network for training) by feeding the piano signal throughfollowing steps16:
1. A Gammatone filterbank, which split the signal into several frequencychannels.
2. Meddis hair cell model, which converts each gammatone filter outputinto a probalistic representation of firing activity in the auditory nerve.
3. Network of up to ten adaptive oscillators, which has phase, frequencyand output as adjustable variables, and extracts partials for the noterecognition.
Their system was tested with different piano pieces in different recordings,and it were able to detect up to 95% of the notes, with 13% extra notesdetected by fault. More test results can be viewed in their article16. Thepreprocessing seems quite complex, and induces thoughts on whether it couldbe done in a simpler way.
15Source:http://www.fri.uni-lj.si/en/personnel/271/oseba.html16Source: M. Marolt, M. Privosnik, SONIC : a system for transcription of piano music, in
Kluev, V., D’Attelis, C. E., Mastorakis, N. E. (eds.), Advances in automation, multimediaand video systems and modern computer science, WSES Press, cop. 2001. (http://lgm.fri.uni-lj.si/matic/clanki/malta2001.pdf)
23
12 Deciding Method for Further Analysis
After a preliminary analysis of different methods for piano transcription,a decision has to be made on which methods are to analyze further andultimately implement.
This document attempts to summarize the results of the initial analysis,in order to form a basis for the decision.
The requirements to the system, stated that a real time implementationis wanted, so methods with high computational complexity of the runningsystem is unwanted. The system is also required to give a representation forevery combination of played tones, and hence whether each individual tonehas been played at a given time.
12.1 Preliminary Analysis
The initial analysis has been focused on different ways to attack the problem,ranging from solutions like a purely statistical approach and auditory basedmodels to methods based on wavelets and neural networks.
12.1.1 Statistical Methods
In the analysis of statistical methods to estimate fundamental frequencies,a wide range of different methods is explained in the book by Klapuri &Davy17.
From the analysis it is concluded that a wide range of different methodsare available, and many of them also quite usable, but for most of them,a heavy computational load is to be expected, and therefore a real timeimplementation may not easily be achieved.
12.1.2 Methods based on Auditory Models
The auditory models are models based on how the human ear works and howthe human perception of music is. Chapter 8 of Klapuri & Davy17 gives agood introduction to a range of these.
The concrete methods have by now only been examined superficially,but includes for instance separation of the tone bands using a filter bank or
17Anssi Klapuri & Manuel Davy, “Signal Processing Methods for Music Transcription”,2006
24
channel and peak selection as well as pitch-perception models. A methodused by Matija Marolt was to identify tones using adaptive oscillators18 forpreprocessing of the signal to use as input to an neural network.
12.1.3 Neural Networks
An neural network takes as input a feature set, which will have to be gener-ated from the input signal, and a result is calculated via a weighting vector.
Neural networks have a huge advantage when it comes to the computa-tional complexity of running the trained networks, but due to their nature,a large amount of training data is needed in order to make the results suffi-ciently reliable, and the training requires a lot of computation.
Earlier studies by Matija Marolt have shown significant results on us-ing neural networks19 with preprocessing of the data by groups of adaptiveoscillators18.
12.1.4 Wavelets
The wavelets provides a way to transform a given signal into frequency com-ponents, like the Fourier transform, and provides the opportunity to studyeach frequency component with a resolution that matches the scale. This fea-ture of the wavelet along with the fact that it, unlike the statistical methods,computationally is not very complex, makes it a good candidate for featureextraction from a recorded signal.
12.2 Decision
Based on the key points of the above, it has been decided that the furtheranalysis will focus on the use of neural networks for the transcription. To useNN, a preprocessing of the data is necessary to minimize the computationalload. This preprocessing is a feature extraction, and further analysis ofwavelets will be performed in order to decide whether these can be usedfor the preprocessing of data for the neural networks.
18Matija Marolt, “Networks of Adaptive Oscillators for Partial Tracking and Transcrip-tion of Music Recordings”, Journal of New Music Research, 2004, vol. 33, no. 1, pp.49-59, 2004.
19Matija Marolt, “A connectionist approach to automatic transcription of polyphonicpiano music”, IEEE Transactions on Multimedia, 2004, Vol. 6, no. 3, pp. 439-449, 2004.
25
13 General Construction
13.1 Purpose
To clarify what building blocks are needed to realize the software of thetransciption tool.
13.2 Analysis
A generic construction, based on a neural network (NN), is viewed on 6.
Figure 6: Blockdiagram of a system based on neural networks.
13.3 Sampled Piano Music
This is a sampled piece of piano music of arbitrary length. It is assumedthat bitresolution is 16 and samplerate is 44100 Hz, so as to comply with thewave-format featured on industrial manufactured CD’s.
13.4 Feature Extraction
The music signal has to be transformed to another representation, one thatsomehow makes it easier to differentiate the different piano notes. The ob-
27
vious repesentation would be frequency components, eg. via Fast FourierTransform (FFT) or Discrete Wavelet Transform (DWT). The optimal fea-ture set would be one that was easily recogniceable/unique for a given noteand also one that has minimal variation, from piano to piano.
13.5 NN
The NN could handle all note detection simultaneously or be split up in88 different networks, each handling a specific note. It is to be determinedwhether a given type of neuron is the most optimal, so the number of neuronscan be minimized, without compromising detection effectivity.
13.6 Decision/Discrimination
As some piano notes have a somewhat strong correlation, especially the oc-taves of a given note, it is very likely that a “false hit” will will be registeredfrom time to time. A discrimination algorithm should be able to remove somerrors. Some errors can be detected and rectified by simple rules. If notesfrom C120, C3, E3, G3 and G5 was detected, G5 would most likely be a falsereading, as the first four notes would require two hands to play. The keywould be to find the best balance between false hits and no dection of notesactually played.
13.7 Presentation
Some kind of representation is needed. The notes could simply be writtendirectly to a file, a score or be presented on the PC monitor. It is not to bea focal point, but should be effective as a diagnostics tool during the designphase.
20The representation of notes is called the scientific pitch notation. A4 is the note withfundamental frequency 440 Hz.
28
14 General Thoughts
The purpose of this worksheet is to brainstorm on the different possibilitiesfor implementation and further development.
14.1 optimizing the networks
The most optimal method for making an optimal neural network would be tomake an input neuron for each point of data in the decomposition and trainthe network. It would take a massive RAM storage, to accomodate this, andlots of time. To optimize execution speed of the networks in run-time, itcould be determined which of the input neurons where asociated with theweights holding the biggest absolute value. An example would be to keepthe 100 most sensitive inputs and then retrain.
14.2 Optimizing the Wavelet Decomposition
At the same time as network optimization is carried out, it would be inter-esting to try different wavelet decompositions, to determine if some typeswhere more effective than others. It would also be very interesting too seeif some wavelets were more appropriate at a given interval. Perhaps it is agood idea to detect higher pitched notes using a shorter decomposition?
14.3 Optimization of the Decision Algorithm
If it was possible to make some kind of “music style detection”, the knowledgecould make base for a probabilistic decision. Jazz would most likely featuresome signature chord modulation that is not seen in classical music and viceversa.
29
Part II
Design
15 Architectural Considerations for the Neu-
ral Network
This worksheet contains an overview of possible methods for structuring thenetwork of neurons and types neurons. The main sources for this worksheetare ”Neural Networks, a Comprehensive Foundation” by Simon Haykin, chap-ter 1 21 and the article:”An Introduction to Computing with Neural Nets” byRichard P. Lippmann22.
15.1 Adjustable Network Elements
A network is a construction of neurons and links between them, like shownin figure 9, page 32. It can be constructed in various ways. Adjustment ofthe structure of neurons and links in the network can be done to achieve thebest results for the network.
The network consists of one input layer with a number of neurons, oneoutput layer with a number of neurons and possibly a number of hiddenlayers not necessarily including the same number of neurons per layer. Thereare three parameters to adjust:
• Number of input neurons(source nodes)
• Number of hidden layers and neurons(computation nodes) in these
• Number of output neurons(computation nodes)
These numbers of hidden layers and neurons in all layers, as well as the typeof neurons must be decided in order to design the network, but before this isdone, some considerations regarding the structure must be made.
21Source info: ”Neural Networks, a Comprehensive Foundation”, second edition, 1999,Simon Haykin, Prentice Hall, isbn: 0-13-273350-1, chapter 1
22Source info: An Introduction to Computing with Neural Nets, IEEE ASSP magazine,April 1987, Richard P. Lippmann
30
15.2 Classes of Networks
The most simple network is a single layer network, with an input layer andan output layer. According to which kind of data the network should handle,different kinds of network types are available. Figure 7 shows a tree diagramof some different types of networks23. For more information on the specificclasses and their algorithms, Richard P. Lippmann has a more profounddescription in his article on neural nets22.
First thing to determine is whether the input signal is binary or conti-nous. This clarifies which kind of algorithms are most suitable for solvingthe problem.
Second thing to determine is whether or not there are data to train thesystem. If data are avaliable, it is possible to apply supervised learning. Ifthere is no training data, the system must be trained unsupervised. Thisis done by initializing the system with a very simple structure, and thengradually optimizing it by feeding the output data into the system to adjustthe structure.
Figure 7: A taxonomy of six neural nets that can be used as classifiers. Classicalalgorithms which are most similar to the neural net models are listed along thebottom. The figure and caption text is from Richard P. Lippmanns article ”AnIntroduction to Computing with Neural Nets”, p. 6, figure 3 23
23Richard P. Lippmann,An Introduction to Computing with Neural Nets, IEEE ASSPmagazine, April 1987
31
15.3 Layers
Figure 8: Neural network structure for a fully connected feed forward single-layernetwork, consisting of an input layer and an output layer24.
Figure 9: Neural network structure for a fully connected feed forward multi layernetwork, consisting of an input layer, one hidden layer and an output layer25.
There are two significant categories of layered networks:
• Single-layer networks, as shown in figure 8
• Multi layer networks, as shown in figure 9
Both net structures can be used for binary as well as continous inputs22.In multi-layer networks, the hidden layers are included to enable the
posibility to extract higher-order statistics21.The multilayer network, withhidden layers, are beneficial when the size of the input layer is large21.
24Source:http://commons.wikimedia.org/wiki/Image:SingleLayerNeuralNetwork_english.png
25Source: http://commons.wikimedia.org/wiki/Image:MultiLayerNeuralNetwork_english.png
32
On both figure 8 and 9 the networks are structured as feed forward net-works. It is possible to use feedback in networks, but this will not be ex-amined further in this project. For more information, see the studies of e.g.Matija Marolt26.
15.4 Types of Neurons
According to Haykin, a neuron is defined as: An information-procesing unitthat is fundamental to the operation of a neural network 21.
Figure 10 shows a nonlinear model of a neuron. On the figure three
Figure 10: Nonlinear model of a neuron. Source:Haykin, p. 11, fig. 1.5
elements are shown21:
• A set of connecting links
• An adder, including a possible bias
• An activation function
Together these three elements form the neuron. The connecting links eachhas their own weighting to their respective input signals. After the input
26”Connectionist Approach to Automaic transcription of Polyphonic Piano Music”,IEEE Transaction on multimedia, vol. 6, no. 3, June 2004, Matija Marolt
33
signals are weighted, they are added, and perhaps biased21:
uk =m∑
j=1
wjkxj (5)
vk = uk + bk (6)
(7)
The bias will be described further below. The next element is the activationfunction:
yk = ϕ(vk)
= ϕ(uk + bk) (8)
In the activation function the output signal is normalized. According toHaykin21, the typically normalizing range is [0,1] or [-1,1].
There are three main categories of activation functions:
• Threshold function, also known as Heaviside function
• Piece-wise linear function
• Sigmoid function
The three function types is shown in figure 11.
34
Figure 11: (a) Threshold function, (b) Piecewise-linear function, (c) Sigmoid func-tion for varying slope parameter a. Source:Haykin, p. 13, fig. 1.8
16 Overall Architecture
It has been decided that the core functionality, namely the ability to detectcertain piano notes, is to be implemented as a NN. The network basicallyconsists of 88 smaller, parallel NN’s, each governing the detection of a singlepiano note. Feature extraction for the NN’s is to be done via discee waveletdecomposition. Prior to implementation, it is to be decided which givenwavelet expresses the most aggressive response, in terms of energy, from a
35
given note. This is done to reduce the data to the NN’s. As these two buildingblocks are considered essential to the project, these are the only focal pointsfrom now on. Figure 12 is a graphical representation of the architecture.
Figure 12: The implemented principle of notedetection. The wavefile is decom-posed into 12 wavelets (the last 1-bit output is a residual) of dyadically declininglengths. The NN that detects a given pitch is fed with the decomposition thatproduces the most power if the note is played. A detection threshhold value is setto determine hit/no hit.
16.1 Wavelet Decomposition
Not written yet...
16.2 Neural Network Structure
Structuring a neural network is not an exact science, hence choices must bemade by qualified guessing.
36
Since the project group has not previously worked with neural networks,a meeting with Uwe Hartmann was arranged27. Below is listed some of theadvises he gave on choice of neuron type:
• Go for the sigmoid function. It is simple and commonly used.
• Choose the soft curve; the specific function is less important.
• Range between -1 and +1, not 0 and 1
Based on literature studies and the meeting with Uwe Hartmann, follow-ing choices have been made:
Her skal sta noget om valg af neurontype, antal lag samt antal neuronerpr. lag... - Nar det er blevet skrevet.
16.2.1 Feature Set versus Pitch
Er ikke sikker pa at vi skal ha’ afsnittet med, men eet eller andet sted børnævnes, hvordan vi har besluttet at benytte de wavelettaps vi bruger til detenkelte NN.
16.2.2 Training and Test
As training (and test) set for a given NN, a wavefile and detection vectorneed to be constructed. The wavefile should be a sequence of different mono-and polyphonic notes. The detection vector is simply the input for the backpropagation in the NN. The training notes are acousticly recorded pianonotes, played on an arbitrarily chosen piano. The test notes are a secondrecording of the same piano. As the Musician Transcription Tool needs tobe able to handle 10 different simultaneous notes, it is suggested to make ana composition algorithm like this:
1. Decide whether target note is to be included in composition or not(P=0.5)
2. Decide randomly the number of notes to be composed (between 0 and10)
27Uwe Hartmann is a Senior Professor at Aalborg University, and has conducted researchand held courses in Neural Networks.
37
3. If target note is included and more than 0 notes are to be composed,write 1 to the detection vector
4. Decide randomly what other notes to be played
5. Add all the needed notes and normalize
6. If a longer sequence is needed, continue from step 1.
38
Part III
Implementation
17 Software design
This worksheet aims to describe the structure of the transcription program.The training, test and run functionality will be implemented in the sameprogram, hereby reducing the amount of written code, as most of the actualactions are identical for all three cases. A simplified diagram of the programcan be seen on figure 13. As a design tool, a userguide has been written.
Figure 13: A simplified chart of the principle of the transcription program
39
17.1 Userguide
The program, which is called transcribe, can be run in three ways:
train For training the system
test For testing the system
run To get transcription results from the program
The runtype and files to use, are chosen by using input arguments to theprogram. There are no limits to the number of arguments, however only oneruntype is allowed.
To use the program for training a network, the first argument must betrain and the remaining argument must be in groups of three:
1. The wavefile to be analysed
2. The correct results for training (a file containing ones and zeros fittingthe wavefile – each one or zero must be the result for a block of 4096samples28)
3. The wanted name of the data (combination of wavelet data and correctresults for training) and net-file (where the neural network is saved)
Example: transcribe train A4.wav A4 hi Net/NN A4 C8.wav C8 hit Net/NN C8
This will train the networks for the .wav-files by comparing the calculatedwavelets with the defined results, and the resulting networks will be saved inthe folder Net with the names NN A4.net and NN C8.net
The syntax for testing the network is identical, but the first argumentmust then be test and the argument groups now mean:
1. The wavefile to be analysed
2. The correct results for testing (a file in the same format as for training)
3. The name net-file (where the neural net was is saved)
284096 samples for decomposition is seleced from a wish to have a sufficiently low timeresolution, sufficiently high frequency resolution and, due to the wavelet transform method,it must be a power of 2.
40
Example: transcribe test A4tst.wav A4tst hit Net/NN A4 C8tst.wav C8tst hit Net/NN C8
This will test the two networks, that were trained before, and return themean square error of each.
Finally, to run the trascription, use the argument run followed by all the.wav-files to analyse. The results will be written in textfiles with a fixedname for each neural network.
Example: transcribe run Musik.wav Muzak.wav
This will output two files for each of the 88 neural networks containing thefloat output values of the networks, of the format “[filename] [netname].test”,eg. Musik.wav A4.test will contain the results from the network trained forrecognizing A4, after processing the file Musik.wav.
17.2 Implementation
The software has been implemented in C, and the decision of combiningthe functionality into one program has prevented a lot of double work. Inparticular the main part of training and test only differ by a few lines ofcode.
The overall implementation is working, however the output of the neu-ral networks seems to be wrong. Reading a wavefile and decomposing it intowavelets is working and has been succesfully compared with matlab results.The neural networks can be trained and tested with a tesfile, but when in-putting data for classifying, the output values are not very well spread overthe range of -1 to 1 as they ought to be. In fact a very great deal of differentinputs can generate the exact same output, and from a wavefile with 7000decompositions, only 123 different output values were detected.
Since the used neural networks are implemented via an external library,which we have not developed, a quick solution was to skip further developmentof the program and use corresponding algorithms in matlab. In a realtimeimplementation, as was originally wanted, the C implementation can be usedas a good basis.
18 Real time considerations
An initially desired feature of the system was the possibility to transcribethe music in real time on “regular hardware”. Although interaction with the
41
sound card has not been implemented, the analysis of wave files indicate thatit is indeed possible to run the transcription in real time.
The wav files used for training, was mono, 44,1 kHz, 16 bit with a lengthof 10 minutes and 50 seconds. The program in this case only runs a singlenetwork, but running the trained networks takes almost no time comparedto the wavelet transform and file writing, and thus this is not assumed to bea significant contribution. On a 3 year old laptop with 1.5 GHz single coreprocessor and 512 MB DDR Ram, with Ubuntu linux, the performance wasmeasured by processing a single note while using the programs top (displayingresource usage) and time (measuring “active” time for a brocess). The resultswere:
• CPU usage: about 90% (88.5-92.9)
• Memory usage: max 114 MB (the entire wave, which is 55 MB is loadedand parts of it buffered)
• Time usage: 59 seconds (real time) – 54 seconds of actual processing
Since the processing time is less than a 10th of the playing time on thishardware, a real time implementation should be achievable, even withoutoptimization of algorithms.
19 FANN – Fast Artificial Neural Network
library
This is the documentation for the FANN library, that was made as a quick-guide for relevant part to use it in the project. Because the output hadstrange results (a lot of similar numbers as output - only a handful of differ-ent outputs in the range -1 to 1 with several thousand different inputs), theC-code based on this was not finished completely.The library can be found on www.leenissen.dk/fann/.On www.leenissen.dk/fann/html/, a reference manualexists for all func-tions.The files mentioned below can be found as a zip archive on http://kom.
aau.dk/group/08gr742/fann.zip
42
Indhold:
--------
1 Indhold i mappen
1.1 How to read
2 Kompilering af FANN
2.1 Linux
2.2 Windows
3 Brug af FANN
3.1 Træning af ANN
3.2 Brug af ANN
3.3 Test af ANN
4 Kompilering af kode
***************************************************************
* 1 * Indhold i mappen *
***************************************************************
fann-2.0.0.zip C and Python Library Source Code
for all platforms
Support for:
Gnu Makefile, Visual Studio 6/.Net,
Borloand C++ Builder and other
standard compilers.
fann_doc_complete_1.0.pdf Complete documentation of V1.0
Test/ Mappe til test-kode, der viser
hvordan FANN bruges
Test/*.data datafiler til input i FANN
Test/make_testdata.m genereing af datafiler til eksemplet
(ud fra pattern recognition filer)
Test/Makefile Makefile, der muliggør bygning og
linking automatisk
43
Test/test_train.c Program der træner ANN ud fra
train[1-4].data-filerne
Test/test_test.c Program der tester ANN ud fra
test[1-4].data-filerne
Test/test_run.c Program der klassificerer ud fra
input - IKKE LAVET FÆRDIG
Test/*.net Trænede netværk
1.1 How to read
- Nar der i denne fil bruges betegnelsen $FANNDIR, sa er det
den mappe fann-2.0.0.zip er udpakket i.
- kodeeksempler er indrykket med 1 tabulator
***************************************************************
* 2 * Kompilering af FANN *
***************************************************************
Udpak fann-2.0.0.zip og kompiler til den platform hvorpa den
skal bruges
2.1 Linux
Ga ind i mappen og udfør flg.:
./configure
make
sudo make install
sudo ldconfig
(ldconfig er for at sikre at biblioteket ogsa findes nar
programmet køres)
2.2 Windows
i mapperne "MicrosoftVisualC++6.0" og "MicrosoftVisualC++.Net"
ligger projektfiler til de respektive Visual Basic
44
***************************************************************
* 3 * Brug af FANN *
***************************************************************
FANN inkluderer funktioner, der direkte benyttes til at træne,
gemme og bruge et netværk. Nar man er færdig med at bruge et
ANN, skal det nedlægges igen for ikke at optage al hukommelsen.
3.1 Træning af ANN
Før træningen kan foretages, skal træningsdata foreligge i
korrekt form i en fil - se 3.1.1
Træningen foregar herefter i 3 trin (se 3.1.2):
- Definer parametre
- træn netværk pa filen
- gem og nedlæg netværket
3.1.1 Træningsdata
Træningsdata skal gemmes i en fil i følgende format
[antal træningsmøstre] [input pr mønster] [outputs pr mønster]
[input 1,1] [input 1,2] [...]
[output 1,1] [output 1,2] [...]
[input 2,1] [input 2,2] [...]
[output 2,1] [output 2,2] [...]
[...]
f.eks. ser træningsdata for træning af en xor-funktion saledes
ud (4 sæt, 2 input, 1 output)
4 2 1
45
-1 -1
-1
-1 1
1
1 -1
1
1 1
-1
3.1.2 Eksempel pa brug:
Et ANN defineres som en struct - tallene her er fra et eksempel
const unsigned int num_input = 2;
const unsigned int num_output = 1;
const unsigned int num_layers = 3;
const unsigned int num_neurons_hidden = 3;
const float desired_error = (const float) 0.001;
const unsigned int max_epochs = 500000;
const unsigned int epochs_between_reports = 1000;
struct fann *ann = fann_create_standard(num_layers,
num_input, num_neurons_hidden, num_output);
Herefter defineres overføringsfunktionerne for neuronerne:
fann_set_activation_function_hidden(ann,
FANN_SIGMOID_SYMMETRIC);
fann_set_activation_function_output(ann,
FANN_SIGMOID_SYMMETRIC);
Andre muligheder for overføringsfunktioner er: FANN_LINEAR,
FANN_LINEAR_PIECE, FANN_LINEAR_PIECE_SYMMETRIC, FANN_SIGMOID,
FANN_SIGMOID_SYMMETRIC, FANN_SIGMOID_SYMMETRIC_STEPWISE,
FANN_SIGMOID_STEPWISE, FANN_THRESHOLD,
FANN_THRESHOLD_SYMMETRIC, FANN_GAUSSIAN,
FANN_GAUSSIAN_SYMMETRIC, FANN_ELLIOT, FANN_ELLIOT_SYMMETRIC
46
(se beskrivelser i $FANNDIR/src/include/fann_data.h)
fann_set_activation_function_hidden sætter
overføringsfunktionen for samtlige skjulte neuroner.
I stedet kan sættes en overføringsfunktion for en enkelt neuron
med funktionen
fann_set_activation_function(ann, FUNKTION, layer, neuron)
eller for et helt lag med
fann_set_activation_function_layer(ann, FUNKTION, layer)
Der loades en fil og trænes pa denne:
fann_train_on_file(ann, "xor.data", max_epochs,
epochs_between_reports, desired_error);
Som standard benyttes træningsalgoritmen FANN_TRAIN_RPROP, men
denne kan ændres ved at kalde
fann_set_training_algorithm(ann, ALGORITME)
hvor ALGORITME er en af følgende: FANN_TRAIN_INCREMENTAL,
FANN_TRAIN_BATCH, FANN_TRAIN_RPROP, FANN_TRAIN_QUICKPROP
(se beskrivelser i $FANNDIR/src/include/fann_data.h)
Netværket gemmes:
fann_save(ann, "xor_float.net");
Netværket nedlægges for at frigøre hukommelsen:
fann_destroy(ann);
3.2 Brug af ANN
Nar et ANN skal bruges, skal det først enten trænes som ovenfor
eller loades fra en fil, der er trænet pa denne vis.
Derefter, defineres input og output udregnes.
47
Variable oprettes til in- og outputs:
fann_type *calc_out;
fann_type input[2];
fann_type er typen af vægte, og er enten float, double eller
int alt efter om fann.h/floatfann.h, doublefann.h eller
fixedfann.h - i vores tilfælde bliver det nok float...
Opret netværket fra den gemte fil:
struct fann *ann = fann_create_from_file("xor_float.net");
Definer inputs:
input[0] = -1;
input[1] = 1;
Udregn og print output:
calc_out = fann_run(ann, input);
printf("xor test (%f,%f) -> %f\n", input[0], input[1],
calc_out[0]);
Netværket nedlægges for at frigøre hukommelsen:
fann_destroy(ann);
3.3 Test af ANN
ANN kan testes ved at give et enkelt input med funktionen
fann_test(ann, input, desired_output);
Der kan ogsa testes pa et helt datasæt med funktionen
48
fann_test_data(ann, data);
Begge opdaterer MSE for netværket, der kan aflæses med
fann_get_MSE(ann)
Der findes ikke en funtion til at teste fra en fil, men
inspireret ud fra hvordan fann_train_on_file er realiseret
(i $FANNDIR/src/fann_train_data.c), bør det kunne lade sig
gøre saledes:
struct fann_train_data *data =
fann_read_train_from_file("filename.data");
fann_test_data(ann, data);
***************************************************************
* 4 * Kompilering af kode *
***************************************************************
Nar en kode skal kompileres, er det væsentligt at de rette
libraries loades.
Der er oprettet en Makefile til formalet, der fungerer pa
linux:
make kompilerer alle filer
make all kompilerer alle filer
make clean rydder op
make filnavn kompiler filnavn.c
make runtest kompilerer og kører test_train og test_test
49
20 Port Audio
Port audio is a cross-platform audio API, and was examined as an option forrealtime implementation of the system. The time did not allow a real timeimplementation, but an unfinished worksheet of how to use the API is seenbelow.Port Audio can be found on www.portaudio.com/
The files mentioned below can be found as a zip archive on http://kom.
aau.dk/group/08gr742/port_audio.zip
Indhold:
--------
1 Indhold i mappen
2 Kompilering af PortAudio
2.1 Linux
2.2 Windows
3 Kompilering af eksempler
3.1 Linux
3.2 Windows
4 Opbygning af programmer
4.1 At skrive en callback-funktion
4.2 At initialisere PortAudio
4.3 At abne en stream
4.4 At starte, stoppe og afbryde en stream
4.5 At lukke en stream og terminere PortAudio
4.6 Diverse funktioner
4.7 At søge efter enheder
4.8 Blocking I/O funktioner (alternativ til callback)
***************************************************************
* 1 * Indhold i mappen *
***************************************************************
50
pa_stable_v19_20071207.tar.gz selve PortAudio distributionen
skal kompileres til det system der bruges (Windows/Linux)
Eksempler mappe med eksempler og guides
til at kompilere dem
***************************************************************
* 2 * Kompilering af PortAudio *
***************************************************************
2.1 Linux
Guide:
www.portaudio.com/trac/wiki/TutorialDir/Compile/Linux
1) Udpak pa_stable_v19_20071207.tar.gz
2) Aben mappen i en terminal
3) Skriv ./configure
4) Skriv make
hvis alt gar vel, vil det nu være bygget
2.2 Windows
Guide (Visual studio):
www.portaudio.com/trac/wiki/TutorialDir/Compile/Windows
Guide (Gratis tools) :
www.portaudio.com/trac/wiki/TutorialDir/Compile/WindowsMinGW
1) Udpak pa_stable_v19_20071207.tar.gz (hvis ikke du kan abne
51
tar-filer, sa hent f.eks. http://peazip.sourceforge.net/)
2) Følg guide
hopefully all goes well...
***************************************************************
* 3 * Kompilering af eksempler *
***************************************************************
3.1 Linux
Generering af savtand (eksempel fra distributionen - lyd ud):
* Hovedfil: patest_saw.c
* portaudio.h og libportaudio.a skal ligge i samme mappe
*
* Kommando:
* gcc -lasound -ljack -lpthread -o patest_saw.bin
patest_saw.c libportaudio.a
*
* Forklaring:
* -lasound : link med asond library (ALSA lyd)
* -ljack : link med jack lib (JACK sound)
* -lpthread : link med pthread lib (Posix threads - tradet
programmering ikke strengt nødvendig for denne fil...)
* -o <filnavn> : eksekverbar fil gemmes som <filnavn>
* patest_saw.c : denne fil
* libportaudio.a : bibliotek, der gør at den eksekverbare fil
kan benyttes uden at portaudio skal være installeret
Optag en lyd og afspil den
(fra distributionen: lyd ind --> gem i fil --> lyd ud)
* Hovedfil: patest_read_record.c
* portaudio.h og libportaudio.a skal ligge i samme mappe
*
* Kommando:
* gcc -lasound -ljack -lpthread -o patest_saw.bin
52
patest_saw.c libportaudio.a
*
* Forklaring - se ovenfor
3.2 Windows (fra guide - ikke testet)
in any project, in which you require portaudio, you can just
link with portaudio_x86.lib, (or _x64) and of course include
the relevant headers (portaudio.h, and/or pa_asio.h,
pa_x86_plain_converters.h) Your new exe should now use
portaudio_xXX.dll.
3.2.1 MinGV
3.2.2 Visual Studio o.l.
***************************************************************
* 4 * Opbygning af programmer *
***************************************************************
Der er taget udgangspunkt i den officielle tutorial pa
http://www.portaudio.com/trac/wiki/TutorialDir/TutorialStart
Det kan ogsa anbefales at læse portaudio.h, der indeholder
informationer om alle funktioner.
Generel opbygning af et PortAudio program:
* Skriv en "callback"-funktion som PortAudio kalder, nar der
skal processeres lyd - denne ma ikke være for
beregningstung!
* Initialiser PA library og aben en stream til audio I/O.
* Start streamen. Callback-funktionen bliver nu kaldt
gentagne gange af PortAudio i baggrunden.
53
* I callback’et kan der læses lyddata fra inputBuffer
og/eller skrives data til outputBuffer.
* Stop streamen ved at returnere 1 fra callback’et, eller ved
at kalde en stop-funktion.
* Luk streamen og terminer PA library.
4.1 At skrive en callback-funktion
4.2 At initialisere PortAudio
4.3 At abne en stream
4.4 At starte, stoppe og afbryde en stream
4.5 At lukke en stream og terminere PortAudio
4.6 Diverse funktioner
4.7 At søge efter enheder
4.8 Blocking I/O funktioner (alternativ til callback)
54
Part IV
C Source CodeThis is the collection of source code for the off-line transcription system.Except for the actual execution of the neural networks, which gives “funny”results, everything is working as expected.
21 Makefile
The Makefile is used for building and linking the program with GNU make.
1 # The make f i l e r e qu i r e s that the fann l i b r a r y i s i n s t a l l e d23 GCC=gcc456 SOURCES. c= main . c wavelet . c waveread . c ann t ra in . c ann te s t . c ann run . c
f i l e i o . c7 INCLUDES=8 CFLAGS= −O3 −lm −l f ann9 SLIBS=
10 PROGRAM= t r an s c r i b e1112 OBJECTS= $ (SOURCES. c : . c=.o )1314 .KEEP STATE:1516 debug := CFLAGS= −g1718 a l l debug : $ (PROGRAM)1920 $ (PROGRAM) : $ (INCLUDES) $ (OBJECTS)21 $ (LINK. c ) −o $@ $ (OBJECTS) $ (SLIBS)2223 c l ean :24 rm −f $ (PROGRAM) $ (OBJECTS)
22 main.c
The main part links the entire program and deals with the input arguments,that determines whether the program is used for training, testing or runningas well as which files to operate on.
55
1 #include <s t d i o . h>2 #include <s t d l i b . h>3 #include <math . h>4 #include <s t r i n g . h>56 #include ”wavelet . h”7 #include ”waveread . h”8 #include ”ann . h”9 #include ” f i l e i o . h”
1011 int main ( int argc , char ∗∗ argv ) {12 char ∗ f i l ename , ∗netname , ∗datname , ∗checkname ,∗ act ion , run [ ]= ”run” , t r a i n
[ ]= ” t r a i n ” , t e s t [ ]= ” t e s t ” ;13 wavheader whd ;14 int argnum , i , j , k , ∗ bu f f e r , bu f s i z e , s i g l e n , ha l f , s i g s t a r t ;15 f loat ∗wavelet , ∗ indata , c a l c ou t ;16 FILE ∗ i f p ,∗ ofp , ∗ f t e s t ;1718 i f ( argc==1){19 p r i n t f ( ”This program needs arguments to work . . . \ nThe f i r s t argument
must be one o f :\ nrun\ t to run the program\ nte s t \ t to t e s t theprogram\ nt ra in \ t to t r a i n the program\n” ) ;
20 e x i t (EXIT FAILURE) ;21 }22 ac t i on=argv [ 1 ] ;23 i f ( strcmp ( act ion , run )&&strcmp ( act ion , t r a i n )&&strcmp ( act ion , t e s t ) ) {24 p r i n t f ( ”The f i r s t argument must be one o f :\ nrun\ t to run the program\
nte s t \ t to t e s t the program\ nt ra in \ t to t r a i n the program\n\nrunning program , s i n c e no other ac t i on i s s p e c i f i e d . . . \ n\n” ) ;
25 ac t i on=run ;26 i =1;27 } else {28 i =2;29 }3031 for ( argnum=i ; argnum<argc ; argnum++){32 f i l ename= argv [ argnum ] ;33 p r i n t f ( ” f i l e %d o f %d : %s \n” , argnum−1, argc −2, f i l ename ) ;34 i f ( ( i f p = fopen ( f i l ename , ” rb” ) )==NULL) {35 f p r i n t f ( s tde r r , ”Could not open the f i l e %s f o r read ing \n” ,
f i l ename ) ;36 } else { // prevent doing t h in g s with nonex i s t ing f i l e3738 /∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗39 ∗ In t h i s part , the wave− f i l e i s read . . . ∗40 ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/4142 // Read the wave header :43 whd=wavread head ( i fp , f i l ename ) ;44 b u f s i z e = whd . da t a s i z e / whd . b l o c k a l i g n ;4546 // Print r e l e v an t in f o :47 p r i n t f ( ”%s : %d\n%s : %d\n%s : %d\n%s : %d\n%s : %d\n%s : %d\n%s : %d\n
%s : %d\n” , ” F i l e s i z e ” ,whd . f i l e s i z e , ”Number o f channe l s ” ,whd .num chan , ”Samplerate ” ,whd . samplerate , ”Byterate ” ,whd . byterate ,”Block al ignment ” ,whd . b l o ck a l i gn , ” Bi t s per sample” ,whd . b i t s ,”Data s i z e ” ,whd . datas i z e , ”Number o f samples ” , b u f s i z e ) ;
56
484950 // Read the content s in to an in t b u f f e r :51 bu f f e r=( int ∗) mal loc ( s izeof ( int [ b u f s i z e ] ) ) ;52 // 8 b i t wave i s unsigned − 16 & 24 b i t are s igned − l e f t and r i g h t
s h i f t to ge t the s i g n b i t co r r e c t53 i f (whd . b i t s >8){54 for ( i =0; i<bu f s i z e ; i++){55 bu f f e r [ i ]= g e t f i l e i n t s ( i f p , whd . b l o c k a l i g n ) << ( s izeof
(unsigned long )∗8−whd . b i t s ) >> ( s izeof (unsigned long )∗8−whd . b i t s ) ;
56 }57 } else {58 for ( i =0; i<bu f s i z e ; i++){59 bu f f e r [ i ]= g e t f i l e i n t s ( i f p , whd . b l o c k a l i g n ) ;60 }61 }62 f c l o s e ( i f p ) ;636465 #i f d e f DEBUG66 // Write the va lue s in the f i l e tmp fo r comparing with the o r i g i n a l
wave in matlab67 i n t ou t ( bu f f e r , bu f s i z e , ”debug wave” ) ;68 // The t e s t was a succes s : var iance o f c read ./ o r i g i n a l ( with
cor r ec t i on fo r 0/0− i n s t ance s ) was 069 #end i f70 p r i n t f ( ”wave− f i l e loaded . . . \ n” ) ;7172 /∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗73 ∗ Here the check− f i l e i s opened and a header i s wr i t t en to the ∗74 ∗ t e s t / t r a i n i n g data−f i l e , i f a network i s to be t ra ined or t e s t e d ∗75 ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/76 i f ( ( ! strcmp ( act ion , t r a i n ) ) | | ( ! strcmp ( act ion , t e s t ) ) ) {77 // next va lue must be the r e s u l t s f o r t r a i n in g / t e s t i n g78 i f ( ( argnum+1)<argc ) {79 checkname=argv [ argnum+1] ;80 i f ( ( i f p = fopen ( checkname , ” r ” ) )==NULL) {81 f p r i n t f ( s tde r r , ”Could not open the check− f i l e %s f o r
read ing \n” , checkname ) ;82 } else {83 // next va lue must be the data/net−name84 i f ( argnum+2<argc ) {85 netname=(char ∗) mal loc ( ( s t r l e n ( argv [ argnum+2])+
s t r l e n ( ” . net ” )+1)∗ s izeof (char ) ) ;86 s p r i n t f ( netname , ”%s . net ” , argv [ argnum+2]) ;87 datname=(char ∗) mal loc ( ( s t r l e n ( argv [ argnum+2])+
s t r l e n ( ” . data” )+1)∗ s izeof (char ) ) ;88 s p r i n t f ( datname , ”%s . data” , argv [ argnum+2]) ;89 i f ( ( ofp = fopen ( datname , ”w” ) )==NULL) {90 f p r i n t f ( s tde r r , ”Could not open the f i l e %s
f o r wr i t i ng ” , datname ) ;91 } else { // wr i t e t r a in / t e s t− f i l e header #number
#input #output9293 f p r i n t f ( ofp , ”%d %d %d\n” ,6999 ,A4 LEN, 1 ) ;94 f c l o s e ( ofp ) ;
57
95 }96 } else {97 f p r i n t f ( s tde r r , ”Not enough arguments − net /data
name miss ing \n” ) ;98 }99 }
100 } else {101 f c l o s e ( i f p ) ; // c l o s e check− f i l e102 f p r i n t f ( s tde r r , ”Not enough arguments − check− f i l e miss ing \n”
) ;103 }104 }105 /∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗106 ∗ Here the wav− f i l e i s ”chopped in p i e c e s ” and each ∗107 ∗ p iece transformed using a D4 wave l e t decomposit ion ∗108 ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/109110 // Define the s i g n a l l eng t h111 s i g l e n =4096; // 0 ,093 s v . 44100 Hz − 0 ,043 s v . 96000 Hz112 wavelet=( f loat ∗) mal loc ( s izeof ( f loat [ s i g l e n ] ) ) ;113114 p r i n t f ( ” execut ing wavelet trans form : ” ) ;115 i =0;116 while ( ( ( i +1)∗ s i g l e n ) <= bu f s i z e ) { // Go through the b u f f e r
u n t i l a whole s i g l e n cannot be used117 s i g s t a r t=i ∗ s i g l e n ;118 p r i n t f ( ”\b\b\b\b\b%5d” , i ) ;119 // Copy par t o f the s i g n a l to wave l e tarray120 for ( j =0; j < s i g l e n ; j++){121 wavelet [ j ]=( f loat ) bu f f e r [ s i g s t a r t+j ] ;122 }123 // Perform the wave l e t decomposit ion ( implemented in wave l e t . c )
− the r e s u l t o f the decomposit ion i s in the wave l e t array124 wavelet db4 ( wavelet , s i g l e n ) ;125 // Write output o f each decomposit ion to f i l e s126 f i l ename=(char ∗) mal loc ( ( s t r l e n ( ” wl1234 ” ) + s t r l e n ( argv [
argnum ] ) + 5 ) ∗ s izeof (char ) ) ;127 s p r i n t f ( f i l ename , ”%s wl%d” , argv [ argnum ] , i ) ;128 i f ( ( f t e s t=fopen ( f i l ename , ” r ” ) )==NULL) { // only wr i t e i f the
f i l e does not e x i s t . . .129 f l o a t o u t ( wavelet , s i g l e n , f i l ename ) ;130 } else {131 f c l o s e ( f t e s t ) ;132 }133 f r e e ( f i l ename ) ;134 // Perform t ra i n i n g / t e s t /run−ac t i ons f o r the s p e c i f i c transform135 i f ( ! strcmp ( act ion , run ) ) {136 indata=( f loat ∗) mal loc ( s izeof ( f loat [A4 LEN ] ) ) ;137 for ( k=A4 ST ; k<A4 END; k++){138 indata [ k−A4 ST]=wavelet [ k ] ;139 }140 // Define names o f the net to load and r e s u l t f i l e to wr i t e141 f i l ename=(char ∗) mal loc ( ( s t r l e n ( ”A#4. t e s t ” )+ s t r l e n ( argv
[ argnum ] ) +1)∗ s izeof (char ) ) ;142 s p r i n t f ( f i l ename , ”%s %s%d . t e s t ” , argv [ argnum ] , ”A” ,4 ) ;143 netname=(char ∗) mal loc ( ( s t r l e n ( ”A#4. net ” )+1)∗ s izeof (char
) ) ;
58
144 s p r i n t f ( netname , ”%s%d . net ” , ”A” ,4) ;145146 ann run ( netname , indata , f i l ename ) ;147 f r e e ( f i l ename ) ;148 f r e e ( netname ) ; //∗/149150 } else i f ( ( ! strcmp ( act ion , t r a i n ) ) | | ( ! strcmp ( act ion , t e s t ) ) ) {151 // here we must generate a t e s t / t r a i n i n g f i l e152 // A4/440Hz/ p i t c h 69 −−> wl−i n t e r v a l : 32−63 (32 va lue s )153 k=f s c a n f ( i f p , ”%d”,& j ) ;154 i f ( ! f e o f ( i f p ) ) {155 i f ( ( ofp = fopen ( datname , ”a” ) )==NULL) {156 f p r i n t f ( s tde r r , ”\nCould not open the f i l e %s f o r
wr i t i ng \n” , datname ) ;157 } else { // wr i t e t r a in / t e s t− f i l e l i n e s [ inpu t s ] \n
[ output ]158 for ( k=A4 ST ; k<A4 END; k++){159 f p r i n t f ( ofp , ”%f ” , wavelet [ k ] ) ;160 }161 f p r i n t f ( ofp , ”\n%d\n” , j ) ;162 f c l o s e ( ofp ) ;163 }164 }165166 }167168169 i++;170 } // end wave le t−l oop171 p r i n t f ( ”\n\n” ) ;172173 // Perform t ra i n i n g / t e s t /run−ac t i ons f o r the en t i r e wave f i l e174 i f ( ( ! strcmp ( act ion , t r a i n ) ) | | ( ! strcmp ( act ion , t e s t ) ) ) {175 f c l o s e ( i f p ) ; // c l o s e check− f i l e176 argnum=argnum+2; // in t r a in and t e s t , 2 ex t ra args are
needed fo r data/netname and c h e c k f i l e177 }178 i f ( ! strcmp ( act ion , run ) ) {179180 } else i f ( ! strcmp ( act ion , t r a i n ) ) {181 p r i n t f ( ” inputs from %d to %d (%d t o t a l ) \n” ,A4 ST ,A4 END,
A4 LEN) ;182 ann t ra in ( datname , netname , A4 LEN) ;183 f r e e ( netname ) ;184 f r e e ( datname ) ;185 } else i f ( ! strcmp ( act ion , t e s t ) ) {186 ann te s t ( datname , netname ) ;187 f r e e ( netname ) ;188 f r e e ( datname ) ;189 }190191 f r e e ( bu f f e r ) ;192 f r e e ( wavelet ) ;193 } // end e l s e ( to s k i p bad arguments/ f i l enames )194 } // for−l oop running through args195 e x i t (EXIT SUCCESS) ;196 }
59
23 fileio.h
This file merely holds the prototypes for the file input and output functions.
12 #ifndef FILEIO H3 #define FILEIO H45 // sub func t ion fo r re turn ing an in t e g e r o f s i z e by t e s from the f i l e6 int g e t f i l e i n t s (FILE ∗ i f p , int s i z e ) ;78 void f l o a t o u t ( f loat ∗ bu f f e r , int bu f s i z e , char ∗ f i l ename ) ;9 void i n t ou t ( int ∗ bu f f e r , int bu f s i z e , char ∗ f i l ename ) ;
101112 #endif
24 fileio.c
This file contains 3 functions: one for reading an integer of different lengthfrom a binary file and two for writing an array of either integers or floats toa text file.
1 #include <s t d i o . h>2 #include <s t d l i b . h>34 #include ” f i l e i o . h”56789 /∗ Returns an in t e g e r o f ” s i z e ” by t e s from the f i l e ∗/
1011 int g e t f i l e i n t s (FILE ∗ i f p , int s i z e ) {12 int r e t v a l =0;1314 i f ( f r ead (&re tva l , s i z e , 1 , i f p ) != 1) {15 i f ( f e o f ( i f p ) ) {16 p r i n t f ( ”Premature end o f f i l e . ” ) ;17 } else {18 p r i n t f ( ” F i l e read e r r o r . ” ) ;19 }20 e x i t (EXIT FAILURE) ;21 }22 return ( r e t v a l ) ;23 }242526272829
60
30 /∗ The f o l l ow i n g func t i ons are used f o r ou tpu t t i n g data to f i l e s f o r readingin matlab or a regu l a r t e x t e d i t o r . . .
31 ∗/3233 void f l o a t o u t ( f loat ∗ bu f f e r , int bu f s i z e , char ∗ f i l ename ) {34 // Write the va lue s in the f i l e [ f i l ename ] f o r comparing with the o r i g i n a l
r e s u l t s in matlab35 int i ;36 FILE ∗ ofp ;37 i f ( ( ofp = fopen ( f i l ename , ”w” ) )==NULL) {38 f p r i n t f ( s tde r r , ”Could not open the f i l e %s f o r wr i t i ng \n” ,
f i l ename ) ;39 } else {40 for ( i =0; i<bu f s i z e ; i++){41 f p r i n t f ( ofp , ”%f \n” , bu f f e r [ i ] ) ;42 }43 f c l o s e ( ofp ) ;44 }45 }4647 void i n t ou t ( int ∗ bu f f e r , int bu f s i z e , char ∗ f i l ename ) {48 // Write the va lue s in the f i l e [ f i l ename ] f o r comparing with the o r i g i n a l
r e s u l t s in matlab49 int i ;50 FILE ∗ ofp ;51 i f ( ( ofp = fopen ( f i l ename , ”w” ) )==NULL) {52 f p r i n t f ( s tde r r , ”Could not open the f i l e %s f o r wr i t i ng \n” ,
f i l ename ) ;53 } else {5455 for ( i =0; i<bu f s i z e ; i++){56 f p r i n t f ( ofp , ”%d\n” , bu f f e r [ i ] ) ;57 }58 f c l o s e ( ofp ) ;59 }6061 }
25 waveread.h
The header for the waveread functions contains a typedef of a struct to holdrelevant information from the header of the wave file, as well as prototypesfor the functions.
1 /∗ Header f i l e f o r waveread . c ∗/23 #ifndef WAVEREADH4 #define WAVEREADH56 typedef struct wh {7 char ∗ f i l ename ;8 int f i l e s i z e ;9 int samplerate ;
61
10 int num chan ;11 int byte ra te ;12 int b l o c k a l i g n ;13 int b i t s ;14 int da ta s i z e ;15 } wavheader ;16171819 // pro to type s :2021 // ”main” func t ion − re turns a s t r u c t conta in ing r e l e v an t in f o from the
header and qu i t s the program i f the f i l e i s not a v a l i d wave− f i l e .22 wavheader wavread head (FILE ∗ i f p , char ∗ f i l ename ) ;2324 // sub func t ion fo r check ing s t r i n g p a r t s o f the header − e x i t s i f they are
not as expec ted .25 void chkheadstr (FILE ∗ i f p , char ∗ f i l ename , char ∗header ) ;262728 #endif
26 waveread.c
The waveread file contains functions to read the header of a wave file andcheck that file follows a supported format.
1 #include <s t d i o . h>2 #include <s t d l i b . h>34 #include ”waveread . h”5 #include ” f i l e i o . h”67 /∗ The mainfunction below i s necessary to compile t h i s f i l e s tanda lone ∗/89 /∗
10 i n t main( i n t argc , char ∗∗ argv ){11 char ∗ f i l ename ;12 i n t f i l e s i z e ;13 wavheader whd ;14 FILE ∗ i f p ,∗ ofp ;151617 f i l ename= argv [ 1 ] ;18 p r i n t f (”%d : %s\n” , argc−1, f i l ename ) ;19 i f ( ( i f p = fopen ( f i lename ,” rb ”) )==NULL){20 f p r i n t f ( s tderr ,” Could not open the f i l e %s fo r reading ” ,
f i l ename ) ;2122 }2324 // Read the header :25 whd=wavread head ( i f p , f i l ename ) ;
62
2627 // Print r e l e v an t in f o :28 p r i n t f (” F i l e s i z e : %d\n” ,whd . f i l e s i z e ) ;29 p r i n t f (”Number o f channels : %d\n” ,whd . num chan) ;30 p r i n t f (” Samplerate : %d\n” ,whd . samplerate ) ;31 p r i n t f (” Byterate : %d\n” ,whd . b y t e r a t e ) ;32 p r i n t f (” Block al ignment : %d\n” ,whd . b l o c k a l i g n ) ;33 p r i n t f (” Bi t s per sample : %d\n” ,whd . b i t s ) ;34 p r i n t f (”Data s i z e : %d\n” ,whd . da t a s i z e ) ;3536 // Read the content s in to an in t b u f f e r :37 b u f s i z e = whd . da t a s i z e / whd . b l o c k a l i g n ;38 p r i n t f (”Number o f samples : %d\n” , b u f s i z e ) ;3940 i n t b u f f e r [ b u f s i z e ] ;41 f o r ( i =0; i<b u f s i z e ; i++){42 b u f f e r [ i ]= ge thead in t ( i f p , whd . b l o c k a l i g n ) << ( s i z e o f (
unsigned long )∗8−whd . b i t s ) ;43 }4445 f c l o s e ( i f p ) ;46 e x i t (EXIT SUCCESS) ;47 }48 ∗/4950 wavheader wavread head (FILE ∗ i f p , char ∗ f i l ename ) {5152 char headchk [ 4 ] ;53 int samplerate , f i l e s i z e , num chan ;54 wavheader wavhead ;5556 // Check fo r RIFF header57 chkheadstr ( i f p , f i l ename , ”RIFF” ) ;5859 // Get f i l e s i z e ( r e s t o f f i l e )60 wavhead . f i l e s i z e = g e t f i l e i n t s ( i f p , 4) ;6162 // Check fo r the WAVE and ”fmt ” headers63 chkheadstr ( i f p , f i l ename , ”WAVE” ) ;64 chkheadstr ( i f p , f i l ename , ” fmt ” ) ;6566 // Check fo r whether the codec i s PCM:67 i f ( g e t f i l e i n t s ( i f p , 4 ) !=16){68 p r i n t f ( ”The f i l e %s i s not a va l i d Wave− f i l e (Not PCM codec )
\n” , f i l ename ) ;69 f c l o s e ( i f p ) ;70 e x i t (EXIT FAILURE) ;71 }72 // Check whether the data i s uncompressed73 i f ( g e t f i l e i n t s ( i f p , 2 ) !=1){74 p r i n t f ( ”The f i l e %s i s compressed , and cannot be used\n” ,
f i l ename ) ;75 f c l o s e ( i f p ) ;76 e x i t (EXIT FAILURE) ;77 }7879 // ge t the number o f channels
63
80 wavhead . num chan = g e t f i l e i n t s ( i f p , 2 ) ;8182 // ge t samplerate83 wavhead . samplerate = g e t f i l e i n t s ( i f p , 4) ;8485 // ge t b y t e r a t e86 wavhead . byte ra te = g e t f i l e i n t s ( i f p , 4) ;8788 // ge t Block al ignment89 wavhead . b l o c k a l i g n = g e t f i l e i n t s ( i f p , 2 ) ;9091 // ge t Bi t s per sample92 wavhead . b i t s = g e t f i l e i n t s ( i f p , 2 ) ;9394 // Check SubChunk2 ID (” data ”)95 chkheadstr ( i f p , f i l ename , ”data” ) ;9697 // ge t Data S i ze98 wavhead . da t a s i z e = g e t f i l e i n t s ( i f p , 4) ;99
100 return (wavhead ) ;101 }102103104105106107108109 /∗ This func t i on cheks the current header f o r matching the co r r e c t s t r i n g ∗/110111 void chkheadstr (FILE ∗ i f p , char ∗ f i l ename , char ∗header ) {112 char headchk [ 4 ] ;113114 i f ( f r ead ( headchk , 4 , 1 , i f p ) != 1) {115 i f ( f e o f ( i f p ) ) {116 p r i n t f ( ”Premature end o f f i l e . ” ) ;117 } else {118 p r i n t f ( ” F i l e read e r r o r . ” ) ;119 }120 e x i t (EXIT FAILURE) ;121 }122 i f ( memcmp( headchk , header , 4 ) ) {123 p r i n t f ( ”The f i l e %s i s not a va l i d Wave− f i l e (No \”%s \”
header ) \n” , f i l ename , header ) ;124 f c l o s e ( i f p ) ;125 e x i t (EXIT FAILURE) ;126 }127128 }
27 wavelet.h
This file just contains the prototype for the wavelet decomposition.
64
12 #ifndef WAVELET H3 #define WAVELET H456 void wavelet db4 ( f loat ∗wavelet , int s i g l e n ) ;78 #endif
28 wavelet.c
The implemented wavelet decomposition, using the lifting scheme on the D4algorithm.
1 #include <s t d i o . h>2 #include <s t d l i b . h>3 #include <math . h>4 #include <s t r i n g . h>56 #include ”waveread . h”7 #include ”wavelet . h”89
1011121314 // return the Daubechies D4 wave l e t decomposit ion from { b u f f e r [ s i g s t a r t ] to
b u f f e r [ s i g s t a r t+s i g l e n ]} in wave l e t15 void wavelet db4 ( f loat ∗wavelet , int s i g l e n ) {16 int n , i , f i r s t , l a s t , h a l f ;17 f loat tmp , sq r t3=sq r t (3 ) , sq r t2=sq r t (2 ) ; // c a l c u l a t e the square
roo t s only once in s t ead o f doing i t in every i t e r a t i o n to savecomputation .
181920 // The D4 decomposit ion21 for ( n=s i g l e n ; n > 1 ; n = n >> 1) { // forward transform − l e n g t h
i s ha lved each i t e r a t i o n2223 // S p l i t s t ep ( even elements are p laced in the f i r s t h a l f and odd
elements in the second h a l f )24 f i r s t =1;25 l a s t=n−1;26 while ( f i r s t <l a s t ) {27 for ( i=f i r s t ; i<l a s t ; i=i +2){28 tmp=wavelet [ i ] ;29 wavelet [ i ]=wavelet [ i +1] ;30 wavelet [ i +1]=tmp ;31 }32 f i r s t ++;33 l a s t −−;34 }
65
3536 // Forward transform s tep − coded from equat ions in s e c t i on ”A
L i f t i n g Scheme Version o f the Daubechies D4 Transform ” at h t t p://www. bearcave . com/mis l / m i s l t e c h / wave l e t s / daubechies / index . html
37 // The transform i s performed in 4 s t e p s :38 // 1) update ( add u1 ( odd ) to even )39 // 2) Pred ic t ( s u b t r a c t p ( even ) from odd )40 // 3) update ( add u2 ( odd ) to even )41 // 4) Normalize4243 ha l f = n/2 ;4445 // Update 146 for ( i = 0 ; i < ha l f ; i++){47 wavelet [ i ] = wavelet [ i ] + sqr t3 ∗ wavelet [ h a l f+i ] ;48 }4950 // Pred ic t51 wavelet [ h a l f ] = wavelet [ h a l f ] − ( sq r t3 /4 . 0 ) ∗wavelet [ 0 ] − ( ( (
sqrt3 −2) /4 . 0 ) ∗wavelet [ ha l f −1]) ;52 for ( i = 1 ; i < ha l f ; i++){53 wavelet [ h a l f+i ] = wavelet [ h a l f+i ] − ( sq r t3 /4 . 0 ) ∗
wavelet [ i ] − ( ( ( sqrt3 −2) /4 . 0 ) ∗wavelet [ i −1]) ;54 }5556 // Update 257 for ( i = 0 ; i < ha l f −1; i++){58 wavelet [ i ] = wavelet [ i ] − wavelet [ h a l f+i +1] ;59 }60 wavelet [ ha l f −1] = wavelet [ ha l f −1] − wavelet [ h a l f ] ;6162 // Normalize63 for ( i = 0 ; i < ha l f ; i++){64 wavelet [ i ] = ( ( sqrt3 −1.0) / sq r t2 ) ∗ wavelet [ i ] ;65 wavelet [ i+ha l f ] = ( ( sq r t3 +1.0) / sq r t2 ) ∗ wavelet [ i+
ha l f ] ;66 }67686970 }71 }
29 ann.h
The header file for the neural networks, contain definitions of relevant pa-rameters, that can be changed at compile time, like the sigmoid functions,training algorithms, network characteristics etc. Also the prototypes for thetraining, testing and run functions are specified.
1 #ifndef ANN H2 #define ANN H
66
34 /∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗5 ∗ For t r a i n i n g : ∗6 ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/78 // Comment in the t r a i n i n g a lgor i thm9
10 //#de f ine ANN TRAIN ALG FANN TRAIN INCREMENTAL11 //#de f ine ANN TRAIN ALG FANN TRAIN BATCH12 #define ANN TRAIN ALG FANN TRAIN RPROP // t h i s i s the d e f a u l t s e t t i n g13 //#de f ine ANN TRAIN ALG FANN TRAIN QUICKPROP1415 // Comment in the wanted t r an s f e r func t i on1617 //#de f ine ANN NEURON TF FANN LINEAR18 //#de f ine ANN NEURON TF FANN LINEAR PIECE19 //#de f ine ANN NEURON TF FANN LINEAR PIECE SYMMETRIC20 //#de f ine ANN NEURON TF FANN SIGMOID21 #define ANN NEURON TF FANN SIGMOID SYMMETRIC22 //#de f ine ANN NEURON TF FANN SIGMOID SYMMETRIC STEPWISE23 //#de f ine ANN NEURON TF FANN SIGMOID STEPWISE24 //#de f ine ANN NEURON TF FANN THRESHOLD25 //#de f ine ANN NEURON TF FANN THRESHOLD SYMMETRIC26 //#de f ine ANN NEURON TF FANN GAUSSIAN27 //#de f ine ANN NEURON TF FANN GAUSSIAN SYMMETRIC28 //#de f ine ANN NEURON TF FANN ELLIOT29 //#de f ine ANN NEURON TF FANN ELLIOT SYMMETRIC3031 #define NUMOUTPUT 1 // − output neurons32 #define NUM LAYERS 3 // − l a y e r s33 #define NUM HIDDEN 5 // − hidden neurons34 #define DES ERR 0.001 // de s i r ed error35 #define MAX EPOCHS 20000 // 500000 // maximum number o f
t r a i n i n g s t e p s36 #define EPOCHS BETWEEN REPORTS 1000 // how many s t ep s between repor t s ? (
d i s p l a y current error )373839 /∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗40 ∗ Limits f o r the networks ∗41 ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/42 #define A4 ST 16//3243 #define A4 END 128//6444 #define A4 LEN A4 END−A4 ST4546 /∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗47 ∗ Prototypes ∗48 ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗/4950 int ann t ra in (char ∗ i n f i l ename , char ∗ out f i l ename , const unsigned int
num input ) ;51 int ann te s t (char ∗ t e s t f i l e name , char ∗ net f i l ename ) ;52 f loat ann run (char ∗ net f i l ename , f loat ∗ indata , char ∗ out f i l ename ) ;5354 #endif
67
30 ann train.c
This function is called, when a network is to be trained. It reads from thespecified input file, trains with the selected number of inputs and writesthe network in the output file. The files must be opened before calling thefunction.
123 #include <s t d i o . h>45 #include ” fann . h”6 #include ”ann . h”78 int ann t ra in (char ∗ i n f i l ename , char ∗ out f i l ename , const unsigned int
num input ) {9
1011 const unsigned int num output = NUMOUTPUT;12 const unsigned int num layers = NUM LAYERS;13 const unsigned int num neurons hidden = NUM HIDDEN;14 const f loat d e s i r e d e r r o r = ( const f loat ) DES ERR;15 const unsigned int max epochs = MAX EPOCHS;16 const unsigned int epochs between repor t s = EPOCHS BETWEEN REPORTS;171819 // Create FANN st ruc t , d e f i n i n g the ANN20 struct fann ∗ann ;2122 // ANN i s i n i t i a l i z e d from d e f i n i t i o n s in ann . h23 ann = fann c r ea t e s t anda rd ( num layers , num input , num neurons hidden
, num output ) ;2425 // Define t r an s f e r f unc t i ons f o r the neurons − TEST NEURON TF i s de f ined in
ann . h26 f a nn s e t a c t i v a t i o n f un c t i o n h i dd en ( ann , ANN NEURON TF) ;27 f a nn s e t a c t i v a t i o n f un c t i o n ou t pu t ( ann , ANN NEURON TF) ;2829 // Define t r a i n in g a lgor i thm ( not necessary − RPROP i s d e f a u l t ) −
TEST TRAIN ALG i s de f ined in ann . h30 f a nn s e t t r a i n i n g a l g o r i t hm (ann , ANN TRAIN ALG) ;3132 // t ra in network33 p r i n t f ( ”Train ing on %s :\n” , i n f i l ename ) ;34 f a n n t r a i n o n f i l e ( ann , in f i l ename , max epochs ,
epochs between report s , d e s i r e d e r r o r ) ;3536 // Save network37 p r i n t f ( ”The network i s saved in %s :\n\n” , out f i l ename ) ;38 fann save ( ann , out f i l ename ) ;3940 // Destroy network to f r e e memory :41 fann des t roy ( ann ) ;4243 return (0 ) ;
68
4445 }
31 ann test.c
This function does almost the same as the training, except that the networkisn’t trained, but the mean square error of running the network on data ofthe same form as the training data is calculated.
1 #include <s t d i o . h>23 #include ” fann . h”4 #include ”ann . h”56 int ann te s t (char ∗ t e s t f i l e name , char ∗ net f i l ename ) {78 int i ;9
10 // Create 4 s t ru c t s , d e f i n i n g the ANNs − t h ee se are t r ea t e d one atthe time
11 struct fann ∗ann ;12 struct f a nn t r a i n da t a ∗data ;131415 // Load network16 p r i n t f ( ”Opening net : %s \n” , ne t f i l ename ) ;17 ann = f a n n c r e a t e f r om f i l e ( ne t f i l ename ) ;18 i f ( ann == NULL)19 {20 f p r i n t f ( s tde r r , ”Error : The net f i l e %s cannot be
opened f o r read ing \n” , ne t f i l ename ) ;21 return (1 ) ;22 }2324 // Test on f i l e25 p r i n t f ( ”Test ing on f i l e : %s \n” , t e s t f i l e n ame ) ;2627 data = f a n n r e a d t r a i n f r om f i l e ( t e s t f i l e n ame ) ;28 i f ( data == NULL)29 {30 f p r i n t f ( s tde r r , ”Error : the t e s t f i l e %s cannot be
opened f o r read ing \n” , t e s t f i l e n ame ) ;31 return (1 ) ;32 }3334 f ann t e s t da t a ( ann , data ) ;35 p r i n t f ( ”Test r e s u l t : MSE of ANN %s = %f \n” , net f i l ename ,
fann get MSE (ann ) ) ;3637 // Destroy network to f r e e memory :38 fann des t roy ( ann ) ;3940 return (0 ) ;
69
4142 }
32 ann run.c
This function is used for giving an output from an already trained network.There appear to be something strange somewhere in the neural network part,as the output always gives a lot of floating point numbers of the exact samevalue.
1 #include <s t d i o . h>2 #include <s t d l i b . h>34 #include ” fann . h”5 #include ”ann . h”678 f loat ann run (char ∗ net f i l ename , f loat ∗ indata , char ∗ out f i l ename ) {9
10 f loat ∗ c a l c ou t ;11 struct fann ∗ann ;1213 FILE ∗ o u t f i l e ;1415 // p r i n t f (”ann : run : indata [31]=% f \n” , indata [ 3 1 ] ) ;1617 // Load network18 // p r i n t f (”Opening net : %s\n” , ne t f i l ename ) ;19 ann = f a n n c r e a t e f r om f i l e ( ne t f i l ename ) ;2021 // Open f i l e f o r wr i t i n g r e s u l t s22 // p r i n t f (”Open o u t f i l e %s fo r wr i t i n g \n” , out f i l ename ) ;23 i f ( ( o u t f i l e = fopen ( out f i l ename , ”a” ) )==NULL)24 {25 f p r i n t f ( s tde r r , ”Error : %s cannot be opened f o r wr i t i ng \n” ,
out f i l ename ) ;26 // Destroy network to f r e e memory :27 fann des t roy ( ann ) ;28 return (1 ) ;29 } else {3031 c a l c ou t=fann run (ann , indata ) ;32 f p r i n t f ( o u t f i l e , ”%f \n” , c a l c ou t [ 0 ] ) ;3334 // p r i n t f (” r e s u l t a t : %f \n” , c a l c ou t [ 0 ] ) ;35 f c l o s e ( o u t f i l e ) ;36 }37 // Destroy network to f r e e memory :38 fann des t roy ( ann ) ;3940 return ( c a l c ou t [ 0 ] ) ;41 }
70
Part V
Matlab Source Code
33 pianocomp.m
1 % s c r i p t e t t age r en wavesekvens a f de 88 optagede t e s t og t raen ing s toner ind, skaerer dem i b idder a f 30.000 og l a v e r en matrice med
2 % ”backup” er den raa wav e f i l der indeho lder hhv t raen ings og t e s t d a t a3 % foer s c r i p t e t koeres kop i e r e s no t e s t r i n g over i backup45 de t ec t ednote s = 88 ; % kan aendres hv i s der eksempe lv i s i k k e ” h i t t e s ” paa
a l l e 88 toner . eks . pga s t o e j .6 note l ength = 30000; % de 30.000 samples der s k a l udtages a f hver en k e l t tone7 maxarray = zeros (1 , de t e c t ednote s ) ;% array der indeho lder indeks paa de
s t o e r s t e peaks89 no t e s t r i n g = [ zeros (1 ,50000) backup ] ; %zerospaddes
1011 newstr ing = zeros (1 , de t e c t ednote s ∗ note l ength ) ; % kommer t i l a t indeho lde en
s t r eng med a l l e de t i l k l i p p e d e toner1213 for i = 1 : de t e c t ednote s14 [m p ] = max( n o t e s t r i n g ) ; %re turnerer indek s e t paa s t o e r s t e ( p o s i t i v e )
ampl i tude15 maxarray ( i ) = p ; %indek s e t gemmes16 no t e s t r i n g (p−50000:p+50000) = zeros (1 ,100001) ; %Peaken ov e r s k r i v e s med
0 , f o r i k k e at g i v e f l e r e h i t s17 end1819 maxarray = sort ( maxarray ) ; %indeksene s o r t e r e s2021 % a l l e tonerne smides i en lang s t r eng22 for i = 1 : de t e c t ednote s23 newstr ing ( ( ( i −1)∗ note l ength )+1: i ∗ note l ength ) = backup ( maxarray ( i ) −54999:
maxarray ( i )−25000) ;24 end2526 t e s tmat r i x = zeros (88 , note l ength ) ;% ” t e s t ”matrix u d s k i f t e s med ” t r a i n i n g ”
matrix , og s c r i p t e t koeres igen2728 for i = 1 : 8829 te s tmat r i x ( i , : ) = newstr ing ( ( ( i −1)∗ note l ength )+1: i ∗ note l ength ) ; %tonerne
l a e g g e s over i en matrice30 end
34 pianomix.m
1 function [ wav e f i l e , d e t e c t i onv e c t o r ] = pianomix ( pitch , notes ,no o f s imul taneous , notematr ix )
71
2 %Tager ” p i t c h ” , ” notes ” = anta l sekvenser der oenskes generere t , ” no o f . . . ”maks . an ta l samt id ige toner , matricen med toner
3 %re turnerer en wav e f i l og en vek tor der f o r t a e l l e r i h v i l k e sekvenser denoenskede tone er t i l s t ede
45 p i t ch = pi t ch − 20 ; %kond i t i one re s t i l matricen67 i f ( ( p i t ch < 1) | | p i t ch > 88)8 disp ( ’ Pitch out o f range ! ’ ) ;9 return
10 end1112 i f ( ( notes < 1) | | notes > 50001)13 disp ( ’ Notes out o f range ! ’ ) ;14 return15 end1617 wave f i l e = zeros (1 ,4096∗ notes ) ;1819 de t e c t i onv e c t o r = zeros (1 , notes ) ; % 0 hv i s tonen i k k e er t i l s tede , 1 e l l e r s2021 for i = 1 : notes2223 p i t chve c t o r = zeros (1 , 10 ) ; % a l l e ikke−nul vae rd i e r ud t rykker en
komposant2425 i f rand < 0 .5 %den oenskede tone medtages26 de t e c t i onv e c t o r ( i ) = 1 ;27 p i t chve c t o r (1 ) = p i t ch ;2829 ex t ranote s = f loor ( no o f s imu l taneous ∗rand ) ; %t i l f a e l d i g t h e l t a l
mellem 0 og 9 , begge ink .3031 i f ex t ranot e s ;32 j = 1 ;33 while j <= ext ranote s34 p i t chve c t o r ( j +1) = ce i l (88∗rand ) ; %t i l f a e l d i g t h e l t a l mellem
1 og 88 , begge ink .35 i f ( length ( unique (nonzeros ( p i t chve c t o r ( 1 : j +1) ) ) )==j +1)36 j = j +1; %f ind ny t i l f a e l d i g tone , h v i s den fundne
a l l e r e d e er medtaget37 end38 end39 end4041 else %den oenskede tone medtages i k k e42 de t e c t i onv e c t o r ( i ) = 0 ;4344 ex t ranote s = f loor ( ( no o f s imu l taneous +1)∗rand ) ; %t i l f a e l d i g t h e l t a l
mellem 0 og 10 , begge ink .4546 i f ex t ranot e s ;47 j = 1 ;48 while j <= ext ranote s49 p i t chve c t o r ( j ) = ce i l (88∗rand ) ; %t i l f a e l d i g t h e l t a l mellem 1
og 88 , begge ink .
72
50 i f ( ( length ( unique (nonzeros ( p i t chve c t o r ( 1 : j ) ) ) )==j ) &&p i t chve c t o r ( j )˜=p i t ch )
51 j = j +1; %f ind ny t i l f a e l d i g tone , h v i s den fundnea l l e r e d e er medtaget , e l l e r den oenskede tone ermedtaget
52 end53 end54 end5556 end5758 %de 4096 samples udtages f o r s k e l l i g e s t ede r f ra de 30.000 samples59 for k = 1 : length (nonzeros ( p i t chve c t o r ) )60 time = round(15000∗rand ) ;61 wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) = wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) +
notematr ix ( p i t chve c t o r ( k ) , time +10001: time+14096) ; %mixkomposit ion
62 end636465 %normal i ser ing66 i f (max( wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) )>1)67 wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) = wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) /(
max(abs ( wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) ) +0.01) ) ; %normal i ser ing68 end69 end
35 featureextraction.m
1 function [ output ] = f e a t u r e e x t r a c t i o n ( s i gna l , p i t ch )23 % Signa l s k a l vaere 4096 samples45 %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− wave l e t .m
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−67 % Based on matlab code from h t t p ://www. con t ro l . auc . dk/˜ a l c /Fnct−31.m8 % which i s a s soc i a t ed with the book9 % ”Ripp les in Mathematics − The Discre t e Wavelet Transform”
10 % Arne Jensen , Anders l a Cour−Harbo , Springer−Verlag 2001.11 % ISBN 3−540−41662−5.12 % See a l s o h t t p ://www. con t ro l . auc . dk/˜ a l c / r i p p l e s . html1314 S=s i g n a l ( 1 : 4096 ) ;15 wl = [ ] ;1617 N = length (S) ;1819 while N>120 s1 = S ( 1 : 2 :N−1) + sqrt (3 ) ∗S ( 2 : 2 :N) ;
% update 121 d1 = S ( 2 : 2 :N) − sqrt (3 ) /4∗ s1 − ( sqrt (3 )−2) /4∗ [ s1 (N/2) s1 ( 1 :N/2−1) ] ;
% pred i c t22 s2 = s1 − [ d1 ( 2 :N/2) d1 (1 ) ] ;
% update 2
73
23 s = ( sqrt (3 )−1)/sqrt (2 ) ∗ s2 ;% normalize
24 d = ( sqrt (3 )+1)/sqrt (2 ) ∗ d1 ;% normalize
2526 wl=[d wl ] ; % save WL transform in vec tor ( l i k e the c−code )27 N=N/2 ; % prepare f o r next s t ep . .28 S=s ;29 end30 wl=[ s wl ] ;3132 %
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
333435 output = [ wl ( 17 : 3 2 ) wl ( 33 : 6 4 ) wl ( 65 : 9 6 ) wl (129 : 160 ) ] ;
36 NNgen.m
1 %Sc r i p t e t opre t t e r , t raener og genererer t e s t d a t e f o r netvaerk23 p i t ch = [21 1 0 8 ] ; %Det i n t e r v a l a f p i t c h e s der s k a l t raenes og t e s t e s45 Anta l t e s t t on e r = 1000 ;6 max s imultane toner = 10 ;78 Result = zeros (88 , Anta l t e s t t on e r ) ; %det f a k t u e l l e r e s u l t a t9 Netoutput = zeros (88 , Anta l t e s t t on e r ) ; %netvae rk s r e sponse t
10 wave le tar ray = zeros ( Anta l t e s t t one r , 1 12 ) ;1112 for m = pitch (1 ) : p i t ch (2 )1314 load t r a in ingmat r i x ;15 %wave f i l t i l t raen ing l a v e s og vek to r med 0 og 1 l a v e s16 [ wave f i l e , d e t e c t i onv e c t o r ] = pianomix ( m, Anta l t e s t t one r ,
max simultane toner , t r a in ingmat r i x ) ;17 %wave le tdekompos i t ion18 for i = 1 : Anta l t e s t t on e r19 wave le tarray ( i , : ) = f e a t u r e e x t r a c t i o n ( wave f i l e ( ( ( i −1)∗4096)+1: i
∗4096) ,m) ;20 end2122 clear wave f i l e t r a in ingmat r i x2324 net = newff ( waveletarray ’ , d e t e c t i onvec to r , [ 2 0 30 30 ] ) ; % opre t t e r e t nyt25 %neura l t ne tvaerk a f feed forward typen , med 112 inputneuroner , 3 s k j u l t e26 %lag med hhv . 20 , 30 og 30 neroner , og 1 outputneuron2728 net = i n i t ( net ) ; % soerger f o r d e f a u l t i n i t i a l i s e r i n g a f vaeg te og b i a s
inden traen ing2930 net . trainParam . epochs = 5 ; % netvae rke t ud sae t t e s f o r d a t a s a e t t e t 5 gange3132 net = t r a i n ( net , waveletarray ’ , d e t e c t i onv e c t o r ) ; %ne t t e t t raenes
74
3334 %ne t t e t gemmes35 save ( s t r c a t ( ’NN’ ,num2str(m) ) ) ;3637 %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−t e s t
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−3839 load t e s tmat r i x ;4041 %wave f i l t i l t e s t genereres og de t e c t i onvek t o r en s k t i v e s t i l en matrice42 [ wave f i l e , Result ( i , : ) ] = pianomix ( m, Anta l t e s t tone r ,
max simultane toner , t e s tmat r i x ) ;4344 for i = 1 : Anta l t e s t t on e r45 %wave l e t dekomposit ion46 wave le tar ray ( i , : ) = f e a t u r e e x t r a c t i o n ( wave f i l e ( ( ( i −1)∗4096)+1: i ∗4096) ,m)
;4748 end4950 clear wave f i l e t e s tmat r i x5152 %Netvaerket s imu leres med t e s t d a t a og re turnere ne t vae rk s r e sponse t53 Netoutput ( i , : ) = sim ( net , waveletarray ’ ) ;54 clear net555657 m58 end
37 resultpresentation.m
12 e r ro r type = zeros (7 ) ; %1: cor r e c t h i t s , 2 : co r r e c t misses , 3 : f a l s e h i t s3 % 4: f a l s e miss , 5 ; LSE, 6 : mean error , 7 : s i gn examination45 for i = 21 : 10867 t r u e h i t s = 0 ;8 t ru e m i s s e s = 0 ;9 f a l s e h i t s = 0 ;
10 f a l s e m i s s e s = 0 ;1112 for j = 1 : 10001314 i f ( round( Netoutput ( i , j ) )==1 && Result ( i , j )==1 )15 t r u e h i t s = t r u e h i t s + 1 ;16 end17 i f ( round( Netoutput ( i , j ) )==0 && Result ( i , j )==0 )18 t ru e m i s s e s = t ru e m i s s e s + 1 ;19 end20 i f ( round( Netoutput ( i , j ) )==1 && Result ( i , j )==0 )21 f a l s e h i t s = f a l s e h i t s + 1 ;22 end23 i f ( round( Netoutput ( i , j ) )==0 && Result ( i , j )==1 )
75
24 f a l s e m i s s e s = f a l s e m i s s e s + 1 ;25 end2627 end2829 e r ro r type (1 , i ) = t r u e h i t s ;30 e r r o r type (2 , i ) = t ru e m i s s e s ;31 e r r o r type (3 , i ) = f a l s e h i t s ;32 e r r o r type (4 , i ) = f a l s e m i s s e s ;33 e r r o r type (5 , i ) = sum( power ( Netoutput ( i , : )−Result ( i , : ) , 2 ) ) /1000 ;34 e r r o r type (6 , i ) = sum(abs ( Netoutput ( i , : )−Result ( i , : ) ) ) /1000 ;35 e r r o r type (7 , i ) = sum( Result ( i , : )−Netoutput ( i , : ) ) /1000 ;3637 end3839 % x = [ 2 1 : 1 0 8 ] ;40 %41 % p lo t f o rma t e r i n g ;42 % p l o t ( x , e r ror t ype ( 3 , : ) , ’ r ’ , x , e r ror t ype ( 4 , : ) , ’ b ’ ) ;43 % se t ( gca , ’XLim’ , [ 2 1 108]) ;44 %45 % p lo t f o rma t e r i n g ;46 % p l o t ( x , e r ror t ype ( 5 , : ) , ’ r ’ , x , e r ror t ype ( 6 , : ) , ’ b ’ ) ;47 % se t ( gca , ’XLim’ , [ 2 1 108]) ;48 %49 % p lo t f o rma t e r i n g ;50 % p l o t ( x , e r ror t ype ( 7 , : ) , ’ r ’ ) ;51 % se t ( gca , ’XLim’ , [ 2 1 108]) ;
76