User Manual ofMining Mouse Vocalizations
Prepared byJesin Zakaria and Eamonn Keogh
CREATE SPECTROGRAM
Run the code createSpectro.m to 1. create spectrogram from a .wav file2. idealize the spectrogram3. extract candidate syllables from idealized spectrogram
Try the following exampleSet,rec = ‘..\031611KOKO02MATED.wav'; % put the address and name of the wav fileD = ‘...\031611KOKO02MATEDspectro\'; % location of the folder
% that will contain syllables
Depending on the size of main memory and recording set range of the for loopIn each iteration we created spectrogram of two minutes of the recording, this value can be changed to create spectrogram of longer section of the recording.
RUNNING TIME:Since the running time is faster than real time, we did not include running time analysis in our paper.For example,It took on average,(12.95 + 12.81 + 12.67)/3 = 12.81 second, to create spectrogram of a two minute long recording
It took, 85.7 second to extract connected components from the idealized spectrogram of a six minute long recording
CREATE SPECTROGRAM
rec = 'C:\Users\Jesin\Desktop\temp\031611KOKO02MATED.wav';t1 = 124000*250; t2 = 125000*250;
[Y, FS] = wavread(rec,[t1,t2]);[y,F,T,P]=spectrogram(Y,512,256,512,FS,'yaxis');
C = -10*log10(P);C(C<35)=0;C(C>80)=0;C(C~=0)=1; imshow(~C);
124 Time (second) 12540
kHz
100
laboratorymice
Figure 1: Use the following code to create the idealized spectrogram.
EXTRACT CANDIDATE SYLLABLES
In createSpectro.m we marked the part of code to extract candidate syllables
Results of all filtering steps are included in the extractcandidatesyllable.zip folder
The folder …\031611KOKO02MATEDspectro contains all connected components with duration >10 and <300 and within frequency range 30 to 110kHz
The folder …\031611KOKO02MATEDcontains all candidate syllables after filtering out some noise and excludingall the syllables but one that appear in the same time stamp
The folder …\sametime contains syllables that were excluded for appearing in same timestamp
CLASSIFY CANDIDATE SYLLABLES
Run the code classifySyllables.mRequire:1. labelGrndTruth.txt contains labels of the ground truth2. theta.txt contains thresholds for each class.
mean, sigma, mean+sigma and mean+2*sigma for each class of syllables in the ground truth are included in column 1, 2, 4 and 5 of theta.txt
3. Nomalized Ground truth4. Candidate syllables bitmaps5. List of candidate syllables in sorted order
Result:For our sample example,‘dis031611KOKO02MATED.txt’, contains distance of the candidate syllables to GroundTruth‘label 031611KOKO02MATED.txt’, contains labels of all the candidate syllables
If you want to see class distribution unblock the code for class distribution in classifySyllables.m
CLASSIFY CANDIDATE SYLLABLES
Normalization method
In our paper we said that all the candidate syllables and ground truth are normalized before computing the GHT distance between them.But for brevity we did not include details about our normalization method and also did not validate our normalization method.
In the next slide we will present detail about our normalization method.
CLASSIFY CANDIDATE SYLLABLESNormalization method
Set: 16 syllables of class 1, 3, 4 and 11 (non confusing classes)Syllables that are not clustered correctly are marked with red circle
GHT is calculated without normalizing the syllables
CLASSIFY CANDIDATE SYLLABLESNormalization method
Set: 16 syllables of class 1, 3, 4 and 11 (non confusing classes)Still there are some syllables that are not clustered correctly as evident from the following figure
GHT is calculated after normalizing the syllables by dividing x and y by the larger dimension(row or column)
Same set of syllables after normalization
CLASSIFY CANDIDATE SYLLABLESNormalization method (we used in our paper)
Set: 16 syllables of class 1, 3, 4 and 11 (non confusing classes)All the syllables except one (marked with arrow), are clustered correctly as evident from the following figure
GHT is calculated after normalizing the syllables by dividing x and y by the size of row and column respectively
Same set of syllables after normalization
CLASSIFY CANDIDATE SYLLABLES
Same set of syllables after normalization
Set: 16 syllables of class 1 and 27 syllables of class 9 (Confusing classes)
Normalization method (we used in our paper)
GHT is calculated after normalizing the syllables by dividing x and y by the size of row and column respectively
EDITING GROUND TRUTH
0 100 200 300 400 500 600 700
0
0.2
0.4
0.6
0.8
1
Adding more instances
Cla
ssifi
catio
n A
ccur
acy
for edited ground truth
for all the labeled syllables
Run accuracyGrndTrth.m to generate the plotIt requires,
editMatrix.txtdis692.txtlabel692.txt
DESCRIPTION OF THE FILESIn our paper we have mentioned about the 692 annotated syllables by the domain expert.Instead of using that 692 syllables as ground truth we used data editing technique, that resulted in a set of 108 syllables which we used as GROUNDTRUTH for our experiments
1. editMatrix.txt contains result of editing 692 annotated syllablesColumn 2, 3, 4 and 5 represent the number of syllable added to the ground truth, class label of the syllable, total number of classified syllable using the edited ground truth and accuracy rate.2. dis692.txt contains GHT distances of the 692 annotated syllables3. label692.txt contains class labels of the 692 syllables
groundtruth.zip contains the set of 692 syllable and 108 syllables that we mentioned in our paper.
MOTIF DISCOVERY
Run findMotif.m to find motifs from a vocalization
944.7 – 945.2 sec194.8 – 195.2 sec
Instruction:In findMotif.m need to change
location of the folders that will contain motifs, .wav file, list of syllables,label of the syllables
And also create folder e.g. …/motif/6 …/motif/7 before running the code.These folders will contain motifs of length 6, 7 etc.
motif.zip contains motifs from the attached .wav file.
Clustering mice vocalizations
Run clusterMtf.m to cluster motifs from mice vocalizations
The folder ‘dendo_mice’ contains all the required files used to generate the dendrograms of figure 12 and figure 13.
d d q d ddqd
(‘q’ means, unknown class)
QUERY
Similarity search / Query by contentSome additional results are attached here
10 NN from four vocalizations are presented.
qaiaiacia
(‘q’ means, unknown class)QUERY
Similarity search / Query by contentSome additional results are attached here
10 NN from four vocalizations are presented.
a
q i
a
i
ac
i
a
Motif Significance
Run mtfSgnfnc.m to assess significance of motifs based on their z-score.
The folder ‘../mtfSgnfcn’ contains all the required files used to generate the plot of figure 17.
Contrast sets
createContrastset.m is used to create the contrast sets.contratset.m is used to extract the patterns in contrast sets, from a vocalization.
The folder ‘../contrastSet’ contains some examples of contrast set that we mentioned in our paper. It also contains necessary files needed in createContrastset.m
‘contrastset.txt’ contains the list of substrings sorted in descending order of their information gain.
Question/ comment?Email at, [email protected]
Top Related