Mining Mouse Vocalizations

Mining Mouse Vocalizations

Jesin ZakariaDepartment of Computer Science and Engineering

University of California Riverside

124 Time (second) 12540

kHz

100

laboratorymice

Mouse Vocalizations

Figure 1: top) A waveform of a sound sequence produced by a lab mouse, middle) A spectrogram of the sound, bottom) An idealized version of the spectrogram

AX

Q

X P A X

C

X P

The intution behind symbolizing the spectrogram

Figure 3: The two fragments of data shown in Figure 2.bottom aligned to produce the maximum overlap. (Best viewed in color)

Figure 4: The data shown in Figure 2 augmented by labeled syllables

Figure 1: top) Two 0.5 second spectrogram representations of fragments of the vocal output of a male mouse. bottom) Idealized (by human intervention) versions of the above

2

7876.3 Time (second)

kHz

120

0

30

110

original

idealized

Background

Figure 6: top) Original spectrogram, bottom) Idealized spectrogram (after thresholding and binarization)

31 1 1

4

4 8

87

Figure 7: left) A real spectrogram of a mouse vocalization can be approximated by samples of handwritten Farsi digits (right). Some Farsi digits were rotated or transposed to enhance the similarity

90.1 91.1Time (sec)

Figure 5: A snippet spectrogram that has seven syllables

I LSP

connected components

Figure 8: from left to right)snippet spectrogram, matrix corresponding to an idealized spectrogram I, matrix corresponding to the set of connected components L, mbrs of the candidate syllables

Extracting syllables from spectrogram

I J K L M N O PA B C D E F G H

a b c d e f

g h i j k

NewClass

Editing Ground truthI J K L M N O P

A B C D E F G H

Figure 9: Sixteen syllables provided by domain experts

Figure 11: Ambiguity reduction of the original set of syllable classes. Representative examples from the reduced set of eleven classes are labeled as small letters

Editing Ground truth

Figure 10: Thick/red curve represents the accuracy of classifying syllables of edited ground truth. Thin/blue curve represents the accuracy of classifying 692 labeled syllables using edited ground truth

0 100 200 300 400 500 600 700

0

0.2

0.4

0.6

0.8

1

Adding more instances

Cla

ssifi

catio

n A

ccur

acy

for edited ground truth

for all the labeled syllables

Data mining Mouse Vocalizations

cccc

ccgc

eccccc

cc eccc

ccc

ciacia

ci

ciac

iaci

dcibfcd

ddci

bfcd

ccccccgc

Figure 12: A clustering of eight snippets of mouse vocalization spectrograms using the string edit distance on the extracted syllables (spectrograms are rotated 90 degrees for visual clarity)

Figure 13: A clustering of the same eight snippets of mouse vocalization shown in Figure 12 using the correlation method. The result appears near random

Clustering mouse vocalizations

cccc

query image


Similarity search / Query by content

Figure 15: top) The query image from [2] was transcribed to cccc. Similar patterns are found in CT (first row) and KO (second row) mouse vocalizations in our collection

Figure 14: top) A query image from [1], The syllable labels have been added by our algorithm to produce the query ciabqciacia, bottom) the two best matches found in our dataset; corresponding symbolic strings are ciafqcicia and ciqbqcaacja, with edit distance 2 and 3, respectively

[1] J. M. S. Grimsley, et al., Development of Social Vocalizations in Mice. PLoS ONE 6(3): e17460 (2011).

[2] T. E. Holy, Z. Guo, Ultrasonic songs of male mice, PLoS Biol 3(12): e386, (2005).

query imagec c c

aa

ib q

a

ii

ciafqciciaEdit dist 2

ciqbqcaacjaEdit dist 3

944.7 – 945.2 sec194.8 – 195.2 sec

motif


Figure 1: A motif that occurred in two different time intervals of a vocalization. The left and right one correspond to the symbolic strings ciaciacia and ciacjacia

b c c c c q g c c

c c c c c c c cg

0 0.5 1 1.5 2 2.5 3 3.50

10

20

30

40 3983

4416

18

118

11

# of

subs

tring

s (lo

g sc

ale)

Z-score

motif 1

motif 2

c c c

ccc i i

i i i

ja a a

a a a

Figure 1: top) Distribution of z-scores, bottom) two sets of motifs from spectral space with a z-score of approximately two and three, respectively

Assessing Motif Significance using z-score

16

17

Overrepresented in Control

Overrepresented in Knock-out

Figure 18: Examples of contrast set phrases. top) Three examples of a phrase ciacia that is overrepresented in KO, appearing 24 times in KO but never in CT. bottom) Two examples of a phrase dccccc that appears 39 times in CT and just twice in KO

Contrast set mining

using information gain

Mining Mouse Vocalizations

Documents

Transcript of Mining Mouse Vocalizations