Music Writer idendify

Writer Identification in Music Score Documents

without Staff-Line Removal

Anirban Jyoti Hati

Dept. of EEE

Birla Institute of Technology

Pilani, India

[email protected]

Partha Pratim Roy

CVPR Unit

Indian Statistical Institute

Kolkata, India

[email protected]

Umapada Pal

CVPR Unit

Indian Statistical Institute

Kolkata, India

[email protected]

Abstract Writer identification from musical scores is a challenging task. A few pieces of work on writer identification in musical sheets have been published in the literature but to the best of our knowledge all these work were performed after removal of staff lines from the musical scores. In this paper we propose a symbol-independent writer identication framework using HMM in music score without removing staff lines. The writing style of each writer is modelled using sliding window based LGH feature. To identify the writer of an input musical sheet, all musical lines are fed to writer specific HMM models and each model return a log-likelihood score for the given input. These log-likelihood scores from each HMM models are compared and the writer corresponding to the maximum score is considered as identified writer of the test sample. Next, a page level log-likelihood score is computed for writer identification in each page sample. We have compared our proposed approach with Gaussian Mixture Models (GMMs) based writer identification system in CVC-MUSCIMA data set. The results obtained from an experiment on 50 writers show that the HMM based approach outperforms GMM based approach.

Keywords- Writer Identification, Music Score Documents,

Gaussian Mixture Model, Hidden Markov Model.

I INTRODUCTION

Writer identification of musical sheets is inevitably a challenging task compared to text documents as number of musical notes in music sheets is lesser than the number of text symbols in text pages. Generally in a music page, notes are written on stave lines. Music sheets also contain other symbols like Clefs, Accidentals, Time signatures, Dynamics, text etc. (See Fig.1). There are few works on writer identification for music scores. In all these works, music sheets were subjected to staff line removal and other pre-processing which ease the identification of writer. To the best of our knowledge we are first to propose writer identification in musical pages without removing staff line or doing any kind of pre-processing.

According to literature survey, initial proposal of writer identification from music scores was given by Bruder et al.

[1]. They extracted features from the collection of music scores and defined a tree structure for each feature for clustering the feature set and K-NN method was used for the purpose. Fornes et al. [4] proposed a writer identification scheme on musical scores after removing staff line. Experiments were performed on 175 music lines from seven writers (each writer contributed 25 music lines) and achieved 95% writer identification. Later, Fornes et al. [5] proposed a writer identification method based on textural features. Gabor features and gray-scale co-occurrence matrices features were extracted from the images after staff line removal. K-NN classifier was used for classification. Gordo et al. [6] proposed a bag of visual terms method to identify the writer of the graphical document of old music scores from symbol analysis. Recently Gordo et al. [16] proposed a writer identification method with Bag of Notes after staff-line removal from music document image.

Fig. 1. Example of a musical sheet showing handwritten music-symbols along with staff-lines.

A writer identification competition on music scores was organized in ICDAR, 2011 [8]. The music sheets having no staff lines were considered. In this competition, Hassane and Al-Maadeed proposed three methods as mentioned in the PRIP02 method: (i)edge-based directional probability distribution features [10], (ii) grapheme features [11], (iii) combination of both edge and grapheme based features. They reported 77% writer identification rate combining both features. Djeddi et al. (TUA03) proposed five methods: (i) A 5 nearest neighbour classifier with city block distance metric, (ii) Support Vector Machine (one against one), (iii) Support

Vector Machine (one against all), (iv) Multilayer perceptron, (v) combination of the four previous classifier. They reported 76% accuracy using method (iii).

In this paper, we propose a symbol-independent writer identication framework using HMM in music score without removing staff lines. A flowchart of our algorithm is presented in Fig.2. The music page is first segmented into music score lines that contain staff lines and music symbols. Next, local gradient histogram (LGH) based feature has been applied in each music-score line to capture the writing style feature. These features are used to construct HMM models for each writer. For identification of an unknown music-sheet, the input image is segmented into music-lines and next these lines are fed to each of the HMM models. HMM returns a log-likelihood score for each writer. Based on these scores the writer of the target music-score line image is decided. Finally, a page-level score is computed from these line-based scores and the writer is identified for that page sample. We have compared our HMM based writer identification approach with GMM models and show that HMM based approach outperforms GMM.

The rest of the paper is organized as follows. In Section II,

we explain feature extraction process from music sheet

image. Section III explains the writer identification approach

using HMM. We demonstrate the performance of our

proposed algorithm on CVC-MUSCIMA dataset in Section

IV. Finally, conclusions and future work are presented.

Fig. 2. Block diagram of the proposed framework

II. FEATURE EXTRACTION

To extract the feature from music lines, we first segment the music-score lines from the musical document. Next, a sliding window based feature processing algorithm is applied. These processes are mentioned in the following sub-sections.

Music-Score Line Segmentation: Each music page is subjected to score line segmentation. For this segmentation task, first music pages are eroded with suitable structural element to remove all of the musical symbols present. Erosion followed by dilation make the presence of staff lines more effective. After these morphological operations, noise

elimination methods are used to eliminate the unwanted noise. Now the variation of intensity in the image provides the positions of the staff lines which help us to segment out the score lines from the original music page. Figure 3 shows segmented out score lines from a page.

Fig. 3. Result of score line segmentation from musical sheet shown in Fig. 1.

LGH Feature: We have used an off-the-shelf Local Gradient Histogram (LGH) feature [13] in our proposed system. In this feature extraction approach, a sliding window traverses the image from left to right in order to produce a sequence of overlapping sub-images. Each window is sub-divided into 4 4 (4 rows and 4 columns) regular cells and from all pixels in each cell a histogram of gradient orientations is calculated.

For feature extraction, at first, a Gaussian filter is applied

to sub-image I(x, y) to obtain the smoothed image S(x, y).

Next, the horizontal and vertical gradient components and of S(x, y) are determined as follows.

(, )= ( + 1, ) ( 1, ) (, )= (, + 1) (, 1)

Then, the gradient magnitude m and direction are obtained for each pixel with coordinates(x, y) as

(, ) = 2 + 2 and (, ) = 1

The field vector = ( , ), is divided into L bin histogram. Each bin specifies a particular octant in the

angular radian space. The histogram is formed by adding

up (, ) to the bin indicated by quantized (, ). For example the concatenation of the 16 histograms of 8 angular

bins provides a 128-dimensional feature vector for each

sliding window position.

III WRITER IDENTIFICATION

A. Line Level writer identification: The writer identification

system is built from HMM based recognizers of handwritten

music-line. HMM is used for writer identification of the

segmented music score-lines.

To identify the writer, we create an HMM for each

writer category. For a classifier of C categories, we choose

the model which best matches the observation from C

HMMs = { , , } ,where m= 1.C, and =1 = 1. This means when a unknown sequence of

unknown category is given, we calculate P ( ) for each HMM and select

, where

deebikaHighlight

= arg max ( )

( |) = (| ) ( )

()

Where, () is the density function for x irrespective of the category and is computed by:

() = (| ) ( )

=1

The term (|) is called the likelihood function for O given . () is called the marginal or prior probability of . The standard solution, performed by the Viterbi algorithm, computes probability (|) of that sequence generated by .

B. Page level writer identification: The recognizer returns a

log-likelihood score from each writer specific HMM models

for a given test line. Let, the log-likelihood score of each

line image be S = {S1, S2.SN} for N writers. The probability P = {P1, P..PN} of the writers choice is thus

calculated by = exp ( ). According to the probability scores, the writers are

ranked as R = {R1, R2.RN}. Fig. 4 shows the distribution of normalized probability scores (NPS) for all four lines of the

music page in Fig 1. We show the NPS score to visualize the

probability distribution in better way. The writer id of Fig.1

is 9. We observe that HMM estimates correct rank for line

1, 3 and 4 but in case of line 2, some other writers (i.e. 6, 15,

25, and 35) have better rank which is wrong.

Fig. 4. NPS distribution for music score line 1 to 4. (For better view of the image, see soft copy of the pdf version)

To avoid the confusion of line wise detection and identify

the original writer of a music page, we assign a weight value

W={W1, W2..WN} to the writers according to their rank (Ri). Next step is to calculate the final scores of the writers.

For m number of score lines corresponding to a music page,

the final score Fi of ith

writer is estimated as

= [

=1

]

where, Pij and Wij are probability score and weight

assignment for jth

line, respectively. We have tried a number

of functions listed in Table I for weight assignment and

found that Inverted distance function provides the best

result. The detailed analysis of result is given in Table II.

Table I: Function for weighted sum calculation

Decay function Description

Uniform function = K Inverted distance

=

Inverted distance

squared =

(2)

Exponential decay = exp (( )) Sinusoidal

= Sin[

(5)]

Linear negative slope = (n + N) The writer having maximum final score max (F1,F2.FN) is considered as the identified writer of that page. Figure 5

shows normalized final score for all writers. The Inverted

distance function is used to compute the page level score.

Fig. 5. Normalized final scores for all writers corresponding to the music page shown in Fig. 1. Blue color indicates scores obtained using weighted function and red denotes scores with uniform function. (For better view of the image, see soft copy of the pdf version)

IV. EXPERIMENTAL RESULTS

Dataset: We have performed our experiment on musical sheets from CVC-MUSCIMA dataset [6, 8]. This dataset

consists of 1,000 music pages written by 50 different

writers. Every writer has 20 different music pages which

show a clear difference of writing style. Adult and expert

musicians from different geographic locations have been

chosen for making this dataset. This ensures a mature and

different writing style for every writer. Here we define the

dataset in a constrained way [16] where the training pieces

of a given fold are same for each writer. The whole dataset is

divided in two parts, one for training and other for testing.

As mentioned earlier, the images are subjected to music-

score line segmentation. All segmented score-lines are

directly fed to feature extraction process. We did not apply

any other pre-processing steps like binarization, noise

removal, staff line removal, text removal, etc. In the

following, we present the performance of writer

identification at line-level using HMM. Next, we discuss the

performance improvement in page level identification.

Finally, we compare the result of page level identification

with GMM based approach.

Line Wise Identification Performance: From each score line image, LGH feature is extracted using a sliding

window. The window of fixed width slides through the

image with 50% overlapping ratio and in each position it

computes feature by dividing the region into n rectangular

cells. After extracting feature from training images of a

particular writer, the obtained sequence of feature vectors

were used to train HMM model for that particular writer.

Thus, 50 HMMs were created for fifty different writers from

the dataset. During testing, from each query test image,

feature sequence is extracted and HMM generates the log-

likelihood score of each writer.

We have tuned the parameters like, feature dimension and,

window width. The feature dimension is varied according to

concentration of histogram and angular information.

Considering orientation as 16 (T=16) we get 256

dimensional feature vector from each window position. The

feature dimension size is adjusted by testing with different

dimensions (i.e. 128, 512). Figure 6 (a) and 6 (b) present the

writer identification accuracy with varying the feature

dimension and size of the window. Highest line wise

identification accuracy was obtained as 59.52% with

window size 34 and feature dimension 256.

(a)

(b) Fig. 6. Identification accuracy for different feature dimension (a) and window size (b) (For better view of the image, see soft copy of the pdf version)

During training of HMMs, the different parameters such as

the number of states and the number of Gaussian

distributions govern the proposed architecture. Fig.7

presents a detailed analysis of performance on datasets that

is observed in terms of Gaussian number and the state

number. From the experiment with the HMM parameters,

we decided 64 Gaussian and 4 states for each model.

(a)

(b)

Fig. 7. Writer identification performance against (a) number of Gaussians and (b) number of states (For better view of the image, see soft copy of the pdf version)

Page Wise Identification Performance: A music page contains multiple score lines. Some of them provide more

information towards correct identification of writer and ease

the task whereas some of them make it more challenging. To

identify the original page level writer, we have applied

different decay functions as explained in section IV B. Table

II presents the performance with different weighted

function. With the inverted distance function, we have obtained the best identification rate. There is a significant

improvement in page level writer identification in fig 6, 7, 8,

and 9 where we have chosen the same parameter set (i.e.

window size, feature dimension, number of states, number

of Gaussians) for line level performance.

Table II: Accuracy of different weight functions for combining line-wise identification scores.

Weight function Accuracy

Uniform function 77.78

Inverted distance 84.13

Inverted distance squared 83.33

Exponential decay 82.54

Sinusoidal 82.54

Linear negative slope 80.16

Fig. 8 shows the writer identification result with different

top choices. Here, Top N denotes that the true writer is

present among the N-best hypotheses. It is to be noted that

with 6 top choices, the page level identification result

reached to 100%. With these 6 choices the line level

performance was 85.76%.

Fig. 8. Writer identification accuracy with different top choices. (For better view of the image, see soft copy of the pdf version)

Error Analysis: Though our methodology offers a good overall accuracy, still identification process is susceptible to

error due to presence of staff lines, less number of music

symbol present in a music page, some part do not contain

any necessary information and some of the music pages

create confusion with other writers etc. Fig 9 shows the

similar writing style of writer 17 and 42 which leads to

misrecognition of the music pages of writer 17.

(a)

(b) Fig. 9. Examples of images (a) from writer 17 and (b) from writer 42 where we received high confusion.

To identify the writers which are more prone towards wrong

identification (fig 10), we have used the following formula

to measure.

(%) = 100 ( )

Where, E (expected accuracy) is the accuracy corresponding

to successful identification of all writers (ideal case). O

(observed accuracy) is the page wise identification. Fig. 10

presents the error analysis of 50 writers from the dataset. It

is to be noticed that writer 1, 4, 17 and 34 have the highest

error percentage of 67%. Overall error percentage (Error) is

15.86%.

Fig. 10. Error in page wise identification for different writers

Comparison with GMM-based Approach: We have compared our HMM-based approach with Gaussian Mixture

Model (GMM) based approach. GMM [12] is used here to

create a model for each music writer. The distribution of the

feature vectors extracted from a persons handwriting is modelled by a Gaussian mixture density. For a D-

dimensional feature vector, x, the mixture density for a

specific writer is defined as

p(x|)= (x).

i=1

The density is a weighted linear combination of M uni-

modal Gaussian densities, pi(x), each parameterized by a

D1 mean vector, i, and a DD covariance matrix, Ci. The parameters of a writers density model are denoted as = {wi, i, Ci}, i = 1,.,M where the mixture weights, wi, sum up to one. We use diagonal covariance matrices [14] in this

paper. During decoding, the feature vectors of X = {x1, .... ,

xT} are assumed to be independent. The log-likelihood of a

model for a sequence of feature vectors X is defined as

log p(X|)= ( |) =1 where p(xt|) is computed according to Equation(1).

In GMM based experiment we retained the same

parameter setup. First, LGH features are extracted from the

music-line images and these are fed to the GMM for writer

identification. Next, page level writers are determined by the

previously explained algorithm. We obtained less than 50%

accuracy with GMM based method.

To measure the scalability of the system, we show ( Fig.11)

how identification performance is dependent on the number

of writers. It is observed that HMM works very well for

lesser number of writers. Up to 7 writers, the accuracies

were 100% using HMM. The line level accuracy gives more

than 80% accuracy with 7 writers. With GMM, the

performance falls down with increasing number of writers.

It indicates HMM to be a better choice. The advantage of

GMMs over HMMs is its lesser training time. GMMs are

less complex, as it consists of only one state and one output

distribution function. There is a significant decrease in

accuracy for more than 10 writers.

Fig. 11. Performance of scalability of HMM and GMM-based writer identification approaches with increasing the number of writers. (For better view of the image, see soft copy of the pdf version)

Comparison with Other System: To compare with the writer identification results in CVC-MUSCIMA dataset, we

find that best result obtained in ICDAR competition was

77%. Recently, Gordo et al. [16] reported 99.7% by using

bag of notes features extracted from segmented music-

symbols. All these existing works preferred staff line

removal which is not always possible for noisy, tampered

and torn music pages. Our proposed approach achieved

84.13% accuracy without removing staff-lines.

V. CONCLUSION AND FUTURE WORK

We have presented here a novel approach for writer

identification without removing staff line or doing any kind

of preprocessing work except music line segmentation. The

methodology is generic and can be used for other datasets.

From the results, it is clear that HMM based approach

performs better than GMM in music score images. For better

identification performance, differentiation of the writers

based on their log-likelihood score is very important. In this

context Inverted distance function performs better compared to others. In future we plan to investigate the

effects of different noises in the experiment. The

performance can also be improved by including other

features along with LGH.

REFERENCE

[1] I. Bruder, T. Ignatova, L. Milewski, Integrating knowledge

components for writer identification in a digital archive of

historical music scores, in: Proceedings of the Joint ACM/IEEE

Conference on Digital Libraries, 2004, pp. 397.

[2] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. In Journal of Royal Statistical Society, 39:138, 1977. [3] H. Melin, J. Koolwaaij, J. Lindberg, and F. Bimbot. A comparative evaluation of variance ooring techniques in HMM

based speaker verication. In Proc. of the 5th Int. Conf. on Spoken Language Processing, pages 23792382, 1998. [4] A. Fornes, J. Llados, G. Sanchez, H. Bunke, Writer

identification in old handwritten music scores, In proc. of the

International Workshop on Document Analysis Systems, 2008, pp.

347 353. [5] A. Fornes, J. Llados, G. Sanchez, H. Bunke, On the use of

textural features for writer identification in old handwritten music

scores, In Proc. of the International Conference on Document

Analysis and Recognition, 2009, pp. 9961000. [6] A. Gordo, A. Fornes, E. Valveny, J. Llados, A bag of notes

approach to writer identification in old handwritten musical scores,

in: Proceedings of the International Workshop on Document

Analysis Systems, 2010, pp. 247 254. [7] A. Fornes, J. Llanos, G. Sanchez, X. Otazu, H. Bunke, A combination of features for symbol-independent writer

identification in old music scores, International Journal on

Document Analysis and Recognition 13 (2010), pp. 243259. [8] A. Fornes, A. Dutta, A. Gordo, J. Llados, The ICDAR 2011

music scores competition: staff removal and writer identification,

in: Proceedings of the International Conference on Document

Analysis and Recognition, 2011, pp. 15111515. [9] C. Hertel and H. Bunke. A set of novel features for writer

identification. In Audio- and Video-Based Biometric Person

Authentication (AVBPA), pp.679687, 2003. [10] S. Al-Maadeed, E. Mohammed and D. Al Kassis, Writer

identification using edge-based directional probability distribution

features for Arabic words, International Conference on Computer

Systems and Applications (AICCSA), pp. 582-590, 2008.

[11] S. Al-Maadeed, A.-A. Al-Kurbi, A. Al-Muslih, R. Al-Qahtani

and H. Al Kubisi, Writer identification of Arabic handwriting

documents using grapheme features, Intl Conf. on Computer Systems and Applications (AICCSA), pp.923-924, 2008.

[12] A. Schlapbach and H.Bunke Off-line Identification Using Gaussian Mixture Models, In Proc. of the International Conference on Pattern Recognition.

[13] J. R. Serrano and F. Perronnin, Handwritten word-spotting using hidden Markov models and universal vocabularies, In Proceedings of the International Conference on Pattern

Recognition.

[14] H. Melin, J. Koolwaaij, J. Lindberg, and F. Bimbot. A comparative evaluation of variance flooring techniques in HMM

based speaker verification. In Proc. of the 5th Int. Conf. on Spoken Language Processing, pages 23792382, 1998. [15] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10:1941, 2000.

[16] A. Gordo, A Fornes, and Ernest Valveny. Writer identification in handwritten musical scores with bag of notes. Pattern Recognition 46(2013) 1337-1346.

Music Writer idendify

Documents

Transcript of Music Writer idendify