Music Writer idendify

download Music Writer idendify

of 6

description

writer identification on musical script

Transcript of Music Writer idendify

  • Writer Identification in Music Score Documents

    without Staff-Line Removal

    Anirban Jyoti Hati

    Dept. of EEE

    Birla Institute of Technology

    Pilani, India

    [email protected]

    Partha Pratim Roy

    CVPR Unit

    Indian Statistical Institute

    Kolkata, India

    [email protected]

    Umapada Pal

    CVPR Unit

    Indian Statistical Institute

    Kolkata, India

    [email protected]

    Abstract Writer identification from musical scores is a challenging task. A few pieces of work on writer identification in musical sheets have been published in the literature but to the best of our knowledge all these work were performed after removal of staff lines from the musical scores. In this paper we propose a symbol-independent writer identication framework using HMM in music score without removing staff lines. The writing style of each writer is modelled using sliding window based LGH feature. To identify the writer of an input musical sheet, all musical lines are fed to writer specific HMM models and each model return a log-likelihood score for the given input. These log-likelihood scores from each HMM models are compared and the writer corresponding to the maximum score is considered as identified writer of the test sample. Next, a page level log-likelihood score is computed for writer identification in each page sample. We have compared our proposed approach with Gaussian Mixture Models (GMMs) based writer identification system in CVC-MUSCIMA data set. The results obtained from an experiment on 50 writers show that the HMM based approach outperforms GMM based approach.

    Keywords- Writer Identification, Music Score Documents,

    Gaussian Mixture Model, Hidden Markov Model.

    I INTRODUCTION

    Writer identification of musical sheets is inevitably a challenging task compared to text documents as number of musical notes in music sheets is lesser than the number of text symbols in text pages. Generally in a music page, notes are written on stave lines. Music sheets also contain other symbols like Clefs, Accidentals, Time signatures, Dynamics, text etc. (See Fig.1). There are few works on writer identification for music scores. In all these works, music sheets were subjected to staff line removal and other pre-processing which ease the identification of writer. To the best of our knowledge we are first to propose writer identification in musical pages without removing staff line or doing any kind of pre-processing.

    According to literature survey, initial proposal of writer identification from music scores was given by Bruder et al.

    [1]. They extracted features from the collection of music scores and defined a tree structure for each feature for clustering the feature set and K-NN method was used for the purpose. Fornes et al. [4] proposed a writer identification scheme on musical scores after removing staff line. Experiments were performed on 175 music lines from seven writers (each writer contributed 25 music lines) and achieved 95% writer identification. Later, Fornes et al. [5] proposed a writer identification method based on textural features. Gabor features and gray-scale co-occurrence matrices features were extracted from the images after staff line removal. K-NN classifier was used for classification. Gordo et al. [6] proposed a bag of visual terms method to identify the writer of the graphical document of old music scores from symbol analysis. Recently Gordo et al. [16] proposed a writer identification method with Bag of Notes after staff-line removal from music document image.

    Fig. 1. Example of a musical sheet showing handwritten music-symbols along with staff-lines.

    A writer identification competition on music scores was organized in ICDAR, 2011 [8]. The music sheets having no staff lines were considered. In this competition, Hassane and Al-Maadeed proposed three methods as mentioned in the PRIP02 method: (i)edge-based directional probability distribution features [10], (ii) grapheme features [11], (iii) combination of both edge and grapheme based features. They reported 77% writer identification rate combining both features. Djeddi et al. (TUA03) proposed five methods: (i) A 5 nearest neighbour classifier with city block distance metric, (ii) Support Vector Machine (one against one), (iii) Support

  • Vector Machine (one against all), (iv) Multilayer perceptron, (v) combination of the four previous classifier. They reported 76% accuracy using method (iii).

    In this paper, we propose a symbol-independent writer identication framework using HMM in music score without removing staff lines. A flowchart of our algorithm is presented in Fig.2. The music page is first segmented into music score lines that contain staff lines and music symbols. Next, local gradient histogram (LGH) based feature has been applied in each music-score line to capture the writing style feature. These features are used to construct HMM models for each writer. For identification of an unknown music-sheet, the input image is segmented into music-lines and next these lines are fed to each of the HMM models. HMM returns a log-likelihood score for each writer. Based on these scores the writer of the target music-score line image is decided. Finally, a page-level score is computed from these line-based scores and the writer is identified for that page sample. We have compared our HMM based writer identification approach with GMM models and show that HMM based approach outperforms GMM.

    The rest of the paper is organized as follows. In Section II,

    we explain feature extraction process from music sheet

    image. Section III explains the writer identification approach

    using HMM. We demonstrate the performance of our

    proposed algorithm on CVC-MUSCIMA dataset in Section

    IV. Finally, conclusions and future work are presented.

    Fig. 2. Block diagram of the proposed framework

    II. FEATURE EXTRACTION

    To extract the feature from music lines, we first segment the music-score lines from the musical document. Next, a sliding window based feature processing algorithm is applied. These processes are mentioned in the following sub-sections.

    Music-Score Line Segmentation: Each music page is subjected to score line segmentation. For this segmentation task, first music pages are eroded with suitable structural element to remove all of the musical symbols present. Erosion followed by dilation make the presence of staff lines more effective. After these morphological operations, noise

    elimination methods are used to eliminate the unwanted noise. Now the variation of intensity in the image provides the positions of the staff lines which help us to segment out the score lines from the original music page. Figure 3 shows segmented out score lines from a page.

    Fig. 3. Result of score line segmentation from musical sheet shown in Fig. 1.

    LGH Feature: We have used an off-the-shelf Local Gradient Histogram (LGH) feature [13] in our proposed system. In this feature extraction approach, a sliding window traverses the image from left to right in order to produce a sequence of overlapping sub-images. Each window is sub-divided into 4 4 (4 rows and 4 columns) regular cells and from all pixels in each cell a histogram of gradient orientations is calculated.

    For feature extraction, at first, a Gaussian filter is applied

    to sub-image I(x, y) to obtain the smoothed image S(x, y).

    Next, the horizontal and vertical gradient components and of S(x, y) are determined as follows.

    (, )= ( + 1, ) ( 1, ) (, )= (, + 1) (, 1)

    Then, the gradient magnitude m and direction are obtained for each pixel with coordinates(x, y) as

    (, ) = 2 + 2 and (, ) = 1

    The field vector = ( , ), is divided into L bin histogram. Each bin specifies a particular octant in the

    angular radian space. The histogram is formed by adding

    up (, ) to the bin indicated by quantized (, ). For example the concatenation of the 16 histograms of 8 angular

    bins provides a 128-dimensional feature vector for each

    sliding window position.

    III WRITER IDENTIFICATION

    A. Line Level writer identification: The writer identification

    system is built from HMM based recognizers of handwritten

    music-line. HMM is used for writer identification of the

    segmented music score-lines.

    To identify the writer, we create an HMM for each

    writer category. For a classifier of C categories, we choose

    the model which best matches the observation from C

    HMMs = { , , } ,where m= 1.C, and =1 = 1. This means when a unknown sequence of

    unknown category is given, we calculate P ( ) for each HMM and select

    , where

    deebikaHighlight

  • = arg max ( )

    ( |) = (| ) ( )

    ()

    Where, () is the density function for x irrespective of the category and is computed by:

    () = (| ) ( )

    =1

    The term (|) is called the likelihood function for O given . () is called the marginal or prior probability of . The standard solution, performed by the Viterbi algorithm, computes probability (|) of that sequence generated by .

    B. Page level writer identification: The recognizer returns a

    log-likelihood score from each writer specific HMM models

    for a given test line. Let, the log-likelihood score of each

    line image be S = {S1, S2.SN} for N writers. The probability P = {P1, P..PN} of the writers choice is thus

    calculated by = exp ( ). According to the probability scores, the writers are

    ranked as R = {R1, R2.RN}. Fig. 4 shows the distribution of normalized probability scores (NPS) for all four lines of the

    music page in Fig 1. We show the NPS score to visualize the

    probability distribution in better way. The writer id of Fig.1

    is 9. We observe that HMM estimates correct rank for line

    1, 3 and 4 but in case of line 2, some other writers (i.e. 6, 15,

    25, and 35) have better rank which is wrong.

    Fig. 4. NPS distribution for music score line 1 to 4. (For better view of the image, see soft copy of the pdf version)

    To avoid the confusion of line wise detection and identify

    the original writer of a music page, we assign a weight value

    W={W1, W2..WN} to the writers according to their rank (Ri). Next step is to calculate the final scores of the writers.

    For m number of score lines corresponding to a music page,

    the final score Fi of ith

    writer is estimated as

    = [

    =1

    ]

    where, Pij and Wij are probability score and weight

    assignment for jth

    line, respectively. We have tried a number

    of functions listed in Table I for weight assignment and

    found that Inverted distance function provides the best

    result. The detailed analysis of result is given in Table II.

    Table I: Function for weighted sum calculation

    Decay function Description

    Uniform function = K Inverted distance

    =

    Inverted distance

    squared =

    (2)

    Exponential decay = exp (( )) Sinusoidal

    = Sin[

    (5)]

    Linear negative slope = (n + N) The writer having maximum final score max (F1,F2.FN) is considered as the identified writer of that page. Figure 5

    shows normalized final score for all writers. The Inverted

    distance function is used to compute the page level score.

    Fig. 5. Normalized final scores for all writers corresponding to the music page shown in Fig. 1. Blue color indicates scores obtained using weighted function and red denotes scores with uniform function. (For better view of the image, see soft copy of the pdf version)

    IV. EXPERIMENTAL RESULTS

    Dataset: We have performed our experiment on musical sheets from CVC-MUSCIMA dataset [6, 8]. This dataset

    consists of 1,000 music pages written by 50 different

    writers. Every writer has 20 different music pages which

    show a clear difference of writing style. Adult and expert

    musicians from different geographic locations have been

    chosen for making this dataset. This ensures a mature and

    different writing style for every writer. Here we define the

    dataset in a constrained way [16] where the training pieces

    of a given fold are same for each writer. The whole dataset is

    divided in two parts, one for training and other for testing.

    As mentioned earlier, the images are subjected to music-

    score line segmentation. All segmented score-lines are

    directly fed to feature extraction process. We did not apply

    any other pre-processing steps like binarization, noise

    removal, staff line removal, text removal, etc. In the

  • following, we present the performance of writer

    identification at line-level using HMM. Next, we discuss the

    performance improvement in page level identification.

    Finally, we compare the result of page level identification

    with GMM based approach.

    Line Wise Identification Performance: From each score line image, LGH feature is extracted using a sliding

    window. The window of fixed width slides through the

    image with 50% overlapping ratio and in each position it

    computes feature by dividing the region into n rectangular

    cells. After extracting feature from training images of a

    particular writer, the obtained sequence of feature vectors

    were used to train HMM model for that particular writer.

    Thus, 50 HMMs were created for fifty different writers from

    the dataset. During testing, from each query test image,

    feature sequence is extracted and HMM generates the log-

    likelihood score of each writer.

    We have tuned the parameters like, feature dimension and,

    window width. The feature dimension is varied according to

    concentration of histogram and angular information.

    Considering orientation as 16 (T=16) we get 256

    dimensional feature vector from each window position. The

    feature dimension size is adjusted by testing with different

    dimensions (i.e. 128, 512). Figure 6 (a) and 6 (b) present the

    writer identification accuracy with varying the feature

    dimension and size of the window. Highest line wise

    identification accuracy was obtained as 59.52% with

    window size 34 and feature dimension 256.

    (a)

    (b) Fig. 6. Identification accuracy for different feature dimension (a) and window size (b) (For better view of the image, see soft copy of the pdf version)

    During training of HMMs, the different parameters such as

    the number of states and the number of Gaussian

    distributions govern the proposed architecture. Fig.7

    presents a detailed analysis of performance on datasets that

    is observed in terms of Gaussian number and the state

    number. From the experiment with the HMM parameters,

    we decided 64 Gaussian and 4 states for each model.

    (a)

    (b)

    Fig. 7. Writer identification performance against (a) number of Gaussians and (b) number of states (For better view of the image, see soft copy of the pdf version)

    Page Wise Identification Performance: A music page contains multiple score lines. Some of them provide more

    information towards correct identification of writer and ease

    the task whereas some of them make it more challenging. To

    identify the original page level writer, we have applied

    different decay functions as explained in section IV B. Table

    II presents the performance with different weighted

    function. With the inverted distance function, we have obtained the best identification rate. There is a significant

    improvement in page level writer identification in fig 6, 7, 8,

    and 9 where we have chosen the same parameter set (i.e.

    window size, feature dimension, number of states, number

    of Gaussians) for line level performance.

    Table II: Accuracy of different weight functions for combining line-wise identification scores.

    Weight function Accuracy

    Uniform function 77.78

    Inverted distance 84.13

    Inverted distance squared 83.33

    Exponential decay 82.54

    Sinusoidal 82.54

    Linear negative slope 80.16

  • Fig. 8 shows the writer identification result with different

    top choices. Here, Top N denotes that the true writer is

    present among the N-best hypotheses. It is to be noted that

    with 6 top choices, the page level identification result

    reached to 100%. With these 6 choices the line level

    performance was 85.76%.

    Fig. 8. Writer identification accuracy with different top choices. (For better view of the image, see soft copy of the pdf version)

    Error Analysis: Though our methodology offers a good overall accuracy, still identification process is susceptible to

    error due to presence of staff lines, less number of music

    symbol present in a music page, some part do not contain

    any necessary information and some of the music pages

    create confusion with other writers etc. Fig 9 shows the

    similar writing style of writer 17 and 42 which leads to

    misrecognition of the music pages of writer 17.

    (a)

    (b) Fig. 9. Examples of images (a) from writer 17 and (b) from writer 42 where we received high confusion.

    To identify the writers which are more prone towards wrong

    identification (fig 10), we have used the following formula

    to measure.

    (%) = 100 ( )

    Where, E (expected accuracy) is the accuracy corresponding

    to successful identification of all writers (ideal case). O

    (observed accuracy) is the page wise identification. Fig. 10

    presents the error analysis of 50 writers from the dataset. It

    is to be noticed that writer 1, 4, 17 and 34 have the highest

    error percentage of 67%. Overall error percentage (Error) is

    15.86%.

    Fig. 10. Error in page wise identification for different writers

    Comparison with GMM-based Approach: We have compared our HMM-based approach with Gaussian Mixture

    Model (GMM) based approach. GMM [12] is used here to

    create a model for each music writer. The distribution of the

    feature vectors extracted from a persons handwriting is modelled by a Gaussian mixture density. For a D-

    dimensional feature vector, x, the mixture density for a

    specific writer is defined as

    p(x|)= (x).

    i=1

    The density is a weighted linear combination of M uni-

    modal Gaussian densities, pi(x), each parameterized by a

    D1 mean vector, i, and a DD covariance matrix, Ci. The parameters of a writers density model are denoted as = {wi, i, Ci}, i = 1,.,M where the mixture weights, wi, sum up to one. We use diagonal covariance matrices [14] in this

    paper. During decoding, the feature vectors of X = {x1, .... ,

    xT} are assumed to be independent. The log-likelihood of a

    model for a sequence of feature vectors X is defined as

    log p(X|)= ( |) =1 where p(xt|) is computed according to Equation(1).

    In GMM based experiment we retained the same

    parameter setup. First, LGH features are extracted from the

    music-line images and these are fed to the GMM for writer

    identification. Next, page level writers are determined by the

    previously explained algorithm. We obtained less than 50%

    accuracy with GMM based method.

    To measure the scalability of the system, we show ( Fig.11)

    how identification performance is dependent on the number

    of writers. It is observed that HMM works very well for

    lesser number of writers. Up to 7 writers, the accuracies

    were 100% using HMM. The line level accuracy gives more

    than 80% accuracy with 7 writers. With GMM, the

    performance falls down with increasing number of writers.

    It indicates HMM to be a better choice. The advantage of

    GMMs over HMMs is its lesser training time. GMMs are

    less complex, as it consists of only one state and one output

  • distribution function. There is a significant decrease in

    accuracy for more than 10 writers.

    Fig. 11. Performance of scalability of HMM and GMM-based writer identification approaches with increasing the number of writers. (For better view of the image, see soft copy of the pdf version)

    Comparison with Other System: To compare with the writer identification results in CVC-MUSCIMA dataset, we

    find that best result obtained in ICDAR competition was

    77%. Recently, Gordo et al. [16] reported 99.7% by using

    bag of notes features extracted from segmented music-

    symbols. All these existing works preferred staff line

    removal which is not always possible for noisy, tampered

    and torn music pages. Our proposed approach achieved

    84.13% accuracy without removing staff-lines.

    V. CONCLUSION AND FUTURE WORK

    We have presented here a novel approach for writer

    identification without removing staff line or doing any kind

    of preprocessing work except music line segmentation. The

    methodology is generic and can be used for other datasets.

    From the results, it is clear that HMM based approach

    performs better than GMM in music score images. For better

    identification performance, differentiation of the writers

    based on their log-likelihood score is very important. In this

    context Inverted distance function performs better compared to others. In future we plan to investigate the

    effects of different noises in the experiment. The

    performance can also be improved by including other

    features along with LGH.

    REFERENCE

    [1] I. Bruder, T. Ignatova, L. Milewski, Integrating knowledge

    components for writer identification in a digital archive of

    historical music scores, in: Proceedings of the Joint ACM/IEEE

    Conference on Digital Libraries, 2004, pp. 397.

    [2] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. In Journal of Royal Statistical Society, 39:138, 1977. [3] H. Melin, J. Koolwaaij, J. Lindberg, and F. Bimbot. A comparative evaluation of variance ooring techniques in HMM

    based speaker verication. In Proc. of the 5th Int. Conf. on Spoken Language Processing, pages 23792382, 1998. [4] A. Fornes, J. Llados, G. Sanchez, H. Bunke, Writer

    identification in old handwritten music scores, In proc. of the

    International Workshop on Document Analysis Systems, 2008, pp.

    347 353. [5] A. Fornes, J. Llados, G. Sanchez, H. Bunke, On the use of

    textural features for writer identification in old handwritten music

    scores, In Proc. of the International Conference on Document

    Analysis and Recognition, 2009, pp. 9961000. [6] A. Gordo, A. Fornes, E. Valveny, J. Llados, A bag of notes

    approach to writer identification in old handwritten musical scores,

    in: Proceedings of the International Workshop on Document

    Analysis Systems, 2010, pp. 247 254. [7] A. Fornes, J. Llanos, G. Sanchez, X. Otazu, H. Bunke, A combination of features for symbol-independent writer

    identification in old music scores, International Journal on

    Document Analysis and Recognition 13 (2010), pp. 243259. [8] A. Fornes, A. Dutta, A. Gordo, J. Llados, The ICDAR 2011

    music scores competition: staff removal and writer identification,

    in: Proceedings of the International Conference on Document

    Analysis and Recognition, 2011, pp. 15111515. [9] C. Hertel and H. Bunke. A set of novel features for writer

    identification. In Audio- and Video-Based Biometric Person

    Authentication (AVBPA), pp.679687, 2003. [10] S. Al-Maadeed, E. Mohammed and D. Al Kassis, Writer

    identification using edge-based directional probability distribution

    features for Arabic words, International Conference on Computer

    Systems and Applications (AICCSA), pp. 582-590, 2008.

    [11] S. Al-Maadeed, A.-A. Al-Kurbi, A. Al-Muslih, R. Al-Qahtani

    and H. Al Kubisi, Writer identification of Arabic handwriting

    documents using grapheme features, Intl Conf. on Computer Systems and Applications (AICCSA), pp.923-924, 2008.

    [12] A. Schlapbach and H.Bunke Off-line Identification Using Gaussian Mixture Models, In Proc. of the International Conference on Pattern Recognition.

    [13] J. R. Serrano and F. Perronnin, Handwritten word-spotting using hidden Markov models and universal vocabularies, In Proceedings of the International Conference on Pattern

    Recognition.

    [14] H. Melin, J. Koolwaaij, J. Lindberg, and F. Bimbot. A comparative evaluation of variance flooring techniques in HMM

    based speaker verification. In Proc. of the 5th Int. Conf. on Spoken Language Processing, pages 23792382, 1998. [15] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10:1941, 2000.

    [16] A. Gordo, A Fornes, and Ernest Valveny. Writer identification in handwritten musical scores with bag of notes. Pattern Recognition 46(2013) 1337-1346.