DeepLearning forJazz Walking Bass Transcription · PDF fileDeepLearning forJazz Walking Bass...

download DeepLearning forJazz Walking Bass Transcription · PDF fileDeepLearning forJazz Walking Bass Transcription ... nWeimar Jazz Database ... A., “Automatic transcription of melody, bass

If you can't read please download the document

Transcript of DeepLearning forJazz Walking Bass Transcription · PDF fileDeepLearning forJazz Walking Bass...

  • Fraunhofer IDMT

    Jakob Abeer1,3, Stefan Balke2, Klaus Frieler3

    Martin Pfleiderer3, Meinard Mller2

    Deep Learning for Jazz Walking Bass Transcription

    1 Semantic Music Technologies Group, Fraunhofer IDMT, Ilmenau, Germany2 International Audio Laboratories Erlangen, Germany3 Jazzomat Research Project, University of Music Franz Liszt, Weimar, Germany

  • Fraunhofer IDMT 2

    n Example: Miles Davis: So What (Paul Chambers: b)

    n Our assumptions for this work:

    n Quarter notes (mostly chord tones)

    n Representation: beat-wise pitch values

    Motivation What is a Walking Bass Line?

    T

    ri A

    gu

    sN

    ura

    dh

    im

    D

    Dm7 (D, F, A, C)

    C A F A D F A D A D A F AF

  • Fraunhofer IDMT 3

    Motivation How is this useful?

    n Harmonic analysis

    n Composition (lead sheet) vs. actual performance

    n Polyphonic transcription from ensemble recordings is challenging

    n Walking bass line can provide first clues about local harmonic changes

    n Features for style & performer classification

  • Fraunhofer IDMT 4

    Problem Setting

    n Challenges

    n Bass is typically not salient

    n Overlap with drums (bass drum) and piano (lower register)

    n High variability of recording quality and playing styles

    n Example: Lester Young Body and Soul

    n Goals

    n Train a DNN to extract bass pitch saliency representation

    n Postprocessing: manual beat-annotations for beat-wise bass pitch

  • Fraunhofer IDMT 5

    Outline

    n Dataset

    n Approach

    n Bass Saliency Mapping

    n Semi-Supervised Model Training

    n Beat-Informed Late Fusion

    n Evaluation

    n Results

    n Summary & Outlook

  • Fraunhofer IDMT 6

    Dataset

    n Weimar Jazz Database (WJD) [1]

    n 456 high-quality jazz solo transcriptions

    n Annotations: solo melody, beats, chords, segments (phrase, chorus ...)

    n 41 files with bass annotations

    n Data augmentation(+): Pitch-shifting +/- 1 semitone (sox library [2])

  • Fraunhofer IDMT 7

    Bass-Saliency Mapping

    n Data-driven approach

    n Use Deep Neural Network (DNN) to learn mapping from magnitudespectrogram to bass saliency representation

    n Spectral Analysis

    n Resampling to 22.05 kHz

    n Constant-Q magnitude spectrogram (librosa [3])

    n Pitch range 28 (41.2 Hz) 67 (392 Hz)

    n Multilabel classification

    n Input dimensions: 40 * NContextFramesn Output dimensions: 40

    n Learning Target: Bass pitch annotations from the WJD

  • Fraunhofer IDMT 8

    DNN Hyperparameter

    n Layer-wise training [4, 5]

    n Least-squares estimate for weight & bias initialization

    n (3-5) fully connected layers, MSE loss

    n Frame-stacking (3-5 context frames) & feature normalization

    n Activation functions: ReLU, Sigmoid (final layer)

    n Dropout & L2 weight regularization

    n Adadelta optimizer

    n Mini-batch size = 500

    n 500 epochs / layer

    n learning rate = 1

  • Fraunhofer IDMT 9

    Semi-Supervised Training

    n Goal: Select prediction on unseen data as additional training data

    F: FeaturesT: Targets

  • Fraunhofer IDMT 10

    Semi-Supervised TrainingSparsity-based Selection

    n Train model on labelled dataset D1+ (3899 notes)

    n Predictions on unlabelled dataset D2+ (11697 notes)

    n Select additional training data via sparsity greater than threshold t

    n Re-training

  • Fraunhofer IDMT 11

    Beat-Informed Late Fusion

    n Use manual beat-annotations from the Weimar Jazz Database

    n Find most salient pitch per beat

    Beats

    Frames

    1 2 3 4

  • Fraunhofer IDMT 12

    Evaluation

    n Use manual beat-annotations from the Weimar Jazz Database

    n Compare against state-of-the-art bass transcription algorithms

    n D: Dittmar, Dressler, and Rosenbauer [8]

    n SG: Salamon, Serr, and Gmez [9]

    n RK: Ryynnen and Klapuri [7]

  • Fraunhofer IDMT 13

    Example

    n Chet Baker: Lets Get Lost (0:04 0:09)

    Initial model

    M1 - without data aug.M1+ - with data aug.

    Semi-supervised learning

    M20,+ - t0M21,+ - t1M22,+ - t2M23,+ - t3

    D - Dittmar et al.SG - Salamon et al.RK - Ryynnen & Klapuri

    M1+

  • Fraunhofer IDMT 14

    Results

  • Fraunhofer IDMT 15

    Summary

    n Data-driven approach seems to enhance non-salient instruments.

    n Beneficial

    n Data augmentation & dataset enlargement

    n Frame stacking (stable bass pitches)

    n Beat-informed late fusion

    n Semi-supervised training did not improve accuracy but made bass-saliencymaps sparser

    n Model is limited to training sets pitch range

  • Fraunhofer IDMT 16

    Acknowledgements

    n Thanks to the authors for sharing bass transcription algorithms / results.

    n Thanks to the students for transcribing the bass lines

    n German Research Foundation

    n DFG-PF 669/7-2 : Jazzomat Research Project (2012-2017)

    n DFG MU 2686/6-1: Stefan Balke and Meinard Mller

  • Fraunhofer IDMT 17

    References

    [1] Weimar Jazz Database: http://jazzomat.hfm-weimar.de[2] sox http://sox.soundforge.net[3] McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., and Nieto, O., librosa: Audio

    and Music Signal Analysis in Python, in Proc. of the Scientific Computing with Python conference (Scipy), Austin, Texas, 2015.

    [4] Uhlich, S., Giron, F., and Mitsufuji, Y., Deep neural network based instrument extraction from music, in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 21352139, Brisbane, Australia, 2015.

    [5] Balke, S., Dittmar, C., Abeer, J., and Muller, M., Data-Driven Solo Voice Enhancement for Jazz Music Retrieval, in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, 2017.

    [6] Hoyer, P.O., Non-negative Matrix Factorization with Sparseness Constraints, J. of Machine Learning Research, 5, pp. 14571469, 2004.

    [7] Ryynanen, M.P. and Klapuri, A., Automatic transcription of melody, bass line, and chords in polyphonic music, Computer Music J., 32, pp. 7286, 2008

    [8] Dittmar, C., Dressler, K., and Rosenbauer, K., A Toolbox for Automatic Transcription of Polyphonic Music, Proc. of the Audio Mostly Conf., pp. 5865, 2007.

    [9] Salamon, J., Serr, J., and Gmez, E., Tonal Representations for Music Retrieval: From Version Identification to Query-by-Humming, Int. J. of Multimedia Inf. Retrieval, 2, pp. 4558, 2013

  • Fraunhofer IDMT 18

    n Jazzomat Research Project & Weimar Jazz Database

    n http://jazzomat.hfm-weimar.de/

    n Python code and trained model available

    n https://github.com/jakobabesser/walking_bass_transcription_dnn

    n Additional online demos

    n https://www.audiolabs-erlangen.de/resources/MIR/2017-AES-WalkingBassTranscription

    Thank You!