MuseGAN: Multi-track Sequential Generative Adversarial ......cross-modal temporal interdependency...

35
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment Hao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang, Yi-Hsuan Yang Research Center of IT Innovation, Academia Sinica *these authors contributed equally to this work

Transcript of MuseGAN: Multi-track Sequential Generative Adversarial ......cross-modal temporal interdependency...

  • MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentHao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang, Yi-Hsuan YangResearch Center of IT Innovation, Academia Sinica

    *these authors contributed equally to this work

  • Outline。Goals & Challenges。Data。Proposed Model。Results & Evaluation。Future Works

    2

    Source Code https://github.com/salu133445/museganDemo Page https://salu133445.github.io/musegan/

    https://github.com/salu133445/museganhttps://salu133445.github.io/musegan/

  • Goals & Challenges

  • [Source Code]https://github.com/salu133445/musegan

    [Demo Page]https://salu133445.github.io/musegan/

    Goals

    Generate pop music。of multiple tracks

    。in piano-roll format

    。using GAN with CNNs4

  • Challenge IMultitrack Interdependency

    5

    vocal

    piano

    bass

    drums

    strings

    music & clip by phycause

    Multi-track GAN

  • Challenge IIMusic Texture

    6

    melody

    chord(harmony)

  • Challenge IIMusic Texture

    7

    melody

    chord(harmony)

    Convolutional Neural Networks

  • Challenge IIITemporal Structure

    8

    paragraph 1 paragraph 2 paragraph 3

    phrase 1 phrase 2 phrase 3 phrase 4

    bar 1 bar 2 bar 3 bar 4

    beat 1 beat 2 beat 3 beat 4

    step 1 step 2 ··· step 24

    song

    phrase 2

    4/4 time

  • Challenge IIITemporal Structure

    9

    bar 1 bar 2 bar 3 bar 4

    beat 1 beat 2 beat 3 beat 4

    step 1 step 2 ··· step 24

    phrase 2 Fixed Structure

    4/4 time

    Convolutional Neural Networks

  • Data

  • Data Representation

    11

    pitch

    time

    Bar 1 Bar 2 Bar 3 Bar 4time step

    Piano-roll

    polyphonic multi-track

    (with symbolic timing)

  • Data Representation

    12

    pitch

    time

    Bar 1 Bar 2 Bar 3 Bar 4

    A3

    t0 t1

    Piano-roll

    polyphonic multi-track

    (with symbolic timing)

  • Data Representation

    13

    Multi-track Piano-roll

    pitch

    time

    tracks

    polyphonic multi-track

    (with symbolic timing)

  • Data Representation

    14

    96 time steps

    84pitches 5 tracks

    4 bars

    a 4×96×84×5 tensor

    Drums

    GuitarPiano

    Strings

    Bass

  • Data

    LPD (Lakh Pianoroll Dataset)。>170,000 multi-track piano-rolls。Derived from Lakh MIDI Dataset。Mainly pop songs

    Pypianoroll (Python package)。Manipulation & Visualization。Efficient I/O。Parse/Write MIDI files。On PYPI (pip installable)

    15

    [Dataset]https://salu133445.github.io/lakh-pianoroll-dataset

    [Pypianoroll]https://salu133445.github.io/pypianoroll/

  • Proposed Model

  • D 1/0

    real samples

    Gz~pz G(z)

    random noise fake samples

    Generative Adversarial Networks

    17

    X~pX

  • D 1/0

    log(1-D(X)) + log(D(G(z)))

    log(1-D(G(z)))

    GeneratorMake G(z) indistinguishable

    from real data for DDiscriminator

    Tell G(z) as fake data from X being real ones

    real samples

    Gz~pz G(z)

    random noise fake samples

    Generative Adversarial Networks

    18

    X~pX

    4-bar phrases of 5 tracks

  • MuseGAN – An Overview

    19

    Gtemp

    4 latent variables1 random noise

    TemporalGenerator

    BarGenerator

    4 piano-roll matrices

    Gbar

  • Bar Generator

    Generator

    20

    zzzzz

    zzzz

    zzzz

    GGGGG

  • Generator

    21

    z

    Bar Generator

    zzzzzzzzz

    zzzz

    GGGGGNo Coordination

    Coordination

    track-dependent

    track-independent

  • zzzzz

    Generator

    22

    z

    Bar GeneratorGz

    GGGGG

    zzzz

    zzzzzzzzz

    zzzz

    GGGGG

    Temporal Generator

  • zzzzz

    Generator

    23

    z

    Bar GeneratorGz

    GGGGG

    zzzz

    zzzzzzzzz

    zzzz

    GGGGG

    TimeDependent Independent

    TrackDependent Melody Groove

    Independent Chords Style

  • MuseGAN

    24

  • Network Architecture

    25

    zzzzz

    z

    Bar GeneratorGz

    GGGGG

    zzzz

    zzzzzzzzz

    zzzz

    GGGGG

    Input 32dense 1024reshape to 2, 1 × 512 channels 2, 1, 512transconv 512 2 × 1 2, 1 4, 1, 512transconv 256 2 × 1 2, 1 8, 1, 256transconv 256 2 × 1 2, 1 16, 1, 256transconv 128 2 × 1 2, 1 32, 1, 128transconv 128 3 × 1 3, 1 96, 1, 128transconv 64 1 × 7 1, 7 96, 7, 64transconv 𝑀𝑀 1 × 12 1, 12 96, 84,𝑀𝑀Output 96, 84 × 𝑀𝑀 channels

    Input 32transconv 1024 2 1 3, 1024transconv 32 3 1 4, 32Output 32

    Temporal Generator

  • Results

  • Results

    27

    More samples available on demo pagehttps://salu133445.github.io/musegan/

    Sample 1 Sample 2

    BassDrumsGuitarStringsPiano

    Step 0 Step 700 Step 2500 Step 7900

    Drum pattern

    Chords

    Bass LineBass

    Drums

    Guitar

    Strings

    Piano

    https://salu133445.github.io/musegan/

  • step2000 4000 6000 800010

    4

    106

    108

    1010

    1012

    0

    Negative Critic Loss

    Neg

    ativ

    e C

    ritic

    Los

    s

    Objective Metrics

    28

    # of Used Pitch Classes

    # of

    Use

    d Pi

    tch

    Cla

    sses

    step

    Qualified Note Rate

    Qua

    lifie

    d N

    ote

    Rat

    e

    step

    Monitor the Training

  • User Study

    29

    H: harmoniousR: rhythmicMS: musically structuredC: coherentOR: overall rating

    composer jamming hybrid

  • Accompaniment System

    30

    Generation from Scratch nothing 5-track

    Accompaniment System single-track 5-track

    Conditional GAN

  • Summary。MuseGAN

    。a novel GAN for multi-track sequence generation。multi-track, polyphonic music。human-AI cooperative scenario

    。Lakh Pianoroll Dataset (LPD) (new dataset)

    。Pypianoroll (new Python package)

    31

  • Future Works

  • Future Works

    Full Song GenerationChallenges。hierarchical temporal structure。variable-length sequence generation

    33

    bar 1 bar 2 bar 3 bar 4

    beat 1 beat 2 beat 3 beat 4

    step 1 step 2 ··· step 24

    phrase 2

    paragraph 1 paragraph 2 paragraph 3

    phrase 1 phrase 2 phrase 3 phrase 4

    song

  • Future Works

    Cross-modal GenerationChallenge。cross-modal temporal interdependency

    Applications in Music。music + lyrics

    。music + video

    34

  • Q&AMuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentHao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, Yi-Hsuan Yang

    Source Code https://github.com/salu133445/museganDemo Page https://salu133445.github.io/musegan/

    zzzzz

    z

    Bar GeneratorGz

    GGGGG

    zzzz

    zzzzzzzzz

    zzzz

    GGGGG

    https://github.com/salu133445/museganhttps://salu133445.github.io/musegan/

    MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentOutlineGoals & ChallengesGoalsChallenge I�Multitrack InterdependencyChallenge II�Music TextureChallenge II�Music TextureChallenge III�Temporal StructureChallenge III�Temporal StructureDataData RepresentationData RepresentationData RepresentationData RepresentationDataProposed ModelGenerative Adversarial NetworksGenerative Adversarial NetworksMuseGAN – An OverviewGeneratorGeneratorGeneratorGeneratorMuseGANNetwork ArchitectureResultsResultsObjective MetricsUser StudyAccompaniment SystemSummaryFuture WorksFuture WorksFuture WorksQ&A