MuseGAN: Multi-track Sequential Generative Adversarial ......cross-modal temporal interdependency...

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentHao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang, Yi-Hsuan YangResearch Center of IT Innovation, Academia Sinica

*these authors contributed equally to this work

Outline。Goals & Challenges。Data。Proposed Model。Results & Evaluation。Future Works

2

Source Code https://github.com/salu133445/museganDemo Page https://salu133445.github.io/musegan/

https://github.com/salu133445/museganhttps://salu133445.github.io/musegan/

Goals & Challenges

[Source Code]https://github.com/salu133445/musegan

[Demo Page]https://salu133445.github.io/musegan/

Goals

Generate pop music。of multiple tracks

。in piano-roll format

。using GAN with CNNs4

Challenge IMultitrack Interdependency

5

vocal

piano

bass

drums

strings

music & clip by phycause

Multi-track GAN

Challenge IIMusic Texture

6

melody

chord(harmony)

Challenge IIMusic Texture

7

melody

chord(harmony)

Convolutional Neural Networks

Challenge IIITemporal Structure

8

paragraph 1 paragraph 2 paragraph 3

phrase 1 phrase 2 phrase 3 phrase 4

bar 1 bar 2 bar 3 bar 4

beat 1 beat 2 beat 3 beat 4

step 1 step 2 ··· step 24

song

phrase 2

4/4 time

Challenge IIITemporal Structure

9




phrase 2 Fixed Structure

4/4 time

Convolutional Neural Networks

Data Representation

11

pitch

time

Bar 1 Bar 2 Bar 3 Bar 4time step

Piano-roll

polyphonic multi-track

(with symbolic timing)

Data Representation

12

pitch

time

Bar 1 Bar 2 Bar 3 Bar 4

A3

t0 t1

Piano-roll



Data Representation

13

Multi-track Piano-roll

pitch

time

tracks



Data Representation

14

96 time steps

84pitches 5 tracks

4 bars

a 4×96×84×5 tensor

Drums

GuitarPiano

Strings

Bass

Data

LPD (Lakh Pianoroll Dataset)。>170,000 multi-track piano-rolls。Derived from Lakh MIDI Dataset。Mainly pop songs

Pypianoroll (Python package)。Manipulation & Visualization。Efficient I/O。Parse/Write MIDI files。On PYPI (pip installable)

15

[Dataset]https://salu133445.github.io/lakh-pianoroll-dataset

[Pypianoroll]https://salu133445.github.io/pypianoroll/

Proposed Model

D 1/0

real samples

Gz~pz G(z)

random noise fake samples

Generative Adversarial Networks

17

X~pX

D 1/0

log(1-D(X)) + log(D(G(z)))

log(1-D(G(z)))

GeneratorMake G(z) indistinguishable

from real data for DDiscriminator

Tell G(z) as fake data from X being real ones

real samples

Gz~pz G(z)

random noise fake samples

Generative Adversarial Networks

18

X~pX

4-bar phrases of 5 tracks

MuseGAN – An Overview

19

Gtemp

4 latent variables1 random noise

TemporalGenerator

BarGenerator

4 piano-roll matrices

Gbar

Bar Generator

Generator

20

zzzzz

zzzz

zzzz

GGGGG

Generator

21

z

Bar Generator

zzzzzzzzz

zzzz

GGGGGNo Coordination

Coordination

track-dependent

track-independent

zzzzz

Generator

22

z

Bar GeneratorGz

GGGGG

zzzz

zzzzzzzzz

zzzz

GGGGG

Temporal Generator

zzzzz

Generator

23

z

Bar GeneratorGz

GGGGG

zzzz

zzzzzzzzz

zzzz

GGGGG

TimeDependent Independent

TrackDependent Melody Groove

Independent Chords Style

MuseGAN

24

Network Architecture

25

zzzzz

z

Bar GeneratorGz

GGGGG

zzzz

zzzzzzzzz

zzzz

GGGGG

Input 32dense 1024reshape to 2, 1 × 512 channels 2, 1, 512transconv 512 2 × 1 2, 1 4, 1, 512transconv 256 2 × 1 2, 1 8, 1, 256transconv 256 2 × 1 2, 1 16, 1, 256transconv 128 2 × 1 2, 1 32, 1, 128transconv 128 3 × 1 3, 1 96, 1, 128transconv 64 1 × 7 1, 7 96, 7, 64transconv 𝑀𝑀 1 × 12 1, 12 96, 84,𝑀𝑀Output 96, 84 × 𝑀𝑀 channels

Input 32transconv 1024 2 1 3, 1024transconv 32 3 1 4, 32Output 32

Temporal Generator

Results

Results

27

More samples available on demo pagehttps://salu133445.github.io/musegan/

Sample 1 Sample 2

BassDrumsGuitarStringsPiano

Step 0 Step 700 Step 2500 Step 7900

Drum pattern

Chords

Bass LineBass

Drums

Guitar

Strings

Piano

https://salu133445.github.io/musegan/

step2000 4000 6000 800010

4

106

108

1010

1012

0

Negative Critic Loss

Neg

ativ

e C

ritic

Los

s

Objective Metrics

28

# of Used Pitch Classes

# of

Use

d Pi

tch

Cla

sses

step

Qualified Note Rate

Qua

lifie

d N

ote

Rat

e

step

Monitor the Training

User Study

29

H: harmoniousR: rhythmicMS: musically structuredC: coherentOR: overall rating

composer jamming hybrid

Accompaniment System

30

Generation from Scratch nothing 5-track

Accompaniment System single-track 5-track

Conditional GAN

Summary。MuseGAN

。a novel GAN for multi-track sequence generation。multi-track, polyphonic music。human-AI cooperative scenario

。Lakh Pianoroll Dataset (LPD) (new dataset)

。Pypianoroll (new Python package)

31

Future Works

Future Works

Full Song GenerationChallenges。hierarchical temporal structure。variable-length sequence generation

33




phrase 2

paragraph 1 paragraph 2 paragraph 3

phrase 1 phrase 2 phrase 3 phrase 4

song

Future Works

Cross-modal GenerationChallenge。cross-modal temporal interdependency

Applications in Music。music + lyrics

。music + video

34

Q&AMuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentHao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, Yi-Hsuan Yang

Source Code https://github.com/salu133445/museganDemo Page https://salu133445.github.io/musegan/

zzzzz

z

Bar GeneratorGz

GGGGG

zzzz

zzzzzzzzz

zzzz

GGGGG

https://github.com/salu133445/museganhttps://salu133445.github.io/musegan/

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentOutlineGoals & ChallengesGoalsChallenge I�Multitrack InterdependencyChallenge II�Music TextureChallenge II�Music TextureChallenge III�Temporal StructureChallenge III�Temporal StructureDataData RepresentationData RepresentationData RepresentationData RepresentationDataProposed ModelGenerative Adversarial NetworksGenerative Adversarial NetworksMuseGAN – An OverviewGeneratorGeneratorGeneratorGeneratorMuseGANNetwork ArchitectureResultsResultsObjective MetricsUser StudyAccompaniment SystemSummaryFuture WorksFuture WorksFuture WorksQ&A

MuseGAN: Multi-track Sequential Generative Adversarial ......cross-modal temporal interdependency...

Documents

Transcript of MuseGAN: Multi-track Sequential Generative Adversarial ......cross-modal temporal interdependency...