MuseGAN: Multi-track Sequential Generative Adversarial ... · Data. LPD (Lakh Pianoroll Dataset)...

Post on 13-Jan-2020

0 views 0 download

Transcript of MuseGAN: Multi-track Sequential Generative Adversarial ... · Data. LPD (Lakh Pianoroll Dataset)...

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentHao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang, Yi-Hsuan YangResearch Center of IT Innovation, Academia Sinica

Demo Page https://salu133445.github.io/musegan/

*these authors contributed equally to this work

Outline。Goals & Challenges。Data。Proposed Model。Results & Evaluation。Future Works

Source Code https://github.com/salu133445/museganDemo Page https://salu133445.github.io/musegan/

2

Goals

Generate pop music。of multiple tracks

。in piano-roll format

。using GAN with CNNs

[Source Code]https://github.com/salu133445/musegan

[Demo Page]https://salu133445.github.io/musegan/

3

Challenge IMultitrack Interdependency

vocal

piano

bassdrums

strings

music & clip by phycause

Multi-track GAN

4

Challenge IIMusic Texture

melody

chord(harmony)

Convolutional Neural Networks

5

Challenge IIITemporal Structure

paragraph 1 paragraph 2 paragraph 3

phrase 1 phrase 2 phrase 3 phrase 4

bar 1 bar 2 bar 3 bar 4

beat 1 beat 2 beat 3 beat 4

step 1 step 2 ··· step 24

song

phrase 2

4/4 time

6

Challenge IIITemporal Structure

bar 1 bar 2 bar 3 bar 4

beat 1 beat 2 beat 3 beat 4

step 1 step 2 ··· step 24

phrase 2 Fixed Structure

Convolutional Neural Networks

4/4 time

7

Data Representation

pitch

time

Bar 1 Bar 2 Bar 3 Bar 4time step

8

Piano-roll

polyphonic multi-track

(with symbolic timing)

Data Representation

pitch

time

Piano-roll

Bar 1 Bar 2 Bar 3 Bar 4polyphonic multi-track

(with symbolic timing)

9

A3

t0 t1

Data RepresentationMulti-track Piano-roll

pitch

time

tracks

polyphonic multi-track

(with symbolic timing)

10

Data Representation

11

96 time steps

84pitches 5 tracks

4 bars

a 4×96×84×5 tensor

Drums

GuitarPiano

Strings

Bass

Data

LPD (Lakh Pianoroll Dataset)。>170,000 multi-track piano-rolls。Derived from Lakh MIDI Dataset。Mainly pop songs

Pypianoroll (Python package)。Manipulation & Visualization。Efficient Save/Load。Parse/Write MIDI files。On PYPI (pip installable)

[Dataset]https://salu133445.github.io/musegan/dataset

[Pypianoroll]https://salu133445.github.io/pypianoroll/

12

Generative Adversarial Networks

X

real data

Gz~p(z) G(z)

random noise fake dataGenerator

D real/fake

Discriminator

4-bar phrases of 5 tracks

critic(wgan-gp)

13

MuseGAN – An Overview

Gtemp

4 latent variables1 random noise

temporalgenerator

bargenerator

4 piano-roll matrices

Gbar

14

Bar Generator

MuseGAN

zzzzz

zzzz

zzzz

GGGGG

15

MuseGAN

z

Bar Generator

zzzzzzzzz

zzzz

16

GGGGG

No Coordination

Coordination

track-dependent

track-independent

zzzzz

MuseGAN

z

Bar GeneratorGz

GGGGG

zzzz

zzzzzzzzz

zzzz

17

GGGGG

zzzzz

MuseGAN

z

Bar GeneratorGz

GGGGG

zzzz

zzzzzzzzz

zzzz

18

GGGGG

TimeDependent Independent

TrackDependent Melody Groove

Independent Chords Style

zzzzz

MuseGAN

z

Bar GeneratorGz

GGGGG

zzzz

zzzzzzzzz

zzzz

19

GGGGG

ChordsStyle

Melody

Groove

Results

More Samples on Demo Pagehttps://salu133445.github.io/musegan/

Sample 1 Sample 2

20

BassDrumsGuitarStringsPiano

Step 0 Step 700 Step 2500 Step 6000 Step 7900

Drum pattern

Chords

Bass Line

Objective MetricsUPC

step

QN

step

UPC number of used pitch classes per bar

QN ratio of qualified notes

Monitor the Training

21

step2000 4000 6000 8000104

106

108

1010

1012

0

Negative Critic Loss

User Study

H: harmoniousR: rhythmicMS: musically structuredC: coherentOR: overall rating

composer

jamming

hybrid

22

Summary。MuseGAN

◦ a novel GAN for multi-track sequence generation

◦ multi-track, polyphonic music

◦ human-AI cooperative scenario (see the paper)

。Lakh Pianoroll Dataset (LPD) (new dataset!!)

。Pypianoroll (new package!!)

23

Future Works

Full Song Generation

bar 1 bar 2 bar 3 bar 4

beat 1 beat 2 beat 3 beat 4

step 1 step 2 ··· step 24

phrase 2

paragraph 1 paragraph 2 paragraph 3

phrase 1 phrase 2 phrase 3 phrase 4

song

Hierarchical Temporal Structure

24

Future Works

Cross-modal Generation。Music + Video。Music + Lyrics。Video + Text

25

Q&A MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music

Generation and Accompaniment

Source Code https://github.com/salu133445/museganDemo Page https://salu133445.github.io/musegan/