Post on 23-Oct-2020
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentHao-Wen Dong,* Wen-Yi Hsiao,* Li-Chia Yang, Yi-Hsuan YangMusic and AI Lab, Research Center of IT Innovation, Academia Sinica
*these authors contributed equally to this work
Outlines。Goals & Challenges。Data。Proposed Model。Results & Evaluation。Recent Work。Future Works
2
Source Code https://github.com/salu133445/museganDemo Page https://salu133445.github.io/musegan/
https://github.com/salu133445/museganhttps://salu133445.github.io/musegan/
Goals & Challenges
[Source Code]https://github.com/salu133445/musegan
[Demo Page]https://salu133445.github.io/musegan/
Goals
Generate pop music。of multiple tracks
。in piano-roll format
。using GAN with CNNs4
Challenge IMultitrack Interdependency
5
vocal
piano
bass
drums
strings
music & clip by phycause
Multi-track GAN
Challenge IIMusic Texture
6
melody
chord(harmony)
Convolutional Neural Networks
Challenge IIITemporal Structure
7
paragraph 1 paragraph 2 paragraph 3
phrase 1 phrase 2 phrase 3 phrase 4
bar 1 bar 2 bar 3 bar 4
beat 1 beat 2 beat 3 beat 4
step 1 step 2 ··· step 24
song
phrase 2
4/4 time
Challenge IIITemporal Structure
8
bar 1 bar 2 bar 3 bar 4
beat 1 beat 2 beat 3 beat 4
step 1 step 2 ··· step 24
Fixed Structure
4/4 time
Convolutional Neural Networks
Data
Data Representation
10
pitch
time
Piano-rollBar 1 Bar 2 Bar 3 Bar 4
polyphonic multi-track
time step
(with symbolic timing)
Data Representation
11
pitch
time
Piano-rollBar 1 Bar 2 Bar 3 Bar 4
polyphonic multi-track
(with symbolic timing)A3
t0 t1
Data Representation
12
Multi-track Piano-roll
pitch
time
tracks
polyphonic multi-track
(with symbolic timing)
Data Representation
13
96 time steps
84pitches 5 tracks
4 bars
a 4×96×84×5 tensor
Drums
GuitarPiano
Strings
Bass
Data
LPD (Lakh Pianoroll Dataset)。>170,000 multi-track piano-rolls。Derived from Lakh MIDI Dataset。Mainly pop songs
Pypianoroll (Python package)。Manipulation & Visualization。Efficient I/O。Parse/Write MIDI files。On PYPI (pip installable)
14
[Dataset]https://salu133445.github.io/lakh-pianoroll-dataset
[Pypianoroll]https://salu133445.github.io/pypianoroll/
Proposed Model
D 1/0
real samples
Gz~pz G(z)
random noise fake samples
Generative Adversarial Networks
16
X~pX
D 1/0
log(1-D(X)) + log(D(G(z)))
log(1-D(G(z)))
GeneratorMake G(z) indistinguishable
from real data for DDiscriminator
Tell G(z) as fake data from X being real ones
real samples
Gz~pz G(z)
random noise fake samples
Generative Adversarial Networks
17
X~pX
D 1/0
log(1-D(X)) + log(D(G(z)))
log(1-D(G(z)))
GeneratorMake G(z) indistinguishable
from real data for DDiscriminator
Tell G(z) as fake data from X being real ones
real samples
Gz~pz G(z)
random noise fake samples
Generative Adversarial Networks
18
X~pX
4-bar phrases of 5 tracks
MuseGAN – An Overview
19
Gtemp
4 latent variables1 random noise
TemporalGenerator
BarGenerator
4 piano-roll matrices
Gbar
Bar Generator
Generator
20
zzzzz
zzzz
zzzz
GGGGG
Generator
21
z
Bar Generator
zzzzzzzzz
zzzz
GGGGGNo Coordination
Coordination
track-dependent
track-independent
zzzzz
Generator
22
z
Bar GeneratorGz
GGGGG
zzzz
zzzzzzzzz
zzzz
GGGGG
Temporal Generator
zzzzz
MuseGAN
23
z
Bar GeneratorGz
GGGGG
zzzz
zzzzzzzzz
zzzz
GGGGG
TimeDependent Independent
TrackDependent Melody Groove
Independent Chords Style
MuseGAN
24
Network Architectures
25
zzzzz
z
Bar GeneratorGz
GGGGG
zzzz
zzzzzzzzz
zzzz
GGGGG
Input 32dense 1024reshape to 2, 1 × 512 channels 2, 1, 512transconv 512 2 × 1 2, 1 4, 1, 512transconv 256 2 × 1 2, 1 8, 1, 256transconv 256 2 × 1 2, 1 16, 1, 256transconv 128 2 × 1 2, 1 32, 1, 128transconv 128 3 × 1 3, 1 96, 1, 128transconv 64 1 × 7 1, 7 96, 7, 64transconv 𝑀𝑀 1 × 12 1, 12 96, 84,𝑀𝑀Output 96, 84 × 𝑀𝑀 channels
Input 32transconv 1024 2 1 3, 1024transconv 32 3 1 4, 32Output 32
Temporal Generator
Results
Results
27
More samples available on demo pagehttps://salu133445.github.io/musegan/
Sample 1 Sample 2
BassDrumsGuitarStringsPiano
Step 0 Step 700 Step 2500 Step 7900
Drum pattern
Chords
Bass lineBass
Drums
Guitar
Strings
Piano
https://salu133445.github.io/musegan/
step2000 4000 6000 800010
4
106
108
1010
1012
0
Negative Critic Loss
Neg
ativ
e C
ritic
Los
s
Objective Metrics
28
# of Used Pitch Classes
# of
Use
d Pi
tch
Cla
sses
step
Qualified Note Rate
Qua
lifie
d N
ote
Rat
e
step
Monitor the Training
User Study
29
H: harmoniousR: rhythmicMS: musically structuredC: coherentOR: overall rating
composer jamming hybrid
Accompaniment System
30
Generation from Scratch nothing 5-track
Accompaniment System single-track 5-track
Conditional GAN
Summary。MuseGAN
。a novel GAN for multi-track sequence generation。multi-track, polyphonic music。human-AI cooperative scenario
。Lakh Pianoroll Dataset (LPD) (new dataset)
。Pypianoroll (new Python package)
31
Recent Work
Known Issue。Naïve binarization methods can easily lead to
overly-fragmented notes
33
raw
Bernoulli sampling
hardthresholding
BinaryMuseGAN。use binary neurons at the output layer of the generator。use straight-through estimator to estimate the gradients for the
binary neurons (which involves nondifferentiable operation)
34
Generator’s outputs Real data
MuseGAN real-valued binary-valued
BinaryMuseGAN binary-valued binary-valued
Hao-Wen Dong and Yi-Hsuan Yang, "Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation," to appear at ISMIR, 2018.
Qualitative Comparison
35
raw
pretrained(+BS)
pretrained(+HT)
proposed(+SBNs)
proposed(+DBNs)
raw
proposed (+SBNs) proposed (+DBNs)
pretrained (+HT)pretrained (+BS)
Future Works
Future Works
Full Song GenerationChallenges。hierarchical temporal structure。variable-length sequence generation
37
bar 1 bar 2 bar 3 bar 4
beat 1 beat 2 beat 3 beat 4
step 1 step 2 ··· step 24
phrase 2
paragraph 1 paragraph 2 paragraph 3
phrase 1 phrase 2 phrase 3 phrase 4
song
Future Works
Cross-modal GenerationChallenge。cross-modal temporal interdependency
Applications in Music。music + lyrics
。music + video
38
Q&AMuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentHao-Wen Dong,* Wen-Yi Hsiao,* Li-Chia Yang, Yi-Hsuan Yang
Source Code https://github.com/salu133445/museganDemo Page https://salu133445.github.io/musegan/
zzzzz
z
Bar GeneratorGz
GGGGG
zzzz
zzzzzzzzz
zzzz
GGGGG
Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation Hao-Wen Dong and Yi-Hsuan Yang
https://github.com/salu133445/museganhttps://salu133445.github.io/musegan/
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentOutlinesGoals & ChallengesGoalsChallenge I�Multitrack InterdependencyChallenge II�Music TextureChallenge III�Temporal StructureChallenge III�Temporal StructureDataData RepresentationData RepresentationData RepresentationData RepresentationDataProposed ModelGenerative Adversarial NetworksGenerative Adversarial NetworksGenerative Adversarial NetworksMuseGAN – An OverviewGeneratorGeneratorGeneratorMuseGANMuseGANNetwork ArchitecturesResultsResultsObjective MetricsUser StudyAccompaniment SystemSummaryRecent WorkKnown IssueBinaryMuseGANQualitative ComparisonFuture WorksFuture WorksFuture WorksQ&A