Expressive music performance...Prelude No. 3 by Nino Rota which audio clip was performed by a...
Transcript of Expressive music performance...Prelude No. 3 by Nino Rota which audio clip was performed by a...
1
Expressive music performance
Chapt. 7
2
1. Why study expressiveness
Understanding human communication strategies Non-verbal communication Expressiveness tells “how to take” the explicit message Disambiguate language expressions (e.g. in a movie)
To embody expressive knowledge in machines To adapt HCI to the basic forms of human behavior
Artistic applications, games and entertainment Auditory Display, Music Retrieval, Education
What is expressiveness (in music performance)
refers to the effect of auditory parameters of music performance covering acoustic, psychoacoustic, and/or musical factors refers to the variation of auditory parameters away from a prototypical performance, but within stylistic constraints
is not deviation relative the musical score, but the sound world that the score attempts to represent
is not the same as emotion is used in the intransitive sense of the verb
no specific feeling is necessarily being expressed; rather the music performance sounds “expressive“ to differing degrees
depends on historical and cultural context
3 Fabian,Timmers, Schubert 2014
Expressiveness: content and layers
Content Layer Sample Dimensions
Musical expressiveness
Composi2onal structure
'objec2ve' expressiveness
Performance expr. amount appropriate expr.
Emo2onal expressiveness
Composi2onal structure
composi2onal features
Performance expressive devia2ons
Sensory / motor expressiveness
Composi2onal structure
sensory quali2es
Performance KE space, ac2on metaphor
4
5
1. Expressiveness in music performance
Expressiveness refers to: aspects of a musical performance that are under the control of the performer the means used by the performer to convey the composer’s message the performer’s contribution to enrich the musical message.
0 20 40 60 80 100 120 35 40 45 50 55 60 65 70
bar 16 bar 8
bar 4 bar 2
key
velo
city
note number
bar 10 bar 12
dynamic profiles vs. musical structure
6
Expressiveness related to the musical piece
Strategies to highlight the musical structure to disambiguate passages where several structural interpretations are possible
Dramatic narrative developed by the performer Emotional content or expressive intention Physical and motor constraints or problems (e.g. fingering) Stylistic expectation based on
cultural norm (e.g. jazz vs. classic music) actual performance situation (e.g. audience engagement)
Musical instrument (e.g. W. Carlos)
Palmer, 1997
7
Expressiveness added by the performer
Performers may play the same piece according diverse nuances: emotions, affects (perceived, induced) KANSEI (i.e. sensibility, feeling, sensitivity) expressive intention
include emotion, affects as well other sensorial and descriptive adjectives or actions evidences the explicit intent of the performer in communicating expression
Artistic expression
Aggressive Gloomy Active
8
Music communication chain
accessible information
Friberg 1995
perform
score
implicit information
expressive intentions performance
expressive cues
Modelli computazionali per l’esecuzione espressiva della musica
Tradurre in suono i simboli in cui è codificata una partitura musicale
Op2cal Music Recogni2on
Modelli per l’esecuzione espressiva partitura
digitale (MIDI, MusiXml)
partitura grafica
suono
Strumento o Modelli di sorgente/
segnale segnali di controllo
Musical Turing Test
A B Prelude No. 3 by Nino Rota
which audio clip was performed by a computer?
Modelli computazionali per l’esecuzione espressiva della musica
Tradurre in suono i simboli in cui è codificata una partitura musicale
evento partitura evento sonoro
frequency pitch metric position metric length
P(n) MP(n) ML(n)
Modelli computazionali per l’esecuzione espressiva della musica
Tradurre in suono i simboli in cui è codificata una partitura musicale
0
10
20
30
40
50
60
0 20 40 60 80 100 120 140
onse
t time
[s]
metric position [beats]
Esecuzione nominale
Modelli computazionali per l’esecuzione espressiva della musica
Tradurre in suono i simboli in cui è codificata una partitura musicale
0
10
20
30
40
50
60
0 20 40 60 80 100 120 140
onse
t tim
e [s
]
metric position [beats]
Esecuzione nominale
Esecuzione espressiva
Modelli computazionali per l’esecuzione espressiva della musica
Le deviazioni espressive sono correlate a specifiche strutture ritmiche, melodiche e armoniche La sintassi del “linguaggio” musicale (suddivisione in motivi, frasi, periodi) ha un ruolo importante Modelli delle deviazioni espressive a partire dagli Anni ’80 (Sundberg et al. 1983, Todd 1992, Widmer 2003, …) Esistono anche aspetti connotativi (emozioni, sensazioni, …), che dipendono dalle intenzioni espressive dell’esecutore
15
Accessible information
Symbolic information: score:
note list performance instructions: e.g. agogic marks
features extracted from score musical structure is not evident annotations (e.g. structure, prosody, etc.)
Physical information: audio signal
acoustic and perceptual features extracted from audio performance actions: e.g. MIDI
performer gestures listener’s physiological response
16
17
Accessible information
Expressive information basic emotions
Happiness, Anger, Sadness, Fear, Tenderness emotion space è Valence / Arousal sensorial space è Kinetics / Energy expressive intentions metaphorical description
Frequently, focus on expressive deviations
18
Expressive deviations (variations)
Systematic presence of deviations from the musical notation Deviations are only the external surface
Measurable: from Midi performances: timing, key velocity from audio signal (partially)
Reference for computing deviations: score, both for theoretical (the score represents the music structure) and practical (it is easily available) reasons global (longer time spam) measurements as reference for local ones
mean performance neutral performance
Mozart K622: neutral Mozart K622: score
19
10
20
30
40
50
60
70
80 ×DR [%] Horowitz 65
-10
-20
-30
1 2 3 4 5 6 7 8 910
20
30
40
50
60
70
80
-10
-20
-30
10
10
10
20
30 ²DR [%] Schnabel
-10
-20
1 2 3 4 5 6 7 8 9
30
-10
-20
10
20
30
40
50 ²DR [%] Brendel
-10
-20
-30
1 2 3 4 5 6 7 8 9
20
30
40
50
-10
-20
-30
Inte
rons
et In
terv
al d
evia
tions
(%)
Brendel
Schnabel
Horowitz 65
Timing variations by 3 pianists
(Träumerei first 8 bars) deadpan
20
Why do these deviations exist?
The score serves mainly: as a mean to convey instructions to the performer as an aid for memorization and preservation.
The score may serve as a representation of cognitive elements of the composition rather than the physical parameters.
The performer interprets these symbols, taking into account the implicit information his/her personal artistic feeling and aim.
The liberty of the musicians to exhibit their own personal interpretation has varied, but has rarely been completely denied.
see e.g. prosody in speech è Musical prosody
21
22
Information and music performance
Expressive performance parameters Physical information level
timing of musical events and tempo, dynamics and articulation . . .
Symbolic information level
Score Musical structure (?)
23
Example of symbolic information
Tree representing the hierarchical structure of the first 16 bars extracted from Mozart sonata K 545
rectangles represent structural elements, circles represent notes.
24
Information representation
Event information representation: EV[n] score as a sequence of events: EV[·]
Time parameters pitch value FR[n], Onset time O[n] and Duration DR[n]
IOI[n] = O[n + 1] – O[n] Inter Onset Interval L[n] = DR[n]/IOI[n] Legato
Intensity I[n] or KeyVelocity KV [n] for MIDI event
Timbre-related parameters. Brightness BR[n] energy envelope
Attack Duration AD[n] Envelope Centroid EC[n]
25
Time information representation
space metaphor: score position: x[n]
measured in musical units (one musical unit = whole note value) metrical units (beats or bars).
length: symbolic duration of an event distance: interval between events
Both can be measured in musical or metrical units e.g. in a 3/4 rhythm
the length of 1 bar = 3 beats (quarter note) = 0.75 musical units (whole notes) the corresponding distance of the event at the beginning of a bar from the event at the beginning of the next bar = 3 beats (quarter note) = 0.75 musical units (whole notes) wide.
26
Time information representation
symbolic time reference can be absolute (as in the so called piano roll notation) or relative: the distance from a previous event is specified
score as a sequence of events distance between two adjacent events: note or rest value NV [n] score position
x[n + 1] = x[n] + NV [n]
symbolic time in seconds nominal time tnom or score time
nominal onset time Onom[n] nomimal duration DRnom[n] inter onset interval (nominal length)
IOInom[n] = Onom[n+1] - Onom[n]
27
Time information representation: Tempo
Tempo: measured by metronome v = score length / performance duration
tnom = x/v Onom[n] = x[n]/v
e.g. with of MM = 120 quarters per minute, v = 2 quarters (beat) per second a beat will lasts 0.5 seconds.
A piece of 16 bars with rhythm 3/4 ( e.g. a waltz) will have total length of 12 musical units and 48 beats.
this piece played exactly at MM = 100 quarters per minute will lasts 28.8 seconds
v = 12/28.8 = 0.417 musical units per second v = 48/28.8 = 1.67 beat per second.
28
Time information representation
at physical level: performance time t measured in seconds Onset time: O[n] Duration: DR[n] Inter Onset Interval: IOI [n] = O[n + 1] − O[n]
IOI deviation: ΔR = IOI [n] / IOInom[n] [%] Legato: L[n] = DR[n]/IOI[n]
29
Time information representation
the relation between performance time and score position x(t) or t(x)
Tempo Mean tempo Main tempo: prevailing tempo Basic tempo: central tendency Local tempo Event tempo
Tempo as function of score position Duration of a measure Relative IOI
30
Example of time map
Tempo v(x)
Time shift: timing measured as deviations from a regular pulse
Time map: t(x)
31
Time shift and micro-pause
time shift: deviation from expected
32
Model representations
Time representation Discrete values è events
e.g. articulation of timing of individual notes Continuous values è functions
e.g. vibrato Granularity
numerical (absolute, relative) values e.g. time interval in ms, relative IOI
categorical e.g. staccato vs. legato, shortening vs. lengthening
Time scale single note, e.g. attack time, vibrato local: few notes, e.g. articulation of a melodic gesture more global: e.g. phrase crescendo
33
2.1 Research on music performance: modeling
’30-’40: Seasore collected and analysed many objective measurements from performances ’40-’70: no much activity ’60: beginning of computer music ’70- . . .: computational approaches
Model development strategies: a. based on intuition b. based on measurements: statistical and mathematical models c. analysis by synthesis, e.g. rule based d. data driven
34
a. Models for understanding, based on intuition
Human analyst select musical parameters devises a theory or a mathematical model assesses empirical validity
by testing on real performance data by expert evaluation
Outputs: expressive deviations expressiveness recognition
35
Performance expressive parameters
Main expressive parameters: tempo and timing (global, local) dynamics articulation
for some instruments, voice vibrato, tremolo intonation
for contemporary music timbre virtual space position
How do parameters interact?
36
b. Modeling strategies for understanding
Analysis-by-measurement based on the analysis of deviations measured in recorded human performances aims at recognizing regularities in the deviation patterns and to describe them by means of a mathematical model, relating score to expressive values
Performing controlled experiments: by manipulating one parameter in a performance (e.g. the instruction to play at a different tempo) the measurements reveal something of the underlying mechanisms.
37
Analysis by measurements
Analysis-by-measurement Selection of performances. Measurement of the physical properties of every note. Reliability control and classification of performances. Selection and analysis of the most relevant variables. Statistical analysis and development of mathematical interpretation model of the data.
38
c. Analysis by synthesis
Analysis by synthesis models Development method
Model formalizes intuition, knowledge of expert musician Performance generation by model simulation Evaluation by experts Refinements
The system acts as a student
KTH rule system
39
Analysis by synthesis: method
Collect data from analysis by measurements Further steps
Start with a hypothesized principle repeat
realize it in terms of a synthetic performance evaluate it by listening. if needed,
modify the hypothesized principle or formulate new rule until the performance is satisfactory
40
Analysis-by-synthesis: pros and cons
Advantages: perception of the music is directly used in evaluation general validity of the hypotheses can directly be tested by applying it to other music examples feedback loop is very short
Disadvantages conclusions are based on the expertise of just a few people. parameters can be chosen rather arbitrarily.
The success depends on:
the formulation of hypotheses on competent listeners.
d. Data driven approach
Search for and discover dependencies on very large data sets via machine-learning and data-mining techniques; goal: ‘explaining’ the given data as well as possible.
No preliminary hypothesis: bottom up approach
Advantages: rooted in large amounts of empirical data the machine may discover truly novel and unexpected things good results in extracting local musically relevant patterns
Problems: difficult to incorporate musicological knowledge into the process challenging: human-interpretable high level models
41 Widmer et al.
42
Example
Dynamics deviation learned from the training pieces applied to Chopin Waltz Op.18, Op.64 no.2
43
Example
Tempo deviation learned from the training pieces applied to Chopin Waltz Op.18, Op.64 no.2
Models
Model based on measurements Tod’s model, Final retard
Analysis by synthesis KTH rule systems CaRo system
Data driven SaxEx YQX
Interactive systems: Virtual Philarmony Expressive sound movements
44
45
d1. Case-based reasoning
Case-based reasoning: Appropriate for problems where:
many examples of solved problems can be obtained a large part of the knowledge involved in the solution of problems is tacit, difficult to verbalize and generalize. new solved problems can be revised by humans and memorized (= learn by experience)
A case consists of a problem, its solution, and, typically, annotations about how the solution was derived.
Saxex system (Arcos, 1997) spectral processing example: Autumn lives
Inexpressive Expressive
Solution Review Retain
Adapt
Retrieve Similar New
Problem
46
Example: KTH performance rule system
Reasoning: rule system if < condition > then < action >
weighted addition of many rules dev[n] = Σ ki fi (note[n] )
KTH performance rule system differentiation rules grouping rules synchronization rules meta-rules
47
Types of rules
Differentiation Rules enhance the differences between scale tones, and between note values Grouping Rules show which tones belong together and which do not.
In music the "belonging-togetherness" exists at several levels simultaneously. Tones constituting melodic gestures, such as belong together, as do tones constituting phrases. The rules mark, by means of micro-pauses and lengthening of tones, the boundaries between all these tone groups.
Ensemble Rules keep the order in ensembles. They achieve synchronization by lengthening and shortening the individual tones
Meta-Rules for emotional expressive performance choice of rules and weight
48
KTH Rules examples
Differentiation rules Duration Contrast High sharp Melodic charge
Grouping rules Phrase arch Punctuation Final retard
Ensemble rules Melodic synchronization Bar synchronization
Macro-Rules for emotional expressive performance
49
Differentiation rule: Duration Contrast
The contrast between long and short note values, such as half note and eighth note, is sometimes enhanced by musicians. They play the short notes shorter and the long notes longer than nominally written in the score. No contrast:
50
Duration contrast
Medium, k = 2.2
Exaggerated, k = 4.4
51
Duration contrast
Inverted, k = -2.2
52
Differentiation of Pitch Categories (2)
MELODIC CHARGE: This rule accounts for the "remarkableness" of the tones in relation to the underlying harmony. Sound level, duration and vibrato extent are increased in proportion to the melodic charge value
Emphasis rule
53
Example: Melodic charge
J S Bach, Fugue theme of the Kyrie movement from the b-minor Mass, BWV 232 No rule K= 1.5
54
Microlevel Grouping (1)
PUNCTUATION: The melody can be divided into small musical gestures normally consisting of a few notes. This rule tries to identify and perform these gestures. It consists of two parts:
the gesture analysis application of these in the performance.
The gesture analysis is a complex system of 14 subrules where concepts from LEAP ARTICULATION and ACCENTS are used. The identified gestures are performed by inserting micro-pauses at the boundaries
55
PUNCTUATION rule
The main principles for the finder rules are: in melodic leap, with different weights for different contexts, after longest of five notes, after appoggiatura before a note surrounded by longer notes after a note followed by two or more shorter notes of equal duration
The eliminator rules remove marks or reduce weights in this cases after very short notes in a melodic step motion when several duration rules interact at two adjoining marks in a tone repetition
56
Punctuation rule
J S Bach: Fugue theme from Fantasia und Fuga, g minor, BWV 542. In the example another rule is also applied
No rule Medium Exaggerate
57
Macro-grouping rule: Phrase arch
PHRASE ARCH Tempo curves in form of arches with an initial accelerando and a final ritardando are applied to the phrase structure. sound level is coupled with the tempo variation so as to create crescendi and diminuendi.
In this example two other rules are applied: Durational contrast Punctuation
F. Mendelsohn, Aria # 18 from "St. Paul", Op. 36.
58
Phrase arch
Medium, k = 1.5
Exaggerated, k = 2.5
59
Phrase arch
Inverted, k = -1.5
60
Macro rules: for emotions
Macro-Rules for Emotional Expressive Performance choice of rules and weight
Examples Fear Anger
61
IOI deviations
dB deviations
Emotional expression: SADNESS
SAD
DEAD-PAN
articulation
62
e. Modeling expressive intentions: CaRo
CaRo model: expressive deviations from an input neutral performance elementary transformations, that not destroy the relation between deviations and musical structure
Musical Expressiveness
Model
Expressive Intentions
MIDI Performance
Symbolic Description
of Audio Perf.
Audio Performance (Sinusoidal
Model)
MIDI Rendering
[ {ai},{fi},{phii} ]
( Off-Line )
Audio Processing
63
Example: measured deviations
Pianist A’s dynamics profiles measured in a single performance
Dynamics profiles show that there are many differences between the performances In spite of these differences, some similarities can be found amount the profiles
0 20 40 60 80 100 120
30
40
50
60
70
80
na br da ha so he li pa flKe
y ve
loci
ty
note number
64
Normalized measured deviations
0 20 40 60 80 100 120
-4
-3
-2
-1
0
1
2
3 na br da ha so he li pa flKe
y ve
loci
ty
note number
Pianist A’s dynamics profiles, normalized to zero mean and unitary variance
65
Average of measured deviations
Grand average of the pianist A’s dynamics profiles
0 20 40 60 80 100 120
-4
-3
-2
-1
0
1
2
3
Key
velo
city
note number
66
Computing deviations: variables k and m
= estimated profile = neutral performance = mean of the profile
67
Examples
MIDI
Corelli Op. V neutral
bright
Post processing
Input Output
Mozart K622 neutral
Mozart K622 nominal
bright
dark
hard
light
soft
heavy
heavy
light
hard
Interaction: the control space
Interactive Interface Multimodal Control Multimedia Context
68
69
Multimodal Control
Human communicates his own intention in a multimodal way For the human-machine communication, we may distinguish two main classes of possible interfaces:
Graphic panel dedicated to the control. The control variables are directly displayed on the panel and the user should learn how to use it. Multimodal, where the user interacts ‘freely’ through movements and non-verbal communication à Task of the interface is to analyze and to understand human intention.
Movement analysis
Expressive music synthesis
Movement Sound
70
Control spaces
The control space is the user interface, which controls, at an abstract level, the expressive content and the interaction between the user and the audio object of the multimedia product.
Perceptual expressive space, was derived by multidimensional analysis of various professionally performed pieces ranging from western classical to popular music. Synthetic expressive space, allows the author to organize his own abstract space by defining expressive points and positioning them in the space
71
Mapping strategies
From position in control space (x,y) to expressive variables k and m Perceptual Expressive Space: linear transformation of position
Synthetic Expressive Space: quadratic interpolation
From parameters P of neutral performance and score P to expressive parameters Pe
72
Examples
drawing a trajectory in the control space, the expressive content change accordingly
Score Expressive
Mozart K 545
from Natural
Mozart K332
73
Expressive audio rendering
Signal processing techniques based on sinusoidal model Time stretching Pitch Shifting Amplitude Envelope Control Brightness control
Rendering of expressive deviations: Local tempo:
time stretching (different for attack, sustain and release) Legato:
time stretching and pitch shifting (with ad-hoc spectral processing) Intensity:
amplitude envelope and brightness Envelope Shape:
amplitude envelope (with triangular-shaped functions)
Example
Expressive post-processing Time-frequency processing for legato
74
75
Real time expressive processing Corelli - Violin Sonata in A dur op. V (excerpt)
Corelli Op. V neutral
Att Vel Leg Int Inv
76
Once upon a time…
3 pages
1 page
• The control space was also implemented as an Applet
• It is used to change interactively the sound comment of the multimedia product
• The author can associate different expressive intentions to the character
Caro 2.0
77
78
YQX: a model for expressive music performance
Data Performances of Mozart Piano Sonatas by Roland Batik Performances of Chopin Piano Pieces by Nikita Magaloff
Method Model score-performance dependencies as a probabilistic network:
79
Probabilistic network
Bayesian model relations between score context and target variables
Model p(Y | Q,X) as a Gaussian distribution N (µ, σ2) that varies linearly with X:
Estimate the parameters and by linear regression (least squares)
σ2 is the average residual error Trained by estimating, separately for each target variable, The dependency on the discrete variables Q is modeled by computing a separate model for each possible combination of values of the discrete values.
80
Features/Targets
Targets (Y): timing, loudness, articulation
Discrete Features (Q): pitch interval, rhythmic context, I-R label, I-R position
Continuous Features (X): I-R closure, duration ratio
81
Performance Rendering
1. Read Expressive hints from score (trill, cresc., rit., p, f, etc.) Set up basic tempo curve Set up basic loudness curve
2. Extract features from score
3. Predict expressive deviations from model; For every note and every target:
Enter features as evidence in network Infer most likely timing/loudness/articulation value
4. Combine basic curves and predicted deviations (for tempo/loudness respectively)
82
YQX system
YQX system at Rencon 2008
83
84
f. Expressiveness of virtual sound movements
The composer asked for expressive sound movements
Experiment: stimuli speed (fast, slow) path (continuous circular, discontinuous random) articulation (legato, staccato) timbre (white noise, harmonic sound)
Results: valence – arousal dimensions similar space location for similar movements strong relation between speed and arousal axis timbre influence legato-staccato is a parameter stronger than path
De Gotzen, IEEE MM 04
85
A. Guarnieri Medea (2002)
Video opera: PalaFenice, Venezia, 2002 sound movement in space as a musical parameter expressive matching between instrumental and sound movement gestures metaphor: the public enters the physical movement
trombone player’s gesture gesture analysis expressive spatialization
Movimento del suono delle trombe in funzione della sua dinamica
Guarnieri, Medea Viene estratto l’inviluppo di ampiezza e trasformato in movimento longitudinale di spazializzazione
Alvise Vidolin 86
Riduzione stereofonica
Movimento sinistra - destra
sinistra
destra
87
Medea, by Adriano Guarneri
Teatro La Fenice, Venezia, 2003 72-element choir, 62-element orchestra, 4 soloist singers Live electronics: reinforcement, spatialization, real time sound processing
Another test
A B
Which is your favorite performance?
Allegro Burlesco by Kuhlau Op. 88, n. 3
Another test
A B
Which is your favorite performance?
90
A dynamic model of phrasing (Todd)
Music and motion musical movement has two degree of freedom, tonal movement and rhythmic movement; this movement is similar to and imitates motion in physical space; the object of motion in physical space, to which musical movement alludes, is that of a body or limb.
91
Definitions
Definitions: x = score position t = performance time a = acceleration u = initial velocity
x (t) or t(x)
92
Linear tempo model
tempo is supposed to vary linearly in time
93
Energy, tempo and intensity
A piece can be decomposed in a hierarchical sequence of segments Every segment is characterized by an accelerando-ritardando pattern The analogy is the movement of a particle of mass m in a V-shaped potential well of length L Energy (E) = Kinetic energy + Potential energy (V)
Kinetic energy = m v2 /2
The complete x(t) mapping results shaped as piece-wise parabolas.
94
Example: simulation vs. performances
Chopin Prelude Simulation (bold line) vs Performance data (dotted line)
95
Example
Theme from the six variations composed by Beethoven over the duet Nel cor più non mi sento
96
Example
Nel cor più non mi sento
Music and Structure (Todd, 1985)
97
Phrase structure
Predicted timing (IOI)
98
Multilevel decomposition (Widmer Tobudic)
Multilevel decomposition (Widmer Tobudic)
Multilevel decomposition of dynamics curve of performance of Mozart Sonata K.279:1:1, mm.31–38
reconstruction residual
Music and Motion (Friberg & Sundberg, 1999)
100
Pianists (tempo) OR Pianists (IOI)
101 symbolic time
tempo
Final retard
Final retard is a three phase quasi-mechanical movement: given constant tempo quadratic function (parabola) for tempo decrease from mechanical retard with constant force linear final tempo curve
102
Mathematical models: Mazzola model
Developed by Guerino Mazzola and colleagues in Zurich Mathematical music theory and performance model
Applies analysis and performance components computer-aided analysis tools for musical structure
each aspect implemented in a Rubbette (plugin) performance is generated with the Rubettes uses “Stemma/Operator” theory for mapping
MetroRUBETTE (inner) metrical analysis result is different than Lerdahl and Jackendoff metrical analysis used linear mapping between metrical weight and tone intensity to generate a performance
Empirical Evaluation not evaluated against real performances
103
YQX: a model for expressive music performance
Data Performances of Mozart Piano Sonatas by Roland Batik Performances of Chopin Piano Pieces by Nikita Magaloff
Method Model score-performance dependencies as a probabilistic network:
104
Probabilistic network
Bayesian model relations between score context and target variables
Model p(Y | Q,X) as a Gaussian distribution N (µ, σ2) that varies linearly with X:
Estimate the parameters and by linear regression (least squares)
σ2 is the average residual error Trained by estimating, separately for each target variable The dependency on the discrete variables Q is modeled by computing a separate model for each possible combination of values of the discrete values.
105
Features/Targets
Targets (Y): timing, loudness, articulation
Discrete Features (Q): pitch interval, rhythmic context, I-R label, I-R position
Continuous Features (X): I-R closure, duration ratio
106
Performance Rendering
1. Read Expressive hints from score (trill, cresc., rit., p, f, etc.) Set up basic tempo curve Set up basic loudness curve
2. Extract features from score
3. Predict expressive deviations from model; For every note and every target:
Enter features as evidence in network Infer most likely timing/loudness/articulation value
4. Combine basic curves and predicted deviations (for tempo/loudness respectively)
107
YQX system
YQX playing "My Nocturne" at Rencon 2008
108 https://www.youtube.com/watch?v=IS3519TUkzg
109
Tempo and articulation prediction.
Tempo and articulation are predicted by a Bayesian Network modelling dependencies between score and performance as conditional probability distributions. The score model comprises simple score descriptors (rhythmic, melodic and harmonic) and higher level features from the Implication-Realization (I-R) model of melodic expectation by E. Narmour (I-R-labels and a derivation of I-R-closure).
110
Tempo prediction
Local tempo, a per-note prediction of long-term tempo changes; take the surrounding performance context into account the Bayesian network is unfolded in time and an adapted version of the Viterbi Algorithm for HMM is used to predict a series of tempo changes that is optimal with respect to the complete piece
Note timing, note-to-note deviations from local tempo using only the immediate score context and the value predicted for the previous note
Global tempo, extracted from tempo annotations in the score (e.g. andante, a tempo).
Combined to form the final tempo Articulations: only the immediate score context
111
Note level rules
Data driven rule extraction algorithm for musical expression. Two of the rules are used to further enhance the aesthetic qualities of the rendered performances. staccato rule:
if two successive notes have the same pitch, and the second of the two is longer,
then the first note is played staccato. delay-next rule:
if two notes of the same length are followed by a longer note, then the last note is played with a slight delay.
Professional musicians tend to emphasize the melody by playing the melody notes slightly ahead of time
Basis function modeling
A set of basis functions φ is combined in a function f, parametrized by a vector of weights w, to approximate the dynamics measured from a recorded performance of the score. Dynamics as a function f(φ;w) of basis functions (φ1,⋯,φK), representing dynamic annotations and metrical basis functions.
112 Gratchen et al. IEEE Multimedia 2017
Interactive performance systems
Real time control of the expressive performance systems VirtualPhilarmony (Baba, Hashida, Katayose)
conducting interface that enables users to perform expressive music with conducting action. provide its player with the sensation of conducting a real orchestra.
113
experience, conducting systems using audio signals are superior
to those using MIDI signals. However, the audio conducting
systems have to resolve a crucial problem regarding
deterioration of the sound quality caused by the phase vocoder.
“Maestro Musik” [3] and “Wii Music” [4] are video games that
simulate the conducting. The former uses a special baton device
and the latter uses a Wii Remote, giving precedence to
entertainment value rather than the actual sensation of
conducting. As a result, the feeling of conducting is missing an
element of reality and the user often fails to accurately conduct
the performance from a musical standpoint because of the
oversensitive feedback.
2.2 Heuristics of conducting an orchestra As mentioned above, conducting systems have been made for
the purpose of questing for musical expression in computer-
generated music or entertainment. However, the existing
conducing systems so for provide its users with
little sensation conducting a real orchestra, from the experts'
point of view.
In case of real conducting, a conductor faces with members of
an orchestra. The members do not always obey the directions of
the conductor completely. Each of the members has her/his own
musicality and intentions just like the conductor. Interaction
between a conductor and the orchestra yields real musical
performances. The biggest issue of the conducting systems is
that few systems consider the viewpoint orchestra members'
musicalities or the intentions when generating music performances.
iFP, mentioned above, is one of the few systems that deal with
interaction between the player and the machine musicality. The
scheduler of iFP mixes the player’s intention of tempo and
dynamics and that described in a expressive performance
template. It schedule the timing of notes to be issued, using the
predicted tempo given by the following equation
€
Pn+1 = αAn +βBn +γCn (1)
where
€
Pn+1 is the predicted tempo of the next beat,
€
An is the
moving average of the last four beats from the performance history,
€
Bn is the differentiation between the current and the
last tempo, and
€
Cn is the tempo that the template indicates.
€
α ,
€
β and
€
γ are the weighting coefficients for each variable
€
(0 ≤ α,β,γ ≤1) . In iFP, the balance of these weight parameters
characterizes the performance taste that is interaction between
the player and the template that strands for the virtual orchestra.
But the weight parameters used in iFP are static regardless the
music style and the tempo all the time. The weight parameters
should be adjusted as they fit to the music style (E.g. a march, a
waltz), and the template itself also should be adjusted in
accordance with tempo dynamically.
The goal of V.P. is to provide its player with the sensation of
conducting a real orchestra. We implemented a function called
“Concertmaster Function” where, set of heuristics about
musicalities or the intentions that orchestra members are
accumulated for this goal. A human concertmaster conveys
instructions of a conductor to orchestra members, and tells the
orchestra members’ will to the conductor. Generally, the leader
of the first violin section of an orchestra takes charge of the
concertmaster. In V.P., concert-master plays the role of
adjustment of intention of its player and the template. To be
concrete, heuristics such as, which of the past beats are crucial
when predicting the future beat position, or how the data of a
Figure 1. Overview of V.P.
template should be modified as they fit the tempo change are
examined and applied through the analysis of virtuoso music
performances.
3. SYSTEM DESIGN OF V.P. V.P. is a conducting interface, the basic idea of which is derived
from iFP. Figure 1 shows an overview of V.P. In V.P., like
many other conducting systems, a tempo of a beat is detected
by some gesture sensors from conducting motions that a player
draws. It is a conducting interface based on beat tapping. It
captures only one vertical dimension of a gesture of conducting.
V.P. is implemented on Max/MSP by Cycling’74. In V.P, a
tempo of the next beat is predicted from the history of beat that
the player has given. And an expressive performance template
is used, as well as iFP. We introduce heuristics of conducting an
orchestra, as the template works more naturally.
3.1 Gesture sensing device In V.P., the local minimum point of the conducting motion
captured by various sensors. This point is regarded as beat-point.
Three methods were used in the acquisition of the beat-point to
correspond to various conditions of the stage: 1) a glove
interface using an accelerometer sensor, 2) a baton using an
infrared sensor (Wii Remote), and 3) a capacitance sensor
(MIDI Theremin).
3.1.1 Glove interface using an accelerometer
sensor The V.P. players attach an original glove equipped with an
accelerometer sensor to their right hands. The LilyPad Arduino
[7] is used for the accelerometer sensor and the microcontroller.
The three-dimensional acceleration information with a sampling
time of 10 ms is sent to the system through Bluetooth using a
BlueSMiRF [8]. The gravity component is extracted from the
three-dimensional acceleration data. This direction is labeled as
the vertical component, and the local minimum values of the
vertical component are considered beat-points. Figure 2-a
shows pulling on the glove.
We had the option of using the Wii Remote, which is also
equipped with an acceleration sensor, like UBS Virtual Maestro
[4], however, the Wii Remote (160 g) is much heavier than an
ordinary conductor’s baton (10 g). We were concerned that the
heaviness of the Wii Remote would worsen the player’s
conducting motion because the center of gravity of the Wii
Remote is far from the player’s hand. In addition, although the
weight of the original glove is about 100 g, this weight is not a
problem because the player uses a real baton, and its center of
Proceedings of the 2010 Conference on New Interfaces for Musical Expression (NIME 2010), Sydney, Australia
264
Gesture sensors
Glove interface using an accelerometer Baton using infrared sensor (Wii Remote) Capacitance sensor (MIDI Theremin)
114
gravity is nearer the player’s hand than it is with the Wii
Remote.
There are two advantages to using the acceleration sensor. One
is not being limited by the V.P. player’s placement and
direction and the other does not need to use caution when
making the conducting motion. V.P. players can construct any
conducting figure they like compared with the two ways
described in more detail below that have large restrictions on
position.
In contrast, the acceleration value increases and decreases
significantly as a result of various factors other than the vertical
movement, such as rotation. Therefore, beat tracking using
acceleration sensors will introduce more mistakes than other
sensing inputs.
3.1.2 Baton using infrared sensor (Wii Remote) An infrared LED (TOSHIBA: TLN115A, 950 nm) is attached
to a baton tip. The system traces the LED’s motion in 2
dimensions using a CMOS sensor (infrared camera) with a
built-in Wii Remote. The sampling time is 10 ms. The beat-
point is extracted from the local minimum values of the vertical
component (Figure 2-b).
Beat detection using the location information can more
accurately acquire data than using the accelerometer. However,
there are many restricting factors when using infrared sensors,
including outside light, obstacles between the LED and the
camera, the range of the camera, and the irradiation angle of the
LED. As a result, the detection mistakes will be greater than
with other methods.
3.1.3 Capacitance sensor (MIDI Theremin) An electrostatic capacitance sensor (the MIDI Theremin, see
Figure 2-c), one of the main interfaces since iFP, is a device
that captures data on the distance from the sensor to the hand
through the change in electrostatic capacitance, converting the
distance into a MIDI pitch-bend signal, which then controls the
pitch in the sound source. The sampling time is 11 ms and the
local minimum values of the pitch-bend signals are treated as
beat-points. In Nakra’s paper [6], it is described that the
sampling time needs less than 10 ms. It is necessary for the
extraction of the beat-point precisely. However, in the case of
MIDI Theremin, 11 ms is the minimum sampling time. And, it
is the capacitance sensor. Because it captures the distance, it
can extract the beat-point more precisely than the other sensers.
The MIDI Theremin is the most stable device of the three
sensors because it makes fewer mistakes in the recognition and
detection of beat-points. However, the available range of the
measurement is narrow (about 20 cm from the antenna),
increasing the restrictions on user location compared with the
other sensors.
3.2 Expressive performance template V.P. utilizes an expressive performance template, as well as the
score data. V.P. uses CrestMuseXML (CMX) for reading this
detailed information about the score. CMX is a group of
common data formats used to treat various types of information
about music [9].
It deals with MusicXML, which describes score information,
and DeviationInstanceXML (DIXML), which describes
deviation information. The deviation information used in V.P. is
1) attack-deviation for all notes, 2) release-deviation for all
notes, 3) volume-deviation for all notes and 4) tempo-deviation
for all beats. The MusicXML data are available from various
commercial music production software packages such as
“Finale.” The DIXML data are obtained from expressive
performance data in SMF format using our original software.
Figure 2. Gesture Sensors
(a) Glove with accelerometer, (b) Baton with LED
and Wii Remote, (c) MIDI Theremin, from the left
Figure 3. Real performance and V.P.
3.3 Concertmaster Function Figure 3 shows the role of the concertmaster in real orchestra
and V.P. As mentioned at the section 2.2, the concertmaster of
V.P. plays a role of coordinating the intention of the conductor
(player) and that of the virtual orchestra that is derived from the
performance template. The scheduler of V.P. estimates the
player's tempo of the next beat from the history of beat
positions that the player has given, and calculates the scheduler
tempo of the next beat, using the estimated player's tempo and
the tempo described in the template.
In real performance, each member has a performance model:
how the music should be performed. The model is constructed
in his/her head based on the member’s own musicality or
various information about the music. The information includes
the music style, the beat time and the tempo suitable etc. And,
the performance is created by changing the model along to the
tempo shown by the conductor. The concertmaster’s role is
combining the members with the conductor smoothly.
In V.P., because there is no orchestra, the performance model
does not exist too. One of roles of the expressive performance
template is a substitution of this model. Because the expressive
performance in the template is created based on the
performance model of the orchestra members. Then, the
template should not be static, but dynamic. In iFP, because the
template was static, there were cases where the player failed in
the performance when the player’s tempo was different from
the template’s tempo. In V.P., the expressive performance
template is dynamic. It is revised by the music style or the
prediction tempo similarly to the performance model of real
Proceedings of the 2010 Conference on New Interfaces for Musical Expression (NIME 2010), Sydney, Australia
265
Real performance and VP
115
gravity is nearer the player’s hand than it is with the Wii
Remote.
There are two advantages to using the acceleration sensor. One
is not being limited by the V.P. player’s placement and
direction and the other does not need to use caution when
making the conducting motion. V.P. players can construct any
conducting figure they like compared with the two ways
described in more detail below that have large restrictions on
position.
In contrast, the acceleration value increases and decreases
significantly as a result of various factors other than the vertical
movement, such as rotation. Therefore, beat tracking using
acceleration sensors will introduce more mistakes than other
sensing inputs.
3.1.2 Baton using infrared sensor (Wii Remote) An infrared LED (TOSHIBA: TLN115A, 950 nm) is attached
to a baton tip. The system traces the LED’s motion in 2
dimensions using a CMOS sensor (infrared camera) with a
built-in Wii Remote. The sampling time is 10 ms. The beat-
point is extracted from the local minimum values of the vertical
component (Figure 2-b).
Beat detection using the location information can more
accurately acquire data than using the accelerometer. However,
there are many restricting factors when using infrared sensors,
including outside light, obstacles between the LED and the
camera, the range of the camera, and the irradiation angle of the
LED. As a result, the detection mistakes will be greater than
with other methods.
3.1.3 Capacitance sensor (MIDI Theremin) An electrostatic capacitance sensor (the MIDI Theremin, see
Figure 2-c), one of the main interfaces since iFP, is a device
that captures data on the distance from the sensor to the hand
through the change in electrostatic capacitance, converting the
distance into a MIDI pitch-bend signal, which then controls the
pitch in the sound source. The sampling time is 11 ms and the
local minimum values of the pitch-bend signals are treated as
beat-points. In Nakra’s paper [6], it is described that the
sampling time needs less than 10 ms. It is necessary for the
extraction of the beat-point precisely. However, in the case of
MIDI Theremin, 11 ms is the minimum sampling time. And, it
is the capacitance sensor. Because it captures the distance, it
can extract the beat-point more precisely than the other sensers.
The MIDI Theremin is the most stable device of the three
sensors because it makes fewer mistakes in the recognition and
detection of beat-points. However, the available range of the
measurement is narrow (about 20 cm from the antenna),
increasing the restrictions on user location compared with the
other sensors.
3.2 Expressive performance template V.P. utilizes an expressive performance template, as well as the
score data. V.P. uses CrestMuseXML (CMX) for reading this
detailed information about the score. CMX is a group of
common data formats used to treat various types of information
about music [9].
It deals with MusicXML, which describes score information,
and DeviationInstanceXML (DIXML), which describes
deviation information. The deviation information used in V.P. is
1) attack-deviation for all notes, 2) release-deviation for all
notes, 3) volume-deviation for all notes and 4) tempo-deviation
for all beats. The MusicXML data are available from various
commercial music production software packages such as
“Finale.” The DIXML data are obtained from expressive
performance data in SMF format using our original software.
Figure 2. Gesture Sensors
(a) Glove with accelerometer, (b) Baton with LED
and Wii Remote, (c) MIDI Theremin, from the left
Figure 3. Real performance and V.P.
3.3 Concertmaster Function Figure 3 shows the role of the concertmaster in real orchestra
and V.P. As mentioned at the section 2.2, the concertmaster of
V.P. plays a role of coordinating the intention of the conductor
(player) and that of the virtual orchestra that is derived from the
performance template. The scheduler of V.P. estimates the
player's tempo of the next beat from the history of beat
positions that the player has given, and calculates the scheduler
tempo of the next beat, using the estimated player's tempo and
the tempo described in the template.
In real performance, each member has a performance model:
how the music should be performed. The model is constructed
in his/her head based on the member’s own musicality or
various information about the music. The information includes
the music style, the beat time and the tempo suitable etc. And,
the performance is created by changing the model along to the
tempo shown by the conductor. The concertmaster’s role is
combining the members with the conductor smoothly.
In V.P., because there is no orchestra, the performance model
does not exist too. One of roles of the expressive performance
template is a substitution of this model. Because the expressive
performance in the template is created based on the
performance model of the orchestra members. Then, the
template should not be static, but dynamic. In iFP, because the
template was static, there were cases where the player failed in
the performance when the player’s tempo was different from
the template’s tempo. In V.P., the expressive performance
template is dynamic. It is revised by the music style or the
prediction tempo similarly to the performance model of real
Proceedings of the 2010 Conference on New Interfaces for Musical Expression (NIME 2010), Sydney, Australia
265
tempo and beat prediction control the performance model combine smoothly template and conductor
https://www.youtube.com/watch?v=t_ClcuIcn5k
VirtualPhilharmony
116
117
3. Models for music production
Models for music performance production: classic music
focus on pitched sounds melodic and harmonic organization linguistic paradigm
contemporary music focus on sound, timbre acoustics, psychoacoustic performance as timbre control
From structural models to sound control models sound synthesis live electronics
118
Performance models for sound synthesis
mapping strategies how to map input values and gestures to acoustic parameters one to one, few to many, scaling
processes, algorithmic control low level parameters generated by algorithms control of algorithms (e.g. masks)
different abstraction levels from abstract to concrete time scales
from note to sound event
score process gesture
performance model
sound synthesis
Electronic music performer
sound
119
Performance models for live electronics
Live electronics performer processes instrumental sound combined effect (e.g. deviations of deviations)
score
Instrument performer
Live electronics
sound
Live electronics performer
musical instrument
audio feedback
120
Automatic performance: problems
The idea of automatic expressive music performance, especially when it is applied to the performance of classical music that was not written for this purpose, is questionable. The possibility to completely model and render the artistic creativity implied in the performance is still to be demonstrated. State of the art models
rendering of some relevant aspects of a musically acceptable performance, but not sufficient for a full artistic appreciation.
121
Automatic performance
Automatic performance can be acceptable when a real artistic value is not necessary (even if useful) when the alternative is a mechanic performance of the score (as in many sequencers) in entertainment application when it not necessary to preserve the exact artistic environment of the composition, as in popular music
when music is expressly created bearing in mind the use of technology
Toward model-listener interaction (conductor paradigm)?
122
b. Models for artistic creation
In the era of information society, artists are always more frequently using technology in their artworks.
sound generation and processing algorithmic composition interactive performance
How to control sound synthesis or processing engines (systems, algorithms, etc.)?
typical performance topic need of establishing and computing the relation of musical and compositional aspects with sound parameters joint model development with composer
Performing environment
123
c. Performance models for education
Another important possible application of performance models, even of classical music, is in education. The knowledge embodied in performance models can help the teacher to
rationalize the performance strategies better convey his teaching effort.
124
Discussion
Need of a well-founded approach based on strong scientific knowledge
from classical music performance studies to performance models of new music creation. from the practical knowledge of new music creators to new performance models.
Models for artistic creation:
joint effort of scientists and musicians really new tools, not only inspired to problems and solutions of the past
Models for expressiveness extraction and processing Toward models of/for performing arts