Expressive music performance...Prelude No. 3 by Nino Rota which audio clip was performed by a...

1

Expressive music performance

Chapt. 7

2

1. Why study expressiveness

Understanding human communication strategies Non-verbal communication Expressiveness tells “how to take” the explicit message Disambiguate language expressions (e.g. in a movie)

To embody expressive knowledge in machines To adapt HCI to the basic forms of human behavior

Artistic applications, games and entertainment Auditory Display, Music Retrieval, Education

What is expressiveness (in music performance)

refers to the effect of auditory parameters of music performance covering acoustic, psychoacoustic, and/or musical factors refers to the variation of auditory parameters away from a prototypical performance, but within stylistic constraints

is not deviation relative the musical score, but the sound world that the score attempts to represent

is not the same as emotion is used in the intransitive sense of the verb

no specific feeling is necessarily being expressed; rather the music performance sounds “expressive“ to differing degrees

depends on historical and cultural context

3 Fabian,Timmers, Schubert 2014

Expressiveness: content and layers

Content Layer Sample Dimensions

Musical expressiveness

Composi2onal structure

'objec2ve' expressiveness

Performance expr. amount appropriate expr.

Emo2onal expressiveness


composi2onal features

Performance expressive devia2ons

Sensory / motor expressiveness


sensory quali2es

Performance KE space, ac2on metaphor

4

5

1. Expressiveness in music performance

Expressiveness refers to: aspects of a musical performance that are under the control of the performer the means used by the performer to convey the composer’s message the performer’s contribution to enrich the musical message.

0 20 40 60 80 100 120 35 40 45 50 55 60 65 70

bar 16 bar 8

bar 4 bar 2

key

velo

city

note number

bar 10 bar 12

dynamic profiles vs. musical structure

6

Expressiveness related to the musical piece

Strategies to highlight the musical structure to disambiguate passages where several structural interpretations are possible

Dramatic narrative developed by the performer Emotional content or expressive intention Physical and motor constraints or problems (e.g. fingering) Stylistic expectation based on

cultural norm (e.g. jazz vs. classic music) actual performance situation (e.g. audience engagement)

Musical instrument (e.g. W. Carlos)

Palmer, 1997

7

Expressiveness added by the performer

Performers may play the same piece according diverse nuances: emotions, affects (perceived, induced) KANSEI (i.e. sensibility, feeling, sensitivity) expressive intention

include emotion, affects as well other sensorial and descriptive adjectives or actions evidences the explicit intent of the performer in communicating expression

Artistic expression

Aggressive Gloomy Active

8

Music communication chain

accessible information

Friberg 1995

perform

score

implicit information

expressive intentions performance

expressive cues

Modelli computazionali per l’esecuzione espressiva della musica

Tradurre in suono i simboli in cui è codificata una partitura musicale

Op2cal Music Recogni2on

Modelli per l’esecuzione espressiva partitura

digitale (MIDI, MusiXml)

partitura grafica

suono

Strumento o Modelli di sorgente/

segnale segnali di controllo

Musical Turing Test

A B Prelude No. 3 by Nino Rota

which audio clip was performed by a computer?



evento partitura evento sonoro

frequency pitch metric position metric length

P(n) MP(n) ML(n)



0

10

20

30

40

50

60

0 20 40 60 80 100 120 140

onse

t time

[s]

metric position [beats]

Esecuzione nominale



0

10

20

30

40

50

60

0 20 40 60 80 100 120 140

onse

t tim

e [s

]

metric position [beats]

Esecuzione nominale

Esecuzione espressiva


Le deviazioni espressive sono correlate a specifiche strutture ritmiche, melodiche e armoniche La sintassi del “linguaggio” musicale (suddivisione in motivi, frasi, periodi) ha un ruolo importante Modelli delle deviazioni espressive a partire dagli Anni ’80 (Sundberg et al. 1983, Todd 1992, Widmer 2003, …) Esistono anche aspetti connotativi (emozioni, sensazioni, …), che dipendono dalle intenzioni espressive dell’esecutore

Accessible information

Symbolic information: score:

note list performance instructions: e.g. agogic marks

features extracted from score musical structure is not evident annotations (e.g. structure, prosody, etc.)

Physical information: audio signal

acoustic and perceptual features extracted from audio performance actions: e.g. MIDI

performer gestures listener’s physiological response

16

17

Accessible information

Expressive information basic emotions

Happiness, Anger, Sadness, Fear, Tenderness emotion space è Valence / Arousal sensorial space è Kinetics / Energy expressive intentions metaphorical description

Frequently, focus on expressive deviations

18

Expressive deviations (variations)

Systematic presence of deviations from the musical notation Deviations are only the external surface

Measurable: from Midi performances: timing, key velocity from audio signal (partially)

Reference for computing deviations: score, both for theoretical (the score represents the music structure) and practical (it is easily available) reasons global (longer time spam) measurements as reference for local ones

mean performance neutral performance

Mozart K622: neutral Mozart K622: score

19

10

20

30

40

50

60

70

80 ×DR [%] Horowitz 65

-10

-20

-30

1 2 3 4 5 6 7 8 910

20

30

40

50

60

70

80

-10

-20

-30

10

10

10

20

30 ²DR [%] Schnabel

-10

-20

1 2 3 4 5 6 7 8 9

30

-10

-20

10

20

30

40

50 ²DR [%] Brendel

-10

-20

-30

1 2 3 4 5 6 7 8 9

20

30

40

50

-10

-20

-30

Inte

rons

et In

terv

al d

evia

tions

(%)

Brendel

Schnabel

Horowitz 65

Timing variations by 3 pianists

(Träumerei first 8 bars) deadpan

20

Why do these deviations exist?

The score serves mainly: as a mean to convey instructions to the performer as an aid for memorization and preservation.

The score may serve as a representation of cognitive elements of the composition rather than the physical parameters.

The performer interprets these symbols, taking into account the implicit information his/her personal artistic feeling and aim.

The liberty of the musicians to exhibit their own personal interpretation has varied, but has rarely been completely denied.

see e.g. prosody in speech è Musical prosody

22

Information and music performance

Expressive performance parameters Physical information level

timing of musical events and tempo, dynamics and articulation . . .

Symbolic information level

Score Musical structure (?)

23

Example of symbolic information

Tree representing the hierarchical structure of the first 16 bars extracted from Mozart sonata K 545

rectangles represent structural elements, circles represent notes.

24

Information representation

Event information representation: EV[n] score as a sequence of events: EV[·]

Time parameters pitch value FR[n], Onset time O[n] and Duration DR[n]

IOI[n] = O[n + 1] – O[n] Inter Onset Interval L[n] = DR[n]/IOI[n] Legato

Intensity I[n] or KeyVelocity KV [n] for MIDI event

Timbre-related parameters. Brightness BR[n] energy envelope

Attack Duration AD[n] Envelope Centroid EC[n]

25

Time information representation

space metaphor: score position: x[n]

measured in musical units (one musical unit = whole note value) metrical units (beats or bars).

length: symbolic duration of an event distance: interval between events

Both can be measured in musical or metrical units e.g. in a 3/4 rhythm

the length of 1 bar = 3 beats (quarter note) = 0.75 musical units (whole notes) the corresponding distance of the event at the beginning of a bar from the event at the beginning of the next bar = 3 beats (quarter note) = 0.75 musical units (whole notes) wide.

26


symbolic time reference can be absolute (as in the so called piano roll notation) or relative: the distance from a previous event is specified

score as a sequence of events distance between two adjacent events: note or rest value NV [n] score position

x[n + 1] = x[n] + NV [n]

symbolic time in seconds nominal time tnom or score time

nominal onset time Onom[n] nomimal duration DRnom[n] inter onset interval (nominal length)

IOInom[n] = Onom[n+1] - Onom[n]

27

Time information representation: Tempo

Tempo: measured by metronome v = score length / performance duration

tnom = x/v Onom[n] = x[n]/v

e.g. with of MM = 120 quarters per minute, v = 2 quarters (beat) per second a beat will lasts 0.5 seconds.

A piece of 16 bars with rhythm 3/4 ( e.g. a waltz) will have total length of 12 musical units and 48 beats.

this piece played exactly at MM = 100 quarters per minute will lasts 28.8 seconds

v = 12/28.8 = 0.417 musical units per second v = 48/28.8 = 1.67 beat per second.

28


at physical level: performance time t measured in seconds Onset time: O[n] Duration: DR[n] Inter Onset Interval: IOI [n] = O[n + 1] − O[n]

IOI deviation: ΔR = IOI [n] / IOInom[n] [%] Legato: L[n] = DR[n]/IOI[n]

29


the relation between performance time and score position x(t) or t(x)

Tempo Mean tempo Main tempo: prevailing tempo Basic tempo: central tendency Local tempo Event tempo

Tempo as function of score position Duration of a measure Relative IOI

30

Example of time map

Tempo v(x)

Time shift: timing measured as deviations from a regular pulse

Time map: t(x)

31

Time shift and micro-pause

time shift: deviation from expected

32

Model representations

Time representation Discrete values è events

e.g. articulation of timing of individual notes Continuous values è functions

e.g. vibrato Granularity

numerical (absolute, relative) values e.g. time interval in ms, relative IOI

categorical e.g. staccato vs. legato, shortening vs. lengthening

Time scale single note, e.g. attack time, vibrato local: few notes, e.g. articulation of a melodic gesture more global: e.g. phrase crescendo

33

2.1 Research on music performance: modeling

’30-’40: Seasore collected and analysed many objective measurements from performances ’40-’70: no much activity ’60: beginning of computer music ’70- . . .: computational approaches

Model development strategies: a.  based on intuition b.  based on measurements: statistical and mathematical models c.  analysis by synthesis, e.g. rule based d.  data driven

34

a. Models for understanding, based on intuition

Human analyst select musical parameters devises a theory or a mathematical model assesses empirical validity

by testing on real performance data by expert evaluation

Outputs: expressive deviations expressiveness recognition

35

Performance expressive parameters

Main expressive parameters: tempo and timing (global, local) dynamics articulation

for some instruments, voice vibrato, tremolo intonation

for contemporary music timbre virtual space position

How do parameters interact?

36

b. Modeling strategies for understanding

Analysis-by-measurement based on the analysis of deviations measured in recorded human performances aims at recognizing regularities in the deviation patterns and to describe them by means of a mathematical model, relating score to expressive values

Performing controlled experiments: by manipulating one parameter in a performance (e.g. the instruction to play at a different tempo) the measurements reveal something of the underlying mechanisms.

37

Analysis by measurements

Analysis-by-measurement Selection of performances. Measurement of the physical properties of every note. Reliability control and classification of performances. Selection and analysis of the most relevant variables. Statistical analysis and development of mathematical interpretation model of the data.

38

c. Analysis by synthesis

Analysis by synthesis models Development method

Model formalizes intuition, knowledge of expert musician Performance generation by model simulation Evaluation by experts Refinements

The system acts as a student

KTH rule system

39

Analysis by synthesis: method

Collect data from analysis by measurements Further steps

Start with a hypothesized principle repeat

realize it in terms of a synthetic performance evaluate it by listening. if needed,

modify the hypothesized principle or formulate new rule until the performance is satisfactory

40

Analysis-by-synthesis: pros and cons

Advantages: perception of the music is directly used in evaluation general validity of the hypotheses can directly be tested by applying it to other music examples feedback loop is very short

Disadvantages conclusions are based on the expertise of just a few people. parameters can be chosen rather arbitrarily.

The success depends on:

the formulation of hypotheses on competent listeners.

d. Data driven approach

Search for and discover dependencies on very large data sets via machine-learning and data-mining techniques; goal: ‘explaining’ the given data as well as possible.

No preliminary hypothesis: bottom up approach

Advantages: rooted in large amounts of empirical data the machine may discover truly novel and unexpected things good results in extracting local musically relevant patterns

Problems: difficult to incorporate musicological knowledge into the process challenging: human-interpretable high level models

41 Widmer et al.

42

Example

Dynamics deviation learned from the training pieces applied to Chopin Waltz Op.18, Op.64 no.2

43

Example

Tempo deviation learned from the training pieces applied to Chopin Waltz Op.18, Op.64 no.2

Models

Model based on measurements Tod’s model, Final retard

Analysis by synthesis KTH rule systems CaRo system

Data driven SaxEx YQX

Interactive systems: Virtual Philarmony Expressive sound movements

44

45

d1. Case-based reasoning

Case-based reasoning: Appropriate for problems where:

many examples of solved problems can be obtained a large part of the knowledge involved in the solution of problems is tacit, difficult to verbalize and generalize. new solved problems can be revised by humans and memorized (= learn by experience)

A case consists of a problem, its solution, and, typically, annotations about how the solution was derived.

Saxex system (Arcos, 1997) spectral processing example: Autumn lives

Inexpressive Expressive

Solution Review Retain

Adapt

Retrieve Similar New

Problem

46

Example: KTH performance rule system

Reasoning: rule system if < condition > then < action >

weighted addition of many rules dev[n] = Σ ki fi (note[n] )

KTH performance rule system differentiation rules grouping rules synchronization rules meta-rules

47

Types of rules

Differentiation Rules enhance the differences between scale tones, and between note values Grouping Rules show which tones belong together and which do not.

In music the "belonging-togetherness" exists at several levels simultaneously. Tones constituting melodic gestures, such as belong together, as do tones constituting phrases. The rules mark, by means of micro-pauses and lengthening of tones, the boundaries between all these tone groups.

Ensemble Rules keep the order in ensembles. They achieve synchronization by lengthening and shortening the individual tones

Meta-Rules for emotional expressive performance choice of rules and weight

48

KTH Rules examples

Differentiation rules Duration Contrast High sharp Melodic charge

Grouping rules Phrase arch Punctuation Final retard

Ensemble rules Melodic synchronization Bar synchronization

Macro-Rules for emotional expressive performance

49

Differentiation rule: Duration Contrast

The contrast between long and short note values, such as half note and eighth note, is sometimes enhanced by musicians. They play the short notes shorter and the long notes longer than nominally written in the score. No contrast:

50

Duration contrast

Medium, k = 2.2

Exaggerated, k = 4.4

51

Duration contrast

Inverted, k = -2.2

52

Differentiation of Pitch Categories (2)

MELODIC CHARGE: This rule accounts for the "remarkableness" of the tones in relation to the underlying harmony. Sound level, duration and vibrato extent are increased in proportion to the melodic charge value

Emphasis rule

53

Example: Melodic charge

J S Bach, Fugue theme of the Kyrie movement from the b-minor Mass, BWV 232 No rule K= 1.5

54

Microlevel Grouping (1)

PUNCTUATION: The melody can be divided into small musical gestures normally consisting of a few notes. This rule tries to identify and perform these gestures. It consists of two parts:

the gesture analysis application of these in the performance.

The gesture analysis is a complex system of 14 subrules where concepts from LEAP ARTICULATION and ACCENTS are used. The identified gestures are performed by inserting micro-pauses at the boundaries

55

PUNCTUATION rule

The main principles for the finder rules are: in melodic leap, with different weights for different contexts, after longest of five notes, after appoggiatura before a note surrounded by longer notes after a note followed by two or more shorter notes of equal duration

The eliminator rules remove marks or reduce weights in this cases after very short notes in a melodic step motion when several duration rules interact at two adjoining marks in a tone repetition

56

Punctuation rule

J S Bach: Fugue theme from Fantasia und Fuga, g minor, BWV 542. In the example another rule is also applied

No rule Medium Exaggerate

57

Macro-grouping rule: Phrase arch

PHRASE ARCH Tempo curves in form of arches with an initial accelerando and a final ritardando are applied to the phrase structure. sound level is coupled with the tempo variation so as to create crescendi and diminuendi.

In this example two other rules are applied: Durational contrast Punctuation

F. Mendelsohn, Aria # 18 from "St. Paul", Op. 36.

58

Phrase arch

Medium, k = 1.5

Exaggerated, k = 2.5

59

Phrase arch

Inverted, k = -1.5

60

Macro rules: for emotions

Macro-Rules for Emotional Expressive Performance choice of rules and weight

Examples Fear Anger

61

IOI deviations

dB deviations

Emotional expression: SADNESS

SAD

DEAD-PAN

articulation

62

e. Modeling expressive intentions: CaRo

CaRo model: expressive deviations from an input neutral performance elementary transformations, that not destroy the relation between deviations and musical structure

Musical Expressiveness

Model

Expressive Intentions

MIDI Performance

Symbolic Description

of Audio Perf.

Audio Performance (Sinusoidal

Model)

MIDI Rendering

[ {ai},{fi},{phii} ]

( Off-Line )

Audio Processing

63

Example: measured deviations

Pianist A’s dynamics profiles measured in a single performance

Dynamics profiles show that there are many differences between the performances In spite of these differences, some similarities can be found amount the profiles

0 20 40 60 80 100 120

30

40

50

60

70

80

na br da ha so he li pa flKe

y ve

loci

ty

note number

64

Normalized measured deviations

0 20 40 60 80 100 120

-4

-3

-2

-1

0

1

2

3 na br da ha so he li pa flKe

y ve

loci

ty

note number

Pianist A’s dynamics profiles, normalized to zero mean and unitary variance

65

Average of measured deviations

Grand average of the pianist A’s dynamics profiles

0 20 40 60 80 100 120

-4

-3

-2

-1

0

1

2

3

Key

velo

city

note number

66

Computing deviations: variables k and m

= estimated profile = neutral performance = mean of the profile

67

Examples

MIDI

Corelli Op. V neutral

bright

Post processing

Input Output

Mozart K622 neutral

Mozart K622 nominal

bright

dark

hard

light

soft

heavy

heavy

light

hard

Interaction: the control space

Interactive Interface Multimodal Control Multimedia Context

68

69

Multimodal Control

Human communicates his own intention in a multimodal way For the human-machine communication, we may distinguish two main classes of possible interfaces:

Graphic panel dedicated to the control. The control variables are directly displayed on the panel and the user should learn how to use it. Multimodal, where the user interacts ‘freely’ through movements and non-verbal communication à Task of the interface is to analyze and to understand human intention.

Movement analysis

Expressive music synthesis

Movement Sound

70

Control spaces

The control space is the user interface, which controls, at an abstract level, the expressive content and the interaction between the user and the audio object of the multimedia product.

Perceptual expressive space, was derived by multidimensional analysis of various professionally performed pieces ranging from western classical to popular music. Synthetic expressive space, allows the author to organize his own abstract space by defining expressive points and positioning them in the space

71

Mapping strategies

From position in control space (x,y) to expressive variables k and m Perceptual Expressive Space: linear transformation of position

Synthetic Expressive Space: quadratic interpolation

From parameters P of neutral performance and score P to expressive parameters Pe

72

Examples

drawing a trajectory in the control space, the expressive content change accordingly

Score Expressive

Mozart K 545

from Natural

Mozart K332

73

Expressive audio rendering

Signal processing techniques based on sinusoidal model Time stretching Pitch Shifting Amplitude Envelope Control Brightness control

Rendering of expressive deviations: Local tempo:

time stretching (different for attack, sustain and release) Legato:

time stretching and pitch shifting (with ad-hoc spectral processing) Intensity:

amplitude envelope and brightness Envelope Shape:

amplitude envelope (with triangular-shaped functions)

Example

Expressive post-processing Time-frequency processing for legato

74

75

Real time expressive processing Corelli - Violin Sonata in A dur op. V (excerpt)

Corelli Op. V neutral

Att Vel Leg Int Inv

76

Once upon a time…

3 pages

1 page

•  The control space was also implemented as an Applet

•  It is used to change interactively the sound comment of the multimedia product

•  The author can associate different expressive intentions to the character

Caro 2.0

77

78

YQX: a model for expressive music performance

Data Performances of Mozart Piano Sonatas by Roland Batik Performances of Chopin Piano Pieces by Nikita Magaloff

Method Model score-performance dependencies as a probabilistic network:

79

Probabilistic network

Bayesian model relations between score context and target variables

Model p(Y | Q,X) as a Gaussian distribution N (µ, σ2) that varies linearly with X:

Estimate the parameters and by linear regression (least squares)

σ2 is the average residual error Trained by estimating, separately for each target variable, The dependency on the discrete variables Q is modeled by computing a separate model for each possible combination of values of the discrete values.

80

Features/Targets

Targets (Y): timing, loudness, articulation

Discrete Features (Q): pitch interval, rhythmic context, I-R label, I-R position

Continuous Features (X): I-R closure, duration ratio

81

Performance Rendering

1.  Read Expressive hints from score (trill, cresc., rit., p, f, etc.) Set up basic tempo curve Set up basic loudness curve

2.  Extract features from score

3.  Predict expressive deviations from model; For every note and every target:

Enter features as evidence in network Infer most likely timing/loudness/articulation value

4.  Combine basic curves and predicted deviations (for tempo/loudness respectively)

82

YQX system

YQX system at Rencon 2008

83

84

f. Expressiveness of virtual sound movements

The composer asked for expressive sound movements

Experiment: stimuli speed (fast, slow) path (continuous circular, discontinuous random) articulation (legato, staccato) timbre (white noise, harmonic sound)

Results: valence – arousal dimensions similar space location for similar movements strong relation between speed and arousal axis timbre influence legato-staccato is a parameter stronger than path

De Gotzen, IEEE MM 04

85

A. Guarnieri Medea (2002)

Video opera: PalaFenice, Venezia, 2002 sound movement in space as a musical parameter expressive matching between instrumental and sound movement gestures metaphor: the public enters the physical movement

trombone player’s gesture gesture analysis expressive spatialization

Movimento del suono delle trombe in funzione della sua dinamica

Guarnieri, Medea Viene estratto l’inviluppo di ampiezza e trasformato in movimento longitudinale di spazializzazione

Alvise Vidolin 86

Riduzione stereofonica

Movimento sinistra - destra

sinistra

destra

87

Medea, by Adriano Guarneri

Teatro La Fenice, Venezia, 2003 72-element choir, 62-element orchestra, 4 soloist singers Live electronics: reinforcement, spatialization, real time sound processing

Another test

A B

Which is your favorite performance?

Allegro Burlesco by Kuhlau Op. 88, n. 3

Another test

A B

Which is your favorite performance?

90

A dynamic model of phrasing (Todd)

Music and motion musical movement has two degree of freedom, tonal movement and rhythmic movement; this movement is similar to and imitates motion in physical space; the object of motion in physical space, to which musical movement alludes, is that of a body or limb.

91

Definitions

Definitions: x = score position t = performance time a = acceleration u = initial velocity

x (t) or t(x)

92

Linear tempo model

tempo is supposed to vary linearly in time

93

Energy, tempo and intensity

A piece can be decomposed in a hierarchical sequence of segments Every segment is characterized by an accelerando-ritardando pattern The analogy is the movement of a particle of mass m in a V-shaped potential well of length L Energy (E) = Kinetic energy + Potential energy (V)

Kinetic energy = m v2 /2

The complete x(t) mapping results shaped as piece-wise parabolas.

94

Example: simulation vs. performances

Chopin Prelude Simulation (bold line) vs Performance data (dotted line)

95

Example

Theme from the six variations composed by Beethoven over the duet Nel cor più non mi sento

96

Example

Nel cor più non mi sento

Music and Structure (Todd, 1985)

97

Phrase structure

Predicted timing (IOI)

98

Multilevel decomposition (Widmer Tobudic)

Multilevel decomposition (Widmer Tobudic)

Multilevel decomposition of dynamics curve of performance of Mozart Sonata K.279:1:1, mm.31–38

reconstruction residual

Music and Motion (Friberg & Sundberg, 1999)

100

Pianists (tempo) OR Pianists (IOI)

101 symbolic time

tempo

Final retard

Final retard is a three phase quasi-mechanical movement: given constant tempo quadratic function (parabola) for tempo decrease from mechanical retard with constant force linear final tempo curve

102

Mathematical models: Mazzola model

Developed by Guerino Mazzola and colleagues in Zurich Mathematical music theory and performance model

Applies analysis and performance components computer-aided analysis tools for musical structure

each aspect implemented in a Rubbette (plugin) performance is generated with the Rubettes uses “Stemma/Operator” theory for mapping

MetroRUBETTE (inner) metrical analysis result is different than Lerdahl and Jackendoff metrical analysis used linear mapping between metrical weight and tone intensity to generate a performance

Empirical Evaluation not evaluated against real performances

103

YQX: a model for expressive music performance

Data Performances of Mozart Piano Sonatas by Roland Batik Performances of Chopin Piano Pieces by Nikita Magaloff

Method Model score-performance dependencies as a probabilistic network:

104

Probabilistic network

Bayesian model relations between score context and target variables

Model p(Y | Q,X) as a Gaussian distribution N (µ, σ2) that varies linearly with X:

Estimate the parameters and by linear regression (least squares)

σ2 is the average residual error Trained by estimating, separately for each target variable The dependency on the discrete variables Q is modeled by computing a separate model for each possible combination of values of the discrete values.

105

Features/Targets

Targets (Y): timing, loudness, articulation

Discrete Features (Q): pitch interval, rhythmic context, I-R label, I-R position

Continuous Features (X): I-R closure, duration ratio

106

Performance Rendering

1.  Read Expressive hints from score (trill, cresc., rit., p, f, etc.) Set up basic tempo curve Set up basic loudness curve

2.  Extract features from score

3.  Predict expressive deviations from model; For every note and every target:

Enter features as evidence in network Infer most likely timing/loudness/articulation value

4.  Combine basic curves and predicted deviations (for tempo/loudness respectively)

107

YQX system

YQX playing "My Nocturne" at Rencon 2008

108 https://www.youtube.com/watch?v=IS3519TUkzg

109

Tempo and articulation prediction.

Tempo and articulation are predicted by a Bayesian Network modelling dependencies between score and performance as conditional probability distributions. The score model comprises simple score descriptors (rhythmic, melodic and harmonic) and higher level features from the Implication-Realization (I-R) model of melodic expectation by E. Narmour (I-R-labels and a derivation of I-R-closure).

110

Tempo prediction

Local tempo, a per-note prediction of long-term tempo changes; take the surrounding performance context into account the Bayesian network is unfolded in time and an adapted version of the Viterbi Algorithm for HMM is used to predict a series of tempo changes that is optimal with respect to the complete piece

Note timing, note-to-note deviations from local tempo using only the immediate score context and the value predicted for the previous note

Global tempo, extracted from tempo annotations in the score (e.g. andante, a tempo).

Combined to form the final tempo Articulations: only the immediate score context

111

Note level rules

Data driven rule extraction algorithm for musical expression. Two of the rules are used to further enhance the aesthetic qualities of the rendered performances. staccato rule:

if two successive notes have the same pitch, and the second of the two is longer,

then the first note is played staccato. delay-next rule:

if two notes of the same length are followed by a longer note, then the last note is played with a slight delay.

Professional musicians tend to emphasize the melody by playing the melody notes slightly ahead of time

Basis function modeling

A set of basis functions φ is combined in a function f, parametrized by a vector of weights w, to approximate the dynamics measured from a recorded performance of the score. Dynamics as a function f(φ;w) of basis functions (φ1,⋯,φK), representing dynamic annotations and metrical basis functions.

112 Gratchen et al. IEEE Multimedia 2017

Interactive performance systems

Real time control of the expressive performance systems VirtualPhilarmony (Baba, Hashida, Katayose)

conducting interface that enables users to perform expressive music with conducting action. provide its player with the sensation of conducting a real orchestra.

113

experience, conducting systems using audio signals are superior

to those using MIDI signals. However, the audio conducting

systems have to resolve a crucial problem regarding

deterioration of the sound quality caused by the phase vocoder.

“Maestro Musik” [3] and “Wii Music” [4] are video games that

simulate the conducting. The former uses a special baton device

and the latter uses a Wii Remote, giving precedence to

entertainment value rather than the actual sensation of

conducting. As a result, the feeling of conducting is missing an

element of reality and the user often fails to accurately conduct

the performance from a musical standpoint because of the

oversensitive feedback.

2.2 Heuristics of conducting an orchestra As mentioned above, conducting systems have been made for

the purpose of questing for musical expression in computer-

generated music or entertainment. However, the existing

conducing systems so for provide its users with

little sensation conducting a real orchestra, from the experts'

point of view.

In case of real conducting, a conductor faces with members of

an orchestra. The members do not always obey the directions of

the conductor completely. Each of the members has her/his own

musicality and intentions just like the conductor. Interaction

between a conductor and the orchestra yields real musical

performances. The biggest issue of the conducting systems is

that few systems consider the viewpoint orchestra members'

musicalities or the intentions when generating music performances.

iFP, mentioned above, is one of the few systems that deal with

interaction between the player and the machine musicality. The

scheduler of iFP mixes the player’s intention of tempo and

dynamics and that described in a expressive performance

template. It schedule the timing of notes to be issued, using the

predicted tempo given by the following equation

€

Pn+1 = αAn +βBn +γCn (1)

where

€

Pn+1 is the predicted tempo of the next beat,

€

An is the

moving average of the last four beats from the performance history,

€

Bn is the differentiation between the current and the

last tempo, and

€

Cn is the tempo that the template indicates.

€

α ,

€

β and

€

γ are the weighting coefficients for each variable

€

(0 ≤ α,β,γ ≤1) . In iFP, the balance of these weight parameters

characterizes the performance taste that is interaction between

the player and the template that strands for the virtual orchestra.

But the weight parameters used in iFP are static regardless the

music style and the tempo all the time. The weight parameters

should be adjusted as they fit to the music style (E.g. a march, a

waltz), and the template itself also should be adjusted in

accordance with tempo dynamically.

The goal of V.P. is to provide its player with the sensation of

conducting a real orchestra. We implemented a function called

“Concertmaster Function” where, set of heuristics about

musicalities or the intentions that orchestra members are

accumulated for this goal. A human concertmaster conveys

instructions of a conductor to orchestra members, and tells the

orchestra members’ will to the conductor. Generally, the leader

of the first violin section of an orchestra takes charge of the

concertmaster. In V.P., concert-master plays the role of

adjustment of intention of its player and the template. To be

concrete, heuristics such as, which of the past beats are crucial

when predicting the future beat position, or how the data of a

Figure 1. Overview of V.P.

template should be modified as they fit the tempo change are

examined and applied through the analysis of virtuoso music

performances.

3. SYSTEM DESIGN OF V.P. V.P. is a conducting interface, the basic idea of which is derived

from iFP. Figure 1 shows an overview of V.P. In V.P., like

many other conducting systems, a tempo of a beat is detected

by some gesture sensors from conducting motions that a player

draws. It is a conducting interface based on beat tapping. It

captures only one vertical dimension of a gesture of conducting.

V.P. is implemented on Max/MSP by Cycling’74. In V.P, a

tempo of the next beat is predicted from the history of beat that

the player has given. And an expressive performance template

is used, as well as iFP. We introduce heuristics of conducting an

orchestra, as the template works more naturally.

3.1 Gesture sensing device In V.P., the local minimum point of the conducting motion

captured by various sensors. This point is regarded as beat-point.

Three methods were used in the acquisition of the beat-point to

correspond to various conditions of the stage: 1) a glove

interface using an accelerometer sensor, 2) a baton using an

infrared sensor (Wii Remote), and 3) a capacitance sensor

(MIDI Theremin).

3.1.1 Glove interface using an accelerometer

sensor The V.P. players attach an original glove equipped with an

accelerometer sensor to their right hands. The LilyPad Arduino

[7] is used for the accelerometer sensor and the microcontroller.

The three-dimensional acceleration information with a sampling

time of 10 ms is sent to the system through Bluetooth using a

BlueSMiRF [8]. The gravity component is extracted from the

three-dimensional acceleration data. This direction is labeled as

the vertical component, and the local minimum values of the

vertical component are considered beat-points. Figure 2-a

shows pulling on the glove.

We had the option of using the Wii Remote, which is also

equipped with an acceleration sensor, like UBS Virtual Maestro

[4], however, the Wii Remote (160 g) is much heavier than an

ordinary conductor’s baton (10 g). We were concerned that the

heaviness of the Wii Remote would worsen the player’s

conducting motion because the center of gravity of the Wii

Remote is far from the player’s hand. In addition, although the

weight of the original glove is about 100 g, this weight is not a

problem because the player uses a real baton, and its center of

Proceedings of the 2010 Conference on New Interfaces for Musical Expression (NIME 2010), Sydney, Australia

264

Gesture sensors

Glove interface using an accelerometer Baton using infrared sensor (Wii Remote) Capacitance sensor (MIDI Theremin)

114

gravity is nearer the player’s hand than it is with the Wii

Remote.

There are two advantages to using the acceleration sensor. One

is not being limited by the V.P. player’s placement and

direction and the other does not need to use caution when

making the conducting motion. V.P. players can construct any

conducting figure they like compared with the two ways

described in more detail below that have large restrictions on

position.

In contrast, the acceleration value increases and decreases

significantly as a result of various factors other than the vertical

movement, such as rotation. Therefore, beat tracking using

acceleration sensors will introduce more mistakes than other

sensing inputs.

3.1.2 Baton using infrared sensor (Wii Remote) An infrared LED (TOSHIBA: TLN115A, 950 nm) is attached

to a baton tip. The system traces the LED’s motion in 2

dimensions using a CMOS sensor (infrared camera) with a

built-in Wii Remote. The sampling time is 10 ms. The beat-

point is extracted from the local minimum values of the vertical

component (Figure 2-b).

Beat detection using the location information can more

accurately acquire data than using the accelerometer. However,

there are many restricting factors when using infrared sensors,

including outside light, obstacles between the LED and the

camera, the range of the camera, and the irradiation angle of the

LED. As a result, the detection mistakes will be greater than

with other methods.

3.1.3 Capacitance sensor (MIDI Theremin) An electrostatic capacitance sensor (the MIDI Theremin, see

Figure 2-c), one of the main interfaces since iFP, is a device

that captures data on the distance from the sensor to the hand

through the change in electrostatic capacitance, converting the

distance into a MIDI pitch-bend signal, which then controls the

pitch in the sound source. The sampling time is 11 ms and the

local minimum values of the pitch-bend signals are treated as

beat-points. In Nakra’s paper [6], it is described that the

sampling time needs less than 10 ms. It is necessary for the

extraction of the beat-point precisely. However, in the case of

MIDI Theremin, 11 ms is the minimum sampling time. And, it

is the capacitance sensor. Because it captures the distance, it

can extract the beat-point more precisely than the other sensers.

The MIDI Theremin is the most stable device of the three

sensors because it makes fewer mistakes in the recognition and

detection of beat-points. However, the available range of the

measurement is narrow (about 20 cm from the antenna),

increasing the restrictions on user location compared with the

other sensors.

3.2 Expressive performance template V.P. utilizes an expressive performance template, as well as the

score data. V.P. uses CrestMuseXML (CMX) for reading this

detailed information about the score. CMX is a group of

common data formats used to treat various types of information

about music [9].

It deals with MusicXML, which describes score information,

and DeviationInstanceXML (DIXML), which describes

deviation information. The deviation information used in V.P. is

1) attack-deviation for all notes, 2) release-deviation for all

notes, 3) volume-deviation for all notes and 4) tempo-deviation

for all beats. The MusicXML data are available from various

commercial music production software packages such as

“Finale.” The DIXML data are obtained from expressive

performance data in SMF format using our original software.

Figure 2. Gesture Sensors

(a) Glove with accelerometer, (b) Baton with LED

and Wii Remote, (c) MIDI Theremin, from the left

Figure 3. Real performance and V.P.

3.3 Concertmaster Function Figure 3 shows the role of the concertmaster in real orchestra

and V.P. As mentioned at the section 2.2, the concertmaster of

V.P. plays a role of coordinating the intention of the conductor

(player) and that of the virtual orchestra that is derived from the

performance template. The scheduler of V.P. estimates the

player's tempo of the next beat from the history of beat

positions that the player has given, and calculates the scheduler

tempo of the next beat, using the estimated player's tempo and

the tempo described in the template.

In real performance, each member has a performance model:

how the music should be performed. The model is constructed

in his/her head based on the member’s own musicality or

various information about the music. The information includes

the music style, the beat time and the tempo suitable etc. And,

the performance is created by changing the model along to the

tempo shown by the conductor. The concertmaster’s role is

combining the members with the conductor smoothly.

In V.P., because there is no orchestra, the performance model

does not exist too. One of roles of the expressive performance

template is a substitution of this model. Because the expressive

performance in the template is created based on the

performance model of the orchestra members. Then, the

template should not be static, but dynamic. In iFP, because the

template was static, there were cases where the player failed in

the performance when the player’s tempo was different from

the template’s tempo. In V.P., the expressive performance

template is dynamic. It is revised by the music style or the

prediction tempo similarly to the performance model of real


265

Real performance and VP

115

gravity is nearer the player’s hand than it is with the Wii

Remote.

There are two advantages to using the acceleration sensor. One

is not being limited by the V.P. player’s placement and

direction and the other does not need to use caution when

making the conducting motion. V.P. players can construct any

conducting figure they like compared with the two ways

described in more detail below that have large restrictions on

position.

In contrast, the acceleration value increases and decreases

significantly as a result of various factors other than the vertical

movement, such as rotation. Therefore, beat tracking using

acceleration sensors will introduce more mistakes than other

sensing inputs.

3.1.2 Baton using infrared sensor (Wii Remote) An infrared LED (TOSHIBA: TLN115A, 950 nm) is attached

to a baton tip. The system traces the LED’s motion in 2

dimensions using a CMOS sensor (infrared camera) with a

built-in Wii Remote. The sampling time is 10 ms. The beat-

point is extracted from the local minimum values of the vertical

component (Figure 2-b).

Beat detection using the location information can more

accurately acquire data than using the accelerometer. However,

there are many restricting factors when using infrared sensors,

including outside light, obstacles between the LED and the

camera, the range of the camera, and the irradiation angle of the

LED. As a result, the detection mistakes will be greater than

with other methods.

3.1.3 Capacitance sensor (MIDI Theremin) An electrostatic capacitance sensor (the MIDI Theremin, see

Figure 2-c), one of the main interfaces since iFP, is a device

that captures data on the distance from the sensor to the hand

through the change in electrostatic capacitance, converting the

distance into a MIDI pitch-bend signal, which then controls the

pitch in the sound source. The sampling time is 11 ms and the

local minimum values of the pitch-bend signals are treated as

beat-points. In Nakra’s paper [6], it is described that the

sampling time needs less than 10 ms. It is necessary for the

extraction of the beat-point precisely. However, in the case of

MIDI Theremin, 11 ms is the minimum sampling time. And, it

is the capacitance sensor. Because it captures the distance, it

can extract the beat-point more precisely than the other sensers.

The MIDI Theremin is the most stable device of the three

sensors because it makes fewer mistakes in the recognition and

detection of beat-points. However, the available range of the

measurement is narrow (about 20 cm from the antenna),

increasing the restrictions on user location compared with the

other sensors.

3.2 Expressive performance template V.P. utilizes an expressive performance template, as well as the

score data. V.P. uses CrestMuseXML (CMX) for reading this

detailed information about the score. CMX is a group of

common data formats used to treat various types of information

about music [9].

It deals with MusicXML, which describes score information,

and DeviationInstanceXML (DIXML), which describes

deviation information. The deviation information used in V.P. is

1) attack-deviation for all notes, 2) release-deviation for all

notes, 3) volume-deviation for all notes and 4) tempo-deviation

for all beats. The MusicXML data are available from various

commercial music production software packages such as

“Finale.” The DIXML data are obtained from expressive

performance data in SMF format using our original software.

Figure 2. Gesture Sensors

(a) Glove with accelerometer, (b) Baton with LED

and Wii Remote, (c) MIDI Theremin, from the left

Figure 3. Real performance and V.P.

3.3 Concertmaster Function Figure 3 shows the role of the concertmaster in real orchestra

and V.P. As mentioned at the section 2.2, the concertmaster of

V.P. plays a role of coordinating the intention of the conductor

(player) and that of the virtual orchestra that is derived from the

performance template. The scheduler of V.P. estimates the

player's tempo of the next beat from the history of beat

positions that the player has given, and calculates the scheduler

tempo of the next beat, using the estimated player's tempo and

the tempo described in the template.

In real performance, each member has a performance model:

how the music should be performed. The model is constructed

in his/her head based on the member’s own musicality or

various information about the music. The information includes

the music style, the beat time and the tempo suitable etc. And,

the performance is created by changing the model along to the

tempo shown by the conductor. The concertmaster’s role is

combining the members with the conductor smoothly.

In V.P., because there is no orchestra, the performance model

does not exist too. One of roles of the expressive performance

template is a substitution of this model. Because the expressive

performance in the template is created based on the

performance model of the orchestra members. Then, the

template should not be static, but dynamic. In iFP, because the

template was static, there were cases where the player failed in

the performance when the player’s tempo was different from

the template’s tempo. In V.P., the expressive performance

template is dynamic. It is revised by the music style or the

prediction tempo similarly to the performance model of real


265

tempo and beat prediction control the performance model combine smoothly template and conductor

https://www.youtube.com/watch?v=t_ClcuIcn5k

VirtualPhilharmony

116

117

3. Models for music production

Models for music performance production: classic music

focus on pitched sounds melodic and harmonic organization linguistic paradigm

contemporary music focus on sound, timbre acoustics, psychoacoustic performance as timbre control

From structural models to sound control models sound synthesis live electronics

118

Performance models for sound synthesis

mapping strategies how to map input values and gestures to acoustic parameters one to one, few to many, scaling

processes, algorithmic control low level parameters generated by algorithms control of algorithms (e.g. masks)

different abstraction levels from abstract to concrete time scales

from note to sound event

score process gesture

performance model

sound synthesis

Electronic music performer

sound

119

Performance models for live electronics

Live electronics performer processes instrumental sound combined effect (e.g. deviations of deviations)

score

Instrument performer

Live electronics

sound

Live electronics performer

musical instrument

audio feedback

120

Automatic performance: problems

The idea of automatic expressive music performance, especially when it is applied to the performance of classical music that was not written for this purpose, is questionable. The possibility to completely model and render the artistic creativity implied in the performance is still to be demonstrated. State of the art models

rendering of some relevant aspects of a musically acceptable performance, but not sufficient for a full artistic appreciation.

121

Automatic performance

Automatic performance can be acceptable when a real artistic value is not necessary (even if useful) when the alternative is a mechanic performance of the score (as in many sequencers) in entertainment application when it not necessary to preserve the exact artistic environment of the composition, as in popular music

when music is expressly created bearing in mind the use of technology

Toward model-listener interaction (conductor paradigm)?

122

b. Models for artistic creation

In the era of information society, artists are always more frequently using technology in their artworks.

sound generation and processing algorithmic composition interactive performance

How to control sound synthesis or processing engines (systems, algorithms, etc.)?

typical performance topic need of establishing and computing the relation of musical and compositional aspects with sound parameters joint model development with composer

Performing environment

123

c. Performance models for education

Another important possible application of performance models, even of classical music, is in education. The knowledge embodied in performance models can help the teacher to

rationalize the performance strategies better convey his teaching effort.

124

Discussion

Need of a well-founded approach based on strong scientific knowledge

from classical music performance studies to performance models of new music creation. from the practical knowledge of new music creators to new performance models.

Models for artistic creation:

joint effort of scientists and musicians really new tools, not only inspired to problems and solutions of the past

Models for expressiveness extraction and processing Toward models of/for performing arts

Expressive music performance...Prelude No. 3 by Nino Rota which audio clip was performed by a...

Documents

Transcript of Expressive music performance...Prelude No. 3 by Nino Rota which audio clip was performed by a...