Classifying human motion for active music systems

AWASS 2012 Case Study Tutorial -Classifying human motion for activemusic systemsArjun Chandra

University of Oslo

2/65

Outline of the Tutorial

Why do motion classification for active music systems?

The motion classification problem.

Established solutions for the problem and demonstration.

Challenges for the week with regards to motion classification foractive music systems.

3/65

Active Music

Videos from yesterday:

iPhone ensemble (UiO)SoloJam (UiO)Performance based musicmaking (Wout Standaert)

http://www.youtube.com/watch?v=mERppQsZePw&feature=player_embedded

http://www.fourms.uio.no/videos/SoloJam.mov

http://www.wiigee.org/demovideo/demovideo.html

4/65

Active Music

Boundary between someone performing music and someonelistening/perceiving. Limited passive interaction - tapping feet etc.Active music blurs this boundary and allows participation byperceivers.End user may have little or no training in traditionalmusicianship or composition.The user gets control of the sonic/musical output to a greater orlesser extent.Users experience the sensation of playing music themselves.

5/65

Active Music

To build such a system...

Give control via mobile media devices like iPods for example.Devices to be intelligent in order to mediate the control of themusical output by participants.

Media device must be able to:

Sense the inputs from the participants and the environment.Process these in various forms.Co-ordinate the activities of the participants as they perform.Maintain musicality and interest of the users.

6/65

Active Music

Key type of input:

Human motion is an integral part of the types of inputs that maybe sensed by the devices.Motion and sound very closely related!Motion to be processed by the device in some fashion andeventually mapped to music.In a full fledged active music system, numerous other types ofinputs will be sensed, pertaining to the participant, the device itself,as well as the external (to the human-device subsystem)environment, including other participants/devices.

http://www.youtube.com/watch?v=BaaDiCwfTxY

7/65

How Does All This Relate toSelf-awareness?

Self-awareness to take the form of:

Devices building models/possessing knowledge of the behavioursof the respective participants.Devices building models/possessing knowledge of themselves.Devices building models/possessing knowledge of the environmentwithin which they get used.Such knowledge would help the devices to further reason aboutthemselves in order to maintaining musicality, maintaining userinterest, efficiency in computation, maintaining good response times,managing overhead in communication, managing energy needs,managing trade-offs between such goals etc.

8/65

Classifying Human Motion forActive Music Systems

One first step towards mapping sensed motion into music:

Recognise patterns in human motion.We will work on such pattern recognition this week.

Triggers or fine grained mapping:

1 The recognition may be used as triggers, i.e. recognise the type ofmotion once it has been performed and trigger sound synthesis.

2 In addition, the system may also be able to anticipate which type ofmotion is about to be performed, or is ongoing, and thus provide thepossibility of more fine grained mapping with sound synthesis.

9/65


One first step towards mapping sensed motion into music:

Recognise patterns in human motion.We will work on such pattern recognition this week.

Triggers or fine grained mapping:

1 The recognition may be used as triggers, i.e. recognise the type ofmotion once it has been performed and trigger sound synthesis.

2 In addition, the system may also be able to anticipate which type ofmotion is about to be performed, or is ongoing, and thus provide thepossibility of more fine grained mapping with sound synthesis.

10/65


Motion Classification Scheme

Pre-processing(optional)

Training

Recognition

TrainedClassifier

Identified classwhich can thenbe informed to a

sound synthesiser

3D accelerometerstream

Example video for motion classification.

http://www.wiigee.org/demovideo/demovideo.html

11/65


Two classic phases:

Training: take a bunch of data and build a classifier.Recognition: use the classifier on new streams to recognise patternsin these streams.

12/65


Some challenges whilst training:

The same type of motion can vary both spatially and temporally.Same type of motion may be performed differently depending onmood of the user.Intentions of the user have a bearing on the performed motion.User may be stationary or moving when performing the motion.Different users may perform the same motion differently.The motion types may grow or reduce in number over time, as theuser operates the system.

13/65


To make things more challenging:

Sometimes, quick training is essential.Ideally, online with little or no effort from user.Automated segmentation coupled with classification.

14/65


Many ways to capture motion:

Marker based motion capture systems, e.g. Qualsis motion capturesystem (Soundsaber)Vision based systems, e.g. Kinect (Piano via Kinect)Sensor based systems, e.g. iOS devices (SoloJam), Wiimote, Xsensfull body motion capture suit (Dance Jockey, Mobile Dance Jockey)

http://www.youtube.com/watch?v=lW116COCsoA

http://www.youtube.com/watch?v=kf3G-DXqt6Y

http://www.fourms.uio.no/videos/SoloJam.mov

http://www.youtube.com/watch?NR=1&feature=endscreen&v=m1OffxIArrA

http://www.youtube.com/watch?v=VjJaGqwOh7g

15/65


In this case study, we capture motion data via:

Media devices e.g. iPods, have internal motion (acceleration)sensors: 3D accelerometers.We will use the sensor data stream as the device is moved, andclassify the performed motion into a relevant category.

16/65


What category?

You can choose to define categories.You will then collect some data pertaining to the categories youchoose.Once you have collected the data, you will then train a classifierwith this data.Once trained, you will use this classifier to recognise the categorisedmotions within a sensor data stream.

17/65


Demo...Let’s define some categories, train and get some motion recognition going!

18/65


As I mentioned yesterday, you are going to be provided:

Established algorithms for motion classification. Two algorithmsto play with and build on to be precise.Data sets with different types of motion which you can use to get afeel for the algorithms.Exercises pertaining to some challenging active music scenarioswhere motion classification is required. These will require you tobuild new data sets.

Let us look at the two algorithms briefly now...

19/65

Motion Classification Algorithms

The two algorithms are:

Dynamic Time Warping.Hidden Markov Models.

You are encouraged to not be limited by these two algorithms. Applyothers that you know of.

20/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Key idea:

To be able to compare two signals of different lengths.The result of such a comparison can be used in interesting ways.

You might wonder...

What should be compared to what? What are these two signals?Template matching!

20/65


Key idea:


You might wonder...

What should be compared to what? What are these two signals?

Template matching!

20/65


Key idea:


You might wonder...

What should be compared to what? What are these two signals?Template matching!

21/65


Template matching:

Match a time varying signal, in our case motion data stream, againsta stored set of signals.The stored set of signals are the templates, one representing eachcategory.In effect, a motion data stream is compared against a representativefrom within the collected data, one for each category.The closest matching template, tells us about the category thestream most likely belongs to.

22/65


Euclidean distance...

time

DTW... intuitive!

time

23/65


Two signals and their costmatrix.This cost → some distancemeasure, e.g. Euclidean.Note the valleys (dark - lowcost) and hills (light - highcost).Goal → find alignment withminimal overall cost.The optimal alignment runsalong a valley of low cost.

24/65


We have to find the optimalwarping path in this costmatrix.Shown is the optimal warpingpath, i.e. optimal alignment.How do we find this warpingpath? There are exponentiallymany.

25/65


If P is a warping path.Note that P is a set of pairs ofaligned points (p) on thesignals.

argminP

∑ks=1 d(ps)ws∑k

s=1 ws

gives the optimal path, where,d(ps) is the cost,ws is the weighting coefficient(1 in our case),and denominator is the lengthof the path.

26/65


We first put some restrictionson the paths that may be found.

1 Monotonicity.2 Continuity.3 Boundary conditions.4 Warping window.5 Slope constraint.

27/65


Monotonicity.

Path not allowed to go backin time.Prevents feature comparisonsbeing repeated duringmatching.

28/65


Monotonicity.

Path not allowed to go backin time.Prevents feature comparisonsbeing repeated duringmatching.

29/65


time time

30/65


Continuity.

Path not allowed to break.Prevents omission of featureswhilst matching.

31/65


Continuity.

Path not allowed to break.Prevents omission of featureswhilst matching.

32/65


time time

33/65


Boundary conditions.

Start at top-left position andend at bottom-right positionin the matrix.Prevents one of the signalsbeing only partiallyconsidered.

34/65


Boundary conditions.

Start at top-left position andend at bottom-right positionin the matrix.Prevents one of the signalsbeing only partiallyconsidered.

35/65


time time

36/65


Warping window.

A good alignment path isunlikely to wander too farfrom the diagonal.Stay within a window.Prevents sticking at similarfeatures and skippingfeatures.

37/65


Warping window.

A good alignment path isunlikely to wander too farfrom the diagonal.Stay within a window.Prevents skipping featuresand sticking at similarfeatures.

38/65


time time

39/65


Slope constraint.

Path not allowed to be toosteep or too flat.Prevents short parts of asignal to be matched withvery long parts of the other.

40/65


Slope constraint.

Path not allowed to be toosteep or too flat.Prevents short parts of asignal to be matched withvery long parts of the other.

41/65


time time

42/65


We then build an accumulateddistance matrix.There is a nicely definedvalley that emerges when wedo so.Building the accumulateddistance matrix is done viadynamic programming.Let us see how this is done...

43/65


This valley reveals the path weare after.The bottom-right corner ofthe matrix holds the value∑k

s=1 d(ps)ws.This value is theun-normalised warpingdistance.Normalising this distance bythe path length, gives us thedistance between the twosignals.

44/65


The two signals shown in thisexample are one dimensional.But, we are going to work with3D data.The process described aboveworks for N dimensional data.Remember that the initialcost matrix is built usingEuclidean distance.

45/65


3D-DTW:

46/65


Training/finding the representative template for each category:

Find the training example with the minimum average normaliseddistance against the remaining examples, for each category.See equation (7) in Gillian et. al. (2011).Note that there are other ways to find templates. You areencouraged to explore other ways.

47/65


Recognition

Find the normalised distance between the stream and all thetemplates.The category of the closest (lowest normalised distance)matching template is the classification result, provided the distanceis below the threshold.See equations (10 - 13) in Gillian et. al. (2011) for thresholddistance for each category - to reject false positives. Come upwith your own way!

48/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

Key idea:

Statistical generative model of time varying signals.One HMM per category.Can help ascertain the probability that a givenobservation/stream/time varying observed signal can begenerated by the model.Knowing this probability, across multiple HMMs, allows us tocategorise a stream.

49/65


Schematic of a Markov chain with 5 states (Rabiner (1989)):

Probability of being in a stateonly depends on the predecessorstate (first order).Independent of time.Denoted by aij , and∑N

j=1 aij = 1.But here, each state isobservable, e.g. weather model:P (rain, rain, rain, ...|Model)?

50/65


Markov chain → HMM:v1

v2v3v1

v2v3

v1v2v3

v1 v2 v3

v1v2

v3

b21b22b23b11

b12b13

b51b52b53

b31b32b33

b41 b42 b43

51/65


HMM:v1

v2v3v1

v2v3

v1v2v3

v1 v2 v3

v1v2

v3

b21b22b23b11

b12b13

b51b52b53

b31b32b33

b41 b42 b43

A hidden process generateswhat you observe.Thus, you observe this hiddenprocess via observations only.

52/65


HMM:v1

v2v3v1

v2v3

v1v2v3

v1 v2 v3

v1v2

v3

b21b22b23b11

b12b13

b51b52b53

b31b32b33

b41 b42 b43

Observation a probabilisticfunction of state!vj ’s are the possibleobservations in any state.We do not observe stateanymore, hence hidden:examples - ask weather fromfriend, observing accelerationstream when person movesin some way/or not inanother room.

53/65


HMM elements:v1

v2v3v1

v2v3

v1v2v3

v1 v2 v3

v1v2

v3

b21b22b23b11

b12b13

b51b52b53

b31b32b33

b41 b42 b43

N states, S = {S1, S2, ..., SN}.qt, the state at time t.M observationsymbols/codebook,V = {v1, v2, ..., vM}.Observation sequenceO = O1O2...OT , made up ofelements from the codebook,e.g. sequence v1v2v1v3v2... oflength T .

54/65


HMM elements:v1

v2v3v1

v2v3

v1v2v3

v1 v2 v3

v1v2

v3

b21b22b23b11

b12b13

b51b52b53

b31b32b33

b41 b42 b43

State transition matrixA = {aij}, whereaij = P (qt+1 = Sj |qt = Si).Emission/observation symbolmatrix B = {bjk}, wherebjk = P (vk|qt = Sj).Initial state probabilityvector π = {πi}, whereπi = P (q1 = Si).

55/65


HMM elements:v1

v2v3v1

v2v3

v1v2v3

v1 v2 v3

v1v2

v3

b21b22b23b11

b12b13

b51b52b53

b31b32b33

b41 b42 b43

A HMM λ specified byspecifying N , M , V , A, B andπ.Example: N = 4, M = 3,V = {v1, v2, v3}, aij ’s, bjk’s,π1 = 1.We have to essentially specifythese, using the motion data,one HMM per category.

56/65


Some procedures:

Pre-processing via vector quantisation: to build acodebook/process acceleration data in terms of observationsymbols giving observation sequences.Forward algorithm: to calculate P (O|λc), where λc denotes thecth HMM, and O is an observation sequence.Forward-backward algorithm: To estimate the parameters (A andB) of the HMM using multiple observation sequences, i.e. training.Baye’s rule: Together with the forward algorithm, helps ascertainP (λc|O), i.e. recognition that a new observation sequence Obelongs to category c.

57/65


Pre-processing by vector quantisation:

Any acceleration stream (stream of 3D vectors) has large range ofvalues and fine granularity.Abstracting each 3D vector into codes.Using k-means clustering and finding centroid of each clusterresults in code words/vectors of the codebook. See Klingmann(2009), Sections 3.1 and 4.4, and Schloemer (2008).Index of a code word is what is used as an observation.String of indices of code words matching vectors in data/stream isthen the observation sequence O.

58/65


Forward algorithm:

To find P (O|λc).Probabilities/forward variables αt(i)’s need to be computed and usedto find this.See Rabiner (1989), Section III-A and Klingmann (2009), Section3.2.4.An efficient way to compute:P (O|λc) =

∑all Q P (O|Q,λc)P (Q|λc), where Q’s are the many

possible (NT ) state sequences that may be visited to generate O.

59/65


Forward algorithm (figures from Rabiner (1989):

αt(i) = P (O1...Ot, qt = Si|λc)

P (O|λc) =∑N

i=1 αT (i)

60/65


Training:

Forward-backward algorithm.For estimation of A and B matrices for each λc, given the respectivetraining observation sequences.αt(i)’s and backward variables βt(i)’s need to be computed and usedto update A and B.βt(i) is the probability of generating the remaining part of theobservation sequence from time t+ 1 to T , given state Si at time t,i.e. P (Ot+1Ot+2...OT |qt = Si, λc).See Rabiner (1989), Sections III-C and V-B, and Klingmann (2009),Sections 3.2.5, 3.2.6, 4.5.2.

61/65


A update:aij =

expected number of transitions from Si to Sj

expected number of transitions from Si

B update:bjk =

expected number of times in Sj and observing symbol vkexpected number of times in Sj

αt(i)’s and βt(i)’s used within the above update equations.

See Section V-B in Rabiner (1989) and Sections 4.5.2 in Klingmann(2009) for a variant that we will use. This variant takes care ofmultiple training observation sequences.

62/65


Recognition:

If Ostream is the stream to be classified.We want to find P (λc|Ostream). We use the forward algorithm andBaye’s Rule for this.P (λc|Ostream) is the probability that λc, i.e. the HMM indexed by c,generated the sequence Ostream.The highest probability amongst all λc’s, tells us the category c thestream has been classified into.

63/65


Recognition:

P (λc) may be calculated as the average of P (Oj |λc) across trainingobservation sequences Oj ’s.We compute P (λc) and P (Ostream|λc) for all m’s.P (Ostream) =

∑c P (Ostream, λc) =

∑c P (Ostream|λc)P (λc).

P (λc|Ostream) = P (Ostream|λc)P (λc)P (Ostream)

See Klingmann (2009), Section 3.3.

64/65

Challenging Active Music Scenarios

Lower level technical challenges:

How well does the system classify when reference point (user) isstationary and moving? Can we distinguish these?How well does the system separate impulsive and sustainedactions, e.g. hitting a drum versus bowing a violin?Can it differentiate or otherwise between using the right or lefthand to do the "same" action?

65/65

Challenging Active Music Scenarios

Higher level semantic challenges:

Can it separate gestures from actions, i.e. find themeaning-bearing part, e.g. difference between actions that areperformed with a sad, happy or angry intention?Can it distinguish between an expert user Vs. non-expert userhandling the device?

Classifying human motion for active music systems

Technology

Transcript of Classifying human motion for active music systems