Classifying human motion for active music systems

68
AWASS 2012 Case Study Tutorial - Classifying human motion for active music systems Arjun Chandra University of Oslo

description

Tutorial @ AWASS 2012 by Arjun Chandra

Transcript of Classifying human motion for active music systems

Page 1: Classifying human motion for active music systems

AWASS 2012 Case Study Tutorial -Classifying human motion for activemusic systemsArjun Chandra

University of Oslo

Page 2: Classifying human motion for active music systems

2/65

Outline of the Tutorial

Why do motion classification for active music systems?

The motion classification problem.

Established solutions for the problem and demonstration.

Challenges for the week with regards to motion classification foractive music systems.

Page 3: Classifying human motion for active music systems

3/65

Active Music

Videos from yesterday:

iPhone ensemble (UiO)SoloJam (UiO)Performance based musicmaking (Wout Standaert)

Page 4: Classifying human motion for active music systems

4/65

Active Music

Boundary between someone performing music and someonelistening/perceiving. Limited passive interaction - tapping feet etc.Active music blurs this boundary and allows participation byperceivers.End user may have little or no training in traditionalmusicianship or composition.The user gets control of the sonic/musical output to a greater orlesser extent.Users experience the sensation of playing music themselves.

Page 5: Classifying human motion for active music systems

5/65

Active Music

To build such a system...

Give control via mobile media devices like iPods for example.Devices to be intelligent in order to mediate the control of themusical output by participants.

Media device must be able to:

Sense the inputs from the participants and the environment.Process these in various forms.Co-ordinate the activities of the participants as they perform.Maintain musicality and interest of the users.

Page 6: Classifying human motion for active music systems

6/65

Active Music

Key type of input:

Human motion is an integral part of the types of inputs that maybe sensed by the devices.Motion and sound very closely related!Motion to be processed by the device in some fashion andeventually mapped to music.In a full fledged active music system, numerous other types ofinputs will be sensed, pertaining to the participant, the device itself,as well as the external (to the human-device subsystem)environment, including other participants/devices.

Page 7: Classifying human motion for active music systems

7/65

How Does All This Relate toSelf-awareness?

Self-awareness to take the form of:

Devices building models/possessing knowledge of the behavioursof the respective participants.Devices building models/possessing knowledge of themselves.Devices building models/possessing knowledge of the environmentwithin which they get used.Such knowledge would help the devices to further reason aboutthemselves in order to maintaining musicality, maintaining userinterest, efficiency in computation, maintaining good response times,managing overhead in communication, managing energy needs,managing trade-offs between such goals etc.

Page 8: Classifying human motion for active music systems

8/65

Classifying Human Motion forActive Music Systems

One first step towards mapping sensed motion into music:

Recognise patterns in human motion.We will work on such pattern recognition this week.

Triggers or fine grained mapping:

1 The recognition may be used as triggers, i.e. recognise the type ofmotion once it has been performed and trigger sound synthesis.

2 In addition, the system may also be able to anticipate which type ofmotion is about to be performed, or is ongoing, and thus provide thepossibility of more fine grained mapping with sound synthesis.

Page 9: Classifying human motion for active music systems

9/65

Classifying Human Motion forActive Music Systems

One first step towards mapping sensed motion into music:

Recognise patterns in human motion.We will work on such pattern recognition this week.

Triggers or fine grained mapping:

1 The recognition may be used as triggers, i.e. recognise the type ofmotion once it has been performed and trigger sound synthesis.

2 In addition, the system may also be able to anticipate which type ofmotion is about to be performed, or is ongoing, and thus provide thepossibility of more fine grained mapping with sound synthesis.

Page 10: Classifying human motion for active music systems

10/65

Classifying Human Motion forActive Music Systems

Motion Classification Scheme

Pre-processing(optional)

Training

Recognition

TrainedClassifier

Identified classwhich can thenbe informed to a

sound synthesiser

3D accelerometerstream

Example video for motion classification.

Page 11: Classifying human motion for active music systems

11/65

Classifying Human Motion forActive Music Systems

Two classic phases:

Training: take a bunch of data and build a classifier.Recognition: use the classifier on new streams to recognise patternsin these streams.

Page 12: Classifying human motion for active music systems

12/65

Classifying Human Motion forActive Music Systems

Some challenges whilst training:

The same type of motion can vary both spatially and temporally.Same type of motion may be performed differently depending onmood of the user.Intentions of the user have a bearing on the performed motion.User may be stationary or moving when performing the motion.Different users may perform the same motion differently.The motion types may grow or reduce in number over time, as theuser operates the system.

Page 13: Classifying human motion for active music systems

13/65

Classifying Human Motion forActive Music Systems

To make things more challenging:

Sometimes, quick training is essential.Ideally, online with little or no effort from user.Automated segmentation coupled with classification.

Page 14: Classifying human motion for active music systems

14/65

Classifying Human Motion forActive Music Systems

Many ways to capture motion:

Marker based motion capture systems, e.g. Qualsis motion capturesystem (Soundsaber)Vision based systems, e.g. Kinect (Piano via Kinect)Sensor based systems, e.g. iOS devices (SoloJam), Wiimote, Xsensfull body motion capture suit (Dance Jockey, Mobile Dance Jockey)

Page 15: Classifying human motion for active music systems

15/65

Classifying Human Motion forActive Music Systems

In this case study, we capture motion data via:

Media devices e.g. iPods, have internal motion (acceleration)sensors: 3D accelerometers.We will use the sensor data stream as the device is moved, andclassify the performed motion into a relevant category.

Page 16: Classifying human motion for active music systems

16/65

Classifying Human Motion forActive Music Systems

What category?

You can choose to define categories.You will then collect some data pertaining to the categories youchoose.Once you have collected the data, you will then train a classifierwith this data.Once trained, you will use this classifier to recognise the categorisedmotions within a sensor data stream.

Page 17: Classifying human motion for active music systems

17/65

Classifying Human Motion forActive Music Systems

Demo...Let’s define some categories, train and get some motion recognition going!

Page 18: Classifying human motion for active music systems

18/65

Classifying Human Motion forActive Music Systems

As I mentioned yesterday, you are going to be provided:

Established algorithms for motion classification. Two algorithmsto play with and build on to be precise.Data sets with different types of motion which you can use to get afeel for the algorithms.Exercises pertaining to some challenging active music scenarioswhere motion classification is required. These will require you tobuild new data sets.

Let us look at the two algorithms briefly now...

Page 19: Classifying human motion for active music systems

19/65

Motion Classification Algorithms

The two algorithms are:

Dynamic Time Warping.Hidden Markov Models.

You are encouraged to not be limited by these two algorithms. Applyothers that you know of.

Page 20: Classifying human motion for active music systems

20/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Key idea:

To be able to compare two signals of different lengths.The result of such a comparison can be used in interesting ways.

You might wonder...

What should be compared to what? What are these two signals?Template matching!

Page 21: Classifying human motion for active music systems

20/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Key idea:

To be able to compare two signals of different lengths.The result of such a comparison can be used in interesting ways.

You might wonder...

What should be compared to what? What are these two signals?

Template matching!

Page 22: Classifying human motion for active music systems

20/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Key idea:

To be able to compare two signals of different lengths.The result of such a comparison can be used in interesting ways.

You might wonder...

What should be compared to what? What are these two signals?Template matching!

Page 23: Classifying human motion for active music systems

21/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Template matching:

Match a time varying signal, in our case motion data stream, againsta stored set of signals.The stored set of signals are the templates, one representing eachcategory.In effect, a motion data stream is compared against a representativefrom within the collected data, one for each category.The closest matching template, tells us about the category thestream most likely belongs to.

Page 24: Classifying human motion for active music systems

22/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Euclidean distance...

time

DTW... intuitive!

time

Page 25: Classifying human motion for active music systems

22/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Euclidean distance...

time

DTW... intuitive!

time

Page 26: Classifying human motion for active music systems

23/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Two signals and their costmatrix.This cost → some distancemeasure, e.g. Euclidean.Note the valleys (dark - lowcost) and hills (light - highcost).Goal → find alignment withminimal overall cost.The optimal alignment runsalong a valley of low cost.

Page 27: Classifying human motion for active music systems

24/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

We have to find the optimalwarping path in this costmatrix.Shown is the optimal warpingpath, i.e. optimal alignment.How do we find this warpingpath? There are exponentiallymany.

Page 28: Classifying human motion for active music systems

25/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

If P is a warping path.Note that P is a set of pairs ofaligned points (p) on thesignals.

argminP

∑ks=1 d(ps)ws∑k

s=1 ws

gives the optimal path, where,d(ps) is the cost,ws is the weighting coefficient(1 in our case),and denominator is the lengthof the path.

Page 29: Classifying human motion for active music systems

26/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

We first put some restrictionson the paths that may be found.

1 Monotonicity.2 Continuity.3 Boundary conditions.4 Warping window.5 Slope constraint.

Page 30: Classifying human motion for active music systems

27/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Monotonicity.

Path not allowed to go backin time.Prevents feature comparisonsbeing repeated duringmatching.

Page 31: Classifying human motion for active music systems

28/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Monotonicity.

Path not allowed to go backin time.Prevents feature comparisonsbeing repeated duringmatching.

Page 32: Classifying human motion for active music systems

29/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

time time

Page 33: Classifying human motion for active music systems

30/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Continuity.

Path not allowed to break.Prevents omission of featureswhilst matching.

Page 34: Classifying human motion for active music systems

31/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Continuity.

Path not allowed to break.Prevents omission of featureswhilst matching.

Page 35: Classifying human motion for active music systems

32/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

time time

Page 36: Classifying human motion for active music systems

33/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Boundary conditions.

Start at top-left position andend at bottom-right positionin the matrix.Prevents one of the signalsbeing only partiallyconsidered.

Page 37: Classifying human motion for active music systems

34/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Boundary conditions.

Start at top-left position andend at bottom-right positionin the matrix.Prevents one of the signalsbeing only partiallyconsidered.

Page 38: Classifying human motion for active music systems

35/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

time time

Page 39: Classifying human motion for active music systems

36/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Warping window.

A good alignment path isunlikely to wander too farfrom the diagonal.Stay within a window.Prevents sticking at similarfeatures and skippingfeatures.

Page 40: Classifying human motion for active music systems

37/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Warping window.

A good alignment path isunlikely to wander too farfrom the diagonal.Stay within a window.Prevents skipping featuresand sticking at similarfeatures.

Page 41: Classifying human motion for active music systems

38/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

time time

Page 42: Classifying human motion for active music systems

39/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Slope constraint.

Path not allowed to be toosteep or too flat.Prevents short parts of asignal to be matched withvery long parts of the other.

Page 43: Classifying human motion for active music systems

40/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Slope constraint.

Path not allowed to be toosteep or too flat.Prevents short parts of asignal to be matched withvery long parts of the other.

Page 44: Classifying human motion for active music systems

41/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

time time

Page 45: Classifying human motion for active music systems

42/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

We then build an accumulateddistance matrix.There is a nicely definedvalley that emerges when wedo so.Building the accumulateddistance matrix is done viadynamic programming.Let us see how this is done...

Page 46: Classifying human motion for active music systems

43/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

This valley reveals the path weare after.The bottom-right corner ofthe matrix holds the value∑k

s=1 d(ps)ws.This value is theun-normalised warpingdistance.Normalising this distance bythe path length, gives us thedistance between the twosignals.

Page 47: Classifying human motion for active music systems

44/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

The two signals shown in thisexample are one dimensional.But, we are going to work with3D data.The process described aboveworks for N dimensional data.Remember that the initialcost matrix is built usingEuclidean distance.

Page 48: Classifying human motion for active music systems

45/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

3D-DTW:

Page 49: Classifying human motion for active music systems

46/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Training/finding the representative template for each category:

Find the training example with the minimum average normaliseddistance against the remaining examples, for each category.See equation (7) in Gillian et. al. (2011).Note that there are other ways to find templates. You areencouraged to explore other ways.

Page 50: Classifying human motion for active music systems

47/65

Motion Classification Algorithms:Dynamic Time Warping (DTW)

Recognition

Find the normalised distance between the stream and all thetemplates.The category of the closest (lowest normalised distance)matching template is the classification result, provided the distanceis below the threshold.See equations (10 - 13) in Gillian et. al. (2011) for thresholddistance for each category - to reject false positives. Come upwith your own way!

Page 51: Classifying human motion for active music systems

48/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

Key idea:

Statistical generative model of time varying signals.One HMM per category.Can help ascertain the probability that a givenobservation/stream/time varying observed signal can begenerated by the model.Knowing this probability, across multiple HMMs, allows us tocategorise a stream.

Page 52: Classifying human motion for active music systems

49/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

Schematic of a Markov chain with 5 states (Rabiner (1989)):

Probability of being in a stateonly depends on the predecessorstate (first order).Independent of time.Denoted by aij , and∑N

j=1 aij = 1.But here, each state isobservable, e.g. weather model:P (rain, rain, rain, ...|Model)?

Page 53: Classifying human motion for active music systems

50/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

Markov chain → HMM:v1

v2v3v1

v2v3

v1v2v3

v1 v2 v3

v1v2

v3

b21b22b23b11

b12b13

b51b52b53

b31b32b33

b41 b42 b43

Page 54: Classifying human motion for active music systems

51/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

HMM:v1

v2v3v1

v2v3

v1v2v3

v1 v2 v3

v1v2

v3

b21b22b23b11

b12b13

b51b52b53

b31b32b33

b41 b42 b43

A hidden process generateswhat you observe.Thus, you observe this hiddenprocess via observations only.

Page 55: Classifying human motion for active music systems

52/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

HMM:v1

v2v3v1

v2v3

v1v2v3

v1 v2 v3

v1v2

v3

b21b22b23b11

b12b13

b51b52b53

b31b32b33

b41 b42 b43

Observation a probabilisticfunction of state!vj ’s are the possibleobservations in any state.We do not observe stateanymore, hence hidden:examples - ask weather fromfriend, observing accelerationstream when person movesin some way/or not inanother room.

Page 56: Classifying human motion for active music systems

53/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

HMM elements:v1

v2v3v1

v2v3

v1v2v3

v1 v2 v3

v1v2

v3

b21b22b23b11

b12b13

b51b52b53

b31b32b33

b41 b42 b43

N states, S = {S1, S2, ..., SN}.qt, the state at time t.M observationsymbols/codebook,V = {v1, v2, ..., vM}.Observation sequenceO = O1O2...OT , made up ofelements from the codebook,e.g. sequence v1v2v1v3v2... oflength T .

Page 57: Classifying human motion for active music systems

54/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

HMM elements:v1

v2v3v1

v2v3

v1v2v3

v1 v2 v3

v1v2

v3

b21b22b23b11

b12b13

b51b52b53

b31b32b33

b41 b42 b43

State transition matrixA = {aij}, whereaij = P (qt+1 = Sj |qt = Si).Emission/observation symbolmatrix B = {bjk}, wherebjk = P (vk|qt = Sj).Initial state probabilityvector π = {πi}, whereπi = P (q1 = Si).

Page 58: Classifying human motion for active music systems

55/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

HMM elements:v1

v2v3v1

v2v3

v1v2v3

v1 v2 v3

v1v2

v3

b21b22b23b11

b12b13

b51b52b53

b31b32b33

b41 b42 b43

A HMM λ specified byspecifying N , M , V , A, B andπ.Example: N = 4, M = 3,V = {v1, v2, v3}, aij ’s, bjk’s,π1 = 1.We have to essentially specifythese, using the motion data,one HMM per category.

Page 59: Classifying human motion for active music systems

56/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

Some procedures:

Pre-processing via vector quantisation: to build acodebook/process acceleration data in terms of observationsymbols giving observation sequences.Forward algorithm: to calculate P (O|λc), where λc denotes thecth HMM, and O is an observation sequence.Forward-backward algorithm: To estimate the parameters (A andB) of the HMM using multiple observation sequences, i.e. training.Baye’s rule: Together with the forward algorithm, helps ascertainP (λc|O), i.e. recognition that a new observation sequence Obelongs to category c.

Page 60: Classifying human motion for active music systems

57/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

Pre-processing by vector quantisation:

Any acceleration stream (stream of 3D vectors) has large range ofvalues and fine granularity.Abstracting each 3D vector into codes.Using k-means clustering and finding centroid of each clusterresults in code words/vectors of the codebook. See Klingmann(2009), Sections 3.1 and 4.4, and Schloemer (2008).Index of a code word is what is used as an observation.String of indices of code words matching vectors in data/stream isthen the observation sequence O.

Page 61: Classifying human motion for active music systems

58/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

Forward algorithm:

To find P (O|λc).Probabilities/forward variables αt(i)’s need to be computed and usedto find this.See Rabiner (1989), Section III-A and Klingmann (2009), Section3.2.4.An efficient way to compute:P (O|λc) =

∑all Q P (O|Q,λc)P (Q|λc), where Q’s are the many

possible (NT ) state sequences that may be visited to generate O.

Page 62: Classifying human motion for active music systems

59/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

Forward algorithm (figures from Rabiner (1989):

αt(i) = P (O1...Ot, qt = Si|λc)

P (O|λc) =∑N

i=1 αT (i)

Page 63: Classifying human motion for active music systems

60/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

Training:

Forward-backward algorithm.For estimation of A and B matrices for each λc, given the respectivetraining observation sequences.αt(i)’s and backward variables βt(i)’s need to be computed and usedto update A and B.βt(i) is the probability of generating the remaining part of theobservation sequence from time t+ 1 to T , given state Si at time t,i.e. P (Ot+1Ot+2...OT |qt = Si, λc).See Rabiner (1989), Sections III-C and V-B, and Klingmann (2009),Sections 3.2.5, 3.2.6, 4.5.2.

Page 64: Classifying human motion for active music systems

61/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

A update:aij =

expected number of transitions from Si to Sj

expected number of transitions from Si

B update:bjk =

expected number of times in Sj and observing symbol vkexpected number of times in Sj

αt(i)’s and βt(i)’s used within the above update equations.

See Section V-B in Rabiner (1989) and Sections 4.5.2 in Klingmann(2009) for a variant that we will use. This variant takes care ofmultiple training observation sequences.

Page 65: Classifying human motion for active music systems

62/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

Recognition:

If Ostream is the stream to be classified.We want to find P (λc|Ostream). We use the forward algorithm andBaye’s Rule for this.P (λc|Ostream) is the probability that λc, i.e. the HMM indexed by c,generated the sequence Ostream.The highest probability amongst all λc’s, tells us the category c thestream has been classified into.

Page 66: Classifying human motion for active music systems

63/65

Motion Classification Algorithms:Hidden Markov Models (HMM)

Recognition:

P (λc) may be calculated as the average of P (Oj |λc) across trainingobservation sequences Oj ’s.We compute P (λc) and P (Ostream|λc) for all m’s.P (Ostream) =

∑c P (Ostream, λc) =

∑c P (Ostream|λc)P (λc).

P (λc|Ostream) = P (Ostream|λc)P (λc)P (Ostream)

See Klingmann (2009), Section 3.3.

Page 67: Classifying human motion for active music systems

64/65

Challenging Active Music Scenarios

Lower level technical challenges:

How well does the system classify when reference point (user) isstationary and moving? Can we distinguish these?How well does the system separate impulsive and sustainedactions, e.g. hitting a drum versus bowing a violin?Can it differentiate or otherwise between using the right or lefthand to do the "same" action?

Page 68: Classifying human motion for active music systems

65/65

Challenging Active Music Scenarios

Higher level semantic challenges:

Can it separate gestures from actions, i.e. find themeaning-bearing part, e.g. difference between actions that areperformed with a sad, happy or angry intention?Can it distinguish between an expert user Vs. non-expert userhandling the device?