by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is...

Human Activity Recognition Using Time Series Classification

by

Zhino Yousefi

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

c© Copyright 2015 by Zhino Yousefi

Abstract

Human Activity Recognition Using Time Series Classification

Zhino Yousefi

Master of Applied Science

Graduate Department of Electrical and Computer Engineering

University of Toronto

2015

Activity Recognition (AR) has become a basis for applications such as health care and elderly surveillance,

human position tracking, home monitoring and security applications. Embedding motion sensors in

smart phones increases the interest in using smart phones for detecting users’ activities. In this thesis, we

propose a system that learns activity trends using only the data from an accelerometer sensor, which is

the most common motion sensor in smart phones. The system uses raw traces in a training set to build a

predictor that assigns the proper label to new traces. Our approach addresses the two main challenges

in AR using smart phones. First, the system is well trained with fewer training traces compared to

benchmark approaches, and new traces can easily be added to our data base. Second, since we use the

raw traces without dening any particular features, our system gives more general and gives almost perfect

accuracy.

ii

Dedication

To my loving family, who has supported me every step of the way.

iii

Acknowledgements

I would like to express my sincerest gratitude and thanks to my supervisor, Professor Dr. Shahrokh

Valaee, for his guidance, caring, and immense knowledge.

I would also like to thank my friendly lab mates at the WIRLAB group, who offered helpful suggestions

and provided a cheerful research atmosphere.

My very special thanks go to my friends Masoud Barakatain, Shadi Emami, Sepideh Hassanmoghadam,

Masume Sabzi, Nastaran Hajia and Niloofar Ghanbari for their great help and assistance.

Finally, I wish to give my deepest gratitude to my dear parents, Ronak Towfighi and Mohammadsharif

Yousefi, for their unconditional love and support, and to my beloved sister, Arian Yousefi, brother Rozhin

Yousefi, and brother-in-law, Alborz Rezazadeh Sereshkeh, for their valuable guidance and encouragement.

iv

Contents

1 Introduction 1

2 Previous Work 3

2.1 Sensor-based Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Feature-based Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.2 Time Series Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.3 Model-based Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Preliminaries 8

3.1 Dynamic Time Warping Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Affinity Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Random Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.4 l1 minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.5 Combination of Multiple Classification Results . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Proposed Activity Recognition System 15

4.1 Fixed Phone Orientation Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1.1 Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.2 Testing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2 Unfixed Phone Orientation Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2.1 Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2.2 Testing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Simulation Results 33

6 Conclusion and Future Work 41

6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

v

6.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

vi

List of Figures

2.1 Example of a Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4.1 Supervised learning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 Traces for two different activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3 Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.4 Different Distance Measurement Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.5 DTW matching matrix and path for two traces from activities “walking downstairs” and

“walking upstairs” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.6 Local DTW distances along matching path for u and v . . . . . . . . . . . . . . . . . . . . 20

4.7 Accelerometer readings of axis x for two walking traces . . . . . . . . . . . . . . . . . . . 21

4.8 DTW matching path for two traces from activity “walking” . . . . . . . . . . . . . . . . . 21

4.9 Local DTW distances along matching path for two walking traces . . . . . . . . . . . . . . 22

4.10 Final output of learning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.11 Testing Process (fixed phone scenario) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.12 Choosing resemblance set (Φx) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.13 Testing Process (unfixed phone scenario) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.1 Distance histograms among two traces from different classes (regular DTW) . . . . . . . . 34

5.2 Distance histograms between two traces within the same class (regular DTW) . . . . . . . 34

5.3 Distance histograms among two traces from different classes (modified DTW) . . . . . . . 35

5.4 Distance histograms between two traces within the same class (modified DTW) . . . . . . 35

5.5 Testing results of the proposed system (Fixed Phone Scenario) and other benchmark AR

methods (systems are trained with 10 samples from each 4 classes) . . . . . . . . . . . . . 36

5.6 Testing results of of the proposed system (Fixed Phone Scenario) and other benchmark

AR methods for (system is trained with 10 samples from each 3 classes) . . . . . . . . . . 37

vii

5.7 Testing results of of the proposed system (Fixed Phone Scenario) and other benchmark

AR methods for (system is trained with 10 samples from each 2 classes ) . . . . . . . . . . 37

5.8 Testing results of the proposed system (Fixed Phone Scenario) for different numbers of

classes (system is trained with 10 samples from each class) . . . . . . . . . . . . . . . . . . 38

5.9 Testing results of the proposed system (Fixed Phone Scenario) for different numbers of

training traces (classifying in 4 classes) using 300 features . . . . . . . . . . . . . . . . . . 38

5.10 Testing results of the NB algorithm (Fixed Phone Scenario) for different numbers of

training traces (classifying in 4 classes) using 300 features . . . . . . . . . . . . . . . . . . 39

5.11 Testing results of the DT algorithm (Fixed Phone Scenario) for different numbers of

training traces (classifying in 4 classes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.12 Testing results of our proposed method (Unfixed Phone Scenario) for different numbers of

classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

viii

Chapter 1

Introduction

Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in

relevant industries due to its applications in cognitive assistance, indoor localization and tracking [31],

fitness monitoring, and human-computer interface [1], [2]. However, AR is not without its problems, the

earliest attempts at addressing them going back a few decades already. Early AR systems mostly targeted

estimation of total expended energy and not complex activities [3]. These were followed by systems

based on wearable sensors, which found applications in recognizing specific physical activities. Later on,

motion sensors became available in smart phones. The calculation and hardware abilities of smart phones,

along with the popularity of powerful machine-learning algorithms, resulted in the emergence of a vast

range of AR methods. Different AR systems use diverse types of sensors, such as camera, microphone,

accelerometer, gyroscope, or combinations thereof [1]. These systems utilize various approaches to detect

particular sets of activities, from simple activities like walking to more complex ones like cooking.

Due to the increasing popularity and enormous potential of AR, researchers are highly motivated

to improve the systems. This means finding a way to make AR systems work with realistic noisy data,

making them applicable for multiple users, reducing the amount of data needed to train the system, and

enhancing security and privacy. Using smart phones introduces new issues to the area of AR [4]. Sensors

provided in smart phones are not usually as accurate as those used in special devices available for AR.

Additionally, battery draining and computational complexity requirements are more crucial in mobile

devices due to limited resources [4].

From the available sensors, we use the accelerometer sensor because of its availability in mobile phones.

Therefore, in this thesis, we implement an AR method based on supervised learning of accelerometer’s

data and use multi-user data sets to evaluate our system. The results are compared with the most

1

Chapter 1. Introduction 2

common AR techniques.

Our proposed AR system can improve various applications such as localization. Activities like walking

on stairs, standing up or being in elevators, used along with map information of the building, provides

information regarding the location of the user and probable changes in floors. Linking a localizer and an

activity detector can thus be beneficial.

Guenbauer et al. [6] uses activity classifications in their indoor navigation system. Similarly, Ftrack

et al. [7] utilize activities like walking upstairs and downstairs and elevator positioning to achieve floor

detection. This type of incorporation (i.e., using multiple classes of activities) has also been done in AR

with an accuracy of 80% and has improved the localization results [8]. However, we anticipate better

results with our more reliable activity recognition system. Investigating this expectation is a suggested

subsequent work.

This thesis is organized as follows. Chapter 2 is an overview of benchmark Activity Recognition and

Wi-Fi based localization methods in the literature. In Chapter 3 we will provide an overview of basic

concepts, tools and theories used in this thesis . Next, Chapter 4 describes our proposed Accelerometer

based Activity Recognition methods for smart phones. Chapter 5 presents the simulation results for our

proposed method and compares them with other common AR methods. Finally, we conclude the thesis

in Chapter6.

Chapter 2

Previous Work

Various AR methods exist for different combinations of input type and detection algorithms. By types of

input, we mean the source of information, number of activities, ability to support single or multi user, etc.

Based on labelled information input, the AR system predicts the label for a new input from an unknown

activity. The algorithm for addressing the learning problem can be chosen from various supervised or

semi-supervised classification methods [1].

For the sensors selection task, we may use image and video or other sensors like motion sensors or

microphones. The main applications for vision-based activity recognition are in security, surveillance,

and improving human-computer interaction [2].

The vision-based systems either consider each frame of an activity individually or classify a sequence

of frames from an action. These frames could be from a single camera or from multiple cameras, and the

methods can be categorized based on whether there is a single user or multiple users in a picture. To

sum up, significant work has been done in the area of vision-based AR, and published research in the

computer vision literature has helped with detection accuracy. Determining the activity label is now

almost perfect for many different scenarios, as achieved in [5].

Another important area of AR methods uses motion sensors. Motion sensors like accelerometers and

gyroscopes measure linear and rotational acceleration of their movements, respectively. These sensors can

be easily mounted on a body to obtain more detailed and accurate motion information. They are also

widely available in smart phones and other mobile devices. In either case, the output of these sensing

tools can be further processed to make a predictor using machine-learning tools. Processing basically

means discovering the patterns that each activity possesses. Such a trained system can then determine

the label of activities based on information from motion sensing [4]. Since our introduced system uses

3

Chapter 2. Previous Work 4

motion sensors, we will discuss it further in Section2.1.

2.1 Sensor-based Activity Recognition

In this section, we investigate AR methods that use motion sensors. Note that the data obtained from

the sensors is in the shape of a time series, i.e., we have the reading of sensors for a number of time

instances. For different activities, the characteristics of the time series are different. Storing a number of

traces for different activities can provide us with a data set that can be used for training and testing the

system. The following is a categorization of different ways to obtain the aforementioned predictor from

this database [18]. We will compare our method with one example from each of the categories below.

2.1.1 Feature-based Classification

In feature-based techniques, we manually choose some functions that are applied to the database traces

and output static data which does not relate to time. These functions extract the features that distinguish

between various activities. For example, intuitively, the average of the norm of data from an accelerometer

should be greater for walking than for standing up, or the variance (changes in time) should be greater

for running than for walking. After assigning these numbers (mean, variance, etc.) to each trace, we

continue with classification based solely on the features.

However, some challenges exist in using these features. In brief, some of them are:

• Defining features that capture the most useful information in databases, which are usually

application-dependent.

• Choosing proper ones (or assigning weights to them) for our special clustering or classification

method.

• Trying to avoid having correlated features for complexity and calculation cost matters.

There are numerous classification methods based on static data, such as Decision Tree group [38], Support

Vector Machine [41], K-Nearest Neighbours [43], etc.

Decision Tree

The following is a short overview of Decision Tree-based classification methods:

A Decision Tree (DT) classifier assumes the form of a tree, with nodes and edges, as depicted in Figure

2.1.


Average

Variance standing

Running Walking

<1

<0.1

>1

>0.1

Figure 2.1: Example of a Decision Tree

Each non-leaf node tests if one feature is positive or negative (or, in a continuous case, above or below

a threshold). The root is the non-leaf node at the very top of the tree (the one that is only connected to

two edges). Each edge is a branch from a test node that tends either towards another test node or a

leaf node. Leaf nodes are tagged with class labels according to the path that connects them to the root.

Paths are consequent nodes and edges leading to a leaf node.

To establish a decision tree based on training data, we follow these steps [29]:

• Start with the feature that best splits the set of items.

• Continue finding the best feature at each test node (only for training instances that have been

partitioned).

• Stop when all training points reached a node that has the same label or when all of the features

have been used along the path that reaches the current node.

The best feature to split the data base is a feature with minimum conditional entropy. The entropy

of classes provides us with a feature, so if we know the amount of that feature, we can ascertain the

class that data belongs to. Thus, the information gain for each feature (B) is a result of subtracting

conditional information from entropy of classes (C) by themselves:

IG(C,B) = H(C)−H(C|B) (2.1)

At the end, the feature with the greatest information gain (IG) is chosen in each step as the divider.

To test new data, we start from the root of the created tree and decide which edge we should follow,

based on the features of the data. We continue along these lines until we reach a leaf, whose label will be

the classifier’s recognition result. For example, assume that we have a trace of accelerometer data for an


activity with an average of 2 and a variance of 0.3. Using the classifier in Figure 2.1, we follow the left

edge of the test nodes “average” and “variance” (1 ≤ 3 and 0.1 ≤ 0.3). Then we will reach the node

“running”, which is the solution of the classifier for this trace.

One of the main advantages of DT is that it makes understandable rules. By looking at a classification

tree, we can learn which features are the most important in the classification. On the other hand, there

are some disadvantages in using a DT classifier. It is computationally expensive (NP-hard), and splitting

training data after each test node reduces the number of training data for upcoming test decisions (i.e., we

are not using all resources at each step). Moreover, DTs are not suitable to use when we have continuous

features because that will require computing proper thresholds on which to base appropriate test nodes.

2.1.2 Time Series Classification

Like static data classification, time series classification requires a learning algorithm. Most raw data

based solutions need the distance/similarity between pairs of data points. The classification algorithm

and evaluation standard are also a part of process.

Golay et al. [35] uses Euclidean and cross correlation based distances and applies Fuzzy c-mean

algorithm to classify MRI brain activity images. Neural Network based approaches as applied by A.

Wismuller et al. in [36] do not require similarity/distance measure. As another example Ahmed et al.

[17] has provided a novel accelerometer-based gesture recognition system using Dynamic Time Warping

distance measure and Affinity Propagation clustering technique.

The choice of the algorithm is determined by characteristics of the data along with complexity and

accuracy requirements of the application. The advantage of using raw data is that we do not lose

information by translating traces to features, so we do not have to define features that separate different

activities. Using raw data makes the solution more general in terms of being applicable to other sets

of activities. The disadvantage of this solution, however, is having high dimensional input (number of

samples in each time series) compared to the small number of features that can be used in a feature-based

technique. We use Time Series Classification method in our proposed method.

2.1.3 Model-based Classification

This method assumes that traces from each activity have been generated based on a particular model.

In other words, a time series from each activity fits a model which, for example, could be a mixture of

probability distributions. After obtaining the model, the model parameters will function as features.

Hence when a new trace shows the same behavior as a known activity by fitting its model, it will


be labelled as that known activity. For example, Naive Bayes [30] is a simple probabilistic classifier,

explained as follows. Having features F1, ..., Fn a probabilistic model for their associated class (C) could

be P (C|F1, ..., Fn). The label for a given particular set of features could be found choosing the class with

maximum conditional probability:

argmaxC

P (C|F1, ..., Fn)

We find this model based on Bayes’ theorem:

P (C|F1, ..., Fn) =P (C)P (F1, ..., Fn|C)

P (F1, ..., Fn)(2.2)

In the end we are looking for the class that gives the maximum P (C|F1, ..., Fn) and since P (F1, ..., Fn) is

common for different classes we just look for a class that gives the greatest P (C)P (F1, ..., Fn|C). So, we

just need to calculate the following term, and by assuming that features are conditionally independent

given the class label we will have:

P (F1, ..., Fn|C) = P (F1|C)...P (Fn|C) (2.3)

Although Naive Bayes is easy to implement and provides robust answers, its independence assumption

is not usually accurate [39].

Chapter 3

Preliminaries

In this section, we provide an overview of some of the techniques used in our proposed AR system

along with the implemented localization network. We go over Affinity Propagation clustering, which is a

primary stage for classification. The measure we use for clustering is Dynamic Time Warping Distance.

Other concepts used in the proposed system, such as Random Projection and Multiple Classification

System, are described in this chapter as well. The details of the proposed method are explained in

Section 4.

3.1 Dynamic Time Warping Distance

There are many different measures for calculating distances between time series. The choices vary from a

very simple one, like Euclidean distance, to more complex measures, like Kullback-Liebler [18].

One way to define distances among time series (traces) is by finding the minimum distance between

those traces by matching their samples. This type of distance has many applications in speech recognition

[15], gesture recognition [16, 17] and time series clustering [18].

Some benefits of this type of measure are [18]:

• Minimizes the effect of shifting and time-scale changes on distance .

• Allows for different local elasticities (i.e., if a trace is squeezed in some parts and loosened in others,

the distance will not change).

To calculate this kind of distance, assume that we have two traces, u and v. Let lu and lv indicate

the lengths of their traces, respectively. In order to find the minimum distance, we match samples of

traces together and sum up over the distances between pairs of samples. The mapping from one trace to

8

Chapter 3. Preliminaries 9

another is shown by vector P . The entries of P are pairs of samples that have been mapped to each

other.

P (u, v) = {p1, p2, ..., pL} (3.1)

pl = (i, j), 1 ≤ l ≤ L, 1 ≤ i ≤ lu, 1 ≤ j ≤ lv (3.2)

,where pl = (i, j) means that ui has been mapped to vj . L is the length of the path that maps samples

of u to samples of v.

The mapping between traces should have some certain properties. The most important one is to be

monotonic, meaning that if pl = (i, j) and pl+1 = (i′, j′) we should have i ≤ i′ and j ≤ j′. The concept

of monotonicity essentially means to maintain the time order while matching samples.

There is an optimal monotonic path that matches the samples of u and v together in a way that the

sum of the distances between the mapped samples is minimized. If the traces are short, we can search

among all possible matchings to find the one that gives the minimum distance.

However, for long traces, this search could be very costly in terms of calculation. To solve this problem,

the solution is to use dynamic programming and find the next matching in the path using the current

matching, as done in the Dynamic Time Warping method [10]. Using dynamic programming does not

give the optimal results because it does not use the whole information about samples at the same time.

However, according to its low complexity its been widely used to find the distance between time series.

To see haw how matching samples of one trace to another in a dynamic way is done, assume we want

the distance between u and v:

Step 1: Build a lu × lv matrix called D. Entry Di,j from this matrix represents the distances between

sample i and j from u and v, respectively. To find the distance between the numbers ui and vj (since they

are just numbers [static]), we can use Euclidean, absolute value of difference, or any distance measure

function that outputs non-negative distances. If we choose to utilize Euclidean, then:

Di,j = (ui − vj)2 (3.3)

Now we have the distances between pairs of samples of traces or in short matrix D.

Step 2: Compute DTWi,j for all i and j as shown in Algorithm 1. Note that DTWi,j is the minimum

distance between truncated u and v (i.e., sequence of u1, ..., ui and sequence of v1, ..., vj). At the end the

total DTW distance between traces u and v is:

DTW (u, v) = DTWlu,lv (3.4)


Initialization : i = 1, j = 1DTW (0, 0) = 0, DTW (1 : lu, 0) =∞, DTW (0, 1 : lv) =∞while j ≤ lv do

while i ≤ lu doDTW (i, j) = D(i, j) +min{DTW (i, j − 1), DTW (i− 1, j), DTW (i− 1, j − 1)}i = i+ 1

endi = 1j = j + 1

endAlgorithm 1: Calculation of DTW distance

DTWlu,lv is actually the sum of distances of pairs of samples that lead to a minimum total distance.

Each mapping can be represented by pl = (i, j) which shows sample number i from u has been matched

with sample number j from v. The vector of this mappings is P which contains all pairs of samples that

lead to DTW (u, v). In order to find the path (P (u, v) = {p1, p2, ..., pL}) that results in the total distance

mentioned in Equation 3.4 we perform the following steps: From the algorithm, we can see that DTW

Data: DTWi,j for 0 ≤ i ≤ lu, 0 ≤ j ≤ lvResult: Finding PInitialization: Set pL = (lu, lv), l = L,;while i ≥ 1 or j ≥ 1 do

pl = (i, j)pl−1 = argmin{DTWi−1,j , DTWi−1,j−1, DTWi,j−1}l = l − 1

endAlgorithm 2: Finding the path that results in DTW (u, v)

puts two more constraints on P (other than monoticity).

• Boundary conditions: The warping path starts from first sample of each trace and ends with the

last samples of each, p1 = (1, 1) and pL = (lu, lv).

• Continuity (Step Size): Given pl = (i, j) and pl+1 = (ui′ , vj′), we should have: i ≤ i′ ≤ i+ 1 and

j ≤ j′ ≤ j + 1.

Adding constraints is a typical step in determining distances. As DTW gives the minimum distance for a

path that meets these constraints, we might come up with smaller distances by allowing for other step

sizes and a different starting and stopping matched pairs. However, we are not interested in gaining the

unconstrained minimum because its introduces much higher computing costs.


3.2 Affinity Propagation

Affinity Propagation (AP) is a clustering method used mainly for pattern to distinguish between different

trends in sample traces collected by sensors [20]. We call a head of a cluster that represents that cluster,

the exemplar of that group. AP by showing the number of clusters without initialization helps all of the

data points to have the same probability to be chosen as an exemplar [11]. The process of AP clustering

is explained as follows.

Assume that we have a training data set of size (Q) to be clustered. To output clusters, AP requires

to have the similarities between different pairs of data points. For the self similarities (i.e. the similarity

of each data point with itself) we usually input the same value (e.g. the median of pairwise similarities)

to the algorithm. It is optional to provide the algorithm with various self similarities when there is

different preferences for each data point to be an exemplar.

Utilizing these pairwise similarities (i.e. S(p, q) including self similarities [S(p, p)]) AP provides

messages called Availability (A) and Responsibility (R) in an iterative manner, as follow:

First, we initialize the Availability matrix (i.e. matrix of pairwise availabilities and responsibilities,

respectively) with zeros.

Second, we update A and R for all pairs of data points (e.g. p and q) using the proceeding equations

until they converge:

R(p, q) = S(p, q)− maxq′,q′ 6=q

{A(p, q′) + S(p, q′)} (3.5)

A(p, q) = min {0, R(q, q) +∑

p′ /∈{p,q}

max {0, R(p′, q)} (3.6)

A(p, p) =∑

p′,p′ 6=q

max {0, R(p′, q)}

A(p, p′) indicates the suitability for node p′ to be the exemplar for p, compared to other candidate

exemplars. Similarly, R(p, p′) indicates its suitability for trace p to pick p′ as an exemplar, compared to

others’ preference for picking n′ [11].

Next, using A and R matrices, AP finds groups and exemplars as explained in Algorithm 3.


Data: Responsibilities and AvailabilitiesResult: Exemplars and their clustersEp=arg max

q∈Q{A(q, q) +R(p, q)}

if Ep = p thenTrace number p is an exemplar;

endAlgorithm 3: Identification of exemplars for each data point

3.3 Random Projection

Random projection is a method for dimensionality reduction [12]. From an algebraic point of view, any

matrix with a lower number of rows than columns, when multiplied to a vector, can reduce the size of

that vector. However, in reducing the size of vectors, we do not want to lose the characteristics of that

vector. In other words, if we have two vectors that are close to each other in their initial spaces, we want

them to be close after the mapping as well. Our projection thus has to satisfy a condition known as

Restricted Isometry Property, which will be discussed in 3.3.

Assume that we have a trace u with length lu, Random Projection (RP) is a projection of u obtained

by multiplying it with a random matrix (it satisfies RIP as explained in 3.3). If we name the random

matrix G, with a size f × lu (f � lu), it will be multiplied to the column vector shape of trace u. This

calculation results in mapped u (i.e. u), as shown below:

uf×1 = Gf×lu ulu×1 (3.7)

Each element of a random matrix has a random value obtained from a random distribution. For

example, if we have a random matrix with normal distribution the elements are equal to g with a

probability of 1√2πe−

12 g

2

.

Using RP, one usually aims to reduce the number of samples in a vector. Hence, we usually want

f � lu. So, the result of this mapping (u) is a reduced dimension version of u. The more a signal is

able to be down-sampled, the lower f can get. A measure for an intrinsic dimension of a signal is its

Sparsity Level. Intrinsic dimension shows how short the mapped signal size should be without loosing

much information. The definition of the sparsity level of a signal is explained in [21].

Restricted Isometry Property

Restricted Isometry Property (RIP) is a relaxed form of orthonormal property for mapping sparse signals

that guarantees energy preservation after mapping. When G is multiplied by v, and if v is a sparse vector,

G performs as an approximately orthogonal matrix for projecting v. When RIP holds for G, a small


constant exists ∆v (related to the sparsity level of v) for which:

(1−∆v)‖v‖22 ≤ ‖Gv‖22 ≤ (1 + ∆v)‖u‖22. (3.8)

To the best of our knowledge, there exists no easy approach that one can check whether RIP holds

for a particular matrix or not. This is an NP-hard problem in general. On the other hand, [14] discusses

that some particular classes of matrices, including Guassian, satisfy the RIP with exponentially growing

probability when the number of rows are growing linearly with the sparsity level of v.

3.4 l1 minimization

Assume that we have an under-determined linear system.

t = Φ θ

When we have more columns than rows in Φ, this equation could have infinite solutions for θ. If for

some reason one is interested in the sparsest θ (or θsparse) from those infinite answers, it will put a limit

on the solutions and a unique answer will emerge. Here, sparsity is a measure of the proportions of

non-zero elements in a matrix to the number of all entries. In order to ensure that we find a proper

sparse answer, we must have an orthogonal φ. We create Φ orthogonal by multiplying the above equation

by a pre-processing factor called W :

W = orth(ΦT )T

Φ† (3.9)

where Φ† is the pseudo inverse of R and orth(Φ) shows an orthogonal basis for Φ and T stands for

transpose. So after multiplying W by our original equation, we will have:

Wt = orth(ΦT )T θ (3.10)

Now having Ψ = orth(ΦT )T and τ = Wt, we are able to look for a sparse solution among solutions for

t = Φθ.

Since sparsity and the value of norm are inversely proportional, we can minimize a p-norm of vector θ

to find the sparsest answer among solutions for θ [13].

Moreover, since we have a linear equation, 1-norm gives the sparsest results and the l1 minimization

problem will be:

θsparse = arg min ‖θ‖1 s.t. τ = Ψθ (3.11)


3.5 Combination of Multiple Classification Results

When we apply a machine-learning method to noisy data, the results will be corrupted, depending on

how sensitive to noise the classifier is. In [9], it was shown how combining the decisions of Multiple

Classifiers (MC) can help with a more accurate and robust final result. In most cases, a combined answer

outperforms the individual accuracy of each classifier. Here is a brief overview of three common MC

techniques:

Majority Voting (MV)

In this method, we choose the mode of different classifications results. Thus, among different solutions,

we take the most frequent answer [22].

Weighted Majority Voting (WMV)

The same idea as MV applies here, but voting is done after each decision is repeated proportional to its

confidence. Confidence is defined and computed based on the type of classifier. For example, probabilistic

classifiers associate the probability of the final answer as their confidence in that solution [24].

Most Confident Classifier (Naive Base)

In this method, we simply choose the classification result that has the highest confidence in its decision

[25]. Hence, the solution with the highest posterior probability will be chosen.

Behaviour-Knowledge Space (BKS)

In training stage we create a look-up table from outputs of different classifiers decisions and the actual

label for those data points. Next in the testing stage when we come up with a combination of decisions

from our classifiers we find the corresponding label based on the aforementioned table [26].

Chapter 4

Proposed Activity Recognition

System

In this chapter, we propose an activity recognition system that uses a 3-axis accelerometer to address the

user-independent activity detection problem. Activity recognition has applications in human-computer

interaction, health care, indoor localization, and tracking. Different activities result in different trends of

acceleration in x, y and z axes. Utilizing machine learning tools, the system learns special characteristics

associated with a particular action.

However, learning these properties can pose some challenges. First, traces from the same activity

might show different behaviours for various users, which makes the learning process more challenging and

complicated. Second, collecting traces for training purposes or updating the database is costly, in that

it requires users to gather data while performing different activities and manually labelling them. Our

system not only addresses the user dependency problem but also reduces the number of required training

traces. The solution is proposed in this chapter, while the simulation results are provided in Chapter 5.

Moreover, depending on the orientation of the phone and where it has been placed, the recorded

acceleration signals vary. Hence, we consider two different scenarios: fixed phone orientation and unfixed

phone orientation. In the fixed phone scenario, the phone will not be rotated with respect to the user’s

body for various traces. For the unfixed phone scenario, the orientation might vary from one trace to

another. The orientations are determined by the axis that is aligned to gravity, the axis vertical to

walking, and the axis aligned to the walking direction. Sections 4.1 and 4.2 describe the set-ups for each

problem, along with approaches for addressing each.

15

Chapter 4. Proposed Activity Recognition System 16

Training Set (Traces with labels)

Learning Algorithm

Predictive Model Testing Set (Traces without labels)

Expected Labels

Figure 4.1: Supervised learning algorithm

4.1 Fixed Phone Orientation Scenario

In this section, we provide a general overview of the fixed phone case problem and how we address it. In

this scenario, the rotation of the phone is fixed for all traces in different classes. The main application for

this method could be in AR-using wearable sensors, but smart phones mounted on the body or kept at a

certain orientation could also apply to this method.

The general solution for an activity recognition problem is a supervised learning system that leads to

a classification. In other words, the system learns the trends of each activity class and categorizes a new

trace as its associated class. A general classification algorithm is shown in Figure 4.1.

The training traces are collected and labelled by users. These are then used to build a model that

finds the activity label for a new trace, as illustrated in Figure 4.1. The traces are the consecutive linear

acceleration measurements of a user’s phone in x, y and z axes for different activities. As discussed in

section 2.1.2, we use a raw time series in our approach.

Assume that we have labelled traces from different activities. Each trace consists of consecutive

readings from three axes of the accelerometer sensor equipped inside mobile phones. A trace u is a matrix

with lu columns and four rows. The first three rows represent acceleration data from x, y and z directions

and the last row is the corresponding sampling time for its column. lu is the number of samples for that

recording of accelerometer. We have shown examples of these traces for two different activities in 4.2.

Note that these traces have been collected by different users and might even have different lengths.

Our method solves most problems associated with raw data classification such as having traces with

different lengths as well as high dimensional input.

Moreover, users may walk at different paces, which makes some traces look more stretched that


Figure 4.2: Traces for two different activities

others. These traces are also asynchronous, meaning that the point value could be totally different for

two traces from the same activity. For example, one walking distance might start from a peak positive

acceleration in one axis, while another walking trace might start from a negative value. We address the

asynchronous traces along with the problem of length and elasticity differences by introducing a modified

version of DTW distance. Regarding the high dimensionality problem by Random Projection, we reduce

the number of samples in each trace, as discussed later in this section.

Since we are dealing with time series and aim to use them as raw data, we have to design a classifier

for unprocessed traces. Time series classification, like static data classification, needs a learning algorithm

to learn characteristics of each activity. The choice of the algorithm is determined by characteristics of

the data, along with the complexity and accuracy requirements of the application. The learning and

testing algorithms for the fixed phone scenario are described in sections 4.1.1 and 4.1.2.

4.1.1 Learning Algorithm

In this thesis, for the fixed phone scenario, the training process is done in three consecutive steps, as

illustrated in Figure 4.3.

First, training traces should be smoothed to filter high frequency noise. Then we group all traces

to different clusters. Regarding the measure used for clustering, we use a variation of Dynamic Time

Warping Distance. We explain each step in further detail in this section.


Figure 4.3: Training Phase

Smoothing

The filter used in our system has a very basic moving average, with a window sliding one sample along

each axis of each trace. This filter assigns the simple average of samples in each frame to the first point

of the window.

The more we want to filter out higher frequency noise, the larger the window size of the filter needs

to be. However, having a large filter window will introduce latency in signal passing. Therefore, there is

a trade-off between noise reduction and waiting time from the first samples arrival to the first smoothed

sample created. Furthermore, significant patterns in the signal should remain after the filter. The window

size is basically found through cross-validation.

There may, however, be cases where the induced latency is not acceptable for that particular application.

In these instances, other filters, like the Double Moving Average Filter or Low Pass Filter, may be used

instead.

We apply the same filter to every trace in order to remove noise and rapid changes due to sensor

faults.

Modified Dynamic Time Warping Distance

A key factor in the clustering process is choosing the right similarity measure to cluster based on it. The

chosen distance measures the different lengths and elasticities. So, the goal here is to find a meter that is

small within the traces of each class and large between traces from different classes. Selecting the right

similarity standard will assist the clustering stage.

The simplest way to accomplish this could be summing up the Euclidean distance between samples of

two traces, but this approach does not work for different lengths and is affected by shifts in traces. Other

options, such as Short Time Series (STS) distance [44], compares the slopes at each point instead of

the actual values, which improves the results of simple Euclidean. Nonetheless, this is also not the best

measure for our case, because each trace is assumed to be a piecewise linear function and hence constraints

are placed on the length of the traces. There are a few other measures with their own assumptions on

data, namely, probability-based distance function [45] J divergence, and symmetric Chernoff information


(a)

(b)

Time Series Feature

extraction

• Euclidean Distance • Pearson’s correlation

coefficient related distance • Mikowski distance • …

Time Series

• Short time series Distance • Dynamic Time Warping Distance • Dissimilarity based on the cross-

correlation • …

Figure 4.4: Different Distance Measurement Methods

[46, 47]. Each of these approaches likewise has its pros and cons. Figure 4.4 shows a brief overview of

distance measures.

As stated earlier, DTW easily deals with different lengths and local scales of our traces, and is widely

used for discrete sequences of continuous values. Therefore, among all of the measures discussed above,

DTW is the best choice for our traces. That being said, we still need to modify basic DTW to better

match our requirements. So, in this section, we explain our Modified Dynamic Time Warping Distance

(MDTW) and show how it outperforms the regular DTW.

From section 3.1, we remember that to find the DTW distance between two vectors, we first find the

matrix (D) of pairwise Euclidean distances between samples of two traces. Then, using D, we find the

DTW matrix using Algorithm 1. Next, using Algorithm 2, the matching path (P ) is discovered.

AAssume that we have the accelerometer reading along axis x for trace u and trace v, as shown in

Figure 4.8. An example of a DTW matrix is also illustrated in Figure 4.5. Here, colors show the distance

between each two samples of traces u and v. The dark path shows the samples that have been matched

together. The vector of local cost values (Clocal) along this path is illustrated in Figure 4.6.

As explained previously, the DTW algorithm is initialized in such a way that the first mapping will be

between the first samples of the two vectors and the last mapping will be between the last two samples of

them. This may cause large distances in the very first and last mappings, as shown in Figure 4.9. Even

in a case where two vectors are similar with a constant shift, the algorithm needs some iterations to find

the proper mapping. Hence, usually some of the first and last local distances may be large, but these

values do not translate the difference in traces.

To see how total distance could be affected by DTW constraints, we use an example of two walking


Figure 4.5: DTW matching matrix and path for two traces from activities “walking downstairs” and “walkingupstairs”

Figure 4.6: Local DTW distances along matching path for u and v


Figure 4.7: Accelerometer readings of axis x for two walking traces

Figure 4.8: DTW matching path for two traces from activity “walking”

traces, showing how their samples have been mapped and indicating the local costs along the matching

paths in Figures 4.7, 4.8 and 4.9, respectively. To depict the pairwise matching in Figure 4.8, one trace

has been shifted up by 1 m/s2 to make the matching more visible.

Asynchronous starts, such as those in Figure 4.7, may lead to large distances, while traces in the same

class should have a small distance. However, as mentioned in section 3.1, the summation of the DTW

distances between pairs of samples that have been matched together is the total distance in the basic

DTW method. We can thus use alternative methods to find the total distance based on Clocal to avoid

adding large values at the beginning and end of the cost vector.

Therefore, we modified DTW to only sum up local distances in the middle part of Clocal. In detail,

instead of Clocal, we use a truncated Cα,β , for which α is the index of the first point at which matching


Figure 4.9: Local DTW distances along matching path for two walking traces

is done, and β is the index of the last point at which matching is done. To select α and β, we look for

the first and last time steps whose sample values are lower than the average overall sample values in the

same trace.

We can also avoid fixing the beginning and ending of Clocal to the first and last samples by choosing

the mapping with the minimum distance within all possible mappings of a few samples at the beginning

and at the end. However, this approach still has the aforementioned problem of high distances at two

ends of the distance trace. As we increase the number of samples that we search, the complexity increases

exponentially.

So, we modify DTW to only sum up local distance for the mappings in the middle of Clocal (the

mappings that have matched right samples with each other). In this way, the greater distance signifies

the difference in the activities and is not caused by different phases in the beginning of the traces from

the same activity (asynchronous start of traces). Hence, in MDTW, the value of the total distance means

the difference in the activities and is not caused by different phases in the beginning of the traces from

the same activity (asynchronous start of traces).

To see how this modification helps the clustering, we compared DTW and MDTW distances. The

dissimilarities are between pairs of traces from the same class and pairs of traces from two different

classes. Figures showing this comparison are provided in Chapter 5.

Note that, in the Fixed Phone Scenario, we can use the information of each axis individually

and thus can find the distance between each axis of a trace separately. At the end, we will have

three different distances for x, y and z of each trace (i.e., for traces u and v, the outputs are:

MDTW (ux, vx), MDTW (uy, vy) and MDTW (uz, vz)). To sum up, we deploy a modified version


of DTW distance to find the pairwise distances between acceleration traces along axes x, y and z.

From Distance to Similarity Measure

Regardless of which scenario we are considering, the output of the previous step (i.e., MDTW) should

pass a function to be converted to similarity. There are many ways to do this, the simplest of which is to

use the negative of distances. The most common approach for conversion is through this function:

S(u, v) =1

1 +MDTW (u, v)(4.1)

The particular function to choose depends on the problem set. In our system, we use this equation:

S(u, v) = exp(−MDTW (u, v)) (4.2)

By applying this function to distances in each axis, we come up with three similarity measures.

Calculating this similarity for all pairs of training traces will lead to three similarity matrices (Sx, Sy

and Sz). These similarity matrices are utilized by the clustering algorithm, as explained in the following

section.

Affinity Propagation Clustering

After finding similarities, we feed the similarity matrix into a clustering algorithm to group the traces

with similar behaviours. We need an approach that deals with raw data [18].

Some clustering methods need to know the number of groups beforehand, such as K-means, K-median,

fuzzy c-means and genetic clustering [48]. They iteratively improve a random initial set of exemplars, so

the results are dependent on initiatives. Unlike these methods, in Affinity Propagation, clusters emerge

naturally. In other words, AP provides a general solution that does not rely on manually set parameters

and initializations. Affinity Propagation has the ability to consider all data points as possible exemplars.

Then, by passing some real valued messages, it outputs a number of clusters and head clusters. When

applying AP, there is no need to initialize the exemplars, but the number of clusters can be made greater

or smaller by modifying a factor in the AP process called preferences. We use the median of similarities

for this factor.

Ideally, traces in each cluster share at least one common characteristic. This feature will be used to

find the labels for unknown traces, as explained in Section4.1.2.


… Cluster 1 Cluster 1

Cluster 2

Cluster 1

Cluster 3 Cluster 1

Cluster Hx Cluster 1

…

Exem

p lars

…

…

…

E1x E2x E3x EHx

Figure 4.10: Final output of learning algorithm

We apply AP to all three similarity measures that are driven from MDTW (ux, vx), MDTW (uy, vy)

and MDTW (uz, vz). The result will be three sets of clusters consisting of Hx, Hy and Hz clusters,

respectively. Moreover, each cluster in each set has an exemplar. The exemplar of cluster Ch, 1 ≤ h ≤ Hx,

is shown by Exh . Figure 4.10 illustrates a schematic for the final output of the learning algorithm.

4.1.2 Testing Algorithm

In the testing stage, we are interested in finding the label of a trace from the testing set, namely t. The

testing process is exactly what we do in the on-line phase with new traces from potentially new users.

First, we pass t through a moving average filter. Then, using the output of the previous stage (clusters

and head-clusters), we find the label for the new trace t. An overview of the testing stage in this scenario

is shown in Figure 4.11.

As depicted, the label recognizer has two main steps. In the first step, the set of exemplars that is

closest to the observed data is selected. In the second step, the goal is to find the label for t based on the

members of the clusters chosen in the first step. The details of these processing steps are provided in the

rest of this section.


Similarity Measure (MDTW)

Clusters From

Training

Recognize the label

Smoothed Testing Trace

Find the Similar Clusters

Find The Intersection For x, y, z

Majority Vote

Moving average Filter

New Unlabeled Trace

Figure 4.11: Testing Process (fixed phone scenario)


Finding the Set of Similar Traces to Search In

In our approach, similar to [17], we find close exemplars to the testing trace by defining a threshold

DTWth that was close to the new trace t:

DTWth = ρ× min1≤h≤H

{DTW (Eh, t)} (4.3)

where ρ is a constant that is usually chosen from 1 ≤ ρ ≤ 3. After finding the similarity threshold or

DTWth, the closest exemplars are those that have a distance smaller than DTWth with on-line trace.

In finding similar exemplars, the union of their corresponding clusters will be similar sets or our future

search space:

K = {Ch | DTW (Eh, t) ≤ DTWth, 1 ≤ h ≤ H} (4.4)

In our approach, similar to [17], we find close exemplars to the testing trace by defining a threshold

DTWth that was close to the new trace t:

where α is a constant that is usually chosen from 1 ≤ α ≤ 3. After finding the similarity threshold or

DTWth, the closest exemplars are those that have a distance smaller than DTWth with on-line trace.

In finding similar exemplars, the union of their corresponding clusters will be similar sets or our future

search space:

When our on-line trace t appears to be similar to an exemplar from the training stage, we can then

conclude that this test trace and the exemplar’s cluster have a common characteristic. To sum up, the

label for the new trace will be determined based on traces in clusters that have exemplars close to trace t.

Finding the closest exemplars helps to remove outliers and shrinks the size of the search space, which

consequently reduces the computational complexity and costs in the next stage.

In the fixed phone scenario, each axis gives independent information regarding users movements,

and we use this diversity in our system. Assume the output similar set for different axes are Kx, Ky

and Kz. In other words, we repeat the same process in Equations (4.3) and (4.4) for all three sets with

corresponding axis of t separately.

The process is visualized in Figure 4.12 for one axis, and the same procedure is done for the other

axes of the traces. In this figure, MDTW (Ex2 , tx) is the minimum among all MDTW (Exk , t)’s. Assume

that MDTW (Ex3 , tx) is also lower than the threshold computed in Equation (4.3). So, from Equation

(4.4), we conclude that clusters 2 and 3 have similar members to t in axis x. Φx in this example will be

the union of these two clusters.

We then use these three similar sets (for various axes of t) to find the label of t, as explained in the


… Cluster 1 Cluster 1

Cluster 2

Cluster 1

Cluster 3 Cluster 1

Cluster Hx

Cluster 1

…

Exem

p lars

…

…

…

Testing Trace (tx)

DTW(E1x,tx) DTW(E2x,tx) DTW(E3x,tx) DTW(EHx,tx)

Search Space

Figure 4.12: Choosing resemblance set (Φx)


next section.

Recognition Based on Labels of Traces In Similar Spaces of t

The key idea is to find the common traces in Kx, Ky and Kz. The intersection of traces with these traces

in different axes will be O. In O, each axis of the traces is included in the corresponding search spaces.

Thus, when a trace is similar to t in all three axes, it is probably from the same activity. Here, we see

that all traces in O do not have the same label. Because the label of each time series in O is the outcome

of a classifier, we can merge these answers to obtain a more accurate solution. We then find the most

frequent label is among labels of traces in similar sets as the label of t.

However, if we have different numbers of training traces from various activities, we can choose the

most confident answer (i.e., the trace that has the minimum MDTW distance with t) or use the BKS

method to obtain unbiased results.

4.2 Unfixed Phone Orientation Scenario

As mentioned previously, we have two scenarios, of which the details of the training steps and the entire

testing algorithm differ for each. One situation is when the position of a phone is decided for all traces

(e.g., fixed in hand, mounted on body, etc.), and the other allows for rotation of the phone from one trace

to another. The problem is almost identical to what we had in the Fixed Phone case, except that here

we have different phone orientations for different traces. Hence, we cannot use the information for each

axis independently, so we propose our solution for the latter scenario.

4.2.1 Learning Algorithm

The overall learning algorithm is almost the same as for the fixed phone case. The learning process has

three main steps, all of which are done off-line, similar to what is shown in Figure 4.3.

First, a moving average filter is applied to each trace in the training set to remove high frequency

noise. This noise might be caused by accelerometer adjustment or sensor sensitivity. The next step

is to find similarities between each pair of traces in the training set. Since similarity and dissimilarity

(distance) are inversely related, we find similarities through distances. We again use Modified Dynamic

Time Warping Distance as a dissimilarity measure, since we might have different lengths, shifts, and

elasticities from one trace to another.

However, in this scenario, we may capture a certain type of acceleration trend for the walking activity

in axis x, whereas the next time, the same changes in axis y may be due to various rotations of the


phone. So, to find the real distance, we must first sum up the traces for different axes (the sum of

squares) and then find MDTW distance for the overall trace. Hence, the output of MDTW for this

scenario will be a single dissimilarity value for each pair of traces (i.e., for traces u and v, the output is:

MDTW (utot, vtot) = MDTW ((u2x+u2y +u2z), (v2x+v2y +v2z))). We find the matrix of pairwise distances

between all traces in the training set. Later, the similarities matrix is found using the distance matrix, as

done in (4.2). The similarities will help Affinity Propagation to cluster training traces. Exemplars are

representative of groups from the same set to which they belong. Clustering is again the main stage of

the entire learning process, and its results will be used in the Testing Phase.

The output of this step for this scenario is a set of H clusters, each cluster Ch having a head cluster

Eh, 1 ≤ h ≤ H. To sum up, finding the closest exemplars helps remove outliers and shrink the size of

the search space, thereby reducing both computational complexity and cost.

4.2.2 Testing Algorithm

For the unfixed phone rotation case, we use the output of the previous stage (clusters and head-clusters)

to find the label for the new trace t. However, the recognition part is entirely different from our previous

scenario. To find the label for the new traces, we follow the steps depicted in 4.13. First, we pass the

gesture trace through the same moving average filter we used in the Learning Stage.

By intuition, if each cluster represents a single activity, the label of the closest exemplar is the

answer. However, this technique does not have a good performance compared to similar techniques in the

literature. In controversy, AP does not give an assurance regarding that members of each cluster have the

same label. Moreover, it provides us with upper level features, more than just classifying by the labels.

To address this issue, we deploy a two-step comparison process to find the label for new traces, similar

to [17]. First, we find exemplars that are most adjacent to the new trace. Second, the best match to the

new trace is chosen among the members of clusters selected in the previous step. The new trace will only

be compared to traces in clusters that have close exemplars to trace t. Finding the search space is similar

to section 4.1.2 (except that t gives one similar set for the whole trace t instead of three for each axis of

it). The following parts describe the other steps in greater detail.

Random Projection

The second step in the testing phase is to map all of the similar traces or K, along with the testing trace

t, to another space to reduce the complexity of further comparisons. The random matrix that we are

using follows this distribution for each g in G:


Similarity Measure (MDTW)

Clusters From

Training

Recognize the label

Smoothed Testing Trace

Find the Similar Clusters

Random Projection

l1-‐minimization

Moving average Filter

New Unlabeled Trace

Figure 4.13: Testing Process (unfixed phone scenario)


g =√

3.

1 with probability 1

6

0 with probability 23

−1 with probability 16

It has been proven in [42] that this matrix would satisfy RIP condition for sparse vectors projection.

We want this projection to simultaneously make all traces the same size. Hence, if we are using Gf×b,

b and f should be the same for all projections. In order to make all traces have the same size before

projection, we zero pad. The number of zeros is the difference between the current trace and the longest

trace (among all training traces, along with the new testing trace) after projection. Assuming K consists

of K traces with lengths l1, ..., lK the desired size after zero padding is:

b = max {l1, l2, ..., lK , lt} (4.5)

So, the number of zeros to be added for each trace is:

l′t = b− lt

l′k = b− lk , 1 ≤ k ≤ K (4.6)

Zero padding involves l′k zeros at the end of trace k from K, and l′t zeros at the end of t. Note that the

position of the added zeros has no effect on the mapping.

The next step is making a matrix Phi by putting together new zero-padded vectors (φ’s) from K.

We can find a proper f by finding the minimum sparsity level among traces K and t. If we show

sparsity level for traces in K by ηk, 1 ≤ k ≤ K and the sparsity level of t by ηt, f will be:

where β is an integer constant (usually 3 or 4) and η for each trace is found for the overall trace that

is the sum of the square of each axis.

At the end, we multiply G by Φ and find mapped Φ (Φ), and also by t to find t. Later, we use the

mapped traces to find the closest time series to t.

Finding the Closest Trace In The New Space

We find the similar set based on the distances between the overall acceleration traces based on Equation

(3.4). This problem could be formulated as:


t = G θ + ε (4.7)

where θ is a sparse vector that finds the weight of each trace of Φ in building up trace t (assuming that

the mapping of the new trace is a linear combination of mapped similar set traces). Ideally, we are

looking for a 1-sparse θ that finds the closest trace in Φ to t.

Before the problem is formulated as an l1 minimization, we need to make Φ orthogonal by multiplying

W = orth(ΦT )T

Φ† to Equation 4.7:

τ = W t = Ψ θ + ε′ (4.8)

Now we are able to solve this problem and find a sparse answer for θ. To compare traces, we deploy l1

minimization to find the best matches to the testing trace:

θsparse = arg min ‖θ‖1 s.t. τ = Ψθ + ε′ (4.9)

This problem will provide a sparse answer for θ. The index of the maximum element in θ will show the

trace that is closest to t in Psi:

ψclosest = arg max θsparse (4.10)

And finally:

label(t) = label(ψclosest) (4.11)

Chapter 5

Simulation Results

In this chapter, we discuss our data base properties and provide comparisons for various problems

addressed throughout the thesis. First, as a key factor in the training process, we require a set of examples

of proper system behaviour called labelled data. The proposed machine learning system will first be

trained by the training traces, and then, to see how well they can predict results, we will test them with

the testing database. More details about our Activity Recognition system are described in as follows.

Consider a data base of activities (Q) repeated by different users multiple times. This dataset

includes a number of actions, ranging from simple movements to ones representing letters. Each trace has

information from all three axes of the accelerometer embedded in a smart phone. The first three rows

represent acceleration data from x, y and z directions, and the last row is the corresponding sampling

time for its column.

From this set of activity traces, we choose a portion of them for training and leave the rest for testing.

Even though we have the labels for testing traces, we will use them only for validating our results and

not for finding the predictive model.

Before we go to the AR system simulation, we provide results on MDTW vs DTW Distances as our

main contribution in both fixed and unfixed scenarios. Figures 5.1, 5.2, 5.3 and 5.4 show the calculated

distances for each approach, both among traces of one class and between traces from different activities.

These distance histograms between different and similar classes for DTW and MDTW provide

simulation results that indicate the improvements caused by our modified DTW compared to the regular

DTW. As Figures 5.1, 5.2, 5.3 and 5.4 illustrate, the distances in the same class case are much lower

in the modified case. Besides, classes are still distinguishable due to the vast distances we have in the

modified DTW approach.

33

Chapter 5. Simulation Results 34

Figure 5.1: Distance histograms among two traces from different classes (regular DTW)

Figure 5.2: Distance histograms between two traces within the same class (regular DTW)


Figure 5.3: Distance histograms among two traces from different classes (modified DTW)

Figure 5.4: Distance histograms between two traces within the same class (modified DTW)


Figure 5.5: Testing results of the proposed system (Fixed Phone Scenario) and other benchmark AR methods(systems are trained with 10 samples from each 4 classes)

Fixed Phone Orientation Results

The traces used for testing our proposed system in the Fixed Phone scenario are from a very diverse data

base provided by [27]. They include the four activity classes of walking, standing up, walking downstairs,

and walking upstairs. The data base used here is collected by a group of 30 volunteers with an age range

of 19-48 years. The volunteers obtained the data wearing a smart phone (Samsung Galaxy S II) on their

waists. Using the phone’s embedded accelerometer, they collected 3-axial linear accelerations at a pace of

50Hz at an approximate length of 2-2.5 seconds. In the following simulations, the collected dataset has

been randomly divided into two sets, with one set being used for training, and the other for evaluating

the system.

Figures 5.5, 5.6 and 5.7 compare the results of our proposed method with both Decision Tree and

Naive Bayes Algorithms for four, three and two number of classes, respectively. As the results show the

superiority of our method will be more significant as the number of classes increases.

Moreover, the results for our proposed system for the Fixed phone scenario having different number

of classes (2, 3 and 4) have been separately depicted in figure 5.8.

The simulations show a nearly perfect detection of activities for training with only 10 traces from

each class. To observe the impact of the number of training traces, we tested the average accuracy versus

the number of training sets in Figure 5.9.

Simulations show nearly perfect detection of activities for training with only 10 traces from each

class. To observe the impact of number of training traces, we have tested the average accuracy versus the

number of training sets in figure 5.9.


Figure 5.6: Testing results of of the proposed system (Fixed Phone Scenario) and other benchmark AR methodsfor (system is trained with 10 samples from each 3 classes)

Figure 5.7: Testing results of of the proposed system (Fixed Phone Scenario) and other benchmark AR methodsfor (system is trained with 10 samples from each 2 classes )


Figure 5.8: Testing results of the proposed system (Fixed Phone Scenario) for different numbers of classes(system is trained with 10 samples from each class)

Figure 5.9: Testing results of the proposed system (Fixed Phone Scenario) for different numbers of trainingtraces (classifying in 4 classes) using 300 features


Figure 5.10: Testing results of the NB algorithm (Fixed Phone Scenario) for different numbers of training traces(classifying in 4 classes) using 300 features

Figure 5.11: Testing results of the DT algorithm (Fixed Phone Scenario) for different numbers of training traces(classifying in 4 classes)

Moreover, Figures 5.10 and 5.11 provide similar illustration for NB and DT, respectively. The rate

of increasing accuracy with number of training traces in higher for NB and DT since they have lower

accuracies. Specifically DT shows that the more training traces we provide the algorithm with, the

accuracy will be higher which could be a weakness for DT. To sum, our method is surpassing other ones

mainly in lower number of training traces and higher number of classes which makes it low-cost in terms

of data set collection and more practical.

Unfixed Phone Orientation Results

For the unfixed scenario, we use a database that has different orientations in collecting each activity.

This database has been provided by Zhang et al. [28]. They have used a 3-axis accelerometer attached to


Figure 5.12: Testing results of our proposed method (Unfixed Phone Scenario) for different numbers of classes

the subjects front right hip. The accelerometer sampling frequency is 100Hz and the length of the traces

that we use is about 24 seconds (some were changed). Since they are beneficial for improving indoor

tracking, we use these classes: Walking, Walking Upstairs, Walking Downstairs, and Standing Up.

This data base has been collected by 14 subjects. We used data from all subjects for both training

and testing. The CDF of the resulting accuracies from using this approach is provided in Figure 5.12.

This dataset does not provide feature sets for its traces, we have compared the results from confusion

matrices provided in [49] and [50] for KNN method (K = 10) and SVM for the same dataset. The

systems trained with almost 1600 traces. The overall accuracies are 80.3%, 92.7% 93.2% and for KNN,

SVM and our approach, respectively. To conclude, our approach can provide superior results with respect

to benchmark methods for both Unfixed and Fixed Phone Scenarios.

Chapter 6

Conclusion and Future Work

The main goal for activity recognition is providing information regarding a user’s meaningful movements

for applications like cognitive assistance and human-computer interface.

As smart phones grow ever more ubiquitous, the idea of using their embedded sensors to extract user’s

movement and location information is becoming increasingly popular. For instance, human activities like

standing up and walking stairs can provide details about a user’s location inside a building. Moreover,

detecting activities like walking, running and standing up can help applications related to health care.

The Activity Recognition process inputs some traces with known labels (i.e., activities) and results in

a system that can determine the label for a new trace. These traces can be recorded by motion, sound, or

vision sensors. The trends for these time series are affected by the activity that the user is doing during

the gathering of them. Motion sensors are either mounted on the body (one part or different parts of

body) or are embedded in a mobile device carried by the user. These sensing technologies provide their

own specific accuracy, cost, user comfort, and privacy. Some AR systems use a combination of these

technologies to meet their goals.

Inspired by the availability of motion sensors (especially accelerometers) in smart phones, we proposed

an AR system that uses linear acceleration data from x, y and z axes. The presented system utilizes

our modified version of Dynamic Time Warping to find the similarities needed for the AP algorithm

to complete the training phase. In the testing stage, we use the new unknown trace and the clusters

from AP to find the label for the new time series. The training and testing traces used in our system

have asynchronous starts. However, we propose a method that is compatible to this kind of data. We

also considered two different cases for the phone orientation (whether it is fixed or not). To sum up, in

this thesis, we proposed two approaches to address AR problems in two scenarios namely, fixed phone

41

Chapter 6. Conclusion and Future Work 42

orientation, and unfixed phone orientation.

The proposed AR method was evaluated using two datasets for each scenario. These datasets included

traces from the four activities of walking, walking downstairs, walking upstairs, and standing up. We

further graded the system by comparing its results with Naive Bayes, Decision Tree and the proposed

method with regular DTW.

The results for the fixed phone case show significant differences in accuracy from other benchmark

AR methods. As the number of training traces increases, we reached almost perfect detection. This is

mostly due to changes made in the learning part, especially in the adjusting of DTW. In the fixed phone

scenario, other than utilizing the MDTW mentioned previously, we provided a completely novel testing

stage where we use the diversity of information coming from different axes. To the best of our knowledge,

having separate classifiers for different axes and using Multi-Classifier Systems to combine the results,

has not been used before to detect activities. In the unfixed scenario, using MDTW again helps with

accurate detection of actions.

To conclude, in this paper, we proposed a system that uses only an accelerometer sensor embedded

in a smart phone. Traces used in the learning and testing phases may have different lengths and be

collected by different users. Our contributions in both instances are mainly using modified dynamic time

warping and decision-making based on multiple classifiers outputs.

6.1 Future Work

Future work involves developing a real-time activity recognizer that can label activities inside a multiple-

action trace. For example, if a user is walking and then stops, the enhanced system should find this

change point and detect the transformation in what the user is performing.

Moreover, the implemented localization system can be improved by the results of these two AR

systems. Combining location and AR using inertial sensors on mobile devices with map information and

creating location-specific weighted assistance from a WiFi fingerprinting system can improve localization

and tracking systems.

Another possible extension of this work is solving the problem of tilting. In both scenarios, the

phone’s orientation is fixed during a single trace; while it might change from one trace to another, it

will not change within a single trace. However, using a gyroscope sensor, we may compensate for tilting.

This way, the resultant system is much more convenient.

A distinct interesting topic is to use multiple classification system in the case that we have different

numbers of training traces in each class. Since majority voting is sensitive to variety of number of traces

Chapter 6. Conclusion and Future Work 43

in each class, if we have more traces for training from one class, that particular activity has more chance

to be selected in the MAJ technique. Hence if we are using various numbers of training trace from

activities in data set, we should merge the classification results with other MC methods.

Future work also involves implementing all these systems on a smart phone or other mobile devices

with embedded accelerometers. The system has a reasonable performance computationally, especially in

the testing phase, where resources are more limited. However, to gauge the performance in real life, the

proposed system should be implemented on commonly used personal devices with inertial sensors.

6.2 Contributions

The main contribution in this thesis is the modifications made to regular DTW. These changes help

solve the problem of asynchronous starts for traces from the same class. As well, the preference in AP

clustering is adjusted according to the number of training traces. In the Fixed Phone Scenario, we found

three MDTW matrices of distances between pairs of traces from one axis in each matrix. So, for example,

we have an MDTW matrix for axis x, which contains dissimilarities between readings of x axes for all

traces. Subsequently, we have three different similarity matrices and various search spaces for each one.

Then, using only the intersection of search spaces for each axis, we find the final answer by choosing the

mode for labels of traces in the overlapping set. Note that our training and testing traces have been

collected by different users and might even have different lengths and sampling rates. Also, our special

combination of different tools in both scenarios works with short datasets, meaning it can be trained

with traces that are nearly two seconds long.

Bibliography

[1] Oscar D. Lara and Miguel A. Labrador, “A Survey on Human Activity Recognition us-

ing Wearable Sensors”, IEEE Communications Surveys and Tutorials, 2013, 1192-1209,

http://dx.doi.org/10.1109/SURV.2012.110112.00192.

[2] M. S. Ryoo, “Interactive Learning of Human Activities Using Active Video Composition”, Interna-

tional Workshop on Stochastic Image Grammars (SIG), in Proceedings of International Conference

on Computer Vision (ICCV), Barcelona, Spain, November 2011.

[3] O. X. Schlmilch, B. Witzschel, M. Cantor, E. Kahl, R. Mehmke, and C. Runge, “Detection of posture

and motion by accelerometry: a validation study in ambulatory monitoring, Computers in Human

Behavior, vol. 15, no. 5, pp. 571583, 1999.

[4] Henpraserttae, A; Thiemjarus, S.; Marukatat, S., “Accurate Activity Recognition Using a Mo-

bile Phone Regardless of Device Orientation and Location,” Body Sensor Networks (BSN), 2011

International Conference on , vol., no., pp.41,46, 23-25 May 2011 doi: 10.1109/BSN.2011.8.

[5] Siddiqi MH, Ali R, Rana MS, Hong E-K, Kim ES, Lee S. “Video-Based Human Activity Recognition

Using Multilevel Wavelet Decomposition and Stepwise Linear Discriminant Analysis”. Sensors. 2014;

14(4):6370-6392.

[6] D. Gusenbauer, C. Isert, and J. Krsche. “Self-Contained Indoor Posi- tioning on Off-the-Shelf Mobile

Devices”. In IEEE Indoor Positioning and Indoor Navigation (IPIN), 2010.

[7] H. Ye, T. Gu, X. Zhu, J. Xu, X. Tao, J. Lu, and N. Jin. “FTrack: Infrastructure-free Floor

Localization via Mobile Phone Sensing”. In IEEE Percom, 2012.

[8] V. Radu, M. K. Marina, “HiMLoc: Indoor Smartphone Localization via Activity Aware Pedestrian

Dead Reckoning with Selective Crowdsourced WiFi Fingerprinting, in Proceedings of the International

Conference on Indoor Positioning and Indoor Navigation, 2013.

44

Bibliography 45

[9] J.A. Sez, M. Galar, J. Luengo, and F. Herrera, “Tackling the problem of classification with noisy

data using Multiple Classifier Systems: Analysis of the performance and robustness”, ;presented at

Inf. Sci., 2013, pp.1-20.

[10] Toni Giorgino (2009). “Computing and Visualizing Dynamic Time Warping Alignments in R: The

dtw Package”. Journal of Statistical Software, 31(7), 1-24.

[11] Brendan J. Frey and Delbert Dueck (2007). “Clustering by passing messages between data points”.

Science 315:972-977. doi:10.1126/science.1136800.

[12] Ella Bingham and Heikki Mannila, “Random projection in dimensionality reduction: Applications

to image and text data”, in Knowledge Discovery and Data Mining, 2001, 245-250.

[13] Baraniuk, Richard G. “Compressive sensing.” IEEE signal processing magazine 24.4 (2007).

[14] F. Yang, S. Wang, and C. Deng, “Compressive sensing of image reconstruction using multi-wavelet

transform”, IEEE 2010.

[15] C. Myers, L. Rabiner, and A. Rosenberg, “Performance tradeoffs in dynamic time warping algorithms

for isolated word recognition, Acoustics, Speech, and Signal Processing [see also IEEE Transactions

on Signal Processing], IEEE Transactions on, vol. 28, no. 6, pp. 623635, 1980.

[16] A. Kuzmanic and V. Zanchi, “Hand shape classification using dtw and lcss as similarity measures

for vision-based gesture recognition system, in EUROCON, 2007. The International Conference on

“Computer as a Tool, 2007, pp. 264269.

[17] A. Akl, C. Feng, and S. Valaee, “A novel accelerometerbased gesture recognition system, IEEE

Transactions on Signal Processing, vol. 59, pp. 61976205, Dec. 2011.

[18] T. Warren Liao. 2005, “Clustering of time series data-a survey,” Pattern Recogn. 38, 11 (November

2005), 1857-1874. DOI=10.1016/j.patcog.2005.01.025.

[19] V. Niennattrakul and C. A. Ratanamahatana, “On clustering multimedia time series data using

k-means and dynamic time warping, in Multimedia and Ubiquitous Engineering, 2007. MUE 07.

International Conference on, 2007, pp. 733738.

[20] Brendan J. Frey; Delbert Dueck (2007). “Clustering by passing messages between data points”.

Science 315: 972976. doi:10.1126/science.1136800.

[21] E. Cand‘es and M. Wakin, “An introduction to compressive sampling, Signal Pro- cessing Magazine,

IEEE, vol. 25, no. 2, pp. 21 30, March 2008.

Bibliography 46

[22] V.D. Mazurov, A.I. Krivonogov, V.S. Kazantsev, “Solving of optimization and identification problems

by the committee methods”, Pattern Recognition 20

[23] Hui Liu; Darabi, H.; Banerjee, P.; Jing Liu, “Survey of Wireless Indoor Positioning Techniques and

Systems,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions

on , vol.37, no.6, pp.1067,1080, Nov. 2007 (1987) 371378.

[24] L. Shapley, B. Grofman, “Optimizing group judgmental accuracy in the presence of interdependen-

cies”, Public Choice 43 (1984) 329343.

[25] D.M. Titterington, G.D. Murray, L.S. Murray, D.J. Spiegelhalter, A.M. Skene, J.D.F. Habbema,

G.J. Gelpke, “Comparison of discriminant techniques applied to a complex data set of head injured

patients”, Journal of the Royal Statistical Society, Series A (General) 144 (1981) 145175.

[26] Y.S. Huang, C.Y. Suen “A method of combining multiple experts for the recognition of unconstrained

handwritten numerals” IEEE Transactions on Pattern Analysis and Machine Intelligence, 17 (1995),

pp. 9093.

[27] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. “Human

Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine”.

International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec

2012.

[28] Mi Zhang and Alexander A. Sawchuk, “USC-HAD: A Daily Activity Dataset for Ubiquitous Activity

Recognition Using Wearable Sensors”, ACM International Conference on Ubiquitous Comput-

ing (UbiComp) Workshop on Situation, Activity and Goal Awareness (SAGAware), Pittsburgh,

Pennsylvania, USA, September 2012.

[29] Rokach, L.; Maimon, O. (2005). “Top-down induction of decision trees classifiers-a sur-

vey”. IEEE Transactions on Systems, Man, and Cybernetics, Part C 35 (4): 476487.

doi:10.1109/TSMCC.2004.843247.

[30] McCallum, Andrew; Nigam, Kamal (1998). “A comparison of event models for Naive Bayes text

classification”. AAAI-98 workshop on learning for text categorization 752.

[31] Radu, V.; Marina, M.K., “HiMLoc: Indoor smartphone localization via activity aware Pedestrian

Dead Reckoning with selective crowdsourced WiFi fingerprinting,” Indoor Positioning and Indoor

Navigation (IPIN), 2013 International Conference on , vol., no., pp.1,10, 28-31 Oct. 2013 doi:

10.1109/IPIN.2013.6817916.

Bibliography 47

[32] A. Kushki, K. N. Plataniotis, and A. N. Venetsanopoulos, “Kernel-based positioning in wireless local

area networks”, IEEE Trans. on Mobile Computing, vol. 6, no. 6, pp. 689705, June 2007.

[33] R. Singh, L. Macchi, C. Regazzoni, and K. Plataniotis, “A statistical modelling based location de-

termination method using fusion in WLAN”, Proceedings of the International Workshop Wireless

Ad-Hoc Networks, 2005.

[34] J. Ma, X. Li, X. Tao, and J. Lu, “Cluster filtered KNN: A WLAN- based indoor positioning scheme”,

International Symposium on a World of Wireless, Mobile and Multimedia Networks, pp. 18, June

2008.

[35] X. Golay, S. Kollias, G. Stoll, D. Meier, A. Valavanis, P. Boesiger, “A new correlation-based fuzzy

logic clustering algorithm for fMRI”, Mag. Resonance Med. 40 (1998) 249260.

[36] A. Wismuller, O. Lange, D.R. Dersch, G.L. Leinsinger, K. Hahn, B. Putz, D. Auer, “Cluster analysis

of biomedical image time series”, Int. J. Comput. Vision 46 (2) (2002) 103128.

[37] C. Feng, S. W. A. Au, S. Valaee, and Z. H. Tan, “Orientation-aware localization using affinity

propagation and compressive sensing,” IEEE International Workshop on Computational Advances

in Multi-Sensor Adaptive Processing , CAMSAP, 2009.

[38] Dalton, A; OLaighin, G., “Comparing Supervised Learning Techniques on the Task of Physical

Activity Recognition,” Biomedical and Health Informatics, IEEE Journal of , vol.17, no.1, pp.46,52,

Jan. 2013 doi: 10.1109/TITB.2012.2223823.

[39] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. “Human

Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine”.

International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec

2012c.

[40] Y. S. Lee and S. B. Cho, “Activity recognition using hierarchical hidden markov models on a

smartphone with 3D accelerometer,” in HAIS, pp. 460467, 2011.

[41] Dernbach, Stefan; Das, B.; Krishnan, Narayanan C.; Thomas, B.L.; Cook, D.J., “Simple and

Complex Activity Recognition through Smart Phones,” Intelligent Environments (IE), 2012 8th

International Conference on , vol., no., pp.214,221, 26-29 June 2012.

[42] E. Bingham and H. Mannila, “Random projection in dimensionality reduction: Applications to

image and text data, Proceedings of the Seventh ACM SIGKDD Inter- national Conference on

Knowledge Discovery and Data Mining, pp. 245250, 2001.

Bibliography 48

[43] Pirttikangas S., Fujinami K.,and Nakajima T. “Feature selection and activity recognition from wear-

able sensors”. In International Symposium on Ubiquitous Computing Systems (UCS2006), 2006.

International Symposium on Ubiquitous Computing Systems (UCS2006), Seoul, Korea, Oct. 11 - 13,

2006, pp. 516-527.

[44] C.S. Mller-Levet, F. Klawonn, K.-H. Cho, O. Wolkenhauer, “Fuzzy clustering of short time series

and unevenly distributed sampling points”, Proceedings of the 5th International Symposium on

Intelligent Data Analysis, Berlin, Germany, August 28-30, 2003.

[45] Mahesh Kumar , Nitin R. Patel , Jonathan Woo, “Clustering seasonality patterns in the presence of

errors”, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery

and data mining, July 23-26, 2002, Edmonton, Alberta, Canada, doi¿10.1145/775047.775129.

[46] Kakizawa, Y., Shumway, R.H. and Taniguchi, N., Discrimination and clustering for multivariate

time series. J. Amer. Stat. Assoc. v93 i441. 328-340.

[47] Dahlhaus, R., “On the Kullback-Leibler information divergence of locally stationary processes”.

Stochastic Process. Appl. v62. 139-168.

[48] T.W. Liao, B. Bolt, J. Forester, E. Hailman, C. Hansen, R.C. Kaste, J. O’May, “Understanding and

projecting the battle state”, 23rd Army Science Conference, Orlando, FL, December 2-5, 2002.

[49] M. Zhang and A. A. Sawchuk. “Manifold learning and recognition of human activity using body-area

sensors”. In IEEE International Conference on Machine Learning and Applications (ICMLA), pages

713, Honolulu, Hawaii, USA, December 2011.

[50] M. Zhang and A. A. Sawchuk. “Motion primitive-based human activity recognition using a bag-

of-features approach”. In ACM SIGHIT International Health Informatics Symposium (IHI), pages

631640, Miami, Florida, USA, January 2012.

by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is...

Documents

Transcript of by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is...