by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is...
Transcript of by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is...
![Page 1: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/1.jpg)
Human Activity Recognition Using Time Series Classification
by
Zhino Yousefi
A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
c© Copyright 2015 by Zhino Yousefi
![Page 2: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/2.jpg)
Abstract
Human Activity Recognition Using Time Series Classification
Zhino Yousefi
Master of Applied Science
Graduate Department of Electrical and Computer Engineering
University of Toronto
2015
Activity Recognition (AR) has become a basis for applications such as health care and elderly surveillance,
human position tracking, home monitoring and security applications. Embedding motion sensors in
smart phones increases the interest in using smart phones for detecting users’ activities. In this thesis, we
propose a system that learns activity trends using only the data from an accelerometer sensor, which is
the most common motion sensor in smart phones. The system uses raw traces in a training set to build a
predictor that assigns the proper label to new traces. Our approach addresses the two main challenges
in AR using smart phones. First, the system is well trained with fewer training traces compared to
benchmark approaches, and new traces can easily be added to our data base. Second, since we use the
raw traces without dening any particular features, our system gives more general and gives almost perfect
accuracy.
ii
![Page 3: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/3.jpg)
Dedication
To my loving family, who has supported me every step of the way.
iii
![Page 4: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/4.jpg)
Acknowledgements
I would like to express my sincerest gratitude and thanks to my supervisor, Professor Dr. Shahrokh
Valaee, for his guidance, caring, and immense knowledge.
I would also like to thank my friendly lab mates at the WIRLAB group, who offered helpful suggestions
and provided a cheerful research atmosphere.
My very special thanks go to my friends Masoud Barakatain, Shadi Emami, Sepideh Hassanmoghadam,
Masume Sabzi, Nastaran Hajia and Niloofar Ghanbari for their great help and assistance.
Finally, I wish to give my deepest gratitude to my dear parents, Ronak Towfighi and Mohammadsharif
Yousefi, for their unconditional love and support, and to my beloved sister, Arian Yousefi, brother Rozhin
Yousefi, and brother-in-law, Alborz Rezazadeh Sereshkeh, for their valuable guidance and encouragement.
iv
![Page 5: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/5.jpg)
Contents
1 Introduction 1
2 Previous Work 3
2.1 Sensor-based Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Feature-based Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Time Series Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 Model-based Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Preliminaries 8
3.1 Dynamic Time Warping Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Affinity Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Random Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 l1 minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Combination of Multiple Classification Results . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Proposed Activity Recognition System 15
4.1 Fixed Phone Orientation Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.1 Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.2 Testing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Unfixed Phone Orientation Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.1 Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.2 Testing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5 Simulation Results 33
6 Conclusion and Future Work 41
6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
v
![Page 6: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/6.jpg)
6.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
vi
![Page 7: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/7.jpg)
List of Figures
2.1 Example of a Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1 Supervised learning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Traces for two different activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Different Distance Measurement Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.5 DTW matching matrix and path for two traces from activities “walking downstairs” and
“walking upstairs” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.6 Local DTW distances along matching path for u and v . . . . . . . . . . . . . . . . . . . . 20
4.7 Accelerometer readings of axis x for two walking traces . . . . . . . . . . . . . . . . . . . 21
4.8 DTW matching path for two traces from activity “walking” . . . . . . . . . . . . . . . . . 21
4.9 Local DTW distances along matching path for two walking traces . . . . . . . . . . . . . . 22
4.10 Final output of learning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.11 Testing Process (fixed phone scenario) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.12 Choosing resemblance set (Φx) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.13 Testing Process (unfixed phone scenario) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1 Distance histograms among two traces from different classes (regular DTW) . . . . . . . . 34
5.2 Distance histograms between two traces within the same class (regular DTW) . . . . . . . 34
5.3 Distance histograms among two traces from different classes (modified DTW) . . . . . . . 35
5.4 Distance histograms between two traces within the same class (modified DTW) . . . . . . 35
5.5 Testing results of the proposed system (Fixed Phone Scenario) and other benchmark AR
methods (systems are trained with 10 samples from each 4 classes) . . . . . . . . . . . . . 36
5.6 Testing results of of the proposed system (Fixed Phone Scenario) and other benchmark
AR methods for (system is trained with 10 samples from each 3 classes) . . . . . . . . . . 37
vii
![Page 8: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/8.jpg)
5.7 Testing results of of the proposed system (Fixed Phone Scenario) and other benchmark
AR methods for (system is trained with 10 samples from each 2 classes ) . . . . . . . . . . 37
5.8 Testing results of the proposed system (Fixed Phone Scenario) for different numbers of
classes (system is trained with 10 samples from each class) . . . . . . . . . . . . . . . . . . 38
5.9 Testing results of the proposed system (Fixed Phone Scenario) for different numbers of
training traces (classifying in 4 classes) using 300 features . . . . . . . . . . . . . . . . . . 38
5.10 Testing results of the NB algorithm (Fixed Phone Scenario) for different numbers of
training traces (classifying in 4 classes) using 300 features . . . . . . . . . . . . . . . . . . 39
5.11 Testing results of the DT algorithm (Fixed Phone Scenario) for different numbers of
training traces (classifying in 4 classes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.12 Testing results of our proposed method (Unfixed Phone Scenario) for different numbers of
classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
viii
![Page 9: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/9.jpg)
Chapter 1
Introduction
Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in
relevant industries due to its applications in cognitive assistance, indoor localization and tracking [31],
fitness monitoring, and human-computer interface [1], [2]. However, AR is not without its problems, the
earliest attempts at addressing them going back a few decades already. Early AR systems mostly targeted
estimation of total expended energy and not complex activities [3]. These were followed by systems
based on wearable sensors, which found applications in recognizing specific physical activities. Later on,
motion sensors became available in smart phones. The calculation and hardware abilities of smart phones,
along with the popularity of powerful machine-learning algorithms, resulted in the emergence of a vast
range of AR methods. Different AR systems use diverse types of sensors, such as camera, microphone,
accelerometer, gyroscope, or combinations thereof [1]. These systems utilize various approaches to detect
particular sets of activities, from simple activities like walking to more complex ones like cooking.
Due to the increasing popularity and enormous potential of AR, researchers are highly motivated
to improve the systems. This means finding a way to make AR systems work with realistic noisy data,
making them applicable for multiple users, reducing the amount of data needed to train the system, and
enhancing security and privacy. Using smart phones introduces new issues to the area of AR [4]. Sensors
provided in smart phones are not usually as accurate as those used in special devices available for AR.
Additionally, battery draining and computational complexity requirements are more crucial in mobile
devices due to limited resources [4].
From the available sensors, we use the accelerometer sensor because of its availability in mobile phones.
Therefore, in this thesis, we implement an AR method based on supervised learning of accelerometer’s
data and use multi-user data sets to evaluate our system. The results are compared with the most
1
![Page 10: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/10.jpg)
Chapter 1. Introduction 2
common AR techniques.
Our proposed AR system can improve various applications such as localization. Activities like walking
on stairs, standing up or being in elevators, used along with map information of the building, provides
information regarding the location of the user and probable changes in floors. Linking a localizer and an
activity detector can thus be beneficial.
Guenbauer et al. [6] uses activity classifications in their indoor navigation system. Similarly, Ftrack
et al. [7] utilize activities like walking upstairs and downstairs and elevator positioning to achieve floor
detection. This type of incorporation (i.e., using multiple classes of activities) has also been done in AR
with an accuracy of 80% and has improved the localization results [8]. However, we anticipate better
results with our more reliable activity recognition system. Investigating this expectation is a suggested
subsequent work.
This thesis is organized as follows. Chapter 2 is an overview of benchmark Activity Recognition and
Wi-Fi based localization methods in the literature. In Chapter 3 we will provide an overview of basic
concepts, tools and theories used in this thesis . Next, Chapter 4 describes our proposed Accelerometer
based Activity Recognition methods for smart phones. Chapter 5 presents the simulation results for our
proposed method and compares them with other common AR methods. Finally, we conclude the thesis
in Chapter6.
![Page 11: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/11.jpg)
Chapter 2
Previous Work
Various AR methods exist for different combinations of input type and detection algorithms. By types of
input, we mean the source of information, number of activities, ability to support single or multi user, etc.
Based on labelled information input, the AR system predicts the label for a new input from an unknown
activity. The algorithm for addressing the learning problem can be chosen from various supervised or
semi-supervised classification methods [1].
For the sensors selection task, we may use image and video or other sensors like motion sensors or
microphones. The main applications for vision-based activity recognition are in security, surveillance,
and improving human-computer interaction [2].
The vision-based systems either consider each frame of an activity individually or classify a sequence
of frames from an action. These frames could be from a single camera or from multiple cameras, and the
methods can be categorized based on whether there is a single user or multiple users in a picture. To
sum up, significant work has been done in the area of vision-based AR, and published research in the
computer vision literature has helped with detection accuracy. Determining the activity label is now
almost perfect for many different scenarios, as achieved in [5].
Another important area of AR methods uses motion sensors. Motion sensors like accelerometers and
gyroscopes measure linear and rotational acceleration of their movements, respectively. These sensors can
be easily mounted on a body to obtain more detailed and accurate motion information. They are also
widely available in smart phones and other mobile devices. In either case, the output of these sensing
tools can be further processed to make a predictor using machine-learning tools. Processing basically
means discovering the patterns that each activity possesses. Such a trained system can then determine
the label of activities based on information from motion sensing [4]. Since our introduced system uses
3
![Page 12: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/12.jpg)
Chapter 2. Previous Work 4
motion sensors, we will discuss it further in Section2.1.
2.1 Sensor-based Activity Recognition
In this section, we investigate AR methods that use motion sensors. Note that the data obtained from
the sensors is in the shape of a time series, i.e., we have the reading of sensors for a number of time
instances. For different activities, the characteristics of the time series are different. Storing a number of
traces for different activities can provide us with a data set that can be used for training and testing the
system. The following is a categorization of different ways to obtain the aforementioned predictor from
this database [18]. We will compare our method with one example from each of the categories below.
2.1.1 Feature-based Classification
In feature-based techniques, we manually choose some functions that are applied to the database traces
and output static data which does not relate to time. These functions extract the features that distinguish
between various activities. For example, intuitively, the average of the norm of data from an accelerometer
should be greater for walking than for standing up, or the variance (changes in time) should be greater
for running than for walking. After assigning these numbers (mean, variance, etc.) to each trace, we
continue with classification based solely on the features.
However, some challenges exist in using these features. In brief, some of them are:
• Defining features that capture the most useful information in databases, which are usually
application-dependent.
• Choosing proper ones (or assigning weights to them) for our special clustering or classification
method.
• Trying to avoid having correlated features for complexity and calculation cost matters.
There are numerous classification methods based on static data, such as Decision Tree group [38], Support
Vector Machine [41], K-Nearest Neighbours [43], etc.
Decision Tree
The following is a short overview of Decision Tree-based classification methods:
A Decision Tree (DT) classifier assumes the form of a tree, with nodes and edges, as depicted in Figure
2.1.
![Page 13: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/13.jpg)
Chapter 2. Previous Work 5
Average
Variance standing
Running Walking
<1
<0.1
>1
>0.1
Figure 2.1: Example of a Decision Tree
Each non-leaf node tests if one feature is positive or negative (or, in a continuous case, above or below
a threshold). The root is the non-leaf node at the very top of the tree (the one that is only connected to
two edges). Each edge is a branch from a test node that tends either towards another test node or a
leaf node. Leaf nodes are tagged with class labels according to the path that connects them to the root.
Paths are consequent nodes and edges leading to a leaf node.
To establish a decision tree based on training data, we follow these steps [29]:
• Start with the feature that best splits the set of items.
• Continue finding the best feature at each test node (only for training instances that have been
partitioned).
• Stop when all training points reached a node that has the same label or when all of the features
have been used along the path that reaches the current node.
The best feature to split the data base is a feature with minimum conditional entropy. The entropy
of classes provides us with a feature, so if we know the amount of that feature, we can ascertain the
class that data belongs to. Thus, the information gain for each feature (B) is a result of subtracting
conditional information from entropy of classes (C) by themselves:
IG(C,B) = H(C)−H(C|B) (2.1)
At the end, the feature with the greatest information gain (IG) is chosen in each step as the divider.
To test new data, we start from the root of the created tree and decide which edge we should follow,
based on the features of the data. We continue along these lines until we reach a leaf, whose label will be
the classifier’s recognition result. For example, assume that we have a trace of accelerometer data for an
![Page 14: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/14.jpg)
Chapter 2. Previous Work 6
activity with an average of 2 and a variance of 0.3. Using the classifier in Figure 2.1, we follow the left
edge of the test nodes “average” and “variance” (1 ≤ 3 and 0.1 ≤ 0.3). Then we will reach the node
“running”, which is the solution of the classifier for this trace.
One of the main advantages of DT is that it makes understandable rules. By looking at a classification
tree, we can learn which features are the most important in the classification. On the other hand, there
are some disadvantages in using a DT classifier. It is computationally expensive (NP-hard), and splitting
training data after each test node reduces the number of training data for upcoming test decisions (i.e., we
are not using all resources at each step). Moreover, DTs are not suitable to use when we have continuous
features because that will require computing proper thresholds on which to base appropriate test nodes.
2.1.2 Time Series Classification
Like static data classification, time series classification requires a learning algorithm. Most raw data
based solutions need the distance/similarity between pairs of data points. The classification algorithm
and evaluation standard are also a part of process.
Golay et al. [35] uses Euclidean and cross correlation based distances and applies Fuzzy c-mean
algorithm to classify MRI brain activity images. Neural Network based approaches as applied by A.
Wismuller et al. in [36] do not require similarity/distance measure. As another example Ahmed et al.
[17] has provided a novel accelerometer-based gesture recognition system using Dynamic Time Warping
distance measure and Affinity Propagation clustering technique.
The choice of the algorithm is determined by characteristics of the data along with complexity and
accuracy requirements of the application. The advantage of using raw data is that we do not lose
information by translating traces to features, so we do not have to define features that separate different
activities. Using raw data makes the solution more general in terms of being applicable to other sets
of activities. The disadvantage of this solution, however, is having high dimensional input (number of
samples in each time series) compared to the small number of features that can be used in a feature-based
technique. We use Time Series Classification method in our proposed method.
2.1.3 Model-based Classification
This method assumes that traces from each activity have been generated based on a particular model.
In other words, a time series from each activity fits a model which, for example, could be a mixture of
probability distributions. After obtaining the model, the model parameters will function as features.
Hence when a new trace shows the same behavior as a known activity by fitting its model, it will
![Page 15: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/15.jpg)
Chapter 2. Previous Work 7
be labelled as that known activity. For example, Naive Bayes [30] is a simple probabilistic classifier,
explained as follows. Having features F1, ..., Fn a probabilistic model for their associated class (C) could
be P (C|F1, ..., Fn). The label for a given particular set of features could be found choosing the class with
maximum conditional probability:
argmaxC
P (C|F1, ..., Fn)
We find this model based on Bayes’ theorem:
P (C|F1, ..., Fn) =P (C)P (F1, ..., Fn|C)
P (F1, ..., Fn)(2.2)
In the end we are looking for the class that gives the maximum P (C|F1, ..., Fn) and since P (F1, ..., Fn) is
common for different classes we just look for a class that gives the greatest P (C)P (F1, ..., Fn|C). So, we
just need to calculate the following term, and by assuming that features are conditionally independent
given the class label we will have:
P (F1, ..., Fn|C) = P (F1|C)...P (Fn|C) (2.3)
Although Naive Bayes is easy to implement and provides robust answers, its independence assumption
is not usually accurate [39].
![Page 16: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/16.jpg)
Chapter 3
Preliminaries
In this section, we provide an overview of some of the techniques used in our proposed AR system
along with the implemented localization network. We go over Affinity Propagation clustering, which is a
primary stage for classification. The measure we use for clustering is Dynamic Time Warping Distance.
Other concepts used in the proposed system, such as Random Projection and Multiple Classification
System, are described in this chapter as well. The details of the proposed method are explained in
Section 4.
3.1 Dynamic Time Warping Distance
There are many different measures for calculating distances between time series. The choices vary from a
very simple one, like Euclidean distance, to more complex measures, like Kullback-Liebler [18].
One way to define distances among time series (traces) is by finding the minimum distance between
those traces by matching their samples. This type of distance has many applications in speech recognition
[15], gesture recognition [16, 17] and time series clustering [18].
Some benefits of this type of measure are [18]:
• Minimizes the effect of shifting and time-scale changes on distance .
• Allows for different local elasticities (i.e., if a trace is squeezed in some parts and loosened in others,
the distance will not change).
To calculate this kind of distance, assume that we have two traces, u and v. Let lu and lv indicate
the lengths of their traces, respectively. In order to find the minimum distance, we match samples of
traces together and sum up over the distances between pairs of samples. The mapping from one trace to
8
![Page 17: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/17.jpg)
Chapter 3. Preliminaries 9
another is shown by vector P . The entries of P are pairs of samples that have been mapped to each
other.
P (u, v) = {p1, p2, ..., pL} (3.1)
pl = (i, j), 1 ≤ l ≤ L, 1 ≤ i ≤ lu, 1 ≤ j ≤ lv (3.2)
,where pl = (i, j) means that ui has been mapped to vj . L is the length of the path that maps samples
of u to samples of v.
The mapping between traces should have some certain properties. The most important one is to be
monotonic, meaning that if pl = (i, j) and pl+1 = (i′, j′) we should have i ≤ i′ and j ≤ j′. The concept
of monotonicity essentially means to maintain the time order while matching samples.
There is an optimal monotonic path that matches the samples of u and v together in a way that the
sum of the distances between the mapped samples is minimized. If the traces are short, we can search
among all possible matchings to find the one that gives the minimum distance.
However, for long traces, this search could be very costly in terms of calculation. To solve this problem,
the solution is to use dynamic programming and find the next matching in the path using the current
matching, as done in the Dynamic Time Warping method [10]. Using dynamic programming does not
give the optimal results because it does not use the whole information about samples at the same time.
However, according to its low complexity its been widely used to find the distance between time series.
To see haw how matching samples of one trace to another in a dynamic way is done, assume we want
the distance between u and v:
Step 1: Build a lu × lv matrix called D. Entry Di,j from this matrix represents the distances between
sample i and j from u and v, respectively. To find the distance between the numbers ui and vj (since they
are just numbers [static]), we can use Euclidean, absolute value of difference, or any distance measure
function that outputs non-negative distances. If we choose to utilize Euclidean, then:
Di,j = (ui − vj)2 (3.3)
Now we have the distances between pairs of samples of traces or in short matrix D.
Step 2: Compute DTWi,j for all i and j as shown in Algorithm 1. Note that DTWi,j is the minimum
distance between truncated u and v (i.e., sequence of u1, ..., ui and sequence of v1, ..., vj). At the end the
total DTW distance between traces u and v is:
DTW (u, v) = DTWlu,lv (3.4)
![Page 18: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/18.jpg)
Chapter 3. Preliminaries 10
Initialization : i = 1, j = 1DTW (0, 0) = 0, DTW (1 : lu, 0) =∞, DTW (0, 1 : lv) =∞while j ≤ lv do
while i ≤ lu doDTW (i, j) = D(i, j) +min{DTW (i, j − 1), DTW (i− 1, j), DTW (i− 1, j − 1)}i = i+ 1
endi = 1j = j + 1
endAlgorithm 1: Calculation of DTW distance
DTWlu,lv is actually the sum of distances of pairs of samples that lead to a minimum total distance.
Each mapping can be represented by pl = (i, j) which shows sample number i from u has been matched
with sample number j from v. The vector of this mappings is P which contains all pairs of samples that
lead to DTW (u, v). In order to find the path (P (u, v) = {p1, p2, ..., pL}) that results in the total distance
mentioned in Equation 3.4 we perform the following steps: From the algorithm, we can see that DTW
Data: DTWi,j for 0 ≤ i ≤ lu, 0 ≤ j ≤ lvResult: Finding PInitialization: Set pL = (lu, lv), l = L,;while i ≥ 1 or j ≥ 1 do
pl = (i, j)pl−1 = argmin{DTWi−1,j , DTWi−1,j−1, DTWi,j−1}l = l − 1
endAlgorithm 2: Finding the path that results in DTW (u, v)
puts two more constraints on P (other than monoticity).
• Boundary conditions: The warping path starts from first sample of each trace and ends with the
last samples of each, p1 = (1, 1) and pL = (lu, lv).
• Continuity (Step Size): Given pl = (i, j) and pl+1 = (ui′ , vj′), we should have: i ≤ i′ ≤ i+ 1 and
j ≤ j′ ≤ j + 1.
Adding constraints is a typical step in determining distances. As DTW gives the minimum distance for a
path that meets these constraints, we might come up with smaller distances by allowing for other step
sizes and a different starting and stopping matched pairs. However, we are not interested in gaining the
unconstrained minimum because its introduces much higher computing costs.
![Page 19: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/19.jpg)
Chapter 3. Preliminaries 11
3.2 Affinity Propagation
Affinity Propagation (AP) is a clustering method used mainly for pattern to distinguish between different
trends in sample traces collected by sensors [20]. We call a head of a cluster that represents that cluster,
the exemplar of that group. AP by showing the number of clusters without initialization helps all of the
data points to have the same probability to be chosen as an exemplar [11]. The process of AP clustering
is explained as follows.
Assume that we have a training data set of size (Q) to be clustered. To output clusters, AP requires
to have the similarities between different pairs of data points. For the self similarities (i.e. the similarity
of each data point with itself) we usually input the same value (e.g. the median of pairwise similarities)
to the algorithm. It is optional to provide the algorithm with various self similarities when there is
different preferences for each data point to be an exemplar.
Utilizing these pairwise similarities (i.e. S(p, q) including self similarities [S(p, p)]) AP provides
messages called Availability (A) and Responsibility (R) in an iterative manner, as follow:
First, we initialize the Availability matrix (i.e. matrix of pairwise availabilities and responsibilities,
respectively) with zeros.
Second, we update A and R for all pairs of data points (e.g. p and q) using the proceeding equations
until they converge:
R(p, q) = S(p, q)− maxq′,q′ 6=q
{A(p, q′) + S(p, q′)} (3.5)
A(p, q) = min {0, R(q, q) +∑
p′ /∈{p,q}
max {0, R(p′, q)} (3.6)
A(p, p) =∑
p′,p′ 6=q
max {0, R(p′, q)}
A(p, p′) indicates the suitability for node p′ to be the exemplar for p, compared to other candidate
exemplars. Similarly, R(p, p′) indicates its suitability for trace p to pick p′ as an exemplar, compared to
others’ preference for picking n′ [11].
Next, using A and R matrices, AP finds groups and exemplars as explained in Algorithm 3.
![Page 20: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/20.jpg)
Chapter 3. Preliminaries 12
Data: Responsibilities and AvailabilitiesResult: Exemplars and their clustersEp=arg max
q∈Q{A(q, q) +R(p, q)}
if Ep = p thenTrace number p is an exemplar;
endAlgorithm 3: Identification of exemplars for each data point
3.3 Random Projection
Random projection is a method for dimensionality reduction [12]. From an algebraic point of view, any
matrix with a lower number of rows than columns, when multiplied to a vector, can reduce the size of
that vector. However, in reducing the size of vectors, we do not want to lose the characteristics of that
vector. In other words, if we have two vectors that are close to each other in their initial spaces, we want
them to be close after the mapping as well. Our projection thus has to satisfy a condition known as
Restricted Isometry Property, which will be discussed in 3.3.
Assume that we have a trace u with length lu, Random Projection (RP) is a projection of u obtained
by multiplying it with a random matrix (it satisfies RIP as explained in 3.3). If we name the random
matrix G, with a size f × lu (f � lu), it will be multiplied to the column vector shape of trace u. This
calculation results in mapped u (i.e. u), as shown below:
uf×1 = Gf×lu ulu×1 (3.7)
Each element of a random matrix has a random value obtained from a random distribution. For
example, if we have a random matrix with normal distribution the elements are equal to g with a
probability of 1√2πe−
12 g
2
.
Using RP, one usually aims to reduce the number of samples in a vector. Hence, we usually want
f � lu. So, the result of this mapping (u) is a reduced dimension version of u. The more a signal is
able to be down-sampled, the lower f can get. A measure for an intrinsic dimension of a signal is its
Sparsity Level. Intrinsic dimension shows how short the mapped signal size should be without loosing
much information. The definition of the sparsity level of a signal is explained in [21].
Restricted Isometry Property
Restricted Isometry Property (RIP) is a relaxed form of orthonormal property for mapping sparse signals
that guarantees energy preservation after mapping. When G is multiplied by v, and if v is a sparse vector,
G performs as an approximately orthogonal matrix for projecting v. When RIP holds for G, a small
![Page 21: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/21.jpg)
Chapter 3. Preliminaries 13
constant exists ∆v (related to the sparsity level of v) for which:
(1−∆v)‖v‖22 ≤ ‖Gv‖22 ≤ (1 + ∆v)‖u‖22. (3.8)
To the best of our knowledge, there exists no easy approach that one can check whether RIP holds
for a particular matrix or not. This is an NP-hard problem in general. On the other hand, [14] discusses
that some particular classes of matrices, including Guassian, satisfy the RIP with exponentially growing
probability when the number of rows are growing linearly with the sparsity level of v.
3.4 l1 minimization
Assume that we have an under-determined linear system.
t = Φ θ
When we have more columns than rows in Φ, this equation could have infinite solutions for θ. If for
some reason one is interested in the sparsest θ (or θsparse) from those infinite answers, it will put a limit
on the solutions and a unique answer will emerge. Here, sparsity is a measure of the proportions of
non-zero elements in a matrix to the number of all entries. In order to ensure that we find a proper
sparse answer, we must have an orthogonal φ. We create Φ orthogonal by multiplying the above equation
by a pre-processing factor called W :
W = orth(ΦT )T
Φ† (3.9)
where Φ† is the pseudo inverse of R and orth(Φ) shows an orthogonal basis for Φ and T stands for
transpose. So after multiplying W by our original equation, we will have:
Wt = orth(ΦT )T θ (3.10)
Now having Ψ = orth(ΦT )T and τ = Wt, we are able to look for a sparse solution among solutions for
t = Φθ.
Since sparsity and the value of norm are inversely proportional, we can minimize a p-norm of vector θ
to find the sparsest answer among solutions for θ [13].
Moreover, since we have a linear equation, 1-norm gives the sparsest results and the l1 minimization
problem will be:
θsparse = arg min ‖θ‖1 s.t. τ = Ψθ (3.11)
![Page 22: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/22.jpg)
Chapter 3. Preliminaries 14
3.5 Combination of Multiple Classification Results
When we apply a machine-learning method to noisy data, the results will be corrupted, depending on
how sensitive to noise the classifier is. In [9], it was shown how combining the decisions of Multiple
Classifiers (MC) can help with a more accurate and robust final result. In most cases, a combined answer
outperforms the individual accuracy of each classifier. Here is a brief overview of three common MC
techniques:
Majority Voting (MV)
In this method, we choose the mode of different classifications results. Thus, among different solutions,
we take the most frequent answer [22].
Weighted Majority Voting (WMV)
The same idea as MV applies here, but voting is done after each decision is repeated proportional to its
confidence. Confidence is defined and computed based on the type of classifier. For example, probabilistic
classifiers associate the probability of the final answer as their confidence in that solution [24].
Most Confident Classifier (Naive Base)
In this method, we simply choose the classification result that has the highest confidence in its decision
[25]. Hence, the solution with the highest posterior probability will be chosen.
Behaviour-Knowledge Space (BKS)
In training stage we create a look-up table from outputs of different classifiers decisions and the actual
label for those data points. Next in the testing stage when we come up with a combination of decisions
from our classifiers we find the corresponding label based on the aforementioned table [26].
![Page 23: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/23.jpg)
Chapter 4
Proposed Activity Recognition
System
In this chapter, we propose an activity recognition system that uses a 3-axis accelerometer to address the
user-independent activity detection problem. Activity recognition has applications in human-computer
interaction, health care, indoor localization, and tracking. Different activities result in different trends of
acceleration in x, y and z axes. Utilizing machine learning tools, the system learns special characteristics
associated with a particular action.
However, learning these properties can pose some challenges. First, traces from the same activity
might show different behaviours for various users, which makes the learning process more challenging and
complicated. Second, collecting traces for training purposes or updating the database is costly, in that
it requires users to gather data while performing different activities and manually labelling them. Our
system not only addresses the user dependency problem but also reduces the number of required training
traces. The solution is proposed in this chapter, while the simulation results are provided in Chapter 5.
Moreover, depending on the orientation of the phone and where it has been placed, the recorded
acceleration signals vary. Hence, we consider two different scenarios: fixed phone orientation and unfixed
phone orientation. In the fixed phone scenario, the phone will not be rotated with respect to the user’s
body for various traces. For the unfixed phone scenario, the orientation might vary from one trace to
another. The orientations are determined by the axis that is aligned to gravity, the axis vertical to
walking, and the axis aligned to the walking direction. Sections 4.1 and 4.2 describe the set-ups for each
problem, along with approaches for addressing each.
15
![Page 24: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/24.jpg)
Chapter 4. Proposed Activity Recognition System 16
Training Set (Traces with labels)
Learning Algorithm
Predictive Model Testing Set (Traces without labels)
Expected Labels
Figure 4.1: Supervised learning algorithm
4.1 Fixed Phone Orientation Scenario
In this section, we provide a general overview of the fixed phone case problem and how we address it. In
this scenario, the rotation of the phone is fixed for all traces in different classes. The main application for
this method could be in AR-using wearable sensors, but smart phones mounted on the body or kept at a
certain orientation could also apply to this method.
The general solution for an activity recognition problem is a supervised learning system that leads to
a classification. In other words, the system learns the trends of each activity class and categorizes a new
trace as its associated class. A general classification algorithm is shown in Figure 4.1.
The training traces are collected and labelled by users. These are then used to build a model that
finds the activity label for a new trace, as illustrated in Figure 4.1. The traces are the consecutive linear
acceleration measurements of a user’s phone in x, y and z axes for different activities. As discussed in
section 2.1.2, we use a raw time series in our approach.
Assume that we have labelled traces from different activities. Each trace consists of consecutive
readings from three axes of the accelerometer sensor equipped inside mobile phones. A trace u is a matrix
with lu columns and four rows. The first three rows represent acceleration data from x, y and z directions
and the last row is the corresponding sampling time for its column. lu is the number of samples for that
recording of accelerometer. We have shown examples of these traces for two different activities in 4.2.
Note that these traces have been collected by different users and might even have different lengths.
Our method solves most problems associated with raw data classification such as having traces with
different lengths as well as high dimensional input.
Moreover, users may walk at different paces, which makes some traces look more stretched that
![Page 25: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/25.jpg)
Chapter 4. Proposed Activity Recognition System 17
Figure 4.2: Traces for two different activities
others. These traces are also asynchronous, meaning that the point value could be totally different for
two traces from the same activity. For example, one walking distance might start from a peak positive
acceleration in one axis, while another walking trace might start from a negative value. We address the
asynchronous traces along with the problem of length and elasticity differences by introducing a modified
version of DTW distance. Regarding the high dimensionality problem by Random Projection, we reduce
the number of samples in each trace, as discussed later in this section.
Since we are dealing with time series and aim to use them as raw data, we have to design a classifier
for unprocessed traces. Time series classification, like static data classification, needs a learning algorithm
to learn characteristics of each activity. The choice of the algorithm is determined by characteristics of
the data, along with the complexity and accuracy requirements of the application. The learning and
testing algorithms for the fixed phone scenario are described in sections 4.1.1 and 4.1.2.
4.1.1 Learning Algorithm
In this thesis, for the fixed phone scenario, the training process is done in three consecutive steps, as
illustrated in Figure 4.3.
First, training traces should be smoothed to filter high frequency noise. Then we group all traces
to different clusters. Regarding the measure used for clustering, we use a variation of Dynamic Time
Warping Distance. We explain each step in further detail in this section.
![Page 26: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/26.jpg)
Chapter 4. Proposed Activity Recognition System 18
Figure 4.3: Training Phase
Smoothing
The filter used in our system has a very basic moving average, with a window sliding one sample along
each axis of each trace. This filter assigns the simple average of samples in each frame to the first point
of the window.
The more we want to filter out higher frequency noise, the larger the window size of the filter needs
to be. However, having a large filter window will introduce latency in signal passing. Therefore, there is
a trade-off between noise reduction and waiting time from the first samples arrival to the first smoothed
sample created. Furthermore, significant patterns in the signal should remain after the filter. The window
size is basically found through cross-validation.
There may, however, be cases where the induced latency is not acceptable for that particular application.
In these instances, other filters, like the Double Moving Average Filter or Low Pass Filter, may be used
instead.
We apply the same filter to every trace in order to remove noise and rapid changes due to sensor
faults.
Modified Dynamic Time Warping Distance
A key factor in the clustering process is choosing the right similarity measure to cluster based on it. The
chosen distance measures the different lengths and elasticities. So, the goal here is to find a meter that is
small within the traces of each class and large between traces from different classes. Selecting the right
similarity standard will assist the clustering stage.
The simplest way to accomplish this could be summing up the Euclidean distance between samples of
two traces, but this approach does not work for different lengths and is affected by shifts in traces. Other
options, such as Short Time Series (STS) distance [44], compares the slopes at each point instead of
the actual values, which improves the results of simple Euclidean. Nonetheless, this is also not the best
measure for our case, because each trace is assumed to be a piecewise linear function and hence constraints
are placed on the length of the traces. There are a few other measures with their own assumptions on
data, namely, probability-based distance function [45] J divergence, and symmetric Chernoff information
![Page 27: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/27.jpg)
Chapter 4. Proposed Activity Recognition System 19
(a)
(b)
Time Series Feature
extraction
• Euclidean Distance • Pearson’s correlation
coefficient related distance • Mikowski distance • …
Time Series
• Short time series Distance • Dynamic Time Warping Distance • Dissimilarity based on the cross-
correlation • …
Figure 4.4: Different Distance Measurement Methods
[46, 47]. Each of these approaches likewise has its pros and cons. Figure 4.4 shows a brief overview of
distance measures.
As stated earlier, DTW easily deals with different lengths and local scales of our traces, and is widely
used for discrete sequences of continuous values. Therefore, among all of the measures discussed above,
DTW is the best choice for our traces. That being said, we still need to modify basic DTW to better
match our requirements. So, in this section, we explain our Modified Dynamic Time Warping Distance
(MDTW) and show how it outperforms the regular DTW.
From section 3.1, we remember that to find the DTW distance between two vectors, we first find the
matrix (D) of pairwise Euclidean distances between samples of two traces. Then, using D, we find the
DTW matrix using Algorithm 1. Next, using Algorithm 2, the matching path (P ) is discovered.
AAssume that we have the accelerometer reading along axis x for trace u and trace v, as shown in
Figure 4.8. An example of a DTW matrix is also illustrated in Figure 4.5. Here, colors show the distance
between each two samples of traces u and v. The dark path shows the samples that have been matched
together. The vector of local cost values (Clocal) along this path is illustrated in Figure 4.6.
As explained previously, the DTW algorithm is initialized in such a way that the first mapping will be
between the first samples of the two vectors and the last mapping will be between the last two samples of
them. This may cause large distances in the very first and last mappings, as shown in Figure 4.9. Even
in a case where two vectors are similar with a constant shift, the algorithm needs some iterations to find
the proper mapping. Hence, usually some of the first and last local distances may be large, but these
values do not translate the difference in traces.
To see how total distance could be affected by DTW constraints, we use an example of two walking
![Page 28: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/28.jpg)
Chapter 4. Proposed Activity Recognition System 20
Figure 4.5: DTW matching matrix and path for two traces from activities “walking downstairs” and “walkingupstairs”
Figure 4.6: Local DTW distances along matching path for u and v
![Page 29: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/29.jpg)
Chapter 4. Proposed Activity Recognition System 21
Figure 4.7: Accelerometer readings of axis x for two walking traces
Figure 4.8: DTW matching path for two traces from activity “walking”
traces, showing how their samples have been mapped and indicating the local costs along the matching
paths in Figures 4.7, 4.8 and 4.9, respectively. To depict the pairwise matching in Figure 4.8, one trace
has been shifted up by 1 m/s2 to make the matching more visible.
Asynchronous starts, such as those in Figure 4.7, may lead to large distances, while traces in the same
class should have a small distance. However, as mentioned in section 3.1, the summation of the DTW
distances between pairs of samples that have been matched together is the total distance in the basic
DTW method. We can thus use alternative methods to find the total distance based on Clocal to avoid
adding large values at the beginning and end of the cost vector.
Therefore, we modified DTW to only sum up local distances in the middle part of Clocal. In detail,
instead of Clocal, we use a truncated Cα,β , for which α is the index of the first point at which matching
![Page 30: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/30.jpg)
Chapter 4. Proposed Activity Recognition System 22
Figure 4.9: Local DTW distances along matching path for two walking traces
is done, and β is the index of the last point at which matching is done. To select α and β, we look for
the first and last time steps whose sample values are lower than the average overall sample values in the
same trace.
We can also avoid fixing the beginning and ending of Clocal to the first and last samples by choosing
the mapping with the minimum distance within all possible mappings of a few samples at the beginning
and at the end. However, this approach still has the aforementioned problem of high distances at two
ends of the distance trace. As we increase the number of samples that we search, the complexity increases
exponentially.
So, we modify DTW to only sum up local distance for the mappings in the middle of Clocal (the
mappings that have matched right samples with each other). In this way, the greater distance signifies
the difference in the activities and is not caused by different phases in the beginning of the traces from
the same activity (asynchronous start of traces). Hence, in MDTW, the value of the total distance means
the difference in the activities and is not caused by different phases in the beginning of the traces from
the same activity (asynchronous start of traces).
To see how this modification helps the clustering, we compared DTW and MDTW distances. The
dissimilarities are between pairs of traces from the same class and pairs of traces from two different
classes. Figures showing this comparison are provided in Chapter 5.
Note that, in the Fixed Phone Scenario, we can use the information of each axis individually
and thus can find the distance between each axis of a trace separately. At the end, we will have
three different distances for x, y and z of each trace (i.e., for traces u and v, the outputs are:
MDTW (ux, vx), MDTW (uy, vy) and MDTW (uz, vz)). To sum up, we deploy a modified version
![Page 31: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/31.jpg)
Chapter 4. Proposed Activity Recognition System 23
of DTW distance to find the pairwise distances between acceleration traces along axes x, y and z.
From Distance to Similarity Measure
Regardless of which scenario we are considering, the output of the previous step (i.e., MDTW) should
pass a function to be converted to similarity. There are many ways to do this, the simplest of which is to
use the negative of distances. The most common approach for conversion is through this function:
S(u, v) =1
1 +MDTW (u, v)(4.1)
The particular function to choose depends on the problem set. In our system, we use this equation:
S(u, v) = exp(−MDTW (u, v)) (4.2)
By applying this function to distances in each axis, we come up with three similarity measures.
Calculating this similarity for all pairs of training traces will lead to three similarity matrices (Sx, Sy
and Sz). These similarity matrices are utilized by the clustering algorithm, as explained in the following
section.
Affinity Propagation Clustering
After finding similarities, we feed the similarity matrix into a clustering algorithm to group the traces
with similar behaviours. We need an approach that deals with raw data [18].
Some clustering methods need to know the number of groups beforehand, such as K-means, K-median,
fuzzy c-means and genetic clustering [48]. They iteratively improve a random initial set of exemplars, so
the results are dependent on initiatives. Unlike these methods, in Affinity Propagation, clusters emerge
naturally. In other words, AP provides a general solution that does not rely on manually set parameters
and initializations. Affinity Propagation has the ability to consider all data points as possible exemplars.
Then, by passing some real valued messages, it outputs a number of clusters and head clusters. When
applying AP, there is no need to initialize the exemplars, but the number of clusters can be made greater
or smaller by modifying a factor in the AP process called preferences. We use the median of similarities
for this factor.
Ideally, traces in each cluster share at least one common characteristic. This feature will be used to
find the labels for unknown traces, as explained in Section4.1.2.
![Page 32: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/32.jpg)
Chapter 4. Proposed Activity Recognition System 24
… Cluster 1 Cluster 1
Cluster 2
Cluster 1
Cluster 3 Cluster 1
Cluster Hx Cluster 1
…
Exem
p lars
…
…
…
E1x E2x E3x EHx
Figure 4.10: Final output of learning algorithm
We apply AP to all three similarity measures that are driven from MDTW (ux, vx), MDTW (uy, vy)
and MDTW (uz, vz). The result will be three sets of clusters consisting of Hx, Hy and Hz clusters,
respectively. Moreover, each cluster in each set has an exemplar. The exemplar of cluster Ch, 1 ≤ h ≤ Hx,
is shown by Exh . Figure 4.10 illustrates a schematic for the final output of the learning algorithm.
4.1.2 Testing Algorithm
In the testing stage, we are interested in finding the label of a trace from the testing set, namely t. The
testing process is exactly what we do in the on-line phase with new traces from potentially new users.
First, we pass t through a moving average filter. Then, using the output of the previous stage (clusters
and head-clusters), we find the label for the new trace t. An overview of the testing stage in this scenario
is shown in Figure 4.11.
As depicted, the label recognizer has two main steps. In the first step, the set of exemplars that is
closest to the observed data is selected. In the second step, the goal is to find the label for t based on the
members of the clusters chosen in the first step. The details of these processing steps are provided in the
rest of this section.
![Page 33: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/33.jpg)
Chapter 4. Proposed Activity Recognition System 25
Similarity Measure (MDTW)
Clusters From
Training
Recognize the label
Smoothed Testing Trace
Find the Similar Clusters
Find The Intersection For x, y, z
Majority Vote
Moving average Filter
New Unlabeled Trace
Figure 4.11: Testing Process (fixed phone scenario)
![Page 34: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/34.jpg)
Chapter 4. Proposed Activity Recognition System 26
Finding the Set of Similar Traces to Search In
In our approach, similar to [17], we find close exemplars to the testing trace by defining a threshold
DTWth that was close to the new trace t:
DTWth = ρ× min1≤h≤H
{DTW (Eh, t)} (4.3)
where ρ is a constant that is usually chosen from 1 ≤ ρ ≤ 3. After finding the similarity threshold or
DTWth, the closest exemplars are those that have a distance smaller than DTWth with on-line trace.
In finding similar exemplars, the union of their corresponding clusters will be similar sets or our future
search space:
K = {Ch | DTW (Eh, t) ≤ DTWth, 1 ≤ h ≤ H} (4.4)
In our approach, similar to [17], we find close exemplars to the testing trace by defining a threshold
DTWth that was close to the new trace t:
where α is a constant that is usually chosen from 1 ≤ α ≤ 3. After finding the similarity threshold or
DTWth, the closest exemplars are those that have a distance smaller than DTWth with on-line trace.
In finding similar exemplars, the union of their corresponding clusters will be similar sets or our future
search space:
When our on-line trace t appears to be similar to an exemplar from the training stage, we can then
conclude that this test trace and the exemplar’s cluster have a common characteristic. To sum up, the
label for the new trace will be determined based on traces in clusters that have exemplars close to trace t.
Finding the closest exemplars helps to remove outliers and shrinks the size of the search space, which
consequently reduces the computational complexity and costs in the next stage.
In the fixed phone scenario, each axis gives independent information regarding users movements,
and we use this diversity in our system. Assume the output similar set for different axes are Kx, Ky
and Kz. In other words, we repeat the same process in Equations (4.3) and (4.4) for all three sets with
corresponding axis of t separately.
The process is visualized in Figure 4.12 for one axis, and the same procedure is done for the other
axes of the traces. In this figure, MDTW (Ex2 , tx) is the minimum among all MDTW (Exk , t)’s. Assume
that MDTW (Ex3 , tx) is also lower than the threshold computed in Equation (4.3). So, from Equation
(4.4), we conclude that clusters 2 and 3 have similar members to t in axis x. Φx in this example will be
the union of these two clusters.
We then use these three similar sets (for various axes of t) to find the label of t, as explained in the
![Page 35: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/35.jpg)
Chapter 4. Proposed Activity Recognition System 27
… Cluster 1 Cluster 1
Cluster 2
Cluster 1
Cluster 3 Cluster 1
Cluster Hx
Cluster 1
…
Exem
p lars
…
…
…
Testing Trace (tx)
DTW(E1x,tx) DTW(E2x,tx) DTW(E3x,tx) DTW(EHx,tx)
Search Space
Figure 4.12: Choosing resemblance set (Φx)
![Page 36: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/36.jpg)
Chapter 4. Proposed Activity Recognition System 28
next section.
Recognition Based on Labels of Traces In Similar Spaces of t
The key idea is to find the common traces in Kx, Ky and Kz. The intersection of traces with these traces
in different axes will be O. In O, each axis of the traces is included in the corresponding search spaces.
Thus, when a trace is similar to t in all three axes, it is probably from the same activity. Here, we see
that all traces in O do not have the same label. Because the label of each time series in O is the outcome
of a classifier, we can merge these answers to obtain a more accurate solution. We then find the most
frequent label is among labels of traces in similar sets as the label of t.
However, if we have different numbers of training traces from various activities, we can choose the
most confident answer (i.e., the trace that has the minimum MDTW distance with t) or use the BKS
method to obtain unbiased results.
4.2 Unfixed Phone Orientation Scenario
As mentioned previously, we have two scenarios, of which the details of the training steps and the entire
testing algorithm differ for each. One situation is when the position of a phone is decided for all traces
(e.g., fixed in hand, mounted on body, etc.), and the other allows for rotation of the phone from one trace
to another. The problem is almost identical to what we had in the Fixed Phone case, except that here
we have different phone orientations for different traces. Hence, we cannot use the information for each
axis independently, so we propose our solution for the latter scenario.
4.2.1 Learning Algorithm
The overall learning algorithm is almost the same as for the fixed phone case. The learning process has
three main steps, all of which are done off-line, similar to what is shown in Figure 4.3.
First, a moving average filter is applied to each trace in the training set to remove high frequency
noise. This noise might be caused by accelerometer adjustment or sensor sensitivity. The next step
is to find similarities between each pair of traces in the training set. Since similarity and dissimilarity
(distance) are inversely related, we find similarities through distances. We again use Modified Dynamic
Time Warping Distance as a dissimilarity measure, since we might have different lengths, shifts, and
elasticities from one trace to another.
However, in this scenario, we may capture a certain type of acceleration trend for the walking activity
in axis x, whereas the next time, the same changes in axis y may be due to various rotations of the
![Page 37: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/37.jpg)
Chapter 4. Proposed Activity Recognition System 29
phone. So, to find the real distance, we must first sum up the traces for different axes (the sum of
squares) and then find MDTW distance for the overall trace. Hence, the output of MDTW for this
scenario will be a single dissimilarity value for each pair of traces (i.e., for traces u and v, the output is:
MDTW (utot, vtot) = MDTW ((u2x+u2y +u2z), (v2x+v2y +v2z))). We find the matrix of pairwise distances
between all traces in the training set. Later, the similarities matrix is found using the distance matrix, as
done in (4.2). The similarities will help Affinity Propagation to cluster training traces. Exemplars are
representative of groups from the same set to which they belong. Clustering is again the main stage of
the entire learning process, and its results will be used in the Testing Phase.
The output of this step for this scenario is a set of H clusters, each cluster Ch having a head cluster
Eh, 1 ≤ h ≤ H. To sum up, finding the closest exemplars helps remove outliers and shrink the size of
the search space, thereby reducing both computational complexity and cost.
4.2.2 Testing Algorithm
For the unfixed phone rotation case, we use the output of the previous stage (clusters and head-clusters)
to find the label for the new trace t. However, the recognition part is entirely different from our previous
scenario. To find the label for the new traces, we follow the steps depicted in 4.13. First, we pass the
gesture trace through the same moving average filter we used in the Learning Stage.
By intuition, if each cluster represents a single activity, the label of the closest exemplar is the
answer. However, this technique does not have a good performance compared to similar techniques in the
literature. In controversy, AP does not give an assurance regarding that members of each cluster have the
same label. Moreover, it provides us with upper level features, more than just classifying by the labels.
To address this issue, we deploy a two-step comparison process to find the label for new traces, similar
to [17]. First, we find exemplars that are most adjacent to the new trace. Second, the best match to the
new trace is chosen among the members of clusters selected in the previous step. The new trace will only
be compared to traces in clusters that have close exemplars to trace t. Finding the search space is similar
to section 4.1.2 (except that t gives one similar set for the whole trace t instead of three for each axis of
it). The following parts describe the other steps in greater detail.
Random Projection
The second step in the testing phase is to map all of the similar traces or K, along with the testing trace
t, to another space to reduce the complexity of further comparisons. The random matrix that we are
using follows this distribution for each g in G:
![Page 38: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/38.jpg)
Chapter 4. Proposed Activity Recognition System 30
Similarity Measure (MDTW)
Clusters From
Training
Recognize the label
Smoothed Testing Trace
Find the Similar Clusters
Random Projection
l1-‐minimization
Moving average Filter
New Unlabeled Trace
Figure 4.13: Testing Process (unfixed phone scenario)
![Page 39: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/39.jpg)
Chapter 4. Proposed Activity Recognition System 31
g =√
3.
1 with probability 1
6
0 with probability 23
−1 with probability 16
It has been proven in [42] that this matrix would satisfy RIP condition for sparse vectors projection.
We want this projection to simultaneously make all traces the same size. Hence, if we are using Gf×b,
b and f should be the same for all projections. In order to make all traces have the same size before
projection, we zero pad. The number of zeros is the difference between the current trace and the longest
trace (among all training traces, along with the new testing trace) after projection. Assuming K consists
of K traces with lengths l1, ..., lK the desired size after zero padding is:
b = max {l1, l2, ..., lK , lt} (4.5)
So, the number of zeros to be added for each trace is:
l′t = b− lt
l′k = b− lk , 1 ≤ k ≤ K (4.6)
Zero padding involves l′k zeros at the end of trace k from K, and l′t zeros at the end of t. Note that the
position of the added zeros has no effect on the mapping.
The next step is making a matrix Phi by putting together new zero-padded vectors (φ’s) from K.
We can find a proper f by finding the minimum sparsity level among traces K and t. If we show
sparsity level for traces in K by ηk, 1 ≤ k ≤ K and the sparsity level of t by ηt, f will be:
where β is an integer constant (usually 3 or 4) and η for each trace is found for the overall trace that
is the sum of the square of each axis.
At the end, we multiply G by Φ and find mapped Φ (Φ), and also by t to find t. Later, we use the
mapped traces to find the closest time series to t.
Finding the Closest Trace In The New Space
We find the similar set based on the distances between the overall acceleration traces based on Equation
(3.4). This problem could be formulated as:
![Page 40: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/40.jpg)
Chapter 4. Proposed Activity Recognition System 32
t = G θ + ε (4.7)
where θ is a sparse vector that finds the weight of each trace of Φ in building up trace t (assuming that
the mapping of the new trace is a linear combination of mapped similar set traces). Ideally, we are
looking for a 1-sparse θ that finds the closest trace in Φ to t.
Before the problem is formulated as an l1 minimization, we need to make Φ orthogonal by multiplying
W = orth(ΦT )T
Φ† to Equation 4.7:
τ = W t = Ψ θ + ε′ (4.8)
Now we are able to solve this problem and find a sparse answer for θ. To compare traces, we deploy l1
minimization to find the best matches to the testing trace:
θsparse = arg min ‖θ‖1 s.t. τ = Ψθ + ε′ (4.9)
This problem will provide a sparse answer for θ. The index of the maximum element in θ will show the
trace that is closest to t in Psi:
ψclosest = arg max θsparse (4.10)
And finally:
label(t) = label(ψclosest) (4.11)
![Page 41: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/41.jpg)
Chapter 5
Simulation Results
In this chapter, we discuss our data base properties and provide comparisons for various problems
addressed throughout the thesis. First, as a key factor in the training process, we require a set of examples
of proper system behaviour called labelled data. The proposed machine learning system will first be
trained by the training traces, and then, to see how well they can predict results, we will test them with
the testing database. More details about our Activity Recognition system are described in as follows.
Consider a data base of activities (Q) repeated by different users multiple times. This dataset
includes a number of actions, ranging from simple movements to ones representing letters. Each trace has
information from all three axes of the accelerometer embedded in a smart phone. The first three rows
represent acceleration data from x, y and z directions, and the last row is the corresponding sampling
time for its column.
From this set of activity traces, we choose a portion of them for training and leave the rest for testing.
Even though we have the labels for testing traces, we will use them only for validating our results and
not for finding the predictive model.
Before we go to the AR system simulation, we provide results on MDTW vs DTW Distances as our
main contribution in both fixed and unfixed scenarios. Figures 5.1, 5.2, 5.3 and 5.4 show the calculated
distances for each approach, both among traces of one class and between traces from different activities.
These distance histograms between different and similar classes for DTW and MDTW provide
simulation results that indicate the improvements caused by our modified DTW compared to the regular
DTW. As Figures 5.1, 5.2, 5.3 and 5.4 illustrate, the distances in the same class case are much lower
in the modified case. Besides, classes are still distinguishable due to the vast distances we have in the
modified DTW approach.
33
![Page 42: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/42.jpg)
Chapter 5. Simulation Results 34
Figure 5.1: Distance histograms among two traces from different classes (regular DTW)
Figure 5.2: Distance histograms between two traces within the same class (regular DTW)
![Page 43: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/43.jpg)
Chapter 5. Simulation Results 35
Figure 5.3: Distance histograms among two traces from different classes (modified DTW)
Figure 5.4: Distance histograms between two traces within the same class (modified DTW)
![Page 44: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/44.jpg)
Chapter 5. Simulation Results 36
Figure 5.5: Testing results of the proposed system (Fixed Phone Scenario) and other benchmark AR methods(systems are trained with 10 samples from each 4 classes)
Fixed Phone Orientation Results
The traces used for testing our proposed system in the Fixed Phone scenario are from a very diverse data
base provided by [27]. They include the four activity classes of walking, standing up, walking downstairs,
and walking upstairs. The data base used here is collected by a group of 30 volunteers with an age range
of 19-48 years. The volunteers obtained the data wearing a smart phone (Samsung Galaxy S II) on their
waists. Using the phone’s embedded accelerometer, they collected 3-axial linear accelerations at a pace of
50Hz at an approximate length of 2-2.5 seconds. In the following simulations, the collected dataset has
been randomly divided into two sets, with one set being used for training, and the other for evaluating
the system.
Figures 5.5, 5.6 and 5.7 compare the results of our proposed method with both Decision Tree and
Naive Bayes Algorithms for four, three and two number of classes, respectively. As the results show the
superiority of our method will be more significant as the number of classes increases.
Moreover, the results for our proposed system for the Fixed phone scenario having different number
of classes (2, 3 and 4) have been separately depicted in figure 5.8.
The simulations show a nearly perfect detection of activities for training with only 10 traces from
each class. To observe the impact of the number of training traces, we tested the average accuracy versus
the number of training sets in Figure 5.9.
Simulations show nearly perfect detection of activities for training with only 10 traces from each
class. To observe the impact of number of training traces, we have tested the average accuracy versus the
number of training sets in figure 5.9.
![Page 45: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/45.jpg)
Chapter 5. Simulation Results 37
Figure 5.6: Testing results of of the proposed system (Fixed Phone Scenario) and other benchmark AR methodsfor (system is trained with 10 samples from each 3 classes)
Figure 5.7: Testing results of of the proposed system (Fixed Phone Scenario) and other benchmark AR methodsfor (system is trained with 10 samples from each 2 classes )
![Page 46: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/46.jpg)
Chapter 5. Simulation Results 38
Figure 5.8: Testing results of the proposed system (Fixed Phone Scenario) for different numbers of classes(system is trained with 10 samples from each class)
Figure 5.9: Testing results of the proposed system (Fixed Phone Scenario) for different numbers of trainingtraces (classifying in 4 classes) using 300 features
![Page 47: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/47.jpg)
Chapter 5. Simulation Results 39
Figure 5.10: Testing results of the NB algorithm (Fixed Phone Scenario) for different numbers of training traces(classifying in 4 classes) using 300 features
Figure 5.11: Testing results of the DT algorithm (Fixed Phone Scenario) for different numbers of training traces(classifying in 4 classes)
Moreover, Figures 5.10 and 5.11 provide similar illustration for NB and DT, respectively. The rate
of increasing accuracy with number of training traces in higher for NB and DT since they have lower
accuracies. Specifically DT shows that the more training traces we provide the algorithm with, the
accuracy will be higher which could be a weakness for DT. To sum, our method is surpassing other ones
mainly in lower number of training traces and higher number of classes which makes it low-cost in terms
of data set collection and more practical.
Unfixed Phone Orientation Results
For the unfixed scenario, we use a database that has different orientations in collecting each activity.
This database has been provided by Zhang et al. [28]. They have used a 3-axis accelerometer attached to
![Page 48: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/48.jpg)
Chapter 5. Simulation Results 40
Figure 5.12: Testing results of our proposed method (Unfixed Phone Scenario) for different numbers of classes
the subjects front right hip. The accelerometer sampling frequency is 100Hz and the length of the traces
that we use is about 24 seconds (some were changed). Since they are beneficial for improving indoor
tracking, we use these classes: Walking, Walking Upstairs, Walking Downstairs, and Standing Up.
This data base has been collected by 14 subjects. We used data from all subjects for both training
and testing. The CDF of the resulting accuracies from using this approach is provided in Figure 5.12.
This dataset does not provide feature sets for its traces, we have compared the results from confusion
matrices provided in [49] and [50] for KNN method (K = 10) and SVM for the same dataset. The
systems trained with almost 1600 traces. The overall accuracies are 80.3%, 92.7% 93.2% and for KNN,
SVM and our approach, respectively. To conclude, our approach can provide superior results with respect
to benchmark methods for both Unfixed and Fixed Phone Scenarios.
![Page 49: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/49.jpg)
Chapter 6
Conclusion and Future Work
The main goal for activity recognition is providing information regarding a user’s meaningful movements
for applications like cognitive assistance and human-computer interface.
As smart phones grow ever more ubiquitous, the idea of using their embedded sensors to extract user’s
movement and location information is becoming increasingly popular. For instance, human activities like
standing up and walking stairs can provide details about a user’s location inside a building. Moreover,
detecting activities like walking, running and standing up can help applications related to health care.
The Activity Recognition process inputs some traces with known labels (i.e., activities) and results in
a system that can determine the label for a new trace. These traces can be recorded by motion, sound, or
vision sensors. The trends for these time series are affected by the activity that the user is doing during
the gathering of them. Motion sensors are either mounted on the body (one part or different parts of
body) or are embedded in a mobile device carried by the user. These sensing technologies provide their
own specific accuracy, cost, user comfort, and privacy. Some AR systems use a combination of these
technologies to meet their goals.
Inspired by the availability of motion sensors (especially accelerometers) in smart phones, we proposed
an AR system that uses linear acceleration data from x, y and z axes. The presented system utilizes
our modified version of Dynamic Time Warping to find the similarities needed for the AP algorithm
to complete the training phase. In the testing stage, we use the new unknown trace and the clusters
from AP to find the label for the new time series. The training and testing traces used in our system
have asynchronous starts. However, we propose a method that is compatible to this kind of data. We
also considered two different cases for the phone orientation (whether it is fixed or not). To sum up, in
this thesis, we proposed two approaches to address AR problems in two scenarios namely, fixed phone
41
![Page 50: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/50.jpg)
Chapter 6. Conclusion and Future Work 42
orientation, and unfixed phone orientation.
The proposed AR method was evaluated using two datasets for each scenario. These datasets included
traces from the four activities of walking, walking downstairs, walking upstairs, and standing up. We
further graded the system by comparing its results with Naive Bayes, Decision Tree and the proposed
method with regular DTW.
The results for the fixed phone case show significant differences in accuracy from other benchmark
AR methods. As the number of training traces increases, we reached almost perfect detection. This is
mostly due to changes made in the learning part, especially in the adjusting of DTW. In the fixed phone
scenario, other than utilizing the MDTW mentioned previously, we provided a completely novel testing
stage where we use the diversity of information coming from different axes. To the best of our knowledge,
having separate classifiers for different axes and using Multi-Classifier Systems to combine the results,
has not been used before to detect activities. In the unfixed scenario, using MDTW again helps with
accurate detection of actions.
To conclude, in this paper, we proposed a system that uses only an accelerometer sensor embedded
in a smart phone. Traces used in the learning and testing phases may have different lengths and be
collected by different users. Our contributions in both instances are mainly using modified dynamic time
warping and decision-making based on multiple classifiers outputs.
6.1 Future Work
Future work involves developing a real-time activity recognizer that can label activities inside a multiple-
action trace. For example, if a user is walking and then stops, the enhanced system should find this
change point and detect the transformation in what the user is performing.
Moreover, the implemented localization system can be improved by the results of these two AR
systems. Combining location and AR using inertial sensors on mobile devices with map information and
creating location-specific weighted assistance from a WiFi fingerprinting system can improve localization
and tracking systems.
Another possible extension of this work is solving the problem of tilting. In both scenarios, the
phone’s orientation is fixed during a single trace; while it might change from one trace to another, it
will not change within a single trace. However, using a gyroscope sensor, we may compensate for tilting.
This way, the resultant system is much more convenient.
A distinct interesting topic is to use multiple classification system in the case that we have different
numbers of training traces in each class. Since majority voting is sensitive to variety of number of traces
![Page 51: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/51.jpg)
Chapter 6. Conclusion and Future Work 43
in each class, if we have more traces for training from one class, that particular activity has more chance
to be selected in the MAJ technique. Hence if we are using various numbers of training trace from
activities in data set, we should merge the classification results with other MC methods.
Future work also involves implementing all these systems on a smart phone or other mobile devices
with embedded accelerometers. The system has a reasonable performance computationally, especially in
the testing phase, where resources are more limited. However, to gauge the performance in real life, the
proposed system should be implemented on commonly used personal devices with inertial sensors.
6.2 Contributions
The main contribution in this thesis is the modifications made to regular DTW. These changes help
solve the problem of asynchronous starts for traces from the same class. As well, the preference in AP
clustering is adjusted according to the number of training traces. In the Fixed Phone Scenario, we found
three MDTW matrices of distances between pairs of traces from one axis in each matrix. So, for example,
we have an MDTW matrix for axis x, which contains dissimilarities between readings of x axes for all
traces. Subsequently, we have three different similarity matrices and various search spaces for each one.
Then, using only the intersection of search spaces for each axis, we find the final answer by choosing the
mode for labels of traces in the overlapping set. Note that our training and testing traces have been
collected by different users and might even have different lengths and sampling rates. Also, our special
combination of different tools in both scenarios works with short datasets, meaning it can be trained
with traces that are nearly two seconds long.
![Page 52: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/52.jpg)
Bibliography
[1] Oscar D. Lara and Miguel A. Labrador, “A Survey on Human Activity Recognition us-
ing Wearable Sensors”, IEEE Communications Surveys and Tutorials, 2013, 1192-1209,
http://dx.doi.org/10.1109/SURV.2012.110112.00192.
[2] M. S. Ryoo, “Interactive Learning of Human Activities Using Active Video Composition”, Interna-
tional Workshop on Stochastic Image Grammars (SIG), in Proceedings of International Conference
on Computer Vision (ICCV), Barcelona, Spain, November 2011.
[3] O. X. Schlmilch, B. Witzschel, M. Cantor, E. Kahl, R. Mehmke, and C. Runge, “Detection of posture
and motion by accelerometry: a validation study in ambulatory monitoring, Computers in Human
Behavior, vol. 15, no. 5, pp. 571583, 1999.
[4] Henpraserttae, A; Thiemjarus, S.; Marukatat, S., “Accurate Activity Recognition Using a Mo-
bile Phone Regardless of Device Orientation and Location,” Body Sensor Networks (BSN), 2011
International Conference on , vol., no., pp.41,46, 23-25 May 2011 doi: 10.1109/BSN.2011.8.
[5] Siddiqi MH, Ali R, Rana MS, Hong E-K, Kim ES, Lee S. “Video-Based Human Activity Recognition
Using Multilevel Wavelet Decomposition and Stepwise Linear Discriminant Analysis”. Sensors. 2014;
14(4):6370-6392.
[6] D. Gusenbauer, C. Isert, and J. Krsche. “Self-Contained Indoor Posi- tioning on Off-the-Shelf Mobile
Devices”. In IEEE Indoor Positioning and Indoor Navigation (IPIN), 2010.
[7] H. Ye, T. Gu, X. Zhu, J. Xu, X. Tao, J. Lu, and N. Jin. “FTrack: Infrastructure-free Floor
Localization via Mobile Phone Sensing”. In IEEE Percom, 2012.
[8] V. Radu, M. K. Marina, “HiMLoc: Indoor Smartphone Localization via Activity Aware Pedestrian
Dead Reckoning with Selective Crowdsourced WiFi Fingerprinting, in Proceedings of the International
Conference on Indoor Positioning and Indoor Navigation, 2013.
44
![Page 53: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/53.jpg)
Bibliography 45
[9] J.A. Sez, M. Galar, J. Luengo, and F. Herrera, “Tackling the problem of classification with noisy
data using Multiple Classifier Systems: Analysis of the performance and robustness”, ;presented at
Inf. Sci., 2013, pp.1-20.
[10] Toni Giorgino (2009). “Computing and Visualizing Dynamic Time Warping Alignments in R: The
dtw Package”. Journal of Statistical Software, 31(7), 1-24.
[11] Brendan J. Frey and Delbert Dueck (2007). “Clustering by passing messages between data points”.
Science 315:972-977. doi:10.1126/science.1136800.
[12] Ella Bingham and Heikki Mannila, “Random projection in dimensionality reduction: Applications
to image and text data”, in Knowledge Discovery and Data Mining, 2001, 245-250.
[13] Baraniuk, Richard G. “Compressive sensing.” IEEE signal processing magazine 24.4 (2007).
[14] F. Yang, S. Wang, and C. Deng, “Compressive sensing of image reconstruction using multi-wavelet
transform”, IEEE 2010.
[15] C. Myers, L. Rabiner, and A. Rosenberg, “Performance tradeoffs in dynamic time warping algorithms
for isolated word recognition, Acoustics, Speech, and Signal Processing [see also IEEE Transactions
on Signal Processing], IEEE Transactions on, vol. 28, no. 6, pp. 623635, 1980.
[16] A. Kuzmanic and V. Zanchi, “Hand shape classification using dtw and lcss as similarity measures
for vision-based gesture recognition system, in EUROCON, 2007. The International Conference on
“Computer as a Tool, 2007, pp. 264269.
[17] A. Akl, C. Feng, and S. Valaee, “A novel accelerometerbased gesture recognition system, IEEE
Transactions on Signal Processing, vol. 59, pp. 61976205, Dec. 2011.
[18] T. Warren Liao. 2005, “Clustering of time series data-a survey,” Pattern Recogn. 38, 11 (November
2005), 1857-1874. DOI=10.1016/j.patcog.2005.01.025.
[19] V. Niennattrakul and C. A. Ratanamahatana, “On clustering multimedia time series data using
k-means and dynamic time warping, in Multimedia and Ubiquitous Engineering, 2007. MUE 07.
International Conference on, 2007, pp. 733738.
[20] Brendan J. Frey; Delbert Dueck (2007). “Clustering by passing messages between data points”.
Science 315: 972976. doi:10.1126/science.1136800.
[21] E. Cand‘es and M. Wakin, “An introduction to compressive sampling, Signal Pro- cessing Magazine,
IEEE, vol. 25, no. 2, pp. 21 30, March 2008.
![Page 54: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/54.jpg)
Bibliography 46
[22] V.D. Mazurov, A.I. Krivonogov, V.S. Kazantsev, “Solving of optimization and identification problems
by the committee methods”, Pattern Recognition 20
[23] Hui Liu; Darabi, H.; Banerjee, P.; Jing Liu, “Survey of Wireless Indoor Positioning Techniques and
Systems,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions
on , vol.37, no.6, pp.1067,1080, Nov. 2007 (1987) 371378.
[24] L. Shapley, B. Grofman, “Optimizing group judgmental accuracy in the presence of interdependen-
cies”, Public Choice 43 (1984) 329343.
[25] D.M. Titterington, G.D. Murray, L.S. Murray, D.J. Spiegelhalter, A.M. Skene, J.D.F. Habbema,
G.J. Gelpke, “Comparison of discriminant techniques applied to a complex data set of head injured
patients”, Journal of the Royal Statistical Society, Series A (General) 144 (1981) 145175.
[26] Y.S. Huang, C.Y. Suen “A method of combining multiple experts for the recognition of unconstrained
handwritten numerals” IEEE Transactions on Pattern Analysis and Machine Intelligence, 17 (1995),
pp. 9093.
[27] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. “Human
Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine”.
International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec
2012.
[28] Mi Zhang and Alexander A. Sawchuk, “USC-HAD: A Daily Activity Dataset for Ubiquitous Activity
Recognition Using Wearable Sensors”, ACM International Conference on Ubiquitous Comput-
ing (UbiComp) Workshop on Situation, Activity and Goal Awareness (SAGAware), Pittsburgh,
Pennsylvania, USA, September 2012.
[29] Rokach, L.; Maimon, O. (2005). “Top-down induction of decision trees classifiers-a sur-
vey”. IEEE Transactions on Systems, Man, and Cybernetics, Part C 35 (4): 476487.
doi:10.1109/TSMCC.2004.843247.
[30] McCallum, Andrew; Nigam, Kamal (1998). “A comparison of event models for Naive Bayes text
classification”. AAAI-98 workshop on learning for text categorization 752.
[31] Radu, V.; Marina, M.K., “HiMLoc: Indoor smartphone localization via activity aware Pedestrian
Dead Reckoning with selective crowdsourced WiFi fingerprinting,” Indoor Positioning and Indoor
Navigation (IPIN), 2013 International Conference on , vol., no., pp.1,10, 28-31 Oct. 2013 doi:
10.1109/IPIN.2013.6817916.
![Page 55: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/55.jpg)
Bibliography 47
[32] A. Kushki, K. N. Plataniotis, and A. N. Venetsanopoulos, “Kernel-based positioning in wireless local
area networks”, IEEE Trans. on Mobile Computing, vol. 6, no. 6, pp. 689705, June 2007.
[33] R. Singh, L. Macchi, C. Regazzoni, and K. Plataniotis, “A statistical modelling based location de-
termination method us- ing fusion in WLAN”, Proceedings of the International Workshop Wireless
Ad-Hoc Networks, 2005.
[34] J. Ma, X. Li, X. Tao, and J. Lu, “Cluster filtered KNN: A WLAN- based indoor positioning scheme”,
International Symposium on a World of Wireless, Mobile and Multimedia Networks, pp. 18, June
2008.
[35] X. Golay, S. Kollias, G. Stoll, D. Meier, A. Valavanis, P. Boesiger, “A new correlation-based fuzzy
logic clustering algorithm for fMRI”, Mag. Resonance Med. 40 (1998) 249260.
[36] A. Wismuller, O. Lange, D.R. Dersch, G.L. Leinsinger, K. Hahn, B. Putz, D. Auer, “Cluster analysis
of biomedical image time series”, Int. J. Comput. Vision 46 (2) (2002) 103128.
[37] C. Feng, S. W. A. Au, S. Valaee, and Z. H. Tan, “Orientation-aware localization using affinity
propagation and compressive sensing,” IEEE International Workshop on Computational Advances
in Multi-Sensor Adaptive Processing , CAMSAP, 2009.
[38] Dalton, A; OLaighin, G., “Comparing Supervised Learning Techniques on the Task of Physical
Activity Recognition,” Biomedical and Health Informatics, IEEE Journal of , vol.17, no.1, pp.46,52,
Jan. 2013 doi: 10.1109/TITB.2012.2223823.
[39] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. “Human
Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine”.
International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec
2012c.
[40] Y. S. Lee and S. B. Cho, “Activity recognition using hierarchical hidden markov models on a
smartphone with 3D accelerometer,” in HAIS, pp. 460467, 2011.
[41] Dernbach, Stefan; Das, B.; Krishnan, Narayanan C.; Thomas, B.L.; Cook, D.J., “Simple and
Complex Activity Recognition through Smart Phones,” Intelligent Environments (IE), 2012 8th
International Conference on , vol., no., pp.214,221, 26-29 June 2012.
[42] E. Bingham and H. Mannila, “Random projection in dimensionality reduction: Applications to
image and text data, Proceedings of the Seventh ACM SIGKDD Inter- national Conference on
Knowledge Discovery and Data Mining, pp. 245250, 2001.
![Page 56: by Zhino Youse - University of Toronto T-Space · Introduction Activity recognition (AR) is commanding wide attention nowadays both in the research realm and in relevant industries](https://reader036.fdocuments.in/reader036/viewer/2022070711/5ec979be972d73389648dbf5/html5/thumbnails/56.jpg)
Bibliography 48
[43] Pirttikangas S., Fujinami K.,and Nakajima T. “Feature selection and activity recognition from wear-
able sensors”. In International Symposium on Ubiquitous Computing Systems (UCS2006), 2006.
International Symposium on Ubiquitous Computing Systems (UCS2006), Seoul, Korea, Oct. 11 - 13,
2006, pp. 516-527.
[44] C.S. Mller-Levet, F. Klawonn, K.-H. Cho, O. Wolkenhauer, “Fuzzy clustering of short time series
and unevenly distributed sampling points”, Proceedings of the 5th International Symposium on
Intelligent Data Analysis, Berlin, Germany, August 28-30, 2003.
[45] Mahesh Kumar , Nitin R. Patel , Jonathan Woo, “Clustering seasonality patterns in the presence of
errors”, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery
and data mining, July 23-26, 2002, Edmonton, Alberta, Canada, doi¿10.1145/775047.775129.
[46] Kakizawa, Y., Shumway, R.H. and Taniguchi, N., Discrimination and clustering for multivariate
time series. J. Amer. Stat. Assoc. v93 i441. 328-340.
[47] Dahlhaus, R., “On the Kullback-Leibler information divergence of locally stationary processes”.
Stochastic Process. Appl. v62. 139-168.
[48] T.W. Liao, B. Bolt, J. Forester, E. Hailman, C. Hansen, R.C. Kaste, J. O’May, “Understanding and
projecting the battle state”, 23rd Army Science Conference, Orlando, FL, December 2-5, 2002.
[49] M. Zhang and A. A. Sawchuk. “Manifold learning and recognition of human activity using body-area
sensors”. In IEEE International Conference on Machine Learning and Applications (ICMLA), pages
713, Honolulu, Hawaii, USA, December 2011.
[50] M. Zhang and A. A. Sawchuk. “Motion primitive-based human activity recognition using a bag-
of-features approach”. In ACM SIGHIT International Health Informatics Symposium (IHI), pages
631640, Miami, Florida, USA, January 2012.