SENSOR-BASED PREDICTION OF PHYSICAL ACTIVITY AND ITS ... · SENSOR-BASED PREDICTION OF PHYSICAL...
Transcript of SENSOR-BASED PREDICTION OF PHYSICAL ACTIVITY AND ITS ... · SENSOR-BASED PREDICTION OF PHYSICAL...
SENSOR-BASED PREDICTION OF PHYSICAL ACTIVITY AND ITS IMPACTS
USING MACHINE LEARNING
Alok Kumar Chowdhury Master of Science in Computer Science & Engineering
A Thesis by Publication submitted in fulfilment of the requirements for the degree of
Doctor of Philosophy (PhD)
School of Electrical Engineering and Computer Science
Science and Engineering Faculty
Queensland University of Technology
May 2018
Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning iii
Keywords
Physical activity recognition
Machine learning
Wearable sensors
Relative physical activity intensity prediction
Energy expenditure prediction
Rate of perceived exertion
Deep learning
Ensemble learning
Decision fusion
Feature fusion
Posterior-adapted class-based fusion
Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning v
Abstract
According to the World Health Organisation (WHO), around 60-85% of the worlds’
population are physically inactive. Lack of physical activity (PA) is a key contributor
to the global overweight and obesity epidemic, and in turn, raises the risks of numerous
health issues including heart-diseases, type-2 diabetes, increasing cholesterol, some
cancers, etc. With obesity prevention and promotion of physical activity being ongoing
public health priorities, there is an urgent need for accurate yet practical measures of
physical activity. Physical activity (PA) is defined as bodily movement that is
produced by contraction of skeletal muscle that substantially increases energy
expenditure above resting. Hence, valid and reliable measures of physical activity and
its personal impacts, including relative physical activity intensity and energy
expenditure, are important to track, view, and share with the health practitioners.
Moreover, physical activity measurement tools that provide information and feedback
to end users in real-time provide the opportunity for designing personalised and
adaptive interventions to increase physical activity such as wellness e-coaching.
In recent years, with the rapid development of ubiquitous sensor technologies,
wearable sensors can provide accurate measurement of important movement and
physiological cues related to PA. Wearable sensors are well-received and widely used
by researchers and the general population, which creates the prospect for objective
measurement of PA in studies examining the impacts of physical activity on health.
However, most studies in the PA domain have relied on self-report methods or direct
measures of energy expenditure which requires expensive and bulky equipment,
mostly in lab-based contexts. On the other hand, wearable sensors-based methods are
inexpensive but, by and large, use simple methods like thresholding or simple
regression which often perform poorly.
A multitude of machine learning based PA recognition systems were developed in
recent years. Most of these studies applied a single classifier on the features extracted
from a single accelerometer location. The uses of advanced ensemble machine learning
techniques which have the potential to improve the PA recognition performance are
not thoroughly investigated in this domain. Also, there is a lack of methods to
vi Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning
effectively fuse the multi-accelerometer data to improve the PA recognition
performance from varying sensor positioning. Among sensor-based approaches to
determine the relative intensity, none of them utilised multiple important modalities of
physiological data (including heart rate, electrodermal activity, and temperature) for
relative intensity prediction using machine learning. Advanced machine learning such
as deep learning was not properly explored in the previous literature of energy
expenditure prediction.
The aim of this thesis is to introduce new methods and models to accurately predict 1)
PA type, and 2) its impacts such as relative activity intensity and energy expenditure
in simulated free-living contexts.
The specific contributions of this thesis in the context of “PA type prediction” are:
1. This research systematically compared the PA classification accuracy achieved
by conventional ensemble methods (bagged decision tree, boosted decision
tree, and random forest) and a custom multi-classifier ensemble combining
four machine learning algorithms (binary decision tree, k-nearest neighbour,
support vector machine, and neural network) using three decision fusion rules
(weighted majority voting, Naïve Bayes, and behaviour knowledge space).
Performance was evaluated in three independent PA recognition datasets. The
results revealed that combining multiple individual classifiers using ensemble
learning methods can improve activity recognition accuracy from wrist-worn
accelerometer data.
2. A novel posterior-adapted class-based weighted decision fusion was proposed
to effectively combine multiple accelerometers data for improving physical
activity recognition. The fusion was applied in two and three accelerometer
location combinations. The results identified that the proposed decision fusion
was superior to the other state-of-the-art fusion algorithms, and that a two-
accelerometer combination (wrist and ankle) provided the best PA recognition
performance.
Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning vii
The specific contributions of this thesis in the context of “estimation of impacts of PA”
are:
3. This research developed both regression and classification models to
effectively predict the relative PA intensity using multimodal physiological
sensor data (heart-rate, RR-interval, Eda and Temp). The experiments were
based on a real-world (non-laboratory) and longitudinal dataset, collected from
22 people, where Borg’s RPE scale was used as a ground truth measure of
relative intensity. The results showed that features extracted from RR-interval
provided the highest prediction performance compared to any other single
modality. However, the combination of Eda and Temp features fused with RR
features produced the best overall performance, confirming the benefits of
using multi-modal data.
4. This research is the first study that proposes the use of deep learning to
effectively predict energy expenditure from body worn accelerometers in pre-
school-aged children. It also systematically compares the deep learning
approach to conventional supervised machine learning and simplified
regression approaches in different accelerometer location configurations
including wrist and hip. The results show that deep learning can achieve a
comparable performance to the conventional supervised learning, and
significantly outperformed the simplified regression approaches.
All of the proposed methods were validated using a diverse range of datasets collected
from different participant groups (adults and children) performing different physical
activities in different contexts (laboratory-based vs outdoors).
These methods collectively deliver better algorithms and maximise the use of available
sensor information to provide accurate measurement of PA type, relative PA intensity,
and energy expenditure. It will provide better quality of information to the users and
health practitioners, which is useful to provide personalised and adaptive PA
recommendations.
Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning ix
Table of Contents
Keywords ................................................................................................................................ iii
Abstract .....................................................................................................................................v
Table of Contents .................................................................................................................... ix
List of Figures ....................................................................................................................... xiii
List of Tables ..........................................................................................................................xv
List of Abbreviations ........................................................................................................... xvii
List of Publications ............................................................................................................... xix
Statement of Original Authorship ......................................................................................... xxi
Acknowledgements ............................................................................................................. xxiii
Chapter 1: Introduction ...................................................................................... 1
1.1 Background and Motivation ...........................................................................................1
1.2 Research Problem ...........................................................................................................3
1.2.1 Lack of Methods That Use and Compare Advanced Ensemble Learning
Algorithms ............................................................................................................3
1.2.2 Lack of Decision Fusion Methods to Combine Multi-Accelerometer Data .........3
1.2.3 Lack of Methods That Use Multi-Modal Data for Relative Intensity
Prediction..............................................................................................................4
1.2.4 Lack of Methods That Use Deep Learning for Energy Expenditure
Prediction..............................................................................................................5
1.3 Research Aims and Objectives .......................................................................................6
1.4 Research Framework ......................................................................................................7
1.5 Contributions of the Thesis .............................................................................................8
1.6 Significance ..................................................................................................................11
1.7 Thesis Outline ...............................................................................................................12
Chapter 2: Literature Review ........................................................................... 13
2.1 Physical Activity and Health ........................................................................................13
2.1.1 Health Benefits of Physical Activity ..................................................................13
2.1.2 Guidelines of Physical Activity ..........................................................................13
2.1.3 Risks Associated with Physical Activity ............................................................14
2.2 Personal Impacts from Physical ActivitY .....................................................................16
2.2.1 Relative Intensity of Physical activity ................................................................16
2.2.2 Energy Expenditure ............................................................................................19
2.3 Sensor-based methods to Measure physical Activity and its impacts ..........................21
x Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning
2.3.1 Wearable Sensors ............................................................................................... 21
2.3.2 Data Pre-processing ........................................................................................... 22
2.3.3 Feature Extraction .............................................................................................. 22
2.3.4 Feature Selection................................................................................................ 27
2.3.5 Learning Algorithms .......................................................................................... 28
2.3.6 Effect of Sensor Number, Positioning, and Combination on the
Performance ....................................................................................................... 35
2.4 Summary of Current Gaps ........................................................................................... 37
PART I - Classification of Physical Activities ....................................................... 39
Chapter 3: Ensemble Methods for Classification of Physical Activities from
Wrist Accelerometry ................................................................................................ 41
3.1 ABSTRACT ................................................................................................................. 44
3.2 Introduction .................................................................................................................. 45
3.3 Methods ........................................................................................................................ 48
3.3.1 Datasets .............................................................................................................. 48
3.3.2 Classification Framework .................................................................................. 50
3.3.3 Conventional Ensemble Methods ...................................................................... 52
3.3.4 Custom Ensemble Methods ............................................................................... 53
3.3.5 Decision Fusion Techniques .............................................................................. 53
3.3.6 Performance Evaluation ..................................................................................... 54
3.4 Results .......................................................................................................................... 56
3.4.1 Dataset #1 Results .............................................................................................. 56
3.4.2 Dataset #2 Results .............................................................................................. 56
3.4.3 Dataset #3 Results .............................................................................................. 57
3.4.4 Statistical Comparison ....................................................................................... 58
3.5 Discussion .................................................................................................................... 60
3.6 Acknowledgements ...................................................................................................... 64
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-
based Fusion of Multi-Accelerometers Data .......................................................... 65
4.1 ABSTRACT ................................................................................................................. 68
4.2 Introduction .................................................................................................................. 69
4.3 Related Work ............................................................................................................... 71
4.4 Methods ........................................................................................................................ 73
4.4.1 Pre-processing .................................................................................................... 73
4.4.2 Feature Extraction .............................................................................................. 73
4.4.3 Normalisation & Feature Selection .................................................................... 73
Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xi
4.4.4 Classification Algorithms ...................................................................................74
4.4.5 Decision Fusion Techniques...............................................................................75
4.5 Experiment ....................................................................................................................78
4.5.1 Datasets ..............................................................................................................78
4.5.2 Implementation of the Framework .....................................................................79
4.5.3 Evaluation Approach and Metrics ......................................................................81
4.6 Results and Discussion .................................................................................................82
4.6.1 Evaluation of Classification Algorithms ............................................................82
4.6.2 Evaluation of Different Fusion Techniques .......................................................82
4.6.3 Activity-Wise Classification Performance .........................................................84
4.6.4 Subject-Wise Classification Performance ..........................................................86
4.6.5 Confusion Matrices ............................................................................................87
4.7 Conclusion ....................................................................................................................89
PART II - Estimation of Impacts of Physical Activities ....................................... 91
Chapter 5: Towards Non-Laboratory Prediction of Relative Physical
Activity Intensities from Multimodal Wearable Sensor Data.............................. 93
5.1 ABSTRACT .................................................................................................................96
5.2 Introduction ..................................................................................................................97
5.3 Dataset Collection .........................................................................................................99
5.4 Methods ......................................................................................................................101
5.4.1 Pre-processing ..................................................................................................101
5.4.2 Feature Extraction and Selection ......................................................................101
5.4.3 Regression Algorithms .....................................................................................102
5.5 Performance Evaluation ..............................................................................................103
5.6 Experimental Results and Discussion .........................................................................104
5.6.1 Performance from Using a Single Modality .....................................................104
5.6.2 Performance from Using Multiple Modality ....................................................104
5.7 Conclusion ..................................................................................................................106
Chapter 6: Prediction of Relative Physical Activity Intensity Using
Multimodal Sensing of Physiological Data .......................................................... 107
6.1 ABSTRACT ...............................................................................................................110
6.2 Introduction ................................................................................................................111
6.3 Methods ......................................................................................................................114
6.3.1 Participants .......................................................................................................114
6.3.2 Protocol ............................................................................................................114
6.3.3 Data Acquisition ...............................................................................................114
xii Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning
6.3.4 Relative Intensity Prediction System ............................................................... 115
6.4 Results ........................................................................................................................ 119
6.4.1 Relative Intensity Classification from a Single Modality ................................ 119
6.4.2 Feature Fusion Results ..................................................................................... 119
6.4.3 Decision Fusion Results ................................................................................... 121
6.4.4 Statistical Comparison ..................................................................................... 123
6.5 Discussion .................................................................................................................. 124
6.6 Acknowledgements .................................................................................................... 127
Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School
Children ........................................................................................................ 129
7.1 ABSTRACT ............................................................................................................... 132
7.2 Introduction ................................................................................................................ 133
7.3 Data Collection and Pre-Processing ........................................................................... 135
7.3.1 Data Collection ................................................................................................ 135
7.3.2 Pre-Processing ................................................................................................. 136
7.4 Methods ...................................................................................................................... 136
7.4.1 Deep Learning Approach ................................................................................. 137
7.4.2 Conventional Supervised Learning Approach ................................................. 140
7.4.3 Simplified Approach ........................................................................................ 143
7.5 Performance Evaluation ............................................................................................. 143
7.6 Results and Discussion ............................................................................................... 144
7.6.1 Evaluation of Deep Learning Approach .......................................................... 144
7.6.2 Evaluation of Conventional Approach ............................................................. 145
7.6.3 Evaluation of Simplified Regression Approach .............................................. 145
7.6.4 Comparison of Approaches ............................................................................. 145
7.7 Conclusion ................................................................................................................. 147
7.8 Acknowledgment ....................................................................................................... 148
Chapter 8: Conclusion and Future Work ...................................................... 149
8.1 Summary of Achievements ........................................................................................ 149
8.2 Limitations ................................................................................................................. 151
8.3 Future Work ............................................................................................................... 152
Bibliography ........................................................................................................... 153
Appendices .............................................................................................................. 173
Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xiii
List of Figures
Figure 1.1 The framework of this research .................................................................. 7
Figure 2.1 Borg’s RPE scale ...................................................................................... 18
Figure 2.2 OMNI scale of perceived exertion (adult) for cycling.............................. 18
Figure 3.1 Flow diagram of the proposed framework................................................ 51
Figure 4.1 Overview of the system developed for implementing the framework ..... 80
Figure 4.2 For a given test instance (x), predicting the final label by fusing the
decisions from accelerometer sensors using weights................................... 80
Figure 4.3 Average F1-Score comparison for model-based, class-based and
posterior-adapted class-based decision fusion with the PAMAP2
dataset .......................................................................................................... 83
Figure 4.4 Average F1-Score comparison for model-based, class-based and
posterior-adapted class-based decision fusion with the MHEALTH
dataset .......................................................................................................... 83
Figure 4.5 Average F1-Scores of all single and possible accelerometer
combinations across different subjects. Error bars represent 95%
confidence intervals. (*) indicates statistical significance (p < 0.05) .......... 86
Figure 5.1 Borg’s Rating of Perceived Exertion (6-20) scale .................................. 100
Figure 5.2 Prediction performances of single modality models .............................. 104
Figure 6.1 F1 Scores for all combinations of modalities using feature fusion; ....... 120
Figure 6.2 The confusion matrix for the best combinations in each classifier ........ 121
Figure 6.3 Scores for all combinations of modalities using decision fusion; .......... 122
Figure 7.1 The transformed representation of a) a single 3-axis accelerometer
window, b) a combination of two accelerometer’s windows .................... 138
Figure 7.2 The CNN architecture used in this study. ............................................... 139
Figure 7.3. Conventional approach design for each accelerometer location and
a combination ............................................................................................. 141
Figure 7.4 Results of the approaches for each accelerometer location and a
combination using (a) RMSE, and b) R2 ................................................... 144
Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xv
List of Tables
Table 2.1 Overview of the extracted features (ordered by author name) ................... 24
Table 2.2 Overview of some wearable sensor-based works that used machine
learning algorithms (ordered by author name) ............................................. 31
Table 3.1 Comparison across three datasets .............................................................. 50
Table 3.2 Classification results (F1-Score) using wrist acceleration sensor of
dataset #1 ..................................................................................................... 56
Table 3.3 Classification results (F1-Score) using wrist acceleration sensor of
dataset #2 ..................................................................................................... 57
Table 3.4 Classification results (F1-Score) using wrist acceleration sensor of
dataset #3 ..................................................................................................... 58
Table 4.1 List of features extracted from each window of an accelerometer ............ 74
Table 4.2 Average F1-scores for each classification model across both datasets ...... 82
Table 4.3 F1-scores for single and all possible combinations of accelerometer
sensors in PAMAP2 dataset ......................................................................... 84
Table 4.4 F1-scores for single and all possible combinations of accelerometer
sensors in MHEALTH dataset ..................................................................... 85
Table 4.5 Confusion matrix for ankle and wrist combination (A+W) in
PAMAP2 dataset .......................................................................................... 87
Table 4.6 Confusion matrix for ankle and wrist combination (A+W) in
MHEALTH dataset ...................................................................................... 88
Table 5.1. Prediction performances of models developed from the combination
of modalities............................................................................................... 105
Table 6.1 Feature set extracted from each sensor modality ..................................... 116
Table 6.2 F1-scores of five different modalities using three classifiers ................. 119
Table 7.1 List of Activities and Their Description .................................................. 135
Table 7.2 List of features extracted from each window of an accelerometer .......... 142
Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xvii
List of Abbreviations
Abbreviation (Alphabetically Sorted)
ACC Accelerometer Data
ANN Artificial Neural Network Classification
BDT Binary Decision Tree Classification
BKS Behaviour Knowledge Space Combiner
CFS Correlation-based Feature Selection
CNNR Convolutional Neural Network Regression
DNN Deep Neural Network Classification
Eda Electrodermal activity
EE Energy Expenditure
HR Heart-Rate
kNN k Nearest Neighbour Classification
MLR Multiple Linear Regression
MRMR Minimal Redundancy Maximum Relevance
NB Naïve Bayes Combiner
NNR Neural Network Regression
PA Physical Activity
RF Random Forest Classification
RPE Rate of Perceived Exertion
RR R-R Interval
SVM Support Vector Machine Classification
SVMR Support Vector Machine Regression
Temp Body Temperature
WMV Weighted Majority Voting
Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xix
List of Publications
List of Q1 Journal Papers
1) A. K. Chowdhury, D. Tjondronegoro, V. Chandran, and S. G. Trost, "Ensemble
Methods for Classification of Physical Activities from Wrist Accelerometry,"
Medicine and Science in Sports and Exercise, vol. 49, no. 9, p. 1965, 2017. (Accepted)
2) A. K. Chowdhury, D. Tjondronegoro, V. Chandran, and S. G. Trost, "Physical
activity recognition using posterior-adapted class-based fusion of multi-
accelerometers data," IEEE Journal of Biomedical and Health Informatics, 2017.
(Accepted)
3) A. K. Chowdhury, D. Tjondronegoro, V. Chandran, J. Zhang, and S. G. Trost,
"Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of
Physiological Data," Plos One, 2018. (To be Submitted Soon)
4) A. K. Chowdhury, D. Tjondronegoro, J. Zhang, M. Hagenbuchner, C. Dylan, and S.
G. Trost, “Deep Learning for Energy Expenditure Prediction in Preschool Children,”
IEEE Journal of Biomedical and Health Informatics, 2018. (Under Revision)
List of Conference Papers
5) A. K. Chowdhury, D. Tjondronegoro, J. Zhang, P. S. Pratiwi, and S. G. Trost,
"Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from
Multimodal Wearable Sensor Data," in Proceedings of the 1st IEEE Life Science
Conference, 2017: IEEE. (Accepted)
6) A. K. Chowdhury, A. Farseev, P. R. Chakraborty, and V. Chandran, “Automatic
Classification of Physical Exercises from Wearable Sensors using Small Dataset from
Non-Laboratory Settings,” in Proceedings of the 1st IEEE Life Science Conference,
2017: IEEE. (Accepted)
7) A. K. Chowdhury, D. Tjondronegoro, J. Zhang, M. Hagenbuchner, C. Dylan, and S.
G. Trost, “Deep Learning for Energy Expenditure Prediction in Preschool Children,”
in International Conference on Biomedical and Health Informatics (BHI), 2018:
IEEE-EMBS. (Accepted as 1-page Abstract for Poster Presentation)
Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xxi
Statement of Original Authorship
The work contained in this thesis has not been previously submitted to meet
requirements for an award at this or any other higher education institution. To the best
of my knowledge and belief, the thesis contains no material previously published or
written by another person except where due reference is made.
Signature:
Date: ____01/05/2018____________
QUT Verified Signature
Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xxiii
Acknowledgements
This PhD is attributed to the immense support received from numerous QUT
members. I would like to thank my supervision team – including Dr Jinglan Zhang
(Principal Supervisor), Professor Stewart Trost (Associate Supervisor), Professor Dian
Tjondronegoro (Associate Supervisor), and Professor Vinod Chandran (Associate
Supervisor). I am tremendously grateful to all of my supervisors for their endless
support throughout my PhD study.
I would like to express my gratitude to Dr Jinglan Zhang for accepting me as her
student from third year of my PhD. She helped me a lot regarding my technical
knowledge, writing and presenting the results. I am so fortunate to have Professor
Stewart Trost as my supervisor. He actively helped me a lot with designing the
methods, finding gaps, writing the works. He participated in every weekly meeting,
closely monitored my progress, and provided in-depth feedback to improve my work.
Professor Dian Tjondronegoro has been a great mentor towards my PhD journey and
beyond. He was my principal supervisor until he left QUT. I have learnt a lot from him
about writing, research, and professional knowledge. He always brought helpful
connections with the right people during my candidature. Professor Vinod Chandran
always provided on-time and valuable feedback on my work. He helped me a lot with
the technical issues, methods and writing.
I acknowledge the financial support received from QUT, via QUT’s
postgraduate research scholarships (QUT-PRA), and excellence top up scholarships. I
am thankful to Institute of Health and Biomedical Innovation (IHBI), Information
Systems School and School of Electrical Engineering and Computer Science for all
their support throughout my journey. A sincere thank you to Tim Mcsweeney for his
diligent proofreading of this thesis.
My heartfelt thanks go to my parents and my wife for their steady mental
support, consideration, patience, help, and encouragement. This work couldn’t have
been completed without their pure devotion, sacrifice and continued prayers towards
my success. Their encouragement kept me going when there seemed to be no way
forward.
xxiv Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning
Finally, I want to thank all my colleagues in QUT. I express my gratitude to all
the anonymous students and staff of QUT who helped in my research with their
participation in my studies.
Chapter 1: Introduction 1
Chapter 1: Introduction
1.1 BACKGROUND AND MOTIVATION
Physical inactivity has been identified as the fourth leading risk factor for global
mortality [1]. It is one of the key contributors to the overweight and obesity epidemic
in the worlds’ population, and is considered as one of the major contributors to chronic
diseases such as heart-disease, type-2 diabetes, depression, and some cancers [2-5].
According to the Australian Institute of Health and Welfare (AIHW), overweight and
obesity is also continuing to rise, and is considered to be the second highest contributor
to disease likelihood in Australia. The rapid development of technologies in both
developed and developing countries makes people’s life easier and enjoyable, but
unfortunately it has removed opportunities for physical activity during work and
leisure. Physical activity (PA) is good for people’s health and well-being as it helps
one to manage weight, control blood pressure, improve cardiovascular fitness, and
manage stress. There is strong evidence to show that regular physical activity
dramatically improves quality of life as it contributes to physical health and is effective
for the improvement of the mental state as well, such as decreasing depression or
anxiety [6-8]. Physical activity is associated with positive health outcomes among
multiple age groups including children, adults, and the elderly regardless of their
physical status/fitness [9, 10].
Despite the obvious benefits of PA, around 60 to 85% of worlds’ population
don’t meet the recommended minimum PA level [11]. Uncertainty with respect to the
correct frequency, intensity, and duration may be one important factor that contributes
to poor exercise adherence. Performing PA without knowing the correct dose can lead
to over- or under-exercising, which reduces the safety and effectiveness of the PA [12].
Due to different levels of physical fitness, the impacts from a given physical activity
may vary among peoples [13]. There are also other reasons such as busy lifestyle, lack
of social support, knowledge, facilities, motivation, etc. for which people are not
encouraged to follow the policies and community level initiatives [14]. For example,
the World Health Organization [15] recommends that adults, 18-64 years, perform at
least 150 minutes of moderate-intensity or 75 minutes of vigorous-intensity physical
exercise weekly. Unless a person realises the benefits of PA, and learns the contexts
2 Chapter 1: Introduction
(e.g., type, amount, intensity, calorie, place, etc.), it remains challenging to increase
people’s adherence to PA.
A self-monitoring tool that can objectively and accurately measure PA and its
personal impacts (relative intensity, and energy expenditure) is needed to monitor
compliance with PA guidelines and help people to perform PA at the correct intensity.
Nowadays, wearable and ambient sensors have rapidly been adopted by the general
population due to the development of integrated sensor technologies such as
miniaturisation, improved user experience design, and low battery consumption. These
modern sensors can be used to collect real-time movement/ acceleration and
physiological data from the users efficiently and unobtrusively. Using this modern
technology, an automated system (learning model) could be developed that can
accurately track the physical activity type, intensity, and energy expenditure and
display it to users as a daily summary [16]. Such objective tracking could increase PA
awareness, management, and promote long-term life-style changes [17], and in turn,
reduce illness and health management cost.
Physical activity usually manifests through users’ movement and physiological
responses such as heart-rate, respiration rate, skin conductance, volume of oxygen
consumption, body temperature, etc. [18, 19]. For example, accelerometer sensors can
record the acceleration or deceleration of the body which can be utilised in a learning
model for objective and direct prediction of PA. Heart-rate, skin-conductance, and
body temperature also have a strong relationship with PA type and intensity, which
can be used to personalise PA estimation [18]. A multitude of studies have been carried
out by utilising these data, captured by wearable sensors, to estimate PA classes and
energy expenditure. However, most of the approaches are either lab-based or use
simple learning algorithms which often perform poorly out of the lab. Moreover, in
general, physiological sensor data requires person-level calibration which negatively
impacts the applicability in real-world contexts [20-22].
This thesis focuses on the use of advanced machine learning and novel fusion
algorithms to improve the measurement of physical activities and its impacts on a
person such as relative intensity and energy expenditure.
Chapter 1: Introduction 3
1.2 RESEARCH PROBLEM
Few studies in the PA domain have explored methods to maximise the use multi-
modal and multi-body-positioned sensor data to improve the prediction of PA and its
personal impacts. Most of the previous methods are lab-based and used simple
learning algorithms for PA recognition [17]. The physiological modalities such as
electrodermal activity, body temperature etc., and multi-accelerometer combinations
warrant further investigations.
1.2.1 Lack of Methods That Use and Compare Advanced Ensemble Learning Algorithms
The use of machine learning for the recognition of physical activity and energy
expenditure have gained considerable research attention in the past few years [23].
Machine learning methods usually extract features in the data and then use supervised
or unsupervised learning algorithms to predict physical activity type and/or energy
expenditure. This approach usually involves training a single learning algorithm such
as support vector machine, neural networks, or random forest. Machine learning
approaches have been shown to provide better prediction for a greater variety of
physical activity metrics (e.g., activity type, walking speed) and energy cost, compared
to a non-machine learning approach such as thresholding (also known as cut-point)
[24, 25]. The use of machine learning is diverse, and most studies are inconsistent in
terms of finding a single learning algorithm which performs well across datasets. The
differences in the data processing methods and the problem of not having
generalisation have hindered research efforts to quantify, understand and intervene on
physical activity and sedentary behaviour. In order to increase generalisation and
overall performance, ensemble learning algorithms which use multiple learning
models and appropriately combine them to get the most out of each model are gaining
popularity. However, the ensemble learning algorithms are not properly used and
systematically compared in the PA domain.
1.2.2 Lack of Decision Fusion Methods to Combine Multi-Accelerometer Data
Combining data from multiple accelerometers, placed at different body
locations, has been found to be effective in improving the accuracy of PA recognition
[26, 27]. Cleland, et al. [28] showed that the placement of accelerometer sensors on
different body locations affects the PA prediction performance. For example, while
the wrist placement is a comfortable position for the users and preferable for
4 Chapter 1: Introduction
recognition of simple activities (e.g., running, standing, etc.) [29], other locations such
as the hip, ankle, and chest demonstrate good performance for certain groups of
activities (e.g., lying, running, standing, etc.) [26, 28, 30]. Acceleration data from
multiple locations can be combined using feature- or a decision-level fusion approach
[31]. Decision-level fusion has been found to be more accurate than feature fusion in
other domains [32]; however, it has not been systematically investigated for PA
recognition. It is also important to determine the best combination of sensor
placements for optimal PA recognition.
1.2.3 Lack of Methods That Use Multi-Modal Data for Relative Intensity Prediction
To date, research efforts to quantify PA intensity from wearable sensors have
predominantly been based on absolute intensity [32-34]. Because such estimates do
not consider an individual’s aerobic fitness, age or health status, the predicted intensity
of PA could be above accepted thresholds for moderate-to-vigorous PA (MVPA), As
such, m-Health platforms using wearable sensor systems to monitor the intensity of
PA could be encouraging individuals to exercise at relative intensities that are neither
safe nor effective. Thus, the development of validated algorithms to predict relative
PA intensity from wearable sensor data constitutes an important research priority.
An alternative approach to measuring relative intensity that does not require
instrumentation or individual calibration in the laboratory is the use of effort
perception or ratings of perceived exertion (RPE). Effort perception scales such as the
Borg alpha-numeric RPE Category Scale are commonly used in exercise testing and
prescription contexts and have been shown to be a valid and reliable indicator of
relative PA intensity [35-38]. Yet, despite the widespread use of RPE for effort
estimation, the utility of algorithms to predict relative PA intensity based on RPE has
not been explored.
Because of the linear relationship between HR and work rate during steady state
exercise, heartrate based methods are mostly adopted for quantifying the relative
intensity of PA, but they are not effective for low relative intensities [39]. Moreover,
these approaches require knowledge of HR max for which commonly used age-related
prediction equations are subject to considerable measurement error [40, 41]. In
addition to heartrate, some other modalities of physiological data, including
electrodermal activity (Eda) and body temperature (Temp), can be easily obtained
Chapter 1: Introduction 5
using wearable sensors. These physiological indicators can provide valuable
information about the metabolic demand of exercise and can also be used to predict
relative PA intensity. However, to the best of my knowledge, the use of multiple
modalities of physiological data for relative intensity prediction has not been
previously investigated.
1.2.4 Lack of Methods That Use Deep Learning for Energy Expenditure Prediction
To date, few studies in the PA domain have used advanced machine learning
algorithms such as deep learning to predict physical activity outcomes such as energy
expenditure (EE) [42]. In most energy expenditure prediction applications, the wealth
of data generated from the sensors has not been thoroughly utilised and predictions are
based on simple linear regression [43]. Such usages have shown poor performance
with large prediction errors [43]. The use of supervised machine learning for EE
prediction has emerged as a viable and more accurate alternative to simple linear
regression [44, 45]. Conventional machine learning involves manual extraction of
features, feature selection, and applying regression algorithms such as artificial neural
networks (ANN) [46-48], ensemble decision trees [49], and support vector machine
[50]. The performance of regression depends on the quality and number of features,
which requires domain knowledge for feature extraction and sophisticated feature
selection algorithms. Often, the same extracted features do not perform equally well
in different studies [51]. To eliminate the need of manual feature selection, deep
learning is gaining popularity and has demonstrated superior performance in other
domains [52]. Preliminary work has established deep learning as a viable strategy to
predict EE from accelerometer output in adults [42]; however, no previous study has
used deep learning to predict EE in preschool-aged children. Child specific models
are needed given pre-schoolers’ unique movement patterns and greater energy cost of
locomotion [53-55].
Most previous studies generally used their own datasets, with no validation of
results across variations of datasets, size of datasets and activity type selections.
Therefore, the performances of existing algorithms have been found to be inconsistent
and dependent on the sample used to generate the training data and the activity targets
under investigation [56].
6 Chapter 1: Introduction
1.3 RESEARCH AIMS AND OBJECTIVES
The overarching aim of this research is: To develop learning models and fusion
algorithms that can maximise the use of multi-modal and/or multi-positioned
wearable sensors data to accurately predict 1) classes of physical activity and 2) its
personal impacts (including relative physical activity intensity and energy
expenditure).
This aim gives rise to the following research objectives:
1) To develop and evaluate an advanced custom ensemble learning algorithm
for PA recognition and compare with conventional ensemble methods and
standalone state-of-the-art supervised learning algorithms.
2) To design a new decision fusion algorithm that can effectively combine
multiple accelerometers placed in different body locations for improving the
PA recognition. And identify the optimal sensor positioning (single location
or combination of multiple location) for recognising a range of PAs.
3) To explore the use of multimodal sensor data for the prediction of relative
PA intensity using both feature and decision fusion.
4) To use deep learning for energy expenditure prediction and compare it with
conventional machine learning algorithms and simple regression.
All the proposed methods need to be evaluated using a diverse range of datasets
collected from different participant groups (adults and children) performing different
physical activities in different contexts (indoor vs outdoor).
Chapter 1: Introduction 7
1.4 RESEARCH FRAMEWORK
The key components of this research are shown in Figure 1.1.
Figure 1.1 The framework of this research
The research involved data collection, modelling and validation of the models.
In the data collection phase, a real-world dataset was collected using the wearable
sensors in the outdoor context. The collected data included acceleration, heart-rate,
electrodermal activity, temperature, and profile data. Apart from the collected dataset,
several datasets which were either public or collected by other parties were utilised in
the research. By leveraging advanced machine learning and fusion algorithms, a set of
learning models (both classification and regression) were developed to improve the
recognition of PA and the prediction of its impacts on persons, namely relative
intensity and energy expenditure. All of the models were validated using a diverse
range of datasets and compared with state-of-the-art methods.
8 Chapter 1: Introduction
1.5 CONTRIBUTIONS OF THE THESIS
The contributions of this research help to extend knowledge, methods and
techniques, in the field of physical activity recognition, and the prediction of personal
impacts of PA (e.g., relative intensity, and energy expenditure prediction). Each
contribution and the related publications are listed in the following section.
The specific contributions of this thesis in the context of “PA prediction” are:
1) Recontextualisation of multi-classifier ensembles and systematically
compare them: This research systematically compared the PA classification
accuracy achieved by conventional ensemble methods (bagged decision tree,
boosted decision tree, and random forest) and a custom multi-classifier ensemble
combining four machine learning algorithms (binary decision tree, k-nearest
neighbour, support vector machine, and neural network) using three decision
fusion rules (weighted majority voting, Naïve Bayes, and behaviour knowledge
space). Performance was evaluated in three independent PA recognition datasets.
The results revealed that combining multiple individual classifiers using ensemble
learning methods can improve activity recognition accuracy from wrist-worn
accelerometer data.
Related Publication:
1. A. K. Chowdhury, D. Tjondronegoro, V. Chandran, and S. G. Trost,
"Ensemble Methods for Classification of Physical Activities from Wrist
Accelerometry," Medicine and Science in Sports and Exercise, vol. 49, no.
9, p. 1965, 2017. (Q1 Journal, Accepted)
2) Novel decision fusion algorithm and sensor positioning: A novel posterior-
adapted class-based weighted decision fusion was proposed to effectively
combine multiple accelerometers data for improving physical activity recognition.
The fusion was applied in two and three accelerometer location combinations. The
results identified that the proposed decision fusion was superior to the other state-
of-the-art fusion algorithms, and that a two-accelerometer combination (wrist and
ankle) provided the best PA recognition performance.
Chapter 1: Introduction 9
Related Publication:
2. A. K. Chowdhury, D. Tjondronegoro, V. Chandran, and S. G. Trost,
"Physical activity recognition using posterior-adapted class-based fusion of
multi-accelerometers data," IEEE Journal of Biomedical and Health
Informatics, 2017. (Q1 Journal, Accepted)
The specific contributions in the context of “PA impacts estimation” are:
3) Investigation of multimodal physiological data for relative intensity
prediction: Both regression and classification models were developed to
effectively predict the relative PA intensity using multimodal physiological sensor
data (heart-rate, RR-interval, Eda and Temp). The experiments were based on a
real-world (non-laboratory) and longitudinal dataset, collected from 22 people,
where Borg’s RPE scale was used as a ground truth measure of relative intensity.
The results showed that features extracted from RR-interval provided the highest
prediction performance compared to any other single modality. However, the
combination of Eda and Temp features fused with RR features produced the best
overall performance, confirming the benefits of using multi-modal data.
Related Publications:
3. 3. A. K. Chowdhury, D. Tjondronegoro, J. Zhang, P. S. Pratiwi, and S. G.
Trost, "Towards Non-Laboratory Prediction of Relative Physical Activity
Intensities from Multimodal Wearable Sensor Data," in Proceedings of the
1st IEEE Life Science Conference, 2017: IEEE. (Accepted)
4. A. K. Chowdhury, D. Tjondronegoro, V. Chandran, J. Zhang, and S. G.
Trost, "Prediction of Relative Physical Activity Intensity Using Multimodal
Sensing of Physiological Data," Plos One, 2018. (Q1 Journal, to be
submitted soon)
4) Use of Deep learning for energy expenditure prediction: This research is the
first study that proposes the use of deep learning to effectively predict energy
expenditure from body worn accelerometers in pre-school-aged children. It also
systematically compares the deep learning approach to conventional supervised
machine learning and simplified regression approaches in different accelerometer
10 Chapter 1: Introduction
location configurations including wrist and hip. The results show that deep
learning can achieve a comparable performance to the conventional supervised
learning, and significantly outperformed the simplified regression approaches.
Related Publications:
5. A. K. Chowdhury, D. Tjondronegoro, J. Zhang, M. Hagenbuchner, D. Cliff,
and S. G. Trost, “Deep Learning for Energy Expenditure Prediction in
Preschool Children,” in International Conference on Biomedical and Health
Informatics (BHI), 2018: IEEE-EMBS. (Accepted as 1-page abstract for
poster presentation)
6. A. K. Chowdhury, D. Tjondronegoro, J. Zhang, M. Hagenbuchner, D. Cliff,
and S. G. Trost, "Deep Learning for Energy Expenditure Estimation in
Preschool Children," IEEE Journal of Biomedical and Health Informatics,
2018. (Q1 Journal, to be submitted as full-paper)
Chapter 1: Introduction 11
1.6 SIGNIFICANCE
The methods developed in this thesis collectively deliver better algorithms and
maximise the use of available sensor information to provide accurate measurement of
PA type, relative PA intensity, and energy expenditure. This research is significant in
a number of ways including:
1. Enhancing techniques for classifying physical activity, introducing the use of
multi-classifier ensemble, proposing novel multi-sensor decision fusion
algorithm, and comparing sensor positioning to recommend the best use of
multiple sensors placed in different body locations for improving PA estimation.
2. Extending the knowledge on quantifying impacts of PA using multimodal
physiological data in terms of the relative intensity, and deep learning model for
energy expenditure. Hence, helping end users to exercise at an intensity that is
both safe and effective.
Overall, this research provides new, more accurate methods for sensor-enabled
monitoring of physical activities. The impact of this work to public health is
significant. Clinicians, researchers and public health agencies could significantly
improve physical activity surveillance and more effectively identify individuals at risk.
The work also has important implications for the design of accurate and effective
technology-based physical activity monitoring and intervention applications that could
be delivered through e-Health initiatives.
12 Chapter 1: Introduction
1.7 THESIS OUTLINE
This section provides an overview of how the thesis contributions are organised
in different chapters.
Introduction: Introduction of the thesis is presented in Chapter 1. It briefly
describes the background, problems, aims, objectives and significance of this research.
Literature Review: Chapter 2 presents the literature review. It reviews existing
work (both traditional and wearable sensor based) on Physical activity. Literature on
methods and techniques on feature extraction, feature selection, learning model, and
fusion are discussed. This chapter also summarises the research gaps in current
literature.
Part 1- Classification of Physical Activities: This part presents our two
contributions/ publications in two chapters (chapter 3 and 4). Chapter 3 presents our
work on the recontextualisation of multi-classifier ensembles and systematically
compares them with other approaches (contribution 1). Chapter 4 presents our work
on novel fusion and sensor positioning (contribution 2).
Part 2 - Estimation of Impacts of Physical Activities: This part consists of
three chapters. Chapter 5 and 6 show our work on multimodal analysis for the relative
intensity prediction (contribution 3). In chapter 7, use of deep learning for energy
expenditure prediction is presented (contribution 4).
Conclusion: Chapter 8 concludes the dissertation with a summary of the
research and provides future directions.
Chapter 2: Literature Review 13
Chapter 2: Literature Review
This chapter aims to provide the overview of background to the research aim,
which is to develop techniques to improve the measurement of physical activity and
their associated personal impacts – in terms of relative intensity and energy
expenditure – using wearable sensor data. Section 2.1 provides a background on the
importance of physical activity for people’s health and wellbeing and the current
guidelines to achieve the right amount of exercises. Section 2.2 presents the current
methods to assess personal impacts from physical activities. Section 2.3 reviews
sensor-based learning methods for physical activity type, intensity, and energy
expenditure prediction and presents the current gaps in knowledge to establish the
motivation for this research. At the end of this chapter, we will outline the key
limitations that will be addressed, and each of the subsequent chapters will provide
further insights into the current literature of the specific problems addressed.
2.1 PHYSICAL ACTIVITY AND HEALTH
2.1.1 Health Benefits of Physical Activity
Physical activity is defined as “any bodily movement produced by the
contraction of skeletal muscle that increases energy expenditure above a basal level”
[57]. Performing adequate amounts of physical activities can improve physical and
psychological health, and one’s quality of life [58, 59]. It can also improve
cardiorespiratory fitness [60]. Physical activity provides a wide spectrum of health
benefits including reduction in the risk of early death, cardiovascular disease, type 2
diabetes, colon and breast cancer, depression and loss of cognitive function [2, 61, 62].
Further, there is moderate to strong evidence that physical activity helps to maintain
weight loss, preserve physical ability in older adults, and improves sleep quality. [59,
62].
2.1.2 Guidelines of Physical Activity
On the basis of the evidence linking physical activity and health, eminent
scientific and public health organisations such as the World Health Organisation
(WHO) have issued guidelines for participation in physical activity. The guidelines are
different for different age-groups (e.g., infants, pre-schoolers, school-aged children,
14 Chapter 2: Literature Review
adults, and older adults). This thesis focuses on the data collected from adults and pre-
schoolers age-groups. The guidelines for adults and pre-schoolers are provided below.
WHO recommends adults aged 18–64 should participate in at least 150 minutes
of moderate-intensity aerobic physical activity throughout the week or do at least 75
minutes of vigorous-intensity aerobic physical activity throughout the week or an
equivalent combination of moderate- and vigorous-intensity activity [15]. In order to
achieve additional health benefits, it recommends adults to increase their moderate-
intensity aerobic physical activity to 300 minutes per week, which is equivalent to 150
minutes of vigorous-intensity aerobic physical activity per week. However, the two
intensities can be mixed in any ratio. According to The Department of Health Australia
[63], the current recommendation for Australian adults (18-64 years) is to accumulate
150 to 300 minutes (2 ½ to 5 hours) of moderate intensity physical activity or 75 to
150 minutes (1 ¼ to 2 ½ hours) of vigorous intensity physical activity, or an equivalent
combination of both moderate and vigorous activities, each week. Both guidelines
suggest people to perform muscle strengthening activities on at least 2 days each week.
Australia’s 24-hour movement guideline for the early years (0-5 years)
recommends pre-schoolers aged 3-5 years should perform at least 180 minutes of
physical activities per day, of which 60 minutes of total physical activity should be
energetic play such as running, jumping, kicking and throwing, and spread throughout
the day [63, 64]. For infants aged 0-1-year, interactive floor-based play in a supervised
and safe environment is recommended. For infants who are not mobile yet, 30 minutes
tummy time is encouraged, which includes activities of reaching and grasping, pushing
and pulling, and crawling. For toddlers (1 to 2 years), The Department of Health
Australia [63] recommends at least 180 minutes of a variety of physical activities
including energetic play such as running, jumping and twirling spread throughout the
day.
2.1.3 Risks Associated with Physical Activity
There is the risk of injuries with any types of physical activity. Risks of physical
injuries are higher in contact activities (soccer, basketball) than in the non-contact
activities like walking, swimming, and gardening [65]. Though it is rare, there are also
well-documented associations between acute episodes of exercise and sudden cardiac
death [12]. It has been estimated that between 6-17% of men who experience sudden
cardiac death do so during acute exercise [66, 67].
Chapter 2: Literature Review 15
Despite the risks of physical activity, the benefits of being physically active far
outweigh the risks [68]. The risk can be minimised by avoiding over-exertion, i.e.
performing physical activity at an appropriate frequency, intensity, and duration.
Taking proper safety measures, such as performing physical activity in safe places,
avoiding exercise in extreme cold or heat, and the use of proper equipment can mitigate
the risks associated with physical activity participation.
16 Chapter 2: Literature Review
2.2 PERSONAL IMPACTS FROM PHYSICAL ACTIVITY
Physical activity results in energy expenditure and the effort required to perform
physical activity is perceived by the individual. The work rate or intensity of physical
activity can be measured in either absolute or relative terms. The absolute impact of
physical activity is the external workload or energy cost for a particular physical
activity, typically expressed as multiples of resting metabolism or Metabolic
Equivalents (METs) [33]. The relative impact of physical activity can be the perceived
intensity or relative physical activity intensity. This research focuses on both ‘relative
physical activity intensity’, and ‘energy expenditure’ as the personal impacts from
physical activity. These terms are described below.
2.2.1 Relative Intensity of Physical activity
Relative intensity is generally expressed as a percentage of an individual’s
maximal aerobic capacity (% VO2 max, % HR reserve) or based on ratings of perceived
exertion (RPE).
For any given physical activity, relative intensity varies between persons due to
between-person differences in aerobic capacity and effort perception [13, 69]. For
example, the absolute intensity of brisk walking is 4 METs. For a young healthy person
whose maximal aerobic capacity is 10 METs, it will be equivalent to a relative intensity
of 40% of maximal capacity. But, for a person with chronic disease whose maximal
aerobic capacity is 6 METs, the relative intensity will be ~67% of the maximal
capacity. Thus, to achieve a certain absolute intensity, individuals with a lower aerobic
capacity are required to work significantly harder (in relative terms) than a person with
higher aerobic capacity and vice versa [70].
Measurement of Relative Intensity
Relative physical activity intensity can be expressed as a percentage of a
person’s maximal oxygen uptake (VO2max) or oxygen uptake reserve (VO2R), which
is measured using an incremental exercise test [62, 71]. VO2max is calculated using a
submaximal exercise test, usually done on a treadmill under clinical supervision.
VO2R is calculated by subtracting resting VO2 from the VO2max. Although these are
considered as gold standard, it is not feasible in real world settings because exercise
tests require sophisticated instruments, lab-based individual calibration, and multiple
visits for the participants.
Chapter 2: Literature Review 17
Percentage of maximal heart rate (%HRmax), or heart rate reserve (%HRR) are
also popular methods for measuring relative physical activity intensity due to the linear
relationship between heart-rate and work rate during steady state exercise [71, 72].
Heart rate based methods are valid for moderate-to-vigorous physical activities, but
they are not accurate for activities of low intensity [39]. Moreover, these approaches
require knowledge of HRmax which can be estimated by the maximal exercise test or
using the person’s age. The common formula to calculate HRmax from age is (220 –
Age). The maximal exercise test requires a lab-based set up, and age-related prediction
equations are subject to considerable measurement error [40, 41].
Self-Rated Scales for Measurement of Relative Intensity
There are also widely used self-rated scales for the measurement of perceived
exertion or relative intensity, which are accessible to everyone and usable in real
contexts. These scales describe how hard an individual perceives an activity to be.
Borg’s rating of perceived exertion (RPE) scale and OMNI perceived exertion scales
are two commonly used validated measures of the physical activity intensity.
Borg’s Rate of Perceived Exertion (RPE) Scale – Borg’s RPE scale reflects
how heavy and strenuous physical activity feels to someone, linking all sensations and
feelings of physical stress, effort, and fatigue [35]. It does not consider factors such as
muscle pain or lack of breath, but focuses on total feelings of exertion. This rating
scale ranges from 6 – 20, where 6 is “no exertion at all” and 20 means “maximal
exertion”. Borg’s RPE scale is shown in Figure 2.1.
Effort perceptions measured using Borg’s RPE scale are strongly and positively
correlated with physiological indictors of exercise intensity such as oxygen
consumption, heart-rate and power output. A study on Taiwanese men [38] showed a
linear relationship with heart rate and Borg’s RPE. Scherr, et al. [36] also performed
maximal treadmill tests on 2,560 participants and simultaneously collected their heart-
rate, blood lactate and RPE. Ratings of perceived effort on the Borg scale were strongly
and positively correlated with heart-rate (r=0.74, p<0.001) and blood lactate (r=0.83,
p<0.001).
18 Chapter 2: Literature Review
Figure 2.1 Borg’s RPE scale
OMNI Perceived Exertion Scale – The first version of the OMNI scale was
constructed for children and adolescents [73, 74]. The OMNI scale uses a pictorial
scale which is helpful for users to report their perceived intensity. More recently, an
adult version of the OMNI scale was developed which is shown in Figure 2.2. The
OMNI scale is a 10-point scale where 0 represents “extremely easy” or “not tired at
all” and 10 represents “extremely hard” or “very, very tired” for adults and children,
respectively.
Figure 2.2 OMNI scale of perceived exertion (adult) for cycling
Like Borg’s scale, a number of studies have established the concurrent validity
of the OMNI scale. The most common criteria measures have been oxygen
consumption, heart-rate and power output [73]. Utter, et al. [75] validated an OMNI
Chapter 2: Literature Review 19
scale for young adults by correlating the RPE-OMNI with oxygen uptake (VO2),
respiratory rate (RR) and heart-rate (HR). Their analysis indicated that RPE-OMNI is
positively correlated with the physiological parameters, where r = 0.67 to 0.88 (P<
0.05).
Although RPE scales are valid and useful for people to report their relative
activity intensity, these scales are difficult to implement in automated scenarios where
people might want to automatically track and record their everyday physical activity.
2.2.2 Energy Expenditure
Total energy expenditure (TEE) is the sum of the basal metabolic rate (the
amount of energy expended while at complete rest), the thermic effect of food (the
energy required to digest and absorb food) and the energy expended in physical
activity (PAEE) [76]. Basal metabolic rate is the largest component of TEE (~60%),
and thermal effect of food is the smallest component which accounts for 10% of TEE.
PAEE is the most variable component of TEE, which accounts for approximately 30%
of the TEE [77].
Measurement of Energy Expenditure
There are several methods available to estimate the human energy expenditure.
These techniques and their advantages and disadvantages are briefly described below.
Direct calorimetry measures the energy expenditure over longer periods of time
(e.g. 24 hours). It is the most accurate technique for measuring energy expenditure. In
this approach, an individual is placed in a room calorimeter to measure the body heat.
Energy expenditure is calculated by the direct measurement of heat loss from the body.
As a result, direct calorimetry techniques are only lab based, and not a practical way
to measure energy expenditure [78, 79].
Indirect calorimetry measures energy expenditure through determining the rates
of O2 consumption and CO2 production [80]. In recent years, indirect calorimeters
have become portable which enables one to measure energy expenditure in outdoor
and real contexts reliably. An indirect calorimeter typically uses a mouth piece /
breathing mask, turbine flow meter, and O2 and CO2 analysers. Indirect calorimetry
can also be used for short periods of time (e.g., minutes to a few hours), which enables
researchers to measure the energy expenditure of participants in the most feasible way.
20 Chapter 2: Literature Review
The doubly labelled water method was developed to measure energy expenditure. This
method is based on the kinetics of 2 stable isotopes of water, 2H2O (deuterium-labelled
water) and H218O (oxygen-18-labeled water). These stable isotopes are naturally
occurring compounds without known toxicity at the low doses used. Deuterium-
labelled water is lost from the body through the usual routes of water loss (urine, sweat,
evaporative losses). Oxygen-18-labeled water is lost from the body at a slightly faster
rate because this isotope is also lost via carbon dioxide production in addition to all
routes of water loss. The difference in the rate of loss between the 2 isotopes is
therefore a function of the rate of carbon dioxide production - a reflection of the rate
of energy production over time. The doubly labelled water method is reliable (3-10%
error) and can be used in free living samples [78]. But, the O-18 water and instruments
to measure traces of the isotopes are quite expensive. Also, it does not provide
information on daily variations of energy expenditure.
Chapter 2: Literature Review 21
2.3 SENSOR-BASED METHODS TO MEASURE PHYSICAL ACTIVITY AND ITS IMPACTS
Given the limitations and participant burden associated with the above-
mentioned laboratory-based methods, wearable sensor-based approaches have become
the method of choice for measuring physical activity and energy expenditure [21].
In this section, the wearable sensor-based methods are briefly reviewed. Section
2.3.1 presents an overview of wearable sensor and data. The subsequent sections
summarise all the steps of a learning method e.g., data pre-processing, feature
extraction, feature selection, and learning algorithms. Section 2.3.6 presents the effects
of sensor number, and positioning on the performance of learning algorithm.
2.3.1 Wearable Sensors
Accelerometers measure body movement in terms of acceleration which is
useful for determining the type of physical activity, intensity of physical activity, and
energy expenditure in real contexts [16, 33, 34, 47, 81]. Montoye, et al. [82] were
among the first to recognise the potential of accelerometers to assess the intensity of
physical activity objectively. The early accelerometer based works used single-axis
accelerometers [83], and recent works are using 3-dimensional acceleration signals
[18, 30, 48].
Physiological sensors such as heart-rate, respiration rate, and electrodermal
activity are also widely used in physical activity research. These physiological data are
linearly related to the intensity of physical activity, and can be considered important
cues for assessing physical activity [18, 84]. Previous researchers have utilised heart-
rate with accelerometer data for measurement of physical activity type and energy
expenditure and reported consistent improvement over the approaches that use
accelerometers alone [22, 85]. Smolander, et al. [86] improved energy expenditure
estimation using respiration with the heart-rate compared to heart-rate alone.
Electrodermal responses and skin temperature provide sweating information due to
physical exertion. Altini, et al. [18] showed improved energy expenditure prediction
when physiological signals (e.g. heart rate (HR), respiration rate, galvanic skin
response, skin humidity) were combined with accelerometer data. However,
physiological responses, such as heart-rate and galvanic skin response, are often
affected by a person’s emotional stress, which lead to poor performance in low-
intensity physical activities [18]. In addition, because they are affected by gender, body
22 Chapter 2: Literature Review
size, and fitness level, most physiological signals also require person-level
normalisation or calibration before being used in machine learning systems [20].
2.3.2 Data Pre-processing
In the data pre-processing step, the raw data are synchronised, then cleaned and
prepared for the modelling [87]. The raw sensor data may contain out-of-range values
and/or missing values. Improper screening of this data prior to the analysis phase can
produce misleading results. Thus, ensuring quality of data before running an analysis
is important. The collected sensor data is verified with the sensor’s standard
distribution or compared with the gold-standard equipment. It is checked to ensure the
sensor is properly calibrated as per the instructions. Any outliers, i.e. invalid or
transient data, are usually discarded from the analysis [17].
In order to deal with missing values of the data, data with missing values can
either be discarded or the missing values replaced using linear interpolation, unique
value, nearest value etc. [88]. Physiological signals such as heart-rate, electrodermal
activity, body temperature, etc. are pre-processed usually for drift and noise reduction.
Smoothing of the raw data can be done by using a moving average filter, or low-pass
or band-pass filter in sliding windows.
To minimise the inter-individual differences in the physiological data, individual
calibration is usually performed which involves several lab visits and tests. As such
calibration is not practical in real contexts, Altini, et al. [20] proposed automatic
normalisation of physiological data. They determined the baseline (𝑋𝑋𝑝𝑝ℎ𝑦𝑦𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏) of the
physiological signal while the subject was lying; and highest value (𝑋𝑋𝑝𝑝ℎ𝑦𝑦ℎ𝑖𝑖𝑖𝑖ℎ) of the
physiological signal was predicted using multiple linear regression of the
physiological signal from daily living activity such as walking. The signal was then
normalised (𝑋𝑋𝑝𝑝ℎ𝑦𝑦𝑛𝑛) using the following equation.
𝑋𝑋𝑝𝑝ℎ𝑦𝑦𝑛𝑛 =(𝑋𝑋𝑝𝑝ℎ𝑦𝑦 − 𝑋𝑋𝑝𝑝ℎ𝑦𝑦𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏)
(𝑋𝑋𝑝𝑝ℎ𝑦𝑦ℎ𝑖𝑖𝑖𝑖ℎ − 𝑋𝑋𝑝𝑝ℎ𝑦𝑦𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏)�
2.3.3 Feature Extraction
In previous physical activity studies, a wide range of features were extracted
from the raw sensor data which were used as input to the classification algorithms. At
first, the raw signal is usually segmented into sequences of consecutive windows, then
a set of features is extracted from each window [23]. The size of the window is an
Chapter 2: Literature Review 23
important consideration and has varied from study to study. For example, 1-2 seconds
[89], 4 seconds [21], 6.7 seconds [90], 10 seconds [30], and 60 seconds [91]. Most
studies in physical activity classification used non-overlapping sliding windows to
extract features [30, 92]. Some previous studies have been shown to be effective using
50% overlapping sliding windows [90]. Overlapping of sliding windows can provide
more data points/windows to train a model when dataset is small.
The features extracted from accelerometer data can be divided into three types:
time domain, frequency domain, and time-frequency domain. Most works utilised
time-domain features or a combination of time-domain and frequency domain features.
Some studies used physiological features along with the accelerometer features, which
are mainly statistical measures extracted from heart-rate, RR interval, skin response
etc. An overview of the extracted features from both accelerometer and physiological
sensor data is given in Table 2.1.
The time domain features are the simplest features extracted from raw
accelerometer data. These features are usually statistical measures such as mean,
median, variance, skewness, kurtosis, etc., extracted from a window of raw data, or
low pass or high pass filtered data [23, 91, 93]. Correlations between accelerometer
axis data are also used, and have been shown to improve recognition [90, 94].
Frequency domain features are extracted from the Fast-Fourier-Transformed (FFT)
window. FFT of time domain signals provides amplitude of the frequency components
and distribution of the signal energy. Some widely used frequency domain features
include entropy, signal energy, principal frequency, and magnitude of principal
frequency. [51]. Few researchers have investigated both time and frequency
characteristics of the sensor data using wavelet analysis [23]. Wavelet analysis such as
discrete wavelet transforms (DWT) decomposes sensor data into a number of
coefficients based on frequency bands while temporal information is preserved. In
general, statistical measures such as standard deviation or root mean square of specific
wavelet coefficient are used for physical activity recognition [95, 96].
24 Chapter 2: Literature Review
Table 2.1 Overview of the extracted features (ordered by author name)
Reference Sensor data and location
Accelerometer Features Physiological/Other
Features Time-domain Frequency-domain Time-frequency
Albinali, et al. [97]
Three miniature wireless accelerometers on dominant hip, thigh and upper arm
Distances between the means of the axes, Variance, Correlation coefficients
Entropy, FFT peaks and frequencies - -
Altini, et al. [18]
Accelerometer and ECG data using an ECG necklace at chest. Accelerometer, GSR and skin humidity using the wristband sensor.
Mean of the absolute signal, magnitude, mean distance between axes, variance, standard deviation, inter-quartiles range, median
Spectral energy, entropy, low frequency band signal power (0.1 − 0.75 Hz), high frequency band signal power (0.75 − 10 Hz), frequency and amplitude of the FFT coefficients
-
RR features - mean, variance and standard deviation. GSR features: mean, signal power, response rate and mean Ohmic Perturbation Duration. Skin humidity feature: mean.
Berchtold, et al. [98]
Mobile phone 3-axis accelerometer
Mean, variance for all 3 axes - - -
Cleland, et al. [28]
Accelerometers on lower back, wrist, foot, chest, hip, thigh
Mean value for each axis, Average Mean over 3 axes, Standard Deviation value for each axis, Average Standard Deviation over 3 axes, Skewness value for each axis, Average Skewness over 3 axes, Kurtosis value for each axis, Average Kurtosis over 3 axes, Correlations of axes
Energy value for each axis (x, y, and z), Average Energy over 3 axes
- -
Ellis, et al. [24] Accelerometers on hip and wrist
Vector magnitude (VM), mean, standard deviation, coefficient of variation, minimum, maximum, 25th, 50th and 75th percentile, 1-s lag autocorrelation, the correlation between each axis
Dominant frequency, entropy - -
Chapter 2: Literature Review 25
Ellis, et al. [99]
Accelerometers on wrist and hip, and heart-rate
Average, Standard deviation, Coefficient of variation, Minimum and maximum, 25th and 75th percentiles, Lag 1-second autocorrelation, Third and fourth moments, Skewness and kurtosis, correlation between axes, average and standard deviation of roll, pitch and yaw, principal direction of motion
Dominant frequency and power at dominant frequency, Total energy and entropy, FFT coefficients,
- -
Freedson, et al. [100]
Single accelerometer on hip
10th, 25th, 50th, 75th, and 90th percentiles of the second-by-second accelerometer counts) and the temporal dynamics (lag-1 autocorrelation)
- - -
Fullerton, et al. [101]
Nine accelerometers on left and right ankle, left and right hip, left and right wrist, left and right upper arm and Spine
Mean, standard deviation, root mean square, peak count, peak amplitude
Spectral energy, spectral power, and signal magnitude area
- -
Gyllensten and Bonomi [102]
Accelerometer on waist
Mean, standard deviation, kurtosis, skewness, range, cross-axis correlation, accelerometer angle
Spectral energy in sub-bands (0-10 Hz in bands of 1.25 Hz), spectral entropy, peak frequencies and cross-spectral densities in sub-bands
- -
Kujala, et al. [39]
Heart rate monitor - - -
Heart rate and its variability, respiration rate, and on/off response information
Lin, et al. [91]
Accelerometer on wrist, waist and ankle, and chest strapped heart rate monitor
Count, mean signal magnitude area (SMA), standard deviation of SMA and median of SMA.
- -
Heart rate features - Mean, standard deviation, variance, interquartile range, skew, kurtosis, mean of HR difference series, standard deviation of HR difference series, and variance of HR difference series.
26 Chapter 2: Literature Review
Mackintosh, et al. [103]
Nine accelerometers on chest, left and right wrists, hips, knees, and ankles
Mean and variance of the three accelerometer axes
- - -
Montoye, et al. [48]
Four accelerometers on hip, right thigh, and both wrists
Mean, standard deviation, minimum, maximum, covariance of adjacent windows of data, and the 10th, 25th, 50th, 75th, and 90th percentiles of the raw acceleration signal on each axis
- - -
Montoye, et al. [45]
Wireless network of three accelerometers on the right wrist, thigh, and ankle, and a hip-mounted accelerometer
Mean, standard deviation. - -
Demographic Features - weight and height.
Nyan, et al. [95]
Accelerometer on shoulder - -
Addition of 'Sum of the squared detail coefficients at levels 4' and 'Sum of the squared detail coefficients at levels 5' for each accelerometer component
-
Pavey, et al. [104]
Accelerometer on non-dominant wrist
Mean, SD, 10th, 25th, 50th,75th, 90th percentile, MAD, lag one autocorrelation,
Signal power, dominant frequency 0.25–3.0 Hz, dominant frequency magnitude 0.25–3.0 Hz, entropy 0.25–3.0 Hz
- -
Tamura, et al. [105]
Accelerometer on waist - -
Sum of the squared detail coefficients at levels 4, and level 5 for each accelerometer component
-
Tapia, et al. [106]
Five tri-axial wireless wearable accelerometers and a chest strapped heart-rate monitor
Area under curve (AUC), variance, mean, mean distances between axes, correlation coefficients
Entropy, FFT peaks and energy -
Heart rate feature – number of heart beats above the resting HR value (BPM-RHR)
Chapter 2: Literature Review 27
Trost, et al. [30] Accelerometer on wrist and hip
Mean, standard deviation, coefficient of variation, percentiles (10th, 25th, 50th, 75th,90th), lag one autocorrelation, skewness, kurtosis, log energy, peak intensity, zero crossings, and cross-axis correlation
Signal power - -
Trost, et al. [47] Accelerometers on right hip
10th, 25th, 50th, 75th, and 90th percentiles and the lag one autocorrelation
- - -
2.3.4 Feature Selection
Feature selection techniques attempt to find a subset of n features out of m
features (m > n) so that the prediction performance is enhanced.
Filter based feature selection methods are fast and simple, where the features are
evaluated independently. In general, it calculates a goodness measure (e.g. correlation,
t-score, etc.) for each of the features, and ranks them. It then selects the best m features,
or features which have goodness measure values higher than a threshold. Popular
filter-based feature selection algorithms used in physical activity research include
correlation-based feature selection, T-score based feature selection, ReliefF, and
Minimum Redundancy Maximum Relevance (MRMR) [21, 107-109].
Unlike the filter-based methods, wrapper-based feature selection methods
evaluate subsets of features to detect the interaction between the features. Lin, et al.
[91] employed two wrapper-based feature selection methods: sequential forward
selection (SFS) and sequential backward selection (SBS) in their study. SFS starts with
an empty set and adds features consecutively, which are sent to the criterion function
to examine its effectiveness. At each step, it adds a feature if the its inclusion with
already selected feature subset results in improved performance. SBS has the opposite
search direction to that of SFS. SBS starts with the complete set of features, and then
iteratively deletes features based on the combined performance.
Some filter-based feature selection methods select redundant features which
cannot provide additional information to the learning algorithm. Wrapper-based
28 Chapter 2: Literature Review
methods, on the other hand, are complex and take significant amounts of time when
the number of features is large. Few studies have compared feature selection methods
in the physical activity domain. For example, Tulum, et al. [108] compared the
performance two filter-based (ReliefF and T-score) feature selection methods on the
physical activity data. They reported the best test accuracy (97.6%) using features
selected by ReliefF, which was around 2% higher than T-score. Conversely, Zhang
and Sawchuk [110], found that a wrapper feature selection method (SFS) provided
higher physical activity classification accuracy when compared to a filter-based
method (ReliefF).
2.3.5 Learning Algorithms
The derived features are used as inputs to a learning algorithm. In previous
studies, algorithms employed range from simple threshold-based methods to complex
machine learning methods such as artificial neural networks, support vector machines,
hidden Markov models, etc. in physical activity research. [53, 56, 111, 112].
Threshold Based Methods
Threshold based methods use predefined thresholds or cut-points to classify
sensor output into classes of physical activity intensity. The most common approach
thus far has been to develop a simple regression equation that defines the relationship
between sensor output and energy expenditure. Once the regression equation has been
developed, thresholds or “cut-points” denoting the dividing line between sedentary-
and-light (1.5 METs), light-and-moderate (3 METs), and moderate-and-vigorous
physical activity (6 METs) are identified. Receiver operator characteristic (ROC)
curve is another widely used technique, which determines cut-point thresholds by
assessing levels of sensitivity (true positives) and specificity (true negatives) for
intensity categories. These cut-points are then used to estimate the amount of time
spent in sedentary, light, moderate, and vigorous activity [113, 114].
Although the application of cut-points continues to be standard research practice
in the movement sciences, there is growing recognition that the relationship between
accelerometer output and energy expenditure is highly activity dependent, and that a
single regression equation cannot accurately determine energy expenditure across a
wide range of activities. Validation studies involving independent samples indicate
Chapter 2: Literature Review 29
that regression-based cut-point approaches misclassify the true intensity of physical
activity 35% to 45% of the time [47].
Machine Learning Methods
Machine learning based classification or regression is a viable and more accurate
alternative to threshold-based methods. [24, 45]. To date, a range of learning
algorithms, including supervised, unsupervised and a combination of learning
algorithms have been employed in the physical activity domain [23]. Table 2.2
summarises the studies in the exercise and movement science implementing machine
learning methods for classification and regression.
Among the supervised algorithms, artificial neural networks [47], SVM [28, 50],
random forest [104], k-nearest neighbour [115], and decision tree [90] are widely used
and reported high physical activity recognition performance using a single
accelerometer. For example, Trost, et al. [47] developed an artificial neural network
model for predicting physical activity type and energy expenditure. They found high
(88.4%) classification accuracy for physical activity type prediction. Ellis, et al. [99]
developed a random forest classifier for physical activity recognition and achieved
92.7% and 87.5% average overall accuracy for the hip and wrist accelerometer,
respectively. In their later study [24], they developed a 2-step activity recognition
model by including a hidden Markov model with random forest. They reported a
balanced accuracy of 88.1% and 83.6% for the hip and wrist, respectively.
There are some studies that compared the performance of different classification
algorithms for physical activity recognition. For example, Reiss and Stricker [116]
used decision tree classifiers which worked best among some base- and meta-level
classification techniques. Bao and Intille [90] applied decision tree, k-nearest
neighbour and naïve Bayes classification to identify 20 physical activities. They
reported high accuracy for both decision tree (84%) and k-nearest neighbour (83%).
Maurer, et al. [117] reported similar performances for decision tree, naïve Bayes, and
KNN in wrist accelerometer data for 6 activities including sitting, standing, walking,
ascending stairs, descending stairs, and running. In a study by Gyllensten and Bonomi
[102], authors reported higher accuracy for SVM model (lab – 95.1%, daily living –
75.6%) compared to neural network (lab – 91.4%, daily living – 74.8%), and decision
tree (lab – 92.2%, daily living – 72.2%) models using waist accelerometer data in both
laboratory and daily living settings. Ermes, et al. [118] used hip and wrist
30 Chapter 2: Literature Review
accelerometers where they reported at least 4% higher classification rate using neural
network (87%) than hierarchical (83%) and decision tree (60%).
To date, researchers have developed several regression equations between
accelerometer count and assessment of physical activity to estimate physical activity
related energy expenditure and absolute intensity [33]. Most studies were conducted
in laboratory settings where correlation values between activity count and energy
expenditure ranged from 0.58 to 0.92 during various activities [119]. Apart from linear
regression, a few neural network [44, 47, 100] based regression methods such as Radial
Basis Function Network (RBFN) and Generalised Regression Neural Network
(GRNN) were also used by researchers which performed better than the linear
regression [91]. Trost, et al. [47] reported 30-40% lower RMSE for energy expenditure
prediction using an artificial neural network model compared to conventional
regression-based models. In a recent study, Zhu, et al. [42] successfully adopted deep
learning for energy expenditure prediction for adults. They reported 30-35% lower
RMSE using deep learning compared to existing activity-specific linear regression
model.
Preece, et al. [23] completed a comprehensive review of learning algorithms
used for classification or regression problems in the physical activity domain. They
were unable to declare one particular machine learning technique as universally better
than others. In a recent study, Kate, et al. [56] compared the performance of eight
learning algorithms for both physical activity recognition and energy expenditure
prediction. In their results, they were unable to find a machine learning technique that
worked best in all testing situations. Interestingly, while no single algorithm works
best in isolation, Catal, et al. [112] showed that the combination or fusion of multiple
classification algorithms using majority vote can yield better performance than a single
classifier. The ensemble of multiple classification and models is advantageous as it
usually reduces the chances of overfitting and improves the generalization of the
classification task. However, to date, relatively few studies in the physical activity
domain have evaluated the utility of decision fusion or ensemble methods
incorporating different classification/regression algorithms.
Chapter 2: Literature Review 31
Table 2.2 Overview of some wearable sensor-based works that used machine learning algorithms (ordered by author name)
References (number of subjects)
Application(s) Sensor data and location Learning algorithm(s) Findings
Albinali, et al. [97] (24 subjects)
Physical activity recognition & Energy expenditure estimation
Three miniature wireless accelerometers on dominant hip, thigh and upper arm
Activity detection - C4.5 classifier;
Energy Expenditure estimation – several regression models
Using the combination of physical activity recognition and individually-calibrated regression models can improve (15%) energy expenditure estimation compared to best estimate from the other methods.
Altini, et al. [18] (16 subjects)
Physical activity recognition and Energy Expenditure estimation
Accelerometer and ECG data using an ECG necklace at chest.
Accelerometer, GSR and skin humidity using the wristband sensor.
Activity detection – Support vector
machine
Energy Expenditure estimation – activity-specific multiple linear regression models
The combination of accelerometer and physiological signals improves performance for activity recognition and energy expenditure.
Berchtold, et al.
[98] (20 subjects)
Physical activity
recognition
Mobile phone 3-axis accelerometer Fuzzy classification Physical activity recognition accuracy improved up
to 97%.
Catal, et al. [112] (36 users)
Physical activity recognition
Accelerometer on wrist J48, Logistic regression, multi-layer perceptron, and voting
Ensemble of all classifiers using voting improved physical activity recognition.
Cleland, et al. [28] Physical activity recognition
Accelerometer on lower back, wrist, foot, chest, hip, thigh
J48, naïve Bayes, neural network, SVM
SVM provided highest accuracy. Among single locations, hip gave best performance. Compared to a single accelerometer, combining data from any two locations resulted in a significant improvement in performance. However, combining data from three or more accelerometers provided no further improvements in performance.
32 Chapter 2: Literature Review
Ellis, et al. [24] (40 overweight women)
Physical activity recognition
Accelerometer on hip and wrist Random forest and hidden Markov model
In free living, their model outperformed traditional cut-points. Hip accelerometer (88%) provided higher accuracy than wrist (83%).
Ellis, et al. [99] (42 adults)
Physical activity recognition and Energy Expenditure estimation
Accelerometers on wrist and hip, and heart-rate
Random forest Heart rate improved energy expenditure prediction but did not significantly improve physical activity recognition. Hip sensor gave better accuracy than wrist.
Freedson, et al. [100] (277 subjects)
Activity type and Energy Expenditure (MET) prediction
Single hip-mounted accelerometer Artificial neural network (ANN) ANN improved energy expenditure prediction compared to other regression models. Household and locomotion activities had high (98%, 89% respectively) and sports had low (23.7%) classification rate.
Fullerton, et al. [101] (10 subjects)
Physical activity recognition
Nine accelerometers on left and right ankle, left and right hip, left and right wrist, left and right upper arm and Spine
Decision tree, SVM, kNN, bagged trees
A fine kNN classification method that used mean and standard deviation features provided best prediction accuracy in free living.
Gyllensten and Bonomi [102] (20 subjects)
Activity type recognition Waist worn accelerometer Decision trees, feed-forward neural networks (NN), support vector machines (SVM), and decision tree
In daily life settings, laboratory trained model provided around 20% lower performance. Among the classifiers, SVM provided higher classification than other algorithms.
Kate, et al. [56]146 subjects
Physical activity recognition and Energy Expenditure estimation
Hip mounted accelerometer SVM, ANN, Logistic regression, decision trees, bagged decision trees, random forest, naïve Bayes, kNN, and linear regression.
Physical activity recognition improved with the increase of features. All learning algorithms gave competitive performance. Activity specific energy expenditure model showed better results than single model.
Chapter 2: Literature Review 33
Lin, et al. [91] (26 subjects)
Energy Expenditure estimation
Wrist, waist and ankle worn accelerometer and chest strapped heart rate monitor
Activity classification – decision
tree
Energy Expenditure estimation - Radial Basis Function Network, Generalised Regression Neural Network
Generalised Regression Neural Network provided higher R2 for Energy Expenditure prediction than Radial Basis Function Network.
Luštrek, et al. [78] Energy Expenditure estimation
Accelerometers on chest, and wrist Energy Expenditure estimation - Linear Regression, MultiLayer Perceptron artificial neural network, Support Vector Regression (SVR), M5P model trees, M5Rules and REPTree regression trees.
The composite classifier, consist of two activity-specific classifiers and a general classifier provided substantial improvement over the single-classifier approach.
Mackintosh, et al. [103] (27 children)
Energy Expenditure estimation
Nine accelerometers on chest, left and right wrists, hips, knees, and ankles
Artificial neural networks All single, 2, 3, and 4 accelerometer combinations had similar performance, which was better than the combination of 9-accelerometers
Montoye, et al. [48] (44 adults)
Energy Expenditure estimation
Four accelerometers on hip, right thigh, and both wrists
Linear regression, linear mixed, and ANN models
ANN showed significant improvement over linear models for wrist accelerometer. However, for hip and thigh locations, linear models provided similar performance to ANN.
Montoye, et al. [120] (44 adults)
Activity type recognition Accelerometers on hip, wrists, and thigh Artificial neural networks Wrist accelerometer provided highest and hip provided lowest accuracy.
Montoye, et al. [45] (34 adults)
Energy Expenditure estimation
Wireless network of three accelerometers worn on the right wrist, thigh, and ankle, and a hip-mounted accelerometer
Artificial neural networks Wireless network only marginally improved Energy Expenditure estimation than a hip accelerometer.
34 Chapter 2: Literature Review
Pavey, et al. [104] (21 Subjects)
Physical activity recognition
Accelerometer on non-dominant wrist Random forest In lab, high physical activity classification accuracy was obtained, and in free living, method was less accurate for identifying stepping and non-steeping activities.
Tapia et al. [106] (21 subjects)
Physical activity and absolute intensity
Five tri-axial wireless wearable accelerometers and a chest strapped heart-rate monitor
C4.5 decision tree Subject-dependent training provided 94.6% and subject-independent training provided only 56.3% accuracy. Heart rate with accelerometer did not improve much.
Trost, et al. [30] (52 children)
Physical activity recognition
Accelerometer on wrist and hip Regularised logistic regression Both hip and wrist provided comparable performance.
Trost, et al. [47] (100 youth participants)
Physical activity recognition and Energy Expenditure estimation
Accelerometers on the right hip Artificial neural networks ANN improved performance than conventional regression-based approaches.
Chapter 2: Literature Review 35
2.3.6 Effect of Sensor Number, Positioning, and Combination on the Performance
Accelerometer sensors are commonly placed on the participant’s chest, hip,
thigh, or wrist. The hip is closest to the centre of mass of a human body; as a result, an
accelerometer placed on the hip location can capture most of human motion. A range
of basic daily activities, including walking, postures and activity transitions can be
classified according to the accelerations measured from a hip-worn accelerometer
[121]. In some studies, the hip location provided better performance than other
locations (e.g., wrist) for physical activity type and energy expenditure prediction [24].
On the other hand, the wrist placement is convenient for users in everyday life usages
and has become a popular placement for commercial activity trackers. In a recent
study, Montoye, et al. [120] compared the hip, thigh and wrist location for physical
activity recognition. They reported the wrist to be the best location with the hip as the
worst. Trost, et al. [30] found comparable performance for both hip and wrist locations.
A number of studies have shown that the effective combination/fusion of outputs
from multiple accelerometers placed at different body locations can improve both
physical activity and energy expenditure prediction accuracy compared to the use of a
single accelerometer [28]. Altini, et al. [92] compared the sensor numbers and
locations for both activity recognition and energy expenditure estimation. There were
no statistical differences in energy expenditure estimation error between the single
accelerometer and a combination of five accelerometers. Bao and Intille [90]
developed a model for accelerometers placed on the upper-arm, lower-arm, hip, thigh,
and ankle. When the outputs from all sensors were fused using feature fusion, the
physical activity classification accuracy was 84%, which was 3% higher than the
accuracy achieved by the combination of thigh and wrist accelerometers. Cleland, et
al. [28] conducted a comprehensive study on accelerometer placements where they
combined accelerometer data from six body locations (lower back, wrist, foot, chest,
hip, and thigh) using simple feature fusion. Among the single locations, they found the
hip to be the best location for physical activity recognition. When they combined the
outputs from any two accelerometer locations, the performance improved significantly
compared to the single accelerometer. However, the combination of three or more
accelerometers did not improve performance.
36 Chapter 2: Literature Review
The results of previous studies indicate that sensor location and number are
important methodological considerations in physical activity research. Although a
number of studies have been conducted on this topic, the findings are inconsistent. A
single accelerometer location does not perform equally for all activities, and the overall
performance depends on the target activity [26]. When combining sensor data from
multiple locations, either feature-level or decision-level fusion can be applied [31].
However, so far, most studies have adopted a simple feature-level fusion approach. An
advanced fusion algorithm using decision fusion has yet to be explored in physical
activity domain.
Chapter 2: Literature Review 37
2.4 SUMMARY OF CURRENT GAPS
As identified in this chapter, the key gaps are provided below:
1. Although most wearable sensor studies have used a diverse range of learning
algorithms, their results are not consistent. An ensemble of several learning
algorithms may effectively leverage the advantages of each learning algorithm
and provide better overall classification accuracy. However, studies
implementing and comparing the performance of an advanced ensemble of
learning algorithms are lacking.
2. Combining data from multiple accelerometers, placed at different body
locations, can improve the physical activity recognition. To date, most studies
are based on feature fusion. Advanced decision fusion methods may combine
sensors more effectively which warrant investigation.
3. Relative physical activity intensity may manifest through a person’s
physiological responses such as heart-rate, RR interval, and electrodermal
activity. However, methods to combine multimodal sensor data and predict
relative intensity are lacking.
4. Advanced learning algorithms such as deep learning can automatically extract
hidden patterns from the data, which may provide improved energy expenditure
prediction performance. This needs to be investigated and compared with current
state-of-the-art methods.
In the following chapters, novel methods to address these problems are
presented. Each chapter includes a more comprehensive background related to the
specific problem.
PART I - Classification of Physical Activities
Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry 41
Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry
Alok Kumar Chowdhury1
Dian Tjondronegoro2
Vinod Chandran1
Stewart G. Trost3
1. Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia.
2. School of Business and Tourism, Southern Cross University, Gold Coast, Australia.
3. Institute of Health and Biomedical Innovation at QLD Centre for Children’s Health Research, School of Exercise and Nutrition Sciences, Queensland University
of Technology, Brisbane, Australia.
Corresponding Author:
Professor Stewart G. Trost, PhD
Institute of Health and Biomedical Innovation at QLD Centre for Children’s Health Research
Level 6, 62 Graham Street
South Brisbane, QLD 4101
Australia
Phone: +61 7 3069 7301
Fax: + 61 7 3138 3980
Email: [email protected]
42 Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry
QUT Verified Signature
Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry 43
QUT Verified
Signature
44 Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry
3.1 ABSTRACT
Purpose: To investigate whether the use of ensemble learning algorithms
improve physical activity recognition accuracy compared to the single classifier
algorithms, and to compare the classification accuracy achieved by three conventional
ensemble machine learning methods (bagging, boosting, random forest) and a custom
ensemble model comprising four algorithms commonly used for activity recognition
(binary decision tree, k nearest neighbour, support vector machine, and neural
network). Methods: The study utilised three independent datasets that included wrist-
worn accelerometer data. For each dataset, a four-step classification framework
consisting of data pre-processing, feature extraction, normalisation and feature
selection, and classifier training and testing was implemented. For the custom
ensemble, decisions from the single classifiers were aggregated using three decision
fusion methods: weighted majority vote, naïve Bayes combination, and behaviour
knowledge space combination. Classifiers were cross-validated using leave-one
subject out cross-validation and compared on the basis of average F1 scores. Results:
In all three datasets, ensemble learning methods consistently outperformed the
individual classifiers. Among the conventional ensemble methods, random forest
models provided consistently high activity recognition; however, the custom ensemble
model using weighted majority voting demonstrated the highest classification
accuracy in two of the three datasets. Conclusion: Combining multiple individual
classifiers using conventional or custom ensemble learning methods can improve
activity recognition accuracy from wrist-worn accelerometer data.
Keywords: Motion sensors, machine learning, pattern recognition, random
forest, bagging, boosted decision trees.
Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry 45
3.2 INTRODUCTION
Physical inactivity is recognised as a critical population health risk factor, and a
significant contributor to the direct and indirect health care costs associated with
management of a wide range of chronic health conditions [122]. In addition, there is a
growing body of evidence to suggest that, sedentary behaviour, characterised by
prolonged bouts of sitting, is associated with serious health conditions, independent of
the effects of physical activity [123, 124]. Hence, valid and reliable measures of
physical activity and sedentary behaviour are a necessity in studies designed to: 1)
document the frequency and distribution of physical activity and sedentary behaviour
in defined population groups; 2) identify the psychosocial and environmental factors
that influence physical activity and sedentary behaviour; and 3) evaluate the efficacy
or effectiveness of programs and policies to increase habitual physical activity and
reduce sedentary behaviour [114].
Accelerometer-based motion sensors have become the method of choice for
measuring physical activity and sedentary time in free-living contexts, because they
are small, robust and low cost [81, 125]. However, differences in accelerometer data
processing methods have hindered research efforts to quantify, understand and
intervene on physical activity and sedentary behaviour. Existing approaches can be
categorised into two groups: 1) threshold-based and 2) machine learning approaches.
Threshold or “cut-point” methods use regression methods to map accelerometer
outputs to energy expenditure [81, 113, 125]. Machine learning approaches extract
features or patterns from the acceleration data and use supervised or unsupervised
learning algorithms to predict physical activity type and/or energy expenditure [25, 44,
120, 126]. Relative to cut-point methods, machine learning approaches can provide a
greater variety of physical activity metrics (e.g., activity type, walking speed) and more
accurate predictions of energy cost [24, 25, 30]. Nevertheless, the adoption of machine
learning methods in physical activity studies has been low because they are not as
easily implemented as threshold based methods.
To date, a range of machine learning algorithms have been used in the physical
activity (PA) classification and measurement domain. They include nearest neighbour
(kNN), artificial neural networks (ANN), support vector machines (SVM), Markov
models, decision trees etc. Preece et al. [23] summarised the relative strengths,
weaknesses, and performance characteristics of 11 different machine learning
46 Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry
approaches. The authors concluded that it was impossible to declare one particular
machine learning technique as universally better than others for any given PA
recognition problem. Most recently, Kate and colleagues [56] compared the accuracy
of eight different machine learning techniques for activity recognition and energy cost
estimation from accelerometer data. Their results indicated that no single machine
learning technique works best in all testing situations.
Because there is no single, optimal machine learning algorithm for any given
classification or estimation problem, ensemble learning approaches are gaining
popularity [112]. An ensemble of classifiers is a set of base level classifiers (known as
weak learners) whose individual decisions are combined to improve overall decision
accuracy. In this respect, an ensemble of machine learning models can be
conceptualised as a committee of experts brought together to make a final decision. If
the weak learners are combined appropriately the fusion of outputs is constructive
leading to better overall decisions and generalisation.
Three commonly used ensemble learning schemes are bagging, boosting, and
random forests. Bagging stands for bootstrap aggregation. It involves taking multiple
random samples of training instances (with replacement) and applying a weak learning
algorithm (typically a decision tree) to the data. The decisions of each classifier are
combined to make a final class prediction using the majority vote rule [127]. Boosting
also applies a voting procedure to combine the decisions of multiple weak learners.
However, boosting adopts an iterative approach in which each new model is influenced
by the performance of previously built models. The boosting algorithm begins by
assigning equal weights to all instances in the training data. It then builds a classifier
(typically a decision tree) and instances are reweighted based on the classifiers
performance on the training data. The weights of correctly classified instances are
decreased, while the weights of misclassified instances are increased. This weighting
scheme allows subsequent classifiers to be more proficient at classifying instances
misclassified by earlier models. The final class prediction is based on the weighted
majority vote of each model, where the weights are determined by the accuracy of the
model [128]. Random Forests are another widely used ensemble learning method. The
random forests algorithm is similar to bagging in that multiple weak learners (decision
trees) are trained on randomly sampled instances from the training data. However,
unlike bagging, where all features in the training data are considered for splitting a
Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry 47
node, the random forest algorithm selects the best among a random sample of features.
The decisions generated by each tree are recorded and the final class prediction is
based on majority vote [129].
While bagging and boosting combine base learners of the same type, it is
possible to construct custom ensembles featuring learning algorithms of different
types. The decisions of each classifier are subsequently combined using an established
decision fusion method such as weighted majority voting, naïve Bayes, behaviour
knowledge space etc. [31]. Unlike conventional ensembles, custom ensemble methods
achieve diversity by using heterogeneous classification algorithms as base classifiers,
which may lead to better generalised performance [130].
Although ensemble learning methods are starting to emerge in physical activity
research, no previous studies in the exercise and movement sciences have compared
the performance of different ensemble methods and decision fusion rules. Therefore,
the purpose of this study was to systematically compare the classification accuracy
achieved by conventional ensemble methods (bagged decision tree, boosted decision
tree, and random forest) and a custom multi-classifier ensemble combining four
machine learning algorithms (binary decision tree, KNN, SVM and neural network)
using three decision fusion rules (weighted majority voting, naïve Bayes, and
behaviour knowledge space). Performance was evaluated in three independent
physical activity recognition datasets.
48 Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry
3.3 METHODS
3.3.1 Datasets
This study used three independent accelerometer datasets collected from
different participant groups (adults and children), performing different activity in
different contexts (lab and outdoor). A brief description of each dataset is provided in
Table 3.1.
Dataset #1. The PAMAP2 dataset is a fully annotated, publicly available
physical activity monitoring dataset. The data was downloaded from the UCI machine
learning repository
https://archive.ics.uci.edu/ml/datasets/PAMAP2+Physical+Activity+Monitoring.
Detailed information about the study can be found elsewhere [87, 131]. Nine
participants (1 female, 8 male, age: 27.2 ± 3.3 years, and BMI: 25.1 ± 2.6 kg/m2)
performed twelve different types of physical activities. Participants wore three Colibri
wireless IMUs (Inertial Measurement Units) on their dominant-arm wrist, dominant-
side ankle and chest. Each IMU contained two three-dimensional (3D) acceleration
sensor (scale: ±6 g and ±16 g) with a resolution of 13 bits, a 3D gyroscope sensor, a
3D magnetometer sensor, temperature, orientation and heart-rate monitor sensors. The
sampling rate of accelerometer and heart-rate sensor was 100 Hz and ~ 9 Hz
respectively. Only 3D accelerometer (±16 g) data of wrist accelerometer was used in
this study. The physical activities included in the datasets were: lying down, sitting,
standing, walking, running, cycling, ascending stairs, descending stairs, Nordic
walking, vacuum cleaning, ironing clothes and jumping rope. For the purposes of this
study, the first eight basic activity classes were selected for evaluation because these
activity classes were widely used in past studies.
Dataset #2. The second dataset comprised wrist accelerometer data collected on
eight individuals (mean age = 29.9 ± 4.2 y, 50% male, mean BMI = 22.8 ± 1.9 kg/m2)
during an outdoor physical activity session in a park. The data collection protocol
included the activities in the following order: stationary activity (sit or stand still) for
5 min, self-paced comfortable walk for 5 min, self-paced brisk walk for 5 min, jogging
for 5 min, and fast-run for 2 min. In between each activity, participants rested for 5 to
15 min. During each trial, motion and heart rate were recorded using Empatica E4
monitor. The Empatica E4 (Empatica Inc., Boston, USA), a light-weight (25 grams)
wrist-watch, was placed on participant’s non-dominant-wrist to record 3D acceleration
Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry 49
(±2 g), heart rate, electrodermal activity, and temperature. This study utilised only the
3D acceleration data. The sampling rate of the acceleration data was 32 Hz.
Dataset #3. The third accelerometer dataset was collected from 17 children (9
boys, 8 girls, age: 14.6 ± 2.4 years, BMI percentile: 66.8 ± 25.9) [30, 81]. A total of 12
activity trials were performed over two laboratory visits. On visit 1, participants
completed the following 6 trials: lying down, handwriting, laundry task, throw and
catch, comfortable over-ground walk, and aerobic dance. On the 2nd visit, the following
6 trials were completed: seated computer game, floor sweeping, brisk over-ground
walk, basketball, over-ground run/jog, and brisk treadmill walk. The duration of each
trial was 5 minutes. Based on the movement pattern, activities were categorised into
seven categories: lying down, sitting (handwriting, computer game) standing with
upper body movements (throw and catch, laundry task, floor sweeping), walking
(comfortable over-ground walk, brisk over-ground walk, brisk treadmill walk),
running, basketball, and dance. Further details of the activities trials can be found in
[81]. During the trials, participants wore an ActiGraph GT3X+ tri-axial accelerometer
(ActiGraph Corporation, Pensacola, FL) on the right hip and non-dominant wrist. The
sampling rate was set to 30 Hz. In this study, only wrist-worn acceleration data were
used.
50 Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry
Table 3.1 Comparison across three datasets
Dataset #1 Dataset #2 Dataset #3
Data Collection Environment
PAMAP2 is a Public dataset collected in lab.
Private dataset collected in outdoor
Private dataset collected in lab
Participants 9 Adult participants (1 female, 8 male), Age: 27.2 ± 3.3 years, BMI: 25.1 ± 2.6 kg/m2
8 Adult participants (4 female and 4 male), Age: 29.9 ± 4.2 years, BMI: 22.8 ± 1.9 kg/m2
17 Children (9 boys, 8 girls), Age: 14.6 ± 2.4 years BMI Percentile: 66.8 ± 25.9
Sensors and Placements
Colibri wireless IMU contains 3D accelerometer, 3D gyroscope sensor, 3D magnetometer sensor, temperature, orientation and heart-rate monitor sensors Placements: Dominant-arm wrist, dominant-side ankle and chest
Empatica E4 contains 3D accelerometer, electrodermal activity, heart-rate and temperature sensors Placements: Non-dominant wrist
ActiGraph GT3X+ tri-axial accelerometer Placements: Right hip and non-dominant wrist
Accelerometer Sensor specifics
Scale: ± 6g and ± 16g Sampling rate: 100 Hz
Scale: ± 2g Sampling rate: 32 Hz
Scale: ± 6g Sampling rate: 30 Hz
Physical activities performed
Lying down, sitting, standing, walking, running, cycling, ascending stairs, descending stairs, Nordic walking, vacuum cleaning, ironing clothes and jumping rope
sit or stand still, self-paced comfortable walk, self-paced brisk walk, jogging, and fast-run
Lying down, sitting (handwriting, computer game), standing with upper body movements (throw and catch, laundry task, floor sweeping), walking (comfortable over-ground walk, brisk over-ground walk, brisk treadmill walk), running, basketball, and dance
3.3.2 Classification Framework
The four steps of the classification framework, shown in Figure 3.1, were data
pre-processing, feature extraction, normalisation and feature selection, and activity
classification. In the classification step, both conventional and custom ensemble
methods were implemented. In the custom ensemble method, decisions from four
“state-of-the-art” single classifiers, including binary decision tree (BDT), k nearest
Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry 51
neighbour (kNN), support vector machine (SVM) and artificial neural network (ANN),
were aggregated together using three decision fusion techniques, namely weighted
majority voting (WMV), naïve Bayes combiner (NB), and behaviour knowledge space
combiner (BKS). Among the conventional ensemble methods, boosted decision trees,
bagged decision trees and random forest were investigated. All steps of the framework
were implemented using Matlab (The MathWorks Inc., USA). Example features and
code for using the proposed framework can be found in the following link:
https://github.com/alokchy04/Decision-Fused-Ensembles-for-PA-Classification-
from-Wrist-Worn-Accelerometer.
Figure 3.1 Flow diagram of the proposed framework
Pre-processing. In the pre-processing step, the accelerometer data was annotated
with activity labels and converted to time-series data structure. If the dataset contained
missing accelerometer data, linear interpolation was used to find the intermediate
missing values in the data. Missing values at the end of each labelled activity were
replaced by previous value. In addition, 10 s of data at the beginning and end of each
labelled activity was discarded from analysis to remove non-steady-state data.
Feature Extraction. A range of time- and frequency- domain features were
extracted from 10 s sliding window with 50% overlapping. A 10 s window was chosen
as this period is sufficient to capture multiple periodic movements for all activities
52 Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry
[30]. In total, 45 features were extracted from each accelerometer. Consistent with
previous studies, mean, standard deviation, minimum, maximum, variance, median,
skewness, 25th and 75th percentile, and kurtosis were extracted from each axis of a 3-
axis accelerometer [118, 132]. In addition to these simple time-domain features,
frequency domain features including spectral energy, dominant frequency, dominant
frequency magnitude, zero crossings, and cross-axis correlations were calculated.
Spectral energy was calculated by summing the squared discrete FFT component
magnitudes of the signal [90]. Spectral energy was normalised by dividing it by
window length. The frequency with highest FFT magnitude was considered as
principle frequency [51]. Zero-crossing for each accelerometer axis represented the
number of times the signal changed sign, and accelerometer axis cross-correlations
(corrxy, corrxz, corryz) [44, 90] were calculated and included in the feature list. The
detailed description of the features is provided in Appendix A.
Normalisation and Feature Selection. Normalisation of the features before
classification is useful when the feature values vary in different dynamic ranges. In
this study, the training features were normalised to a zero mean and unit variance by
subtracting the corresponding mean and dividing by the standard deviation. Features
in the testing data were normalised using the same approach using the training data
means and standard deviations. Feature selection is another important step necessary
to improve time and space complexity of the classification algorithms. A correlation-
based feature selection method [133] was applied on the training data to select features
for classification. Features with a correlation ≥ 0.25 coefficient with the activity
classes were selected as inputs to the classifiers. A list of features selected for inclusion
in each training dataset is provided in Appendix B.
3.3.3 Conventional Ensemble Methods
The performance of three standard ensemble methods were evaluated - bagged
decision trees, random forests and boosted decision trees. ‘Treebagger’ classification
class of Matlab was used as the bagging decision tree and random forest
implementation. The number of decision trees in the ensemble was empirically set to
20 as it provided optimum performance. While the bagging decision tree considered
all features for splitting a node, random forest implementation used the number of
features to sample equal to the square root of the total number of features available.
Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry 53
For the boosting decision tree implementation, Adaboost.M2 multi-class classification
method with 100 learning cycle and ‘Discriminant’ weak learners was chosen.
3.3.4 Custom Ensemble Methods
Heterogeneity in the decisions of multiple single classification algorithms (base-
classifier) on the same dataset can be utilised to improve classification performance in
physical activity recognition problems [31, 112]. When each base classifier has good
individual performance and also sufficient diversity (due to having different
algorithms), fusion will significantly improve performance. This study employed four
well-known, widely-used supervised learning algorithms of different complexity
(binary decision tree, k nearest neighbour, support vector machine, and artificial neural
network) as base classifiers, which were fused together using three decision-fusion
techniques (weighted majority voting, naïve Bayes, and behaviour knowledge space).
Detailed information related to the implementation of the single classifier models can
be found in Appendix C.
3.3.5 Decision Fusion Techniques
In a N-classifier ensemble, let the classifier set and set of classes are E = {E1, E2,
…, EN) and C = {c1,c2,…,cm) respectively. Each classifier Ei produces a class label li
∈ C, i = 1,…,m without any further information. When classifying an object x, the N
classifier outputs a vector L = [l1, l2, …, lN]. Then, decision fusion techniques combine
the classifier’s output and provide a single class label. The current study evaluated the
performance of three different decision fusion techniques including weighted majority
voting, naïve Bayes combination, and behaviour knowledge space combination [31,
134, 135].
Weighted Majority Voting (WMV). The weighted majority vote is one of the most
widely used decision fusion combiners, often useful when all classifiers in the
ensemble do not have equal performance. This approach measures the individual
accuracy of each classifier on the training data and uses these as weights,
W={w1,w2,…,wN), to give the more competent classifiers more authority in making the
final decision. Then, when predicting for an object x, for all predicted class labels it
calculates the score using following equation 1,
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑘𝑘) = � 𝑤𝑤𝑖𝑖 𝑘𝑘 = 1,2, … ,𝑚𝑚 𝑙𝑙𝑖𝑖=𝑐𝑐𝑘𝑘
(1)
54 Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry
Finally, it selects the class label which has maximum score.
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑙𝑙𝑠𝑠𝑓𝑓 = arg max𝑘𝑘=1𝑚𝑚 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑘𝑘) (2)
Naïve Bayes (NB) Combination. This fusion method assumes the classifiers are
mutually independent. For each classifier Ei, a m×m confusion matrix CMi is
calculated by applying it to the training data set. Let, T is the total number of objects
in training data, where the number of objects in each class is denoted by T1, T2, …, Tm.
During testing for an object x, this method calculates posterior probability for all
predicted class labels using following formula.
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑘𝑘) = 𝑇𝑇𝑘𝑘𝑇𝑇
�𝑠𝑠𝑚𝑚𝑖𝑖(𝑘𝑘, 𝑓𝑓𝑖𝑖)𝑁𝑁
𝑖𝑖=1
𝑘𝑘 = 1,2, … ,𝑚𝑚 (3)
Where 𝑠𝑠𝑚𝑚𝑖𝑖(𝑘𝑘, 𝑓𝑓) is the number of elements of the training dataset whose true
class label was ck and were assigned by Ei to class cl. Finally, naïve Bayes combiner
assigns the class label with maximum score to object x using equation 2.
Behaviour Knowledge Space (BKS) Combination. Unlike most fusion
methods, behaviour knowledge space (BKS) [136] does not require an assumption of
independence of the decisions of individual classifiers. The accuracy of the BKS
combiner is very high when dataset is large, but on small datasets BKS often over-
trains. It creates a knowledge space using a lookup table based on the classification of
training data. The look up table provides information on how often each labelling
combination is produced by the classifiers. When testing for an object x (window of
test data), it looks for a combination of predicted class labels in the look up table and
selects the most frequent true label corresponding to that combination as a final result.
The challenge with this fusion technique comes when the testing data evokes label
combinations that do not appear in the look up table. To address this problem, weighted
majority voting was used for combinations of labels not in the look up table.
3.3.6 Performance Evaluation
The performance was evaluated using leave-one-subject-out (LOSO) cross-
validation [137]. In LOSO, data from one user are used for testing; the other users’
samples are used for training. In this way, samples of each subject are used exactly
once for testing. This study used F1-score [138] to measure the performance of the
ensemble learning methods. The study favoured F1-score over classification accuracy
Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry 55
because unlike accuracy or percentage of agreement, it is not influenced by class
distribution. The F1-score was computed from precision and recall by keeping a
balance between them.
F1 Score =2 x precision x recall
precision + recall x 100 % (4)
Where, precision describes the exactness of a classifier. A lower value of
precision indicates a high false positive rate. Recall or sensitivity is useful to measure
the completeness of classifiers. Low recall indicates a high false negative rate.
56 Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry
3.4 RESULTS
3.4.1 Dataset #1 Results
Table 3.2 reports F1-Scores of the four single classifiers, the custom ensemble,
and the three conventional ensemble methods for all activities in dataset #1. In this
dataset, the custom ensembles using WMV and NB fusion were effective and
outperformed all of the single classifiers, but the conventional ensemble methods
failed to exceed all of the single classifiers. Among the four single classifiers, the
performance of SVM was best.
Table 3.2 Classification results (F1-Score) using wrist acceleration sensor of dataset #1
Conventional Ensembles Individual Classifiers Custom Ensembles
Random Forest
Bagging Decision
Tree
Boosted Decision
Tree BDT KNN SVM ANN WMV
Fusion NB
Fusion BKS
Fusion
Lying 80.18 72.76 89.31 73.28 87.36 92.78 91.22 92.69 91.55 86.4
Sitting 76.92 74.34 79.57 70.94 78.97 85.71 82.39 85.5 85.8 79.4
Standing 87.65 82.08 81.6 76.02 84.91 86.04 85.16 88.89 88.7 87.65
Walking 84.5 86.55 88.96 70.07 76.99 87.45 84.34 87.4 84.38 82.02
Running 100 99.12 99.71 85.25 99.71 96.02 94.15 99.12 99.12 99.12
Cycling 95.68 92.93 93.67 92.89 95.62 95.96 86.4 96.61 96.8 96.12
Ascending Stairs 59.89 58.15 53.61 44.44 39.26 48.87 58.51 58.46 57.39 50.87
Descending Stairs 66.67 75.7 56.2 64.26 68.46 72.88 69.6 76.42 79.84 74.49
Average 81.44 80.2 80.33 72.14 78.91 83.22 81.47 85.64 85.45 82.01
Std Dev. 13.63 12.87 16.93 14.41 18.91 15.75 11.78 13.03 13.00 15.00
3.4.2 Dataset #2 Results
Table 3.3 reports F1-Scores of the four single classifiers, the custom ensemble,
and the three conventional ensemble methods in dataset #2. In this dataset, the random
forest model was the best ensemble classifier overall, with the custom ensemble with
WMV fusion also providing better recognition accuracy than the four single classifiers.
Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry 57
The random forest model provided better recognition accuracy for all physical
activities, with the exception of stationary activities, for which the bagged decision
tree was best. The custom ensemble with WMV fusion performed well for sitting and
standing, comfortable walking, and fast walking, but failed to outperform BDT for
jogging and running. Of the four single classifiers, BDT was the best performer.
Table 3.3 Classification results (F1-Score) using wrist acceleration sensor of dataset #2
Conventional Ensembles Individual Classifiers Custom Ensembles
Random Forest
Bagging Decision
Tree
Boosted Decision
Tree BDT KNN SVM ANN WMV
Fusion NB
Fusion BKS
Fusion
Stationary (sit and stand) 94.52 95.83 93.41 91.35 90.5 92.24 89.55 94.07 92.58 92.37
Comfortable Walking 70.73 67.17 62.78 65.17 57.08 64.52 58.91 68.63 66.15 63.02
Fast Walking 75.64 67.94 70.75 67.75 64.7 74.36 54.2 74.56 72.96 65.77
Jogging 83.52 82.09 74.03 79.21 72.98 69.94 66.92 75.94 75.1 76.4
Running 73.82 72.9 66.07 73.56 67.47 66.67 55.83 71.22 71.14 70.69
Average 79.65 77.18 73.41 75.41 70.54 73.54 65.08 76.88 75.58 73.65
Std Dev. 9.56 12.00 11.98 10.43 12.54 11.09 14.53 10.02 10.06 11.64
3.4.3 Dataset #3 Results
Table 3.4 reports F1-Scores of the four single classifiers, the custom ensemble,
and the three conventional ensemble methods in dataset #3. In this dataset, the custom
ensemble with WMV fusion provided the highest performance, while the random
forest and custom ensemble with NB custom also performed better than the single
classifiers. Compared to the other classifiers, the random forest model exhibited the
best recognition accuracy for lying down, sitting, and walking activities. Of the four
single classifiers, SVM exhibited the highest classification accuracy.
58 Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry
Table 3.4 Classification results (F1-Score) using wrist acceleration sensor of dataset #3
Conventional Ensembles Individual Classifiers Custom Ensembles
Random Forest
Bagging Decision
Tree
Boosted Decision
Tree BDT KNN SVM ANN WMV
Fusion NB
Fusion BKS
Fusion
Lying down 79.46 78.18 74.45 68.17 66.94 76.66 70.97 78.26 75.98 72.13
Sitting+ 92.13 90.04 89.97 85.52 87.09 91.19 89.87 90.63 90.39 86.74
Standing+ 86.69 85.12 82.68 79.34 84.3 86.19 85 87.67 87.83 85.32
Walking 95.39 93.14 93.41 91.95 94.9 94.88 93.14 95.34 95.31 93.5
Running 71.35 64.12 71.48 57.45 66.54 67.16 64.41 73.18 69.93 64.41
Basketball 85.63 84.31 89.38 76.52 87.22 89 89.54 91.16 91.14 86.72
Dance 84.47 79.59 84.58 74.45 78.88 80.9 82.05 85.71 81.74 81.2
Average 85.02 82.07 83.71 76.2 80.84 83.71 82.14 85.99 84.62 81.43
Std Dev 7.95 9.52 8.19 11.28 10.73 9.54 10.67 7.77 9.12 9.94
Classification accuracies and confusion matrices for three datasets can be found
in Appendix D.
3.4.4 Statistical Comparison
The comparative performance of the different ensemble models and the single
classifiers across different folds/subjects were tested for statistical significance using
one-way repeated measures ANOVA. To increase statistical power and enhance the
generalisability of the findings, F1-scores for each hold out subject/fold from all three
datasets were pooled.
Overall, mean F1-scores differed significantly between the ensemble and single
classifier models (Wilks’ Lambda = 0.270, F (9, 24) = 7.204, p < .0001). LSD post hoc
comparisons revealed that the custom ensemble with WMV or NB provided
statistically significant improvements in performance relative to the single classifiers.
The custom ensemble with WMV significantly outperformed the conventional
ensemble models with the exception of the random forest classifier. NB fusion
significantly outperformed Adaboost, but not random forest or bagged decision trees.
The custom ensemble with BKS fusion offered no significant improvements in
Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry 59
performance relative to the conventional ensemble models and, with the exception of
BDT, failed to outperform the single classifiers. Among the conventional ensembles,
the random forest ensemble significantly outperformed the custom ensemble with
BKS and the single classifiers, with the exception of SVM. Bagged decision tree and
Adaboost significantly outperformed BDT, but not SVM, KNN, or ANN.
60 Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry
3.5 DISCUSSION
This study systematically examined the performance accuracy achieved by
several conventional ensembles and a custom ensemble method in three datasets
featuring wrist worn accelerometer data. Across the three datasets, random forest
ensembles and the custom ensemble with weighted majority vote provided
consistently higher classification accuracy than bagged and boosted decision trees, and
with the exception of SVM in dataset #1, significantly outperformed the four single
classifiers. Of the three decision fusion techniques examined, weighted majority vote
provided marginally better performance than Naïve Bayes fusion However, both
weighted majority vote and Naïve Bayes fusion significantly outperformed behaviour
knowledge space fusion.
Our results are consistent with previous studies demonstrating that combining
multiple classifiers with different induction bias provides better PA recognition than
conventional ensemble methods and single model classifiers. Ruch et al. [139] used
majority voting (MV) to combine the decisions of k-nearest neighbour (kNN), normal
density discriminant function (NDDf), and custom decision tree (CDT) classifiers. In
free living conditions, MV provided a maximum 67% classification accuracy when
employing both hip and wrist accelerometers. Most recently, Catal et al. (5) reported
that the combination of three physical activity classifiers (logistic regression, decision
tree, and multi-layer perceptron) using voting provided better performance than a
single model approach.
The poorest performing custom ensemble in our experiment was BKS. The
limitations of BKS are well-documented [140]. It frequently suffers from
generalisation error if the training dataset is not sufficiently large and/or representative.
For example, when the number of classes (m) and classifiers (N) are large, the
combinations of classifier’s outputs for all classes in the look-up table become very
large (mN x m). In this case, if the training dataset is not representative and sufficiently
large to estimate all or most of the combinations of classifiers outputs, BKS fusion can
provide poor performance. BKS fusion also does not perform well when the
combinations of classifier’s outputs are ambiguous i.e., multiple occurrences of the
same combination of classifier outputs correspond to different true labels in the look-
up table and has low confidence/probability on the most representative true-class.
Considering the number of classes (8 in dataset #1, 5 in dataset #2, and 7 in dataset #3)
Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry 61
and four classifiers used in this study, the datasets were small for a reliable BKS fusion.
Also, in our experiment BKS fusion occasionally misclassified test instances due to
the ambiguous cells in the look-up table. To avoid the problems related to ambiguous
cells, some existing papers propose to use a local classifier in the original feature space
associated to ambiguous cells [140] – which has not been investigated in this paper.
Among the conventional ensembles, the random forest algorithm provided
strong classification performance, with F1 Scores ranging from 79.6% to 85% across
three datasets. This finding is consistent with results of recent studies developing and
testing random forest classifiers for use in the exercise and movement sciences. Ellis
and colleagues [99] developed a random forest classifier for recognition of four broad
classes of physical activities (household duties, stair climbing, walking, running) in
healthy adults. Separate classifiers were trained using frequency and time domain
features in accelerometer data collected on the hip and wrist. Using leave-one-subject-
out cross-validation, the average overall accuracy for the hip and wrist classifier was
92.7% and 87.5%, respectively. In a follow-up investigation [24], a 2-step activity
recognition model comprising a random forest classifier and a hidden Markov model
provided a balanced accuracy of 88.1% and 83.6% for the hip and wrist, respectively.
Most recently, Pavey et al. [104] developed a random forest activity classifier for
recognition four activity classes from accelerometer data collected on the wrist.
Recognition accuracy for sedentary, stationary plus, walking, and running was 80.1%,
95.7%, 91.7%, and 93.7%, respectively. When evaluated on 24-hour free-living data,
recognition of stepping events (walking and running) exceeded 90%.
However, it is important to note that random forest classifiers do not perform
well in all testing scenarios. Sasaki et al. [141] used time and frequency domain
features in accelerometer signal collected on the dominant hip, wrist, and ankle to train
random forest physical activity classifiers for older adults (65-85 years). In the leave-
one-subject-out cross-validation of the laboratory-based activity trials, recognition
accuracy for five activity classes (sedentary, standing, household chores, locomotion,
recreational activities) was 87%, 84%, and 89% for the hip, wrist, and ankle models,
respectively. However, when the models were deployed in free-living conditions, the
overall classification accuracy declined significantly to just over 50%.
Although the focus of this study was ensemble learning methods, the strong
performance (F1-score) of the base classifiers is worth noting. Of the four single
62 Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry
classifiers examined, SVM provided the highest averaged F1-score for dataset #1
(83%) and dataset #3 (84%), which is consistent with the results of previous
investigations comparing the performance of different supervised learning algorithms
[28]. BDT on other hand, performed best (75%) for dataset #2, but exhibited the worst
performance of all the base classifiers in datasets #1 and # 3. The superior performance
of BDT in dataset #2, may be explained, at least in part, by the relatively homogeneous
nature of the activities represented in the training data (rest versus walking and running
at different speeds). It may be that ensemble methods are more suitable for more
complex activity recognition problems requiring the detection of more fine-grained
activities. Future research should explore this hypothesis.
Although the ensemble methods consistently achieved better performance
accuracy, the magnitude of improvement over the single model classifiers was
relatively small. This is because the single classifiers were trained with sufficient data
and exhibited relatively high recognition accuracy in their own right. Nevertheless,
when investigating performance on a class-wise basis, notable performance
differences were observed for several activity classes. In dataset #1, the custom
ensemble with WMV fusion improved the recognition of stair climbing to 61%
compared to the best single classifier (ANN 57%). Similarly, for dataset #3,
recognition accuracy for running increased to 73%, where best single classifier (SVM)
provided only 68% accuracy. While the increment in performance afforded by
ensemble methods varied by dataset and activity class, the results confirm the general
principle that ensemble methods work best when the decisions from individual
classifiers are complimentary.
A strength of the current study was the use of three diverse physical activity
datasets, collected on different participant groups (adults and children) performing
different physical activities in different contexts (laboratory-based vs. outdoors). The
examination of three different decision fusion methods to combine four widely used
“state-of-the-art” classification algorithms was an additional strength. There were,
however, some limitations that warrant consideration. First, although the study was
conducted using training data collected in under different conditions, all three datasets
comprised activities that were completed in predetermined sequences. Thus, additional
work is required to evaluate the relative performance of ensemble methods in true free-
living contexts. Second, the activities in the selected datasets were primarily
Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry 63
ambulatory in nature. Only dataset #3 included non-ambulatory lifestyle activities such
as basketball, dance, etc. Future studies should include a more diverse set of physical
activities to recognise using ensemble methods. Third, a simple correlation based
feature selection method was used. The use of a more sophisticated feature selection
algorithm would likely have improved performance. Fourth and finally, our
experiments focused on activity recognition or classification. It should be noted that
ensemble methods can also be used for numerical prediction problems such as
estimating energy expenditure or physical activity intensity.
In summary, the results demonstrate that activity recognition accuracy can be
improved through the implementation of ensemble learning methods. Conventional
ensemble methods such as bagging, boosting, and random forests improve activity
recognition, in most, but not all situations. However, a custom ensemble using weight
majority voting to fuse the decisions of four widely used “state-of-the-art”
classification algorithms consistently outperformed the constituent base classifiers and
most conventional ensemble models. Decision fused ensemble methods thus have
strong potential to improve physical activity recognition from wearable sensors.
64 Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry
3.6 ACKNOWLEDGEMENTS
No funding was received for completion of this project. Trost is a member of the
ActiGraph Scientific Advisory Board. Chowdhury, Tjondronegoro, and Chandran
declare no conflict of interest. The results from the present study do not constitute
endorsement by the American College of Sports Medicine.
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 65
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data
Alok Kumar Chowdhury1
Dian Tjondronegoro2
Vinod Chandran1
Stewart G. Trost3
1. Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia.
2. School of Business and Tourism, Southern Cross University, Gold Coast, Australia.
3. Institute of Health and Biomedical Innovation at QLD Centre for Children’s Health Research, School of Exercise and Nutrition Sciences, Queensland University
of Technology, Brisbane, Australia.
Corresponding Author:
Professor Dian Tjondronegoro
School of Business and Tourism,
Southern Cross University,
Gold Coast, Australia
Phone: +61 415 558 420
Email: [email protected]
66 Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data
QUT Verified Signature
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 67
QUT Verified Signature
68 Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data
4.1 ABSTRACT
This paper proposes the use of posterior-adapted class-based weighted decision
fusion to effectively combine multiple accelerometers data for improving physical
activity recognition. The cutting-edge performance of this method is benchmarked
against model-based weighted fusion and class-based weighted fusion without
posterior adaptation, based on two publicly available datasets, namely PAMAP2 and
MHEALTH. Experimental results show that: (a) posterior-adapted class-based
weighted fusion outperformed model-based and class-based weighted fusion; (b)
decision fusion with two accelerometers showed statistically significant improvement
in average performance compared to the use of a single accelerometer; (c) generally,
decision fusion from 3 accelerometers did not show further improvement from the best
combination of 2 accelerometers, (d) a combination of ankle and wrist located
accelerometers showed the best overall performance compared to any combination of
two or three accelerometers.
Index Terms—Activity Recognition, Accelerometer, Decision Fusion, Class-
Based Weighted Fusion
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 69
4.2 INTRODUCTION
Physical inactivity is a critical health risk factor [122], which triggers the need
for real time physical activity (PA) recognition and quantification of the frequency and
intensity of each PA instances using accelerometer-based motion sensors [26, 131]. A
range of approaches including rule-based (such as threshold/hierarchical), supervised
and unsupervised classification algorithms have been proposed for PA recognition [23,
53, 56, 111, 112]. The choice between using machine learning or rule-based approach
is often determined by the availability of a suitable training set. In the case of data
scarcity, rule based systems are usually used based on the domain knowledge. Most
recent papers in the PA domain suggested the use of supervised machine learning
algorithms [24, 120] as there are usually enough labelled data to train a reliable
machine-learning model. However, previous studies generally used their own datasets,
with no validation of results across variations of datasets, size of datasets and activity
type selections. Therefore, the performance of existing algorithms have been found to
be inconsistent and dependent on the sample used to generate the training data and the
activity targets under investigation [56].
Multiple accelerometers placed at different body locations has been found to be
effective in improving the accuracy of PA recognition and the performance depends
on activity type [26, 27, 30]. Acceleration data from multiple locations can be
combined using feature- or a decision-level fusion approach [31]. Decision-level
fusion has been found to be more accurate than feature fusion in other domains [32];
however, it has not been systematically investigated for PA recognition.
This paper’s key contribution is to propose the use of posterior-adapted class-
based weighted decision fusion. It is novel, as class-based decision fusion has not been
used for PA recognition, while it has been found to perform better than model-based
decision fusion [142]. Moreover, using posterior probability of the test data can further
improve the performance and it has also not been utilised in PA domain. In model-
based fusion, a model is developed for each accelerometer location, then the fusion
assigns a weight for each model based on the overall performance based on its training
data. Such approach is theoretically less robust compared to class-based fusion, which
focuses on the class (i.e., activity) wise performance of the models. Posterior-
adaptation means that the class-based weights are dynamically adjusted using the
70 Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data
confidence scores from each classification model based on real observations (i.e., test
data).
Aside from finding the most effective fusion technique, another challenge is to
determine the best combination of sensor placements for optimal PA recognition.
Therefore, our experiments have investigated how decision-level fusion can optimally
combine multiple classification models, where each model is trained using the
accelerometer data obtained from ankle, chest and wrist respectively. The robustness
of our proposed method has been tested against two publicly available datasets and
benchmarked with model-based and class-based weighted decision fusion techniques.
To sum up, the novelty of this paper is proposing the use of posterior-adapted class-
based weighted decision fusion to effectively combine multiple accelerometers data
for improving physical activity recognition.
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 71
4.3 RELATED WORK
PA recognition accuracy has been found to be dependent on the accelerometer
locations and types of PAs. For example, Atallah, et al. [143] used k nearest neighbour
(KNN) classifier and Bayesian classifier and found that the wrist location was good
for recognising very low-intensity-level and medium-intensity-level activities. For
low-intensity-level and transition activities, the waist location was the best. However,
the authors did not combine data from the accelerometers to find the optimal
combinations.
Some studies compared the performance of classifiers trained on data from the
combination of different accelerometer locations. Bao et al. [90] used feature fusion
on the accelerometer data collected from the upper-arm, lower-arm, hip, thigh, and
ankle and then applied several learning algorithms. They found a decision tree to be
the best performer (84%) when all sensors were fused, while the combination of thigh
and wrist accelerometer provided 3% less accuracy. However, the authors did not
investigate all possible accelerometer location combinations or the effect of
accelerometer location on recognising different PA types. A comprehensive study by
Cleland et al [28] used feature fusion and compared the performance of support vector
machine classifiers trained on accelerometer data from six body locations (lower back,
wrist, foot, chest, hip, and thigh) and their combinations. Compared to a single
accelerometer, combining data from any two locations resulted in a significant
improvement in performance. However, combining data from three or more
accelerometers provided no further improvements in performance. Kern, et al. [144],
[145] and [146] also reported significant improvements in recognition performance
when combining two or more accelerometer locations. Notably, all of the
aforementioned studies used feature fusion, which is more prone to noisy and
redundant data compared to decision fusion approach [31]. These studies did not use
multiple public datasets and consider the varying performance of single accelerometers
for different activities when combining different accelerometer positions.
There are existing studies in pattern recognition that investigated the best
classifier combination for decision fusion, such as using a diversity measure analysis
[147]. In activity recognition, the commonly used decision fusion rules include
majority voting, summation, hierarchical fusion and Bayesian fusion [148]. Banos, et
al. [149] proposed hierarchical-weighted decision fusion by combining the advantages
72 Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data
of the hierarchical decision and majority voting models which utilised class-level
classifiers and sensor-level classifier for making decisions. While several weighted
fusion techniques (classifier, class and sample-based) were compared empirically in
[142], class-based fusion seems to be more suitable for accelerometer fusion in PA
recognition due to the variation in the class-wise performance of different placement
of accelerometers. However, class-based fusion is yet to be fully investigated in the
PA domain. Moreover, the approach for calculating the weights in class-based fusion
needs improvement as it uses training errors to evaluate testing reliability. Adaptation
of class-based weights using the posterior probability of the test instances should
further improve the fusion techniques. Zhang and Zhang [150] showed that adjusting
the probabilities derived from the training output confusion matrix using the decision
reliability can improve the decision-making accuracy. However, they did not use class-
based weights, and did not apply their fusion algorithm for the PA recognition.
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 73
4.4 METHODS
The framework comprises pre-processing, feature extraction, normalisation,
feature selection, and classification. These steps were simultaneously applied to data
from each accelerometer location (e.g., ankle, chest, and wrist), resulting in activity
candidates. The final decision (i.e., which activity is the most likely) was achieved by
applying a posterior-adapted class-based weighted decision fusion. Each step will be
described in this section.
4.4.1 Pre-processing
Each of the 3-axis (x, y, z) accelerometer data was converted to a time-series
data structure. A linear interpolation method was used to impute missing data in the
middle of a labelled activity sequence. The missing values at the end of each labelled
activity sequence were replaced by the previous value.
4.4.2 Feature Extraction
For each of the 3-axis accelerometer data, a set of 45 features (in time- and
frequency- domain) was extracted from a 2-second sliding window without
overlapping. Short windows (interval 1–2 second) were used, as it has been shown to
demonstrate the best trade-off between accuracy and speed in PA recognition [89].
Specifically, 2-second window was empirically set as it was capable of capturing the
periodic movements for the selected PA classes.
Table 4.1 lists the extracted features from each window. These features were
combined from the features extracted in previous PA recognition studies [44, 51, 90,
118, 132].
4.4.3 Normalisation & Feature Selection
Normalisation is required to limit feature values within a range, and in this case
we set the range to zero mean and unit variance using linear methods. For example, a
feature x can be normalised using following formula.
𝑥𝑥�𝑖𝑖 = 𝑥𝑥𝑖𝑖−𝑥𝑥𝜎𝜎
(1)
Where, 𝑥𝑥 and 𝜎𝜎 are mean and standard deviation respectively.
Use of unnecessary features may lead to over-fitting, low performance and
computational load [151]. Therefore, instead of adopting all the 45 features for
74 Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data
classification, correlation-based feature selection method was adopted to select the
most useful features. This feature selection method is fast, simple, and found to be
useful in previous studies [133]. In this study, the training data was used to compute
the correlations between each labelled activity and feature. Features that have a
correlation of 0.25 or greater (threshold was suggested in [152]) were selected for
training and testing the classifiers.
Table 4.1 List of features extracted from each window of an accelerometer
No Features Feature Count
1 Mean for each axis of a 3-axis accelerometer 3 2 Standard deviation for each axis of accelerometer 3 3 Minimum value for each axis 3 4 Maximum value for each axis 3 5 Variance for each axis 3 6 Median value for each axis 3 7 Skewness for each axis 3 8 Kurtosis for each axis 3 9 Energy for each axis 3 10 Cross-correlation of accelerometer axis 3 11 Principal frequency for each axis 3 12 Magnitude of principal frequency for each axis 3 13 Median crossing for each axis 3 14 25th percentile for each axis 3 15 75th percentile for each axis 3
Total number of features extracted 45
4.4.4 Classification Algorithms
In order to find the best classification approach, several state-of-the-art machine
learning methods were initially benchmarked, including binary decision tree (BDT),
support vector machine (SVM), and deep neural network (DNN), random forest (RF)
and Adaboost.
In our implementation, the maximum decision split for BDT was set to 20. The
DNN used two auto-encoders to convert inputs into 35 and 20 deep features
respectively. A softmax layer was trained using 20 deep features for activity
classification. To reduce overfitting, L2-weight regularisation (value set to 0.001) was
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 75
added to train criterion. In RF, random subset of predictors for each decision split was
equal to the square root of the total number of available features. For the Adaboost.M2
algorithm, a multi-class classification method with 100 learning cycle was
implemented.
Based on the experimental results (see section 4.6.1 for details), SVM was
selected as the best classification algorithm, as it showed the highest classification
accuracy compared to other classification algorithms, although the difference was
marginal.
4.4.5 Decision Fusion Techniques
Let’s consider, the fusion of decisions from 𝑓𝑓 models for a 𝑚𝑚-class problem. The
sets of models and classes can be presented as M = {M1, M2… Mn} and C = {C1, C2…
Cm}. When classifying a test instance (x), each model provides a predicted class label
along with a posterior probability of the predicted label, which is a measure of the
confidence of the decision from that model for that test instance. Let the predicted
vector for that instance be V(x) = {V1(x), V2(x)… Vn(x)} where each Vi(x) 𝜖𝜖 C, and the
posterior probabilities be W2(x) = {W21(x), W22(x) … W2n(x)}. A decision fusion
technique provides a final prediction for x by combining individual predictions {V(x)}.
A model-based weighted voting assigns a weight to each model/classifier based
on the overall performance of that model on the training data irrespective of classes
[31, 142]. This weight is independent of its predicted class. In the fusion step, a
weighted majority is used to decide the final predicted class. In contrast, a class-based
weighted decision fusion assigns weights to all classes based on the prior knowledge
of the model’s prediction performance for the different classes [142]. In the fusion
step, a weighted majority is again used to decide the final predicted class but the
weights are now different and the majority class may be different. This study proposes
posterior-adapted class-based fusion, which adjusts the class-based weights for each
test instance using the posterior probability of the model on the prediction. The steps
to achieve these weighted decision fusion schemes are described below.
Weight calculation – Using 10-fold cross validation on the training data, both
predicted training classes and true training classes are compared and the F1-scores for
all classes are computed. F1-scores indicate the model’s confidence for each class
based on the training data, which are used as class-based weights. The 10-fold
76 Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data
validation allows reliable calculation of expected class-wise performance on unseen
data and avoid overfitting.
Let the class-based weights for models be W1 = {W11, W12 … W1n}, where W1i is
a collection of weights for all (m) classes for the ith model, i.e., {w1i1, w1i2 … w1im}. For
model-based fusion, a weight for each model is calculated by taking average of class-
based weights W1i.
𝑊𝑊𝑎𝑎𝑎𝑎𝑎𝑎 𝑖𝑖 = 𝑊𝑊1𝚤𝚤����� 1 ≤ 𝑓𝑓 ≤ 𝑓𝑓 (2)
Weight adjustment – For each test instance, the class-based weights are
adjusted using the posterior probability of the predicted label. Let the adjusted class-
based weights for the given test instance are Wi(x) = {wi1, wi2 … wim} 1≤ i ≤ n. At first,
the adjusted class-based weights are initialised to the class-based weights.
𝑊𝑊𝑖𝑖(𝑥𝑥) = 𝑊𝑊1𝑖𝑖 1 ≤ 𝑓𝑓 ≤ 𝑓𝑓 (3)
Then, weights are adjusted by the posterior probability using the following
equation.
𝑤𝑤𝑖𝑖𝑘𝑘 = (𝛼𝛼 ∗ 𝑤𝑤𝑖𝑖𝑘𝑘 + (1 − 𝛼𝛼) ∗ 𝑊𝑊2𝑖𝑖)𝑉𝑉𝑖𝑖=𝐶𝐶𝑘𝑘
1 ≤ 𝑘𝑘 ≤ 𝑚𝑚, 1 ≤ 𝑓𝑓 ≤ 𝑓𝑓 (4)
Here, α is a weight adjustment parameter, within 0 to 1, that requires tuning.
Model-based Fusion – This fusion scheme takes a weight (𝑊𝑊avg i) for each
model and current prediction vector {V(x)} to make a final prediction for a test instance
(x). It computes the score for each of the predicted label by summing up the
corresponding model’s weight using equation (8).
𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑉𝑉𝑖𝑖(𝑥𝑥) = ∑𝑤𝑤𝑎𝑎𝑎𝑎𝑎𝑎 𝑖𝑖 1 ≤ 𝑓𝑓 ≤ 𝑓𝑓 (5)
Then it selects that predicted label {Vi(x)} as final decision, which has the highest
score.
Class-based Fusion – This fusion scheme considers the class-based weights
W1(x) and current prediction vector {V(x)} to make a final prediction for a test instance
(x). It calculates score for each class using following formula.
𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑘𝑘 = ∑ 𝑤𝑤1𝑖𝑖𝑘𝑘 1 ≤ 𝑘𝑘 ≤ 𝑚𝑚, 1 ≤ 𝑓𝑓 ≤ 𝑓𝑓𝑉𝑉𝑖𝑖(𝑥𝑥)=𝐶𝐶𝑘𝑘 (6)
Finally, it selects the class label as final prediction, which has maximum score
using equation (7).
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 77
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑙𝑙𝑎𝑎𝑙𝑙𝑙𝑙𝑙𝑙 = 𝐶𝐶𝑎𝑎𝑎𝑎𝑎𝑎𝑚𝑚𝑎𝑎𝑥𝑥𝑘𝑘=1𝑚𝑚 𝑆𝑆𝑐𝑐𝑆𝑆𝑎𝑎𝑙𝑙𝑘𝑘 (7)
Posterior-adapted Class-based Fusion – This fusion scheme is similar to class-
based fusion, but it used adjusted class-based weights Wi(x). It calculates score for each
class using the following formula.
𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑘𝑘 = ∑ 𝑤𝑤𝑖𝑖𝑘𝑘 1 ≤ 𝑘𝑘 ≤ 𝑚𝑚, 1 ≤ 𝑓𝑓 ≤ 𝑓𝑓𝑉𝑉𝑖𝑖(𝑥𝑥)=𝐶𝐶𝑘𝑘 (8)
Finally, it selects the class label as final prediction, which has maximum score
using equation (7).
78 Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data
4.5 EXPERIMENT
4.5.1 Datasets
Two publicly available PA monitoring datasets were chosen for the study, as
they both have accelerometer sensors data from three body positions (ankle, chest and
wrist). These data had been shown by previous studies [87, 131, 153, 154] to be
effective for machine learning purposes, which confirms that there was enough data to
train the machine learning models.
The PAMAP2 Dataset includes data from nine participants (1 female, 8 male),
with age and body mass index (BMI) of 27.2 ± 3.3 years and 25.1 ± 2.6 kg/m2
respectively. Participants wore three Colibri wireless IMUs on their dominant-side
wrist, ankle, and chest, when performing physical activities including lying down,
sitting, standing, walking, running, cycling, Nordic walking, ascending stairs,
descending stairs, vacuum cleaning, ironing clothes and jumping rope. Each sensor
contains two three-dimensional (3D) acceleration sensor (scale: ±6g and ±16g) with a
resolution of 13 bits, a gyroscope sensor, a magnetometer sensor, temperature,
orientation and heart rate monitor sensors. The sampling rate of recorded acceleration
data is 100 Hz. Further details of the study protocol can be found in [87, 131].
The MHEALTH Dataset includes data from ten participants, in an out-of-lab
environment, while performing twelve physical activities. The physical activities
include: standing still (1 min), sitting and relaxing (1 min), lying down (1 min),
walking (1 min), climbing stairs (1 min), waist bends forward (20x), frontal elevation
of arms (20x), knees bending (crouching) (20x), cycling (1 min), jogging (1 min),
running (1 min), and jumping front & back (20x). During the data collection,
Shimmer2 (Shimmer 2R, Real-time Technologies, Dublin, Ireland) wearable sensors
were attached to the subject’s chest, right wrist and left ankle. These sensors monitor
3D acceleration data (±6g) from chest, ankle, & wrist, electrocardiography (ECG)
signal, 3D gyroscope data from ankle, & wrist, and 3D magnetometer data from ankle,
& wrist. The sampling rate of recorded data is 50Hz. Further details on the data
collection can be found in [153, 154].
Both datasets are fully labelled with each raw acceleration signal annotated
based on the performed activity. For the purpose of this study, a subset data was
extracted from both datasets. For PAMAP2, the selected activity classes were lying
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 79
down, sitting, standing, walking, running, cycling, ascending stairs, and descending
stairs. For MHEALTH dataset, lying down, sitting and relaxing, standing still,
walking, running, cycling, climbing stairs, and jogging activities were chosen for
analysis.
4.5.2 Implementation of the Framework
Figure 4.1 shows how the framework had been implemented. For each
accelerometer location data (ankle, chest, and wrist), SVM classifier was applied in
the four-phase processes: (1) training phase, where the classifier was trained using
training data; (2) weight calculation phase, where the classifier was evaluated using
10-fold cross-validation of the training data and the resultant class-based weights and
average weights were assigned, (3) individual model decision phase, where the
classification model (trained in phase 1) was applied to a new/testing data and
predicted label and its posterior probability, (4) class-based weight adjustment
phase, where class-based weights from training data (output of phase 2) were adjusted
using the posterior probability of the predicted label (output of phase 3), called
adjusted class-based weights.
Finally, a decision fusion phase (figure 4.2) combined the decisions from each
individual sensor location using posterior-adapted class-based weighted decision
fusion, and also using model-based, class-based decision fusion techniques for
benchmarking purposes.
80 Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data
Figure 4.1 Overview of the system developed for implementing the framework
Figure 4.2 For a given test instance (x), predicting the final label by fusing the decisions from accelerometer sensors using weights
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 81
4.5.3 Evaluation Approach and Metrics
Leave-one-subject-out cross-validation was used to evaluate and compare the
classification models. This evaluation uses one subject’s data for testing and remaining
subject’s data for training to conduct a subject-independent evaluation. Thus, all
subject’s data are considered once for testing (as suggested in [137]). In a real-world
context, it is desirable for an activity recognition system to perform well for a new
subject.
The performance of each classifier was evaluated by calculating precision, recall
and F1-score. For each class, predictions were compared to ground truth labels and the
number of true-positives (TP), true-negatives (TN), false-positives (FP), and false-
negatives (FN) were calculated. Precision measures the exactness of a classifier while
recall can measure the completeness of classifiers. These can be calculated for a
particular class using the following equations.
𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇
(9)
𝑅𝑅𝑠𝑠𝑠𝑠𝑓𝑓𝑓𝑓𝑓𝑓 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇+𝐹𝐹𝑁𝑁
(10)
The F1-score is a balanced combination of both precision and recall can be
measured using the following formula.
𝐹𝐹1 − 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 2 𝑥𝑥 𝑇𝑇𝑎𝑎𝑙𝑙𝑐𝑐𝑖𝑖𝑃𝑃𝑖𝑖𝑆𝑆𝑃𝑃 𝑥𝑥 𝑅𝑅𝑙𝑙𝑐𝑐𝑎𝑎𝑙𝑙𝑙𝑙𝑇𝑇𝑎𝑎𝑙𝑙𝑐𝑐𝑖𝑖𝑃𝑃𝑖𝑖𝑆𝑆𝑃𝑃+𝑅𝑅𝑙𝑙𝑐𝑐𝑎𝑎𝑙𝑙𝑙𝑙
𝑥𝑥 100 % (11)
The predicted classes for each subject were combined and a confusion matrix
was derived from the complete set. Then using the confusion matrix, F1-scores for all
activity classes were computed to get an insight into the model’s performance for each
class. Let the number of subjects and classes are n and m respectively. The classes can
be presented as C = {C1, C2… Cm}. Given that, true and predicted classes are {T1, T2
… Tn} and {P1, P2 … Pn} respectively. Where, Ti and Pi are true and predicted classes
for ith subject and Ti 𝜖𝜖 C, Pi 𝜖𝜖 C. The F1-Scores were calculated using the following
steps.
Step 1: 𝑃𝑃 = ⋃ 𝑃𝑃𝑖𝑖𝑃𝑃𝑖𝑖=1 ; 𝑇𝑇 = ⋃ 𝑇𝑇𝑖𝑖𝑃𝑃
𝑖𝑖=1
Step 2: 𝐹𝐹1𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝐶𝐶𝐶𝐶𝐶𝐶𝑆𝑆𝑆𝑆_𝑊𝑊𝑊𝑊𝑆𝑆𝑊𝑊(𝑇𝑇,𝑃𝑃) = {𝐹𝐹𝑆𝑆1,𝐹𝐹𝑆𝑆2, …𝐹𝐹𝑆𝑆𝑚𝑚}
Here, 𝐹𝐹𝑆𝑆𝑘𝑘 is the overall F1-Score of kth class across subjects.
82 Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data
4.6 RESULTS AND DISCUSSION
4.6.1 Evaluation of Classification Algorithms
Table 4.2 shows the F1-scores of machine learning algorithms using both
PAMAP2 and MHEALTH datasets. The results were not conclusive in terms of
deciding the best classification approach. Both RF and SVM consistently showed
better performance for all three accelerometer locations across both datasets. However,
for the remaining analyses, this study adopted SVM, as it gave the highest F1-score
(82.32%) when averaged over all placement locations and both datasets, which is
consistent with previous work [28].
Table 4.2 Average F1-scores for each classification model across both datasets
Classifier PAMAP2 MHEALTH Average
Across Both Datasets Ankle Chest Wrist Ankle Chest Wrist
SVM 84.72 81.00 80.86 83.64 79.62 84.10 82.32
RF 84.88 77.57 77.91 83.55 83.16 86.35 82.24
BDT 77.56 71.95 74.42 79.36 80.49 87.22 78.50
DNN 78.17 79.38 76.12 87.30 77.87 86.88 80.95
Adaboost 79.68 79.02 76.79 86.08 82.99 86.68 81.87
4.6.2 Evaluation of Different Fusion Techniques
Figures 4.3 and 4.4 show the average classification performances of model-
based, class-based and posterior-adapted class-based decision fusion across different
accelerometer location combinations for the PAMAP2 and MHEALTH datasets
respectively. Error bars in both figures present 95% confidence interval (CI). The
weight adjustment parameter (α) in posterior-adapted class-based weighted fusion was
set to 0, 0.25, 0.5, 0.75 and 1. An α = 0.5 adjusts the weights by taking the average of
the class-based weights and posterior probabilities and provided the best performance
for the optimal accelerometer combination (Ankle + Wrist). Hence, the results reported
in this paper used α = 0.5.
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 83
Figure 4.3 Average F1-Score comparison for model-based, class-based and posterior-adapted class-
based decision fusion with the PAMAP2 dataset
Figure 4.4 Average F1-Score comparison for model-based, class-based and posterior-adapted class-based decision fusion with the MHEALTH dataset
In both datasets, the posterior-adapted class-based weighted fusion consistently
provided the best average F1-Scores for all accelerometer combinations compared to
that obtained using either model-based or class-based weighted fusion. While the
performance of model-based fusion was poor for most two accelerometer
combinations, the class-based fusion performed well in most situations (F1-Scores
were higher than model-based but lower than posterior-adapted class-based).
84 Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data
With PAMAP2, the posterior-adapted class-based weighted fusion provided
statistically significant improvement in performance compared to model-based fusion
for all two accelerometer combinations, but not A+C+W. With MHEALTH, the
posterior-adapted class-based weighted fusion provided statistically significant
improvements in performance compared to model-based fusion for A+W and C+W,
but not A+C or A+C+W. Across all accelerometer configurations, posterior-adapted
class-based weighted fusion consistently provided higher classification accuracy than
class weighted decision fusion; however, there were no statistically significant
differences in average F1-Scores.
4.6.3 Activity-Wise Classification Performance
Tables 4.3 and 4.4 report the class/activity-wise and average F1-scores for single
location classifiers and all possible combinations of accelerometer locations (using
posterior-adapted class-based weighted fusion) for PAMAP2 and MHEALTH,
respectively.
Table 4.3 F1-scores for single and all possible combinations of accelerometer sensors in PAMAP2
dataset
Posterior-adapted class-based weighted
fusion
Ankle Chest Wrist A+C A+W C+W A+C+W
Lying 96.88 98.36 90.81 97.25 95.31 95.41 97.06
Sitting 61.65 70.02 82.56 70.00 85.64 85.08 84.24
Standing 72.31 70.55 86.31 72.83 88.98 87.68 87.32
Walking 97.12 82.80 85.70 96.68 96.41 88.65 96.71
Running 89.61 93.41 98.14 98.49 99.19 99.19 99.31
Cycling 94.93 86.27 95.15 96.47 97.58 91.41 98.36
Asc. Stairs 85.00 68.64 44.22 88.01 84.44 71.23 85.88
Desc. Stairs 80.25 77.96 64.01 89.26 88.92 80.06 89.67
Mean 84.72 81.00 80.86 88.62 92.06 87.34 92.32
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 85
Of the single location models, classifiers trained on ankle data performed best
across both datasets. However, classification accuracy for each PA class varied with
accelerometer location. In PAMAP2, the ankle was the best location for walking,
ascending stairs, and descending stairs, while the wrist location was best for sitting,
standing, running and cycling. The chest location was only best for lying down. In
MHEALTH, the ankle location was best for lying down, walking, cycling, and
climbing stairs, while the wrist location was best for sitting and standing. The chest
location was best for running, and jogging.
Table 4.4 F1-scores for single and all possible combinations of accelerometer sensors in MHEALTH
dataset
Posterior-adapted class-based weighted
fusion
Ankle Chest Wrist A+C A+W C+W A+C+W
Lying 94.74 89.09 90.85 100.0 96.49 100.0 100.0
Sitting 45.00 44.41 79.23 26.12 81.65 84.93 67.67
Standing 61.02 60.78 86.59 56.54 85.26 86.59 75.95
Walking 95.40 85.94 83.59 97.58 98.70 89.11 94.08
Running 88.62 88.70 76.14 88.78 87.35 86.90 87.71
Cycling 99.36 93.98 94.92 97.32 96.56 99.84 100.0
C. Stairs 98.24 85.58 84.92 98.08 98.72 89.45 94.47
Jogging 86.78 88.47 76.56 89.20 88.41 87.60 88.82
Mean 83.64 79.62 84.10 81.70 91.64 90.55 88.59
Fusion of multiple accelerometer locations using the posterior-adapted class-
based decision fusion showed notable improvements in performance compared to the
single location models. In PAMAP2, classification performances for the fusion of
ankle and wrist accelerometers (A+W) and all three accelerometers (A+C+W) were
similar and best among all the combinations. Chest with wrist (C+W) and ankle with
chest (A+C) accelerometer locations also exhibited superior performance to that
86 Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data
observed for any single location model. In MHEALTH, the best fusion performance
was obtained for A+W (91.6%) and C+W (90.6%), with A+C+W also provided
outstanding classification performance (88.6%). All combinations except A+C
exceeded the performance of any single location model.
4.6.4 Subject-Wise Classification Performance
Performance differences across different subjects were tested for statistical
significance using one-way repeated measures ANOVA. To achieve better statistical
confidence, F1-scores for each hold out subject in both datasets were pooled. The
results are shown in figure 4.5.
Figure 4.5 Average F1-Scores of all single and possible accelerometer combinations across different subjects. Error bars represent 95% confidence intervals. (*) indicates statistical significance (p < 0.05)
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 87
Overall, mean F1-scores differed significantly between the combinations of
accelerometer locations (Wilks’ Lambda = 0.140, F (6, 12) = 12.269, p < 0.001). Least
significant difference (LSD) post hoc tests revealed a significant improvement in
performance when fusing the predictions of two or three accelerometers. All
accelerometer combinations except the combination of ankle and chest (A+C)
significantly outperformed all single sensor locations. A+W and A+C+W provided the
highest average F1-scores across different subjects, but there were not any significant
statistical differences between A+W, C+W, and A+C+W.
4.6.5 Confusion Matrices
Table 4.5 and 4.6 show the confusion matrices of the best-performing
accelerometer combination, i.e., combination of ankle and wrist (A+W), using
Posterior-adapted class-based weighted fusion for PAMAP and MHEALTH datasets,
respectively.
Table 4.5 Confusion matrix for ankle and wrist combination (A+W) in PAMAP2 dataset
1 2 3 4 5 6 7 8
1 Lying 884 0 1 1 0 0 0 1
2 Sitting 63 701 76 0 0 7 1 2
3 Standing 21 81 763 2 0 3 0 2
4 Walking 0 0 0 1101 0 0 12 5
5 Running 0 0 0 2 430 0 0 3
6 Cycling 0 4 3 1 0 746 1 1
7 Asc. Stairs 0 1 0 49 1 11 342 29
8 Desc. Stairs 0 0 0 10 1 6 21 325
88 Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data
Table 4.6 Confusion matrix for ankle and wrist combination (A+W) in MHEALTH dataset
1 2 3 4 5 6 7 8
1 Lying 289 0 0 0 0 21 0 0
2 Sitting 0 227 83 0 0 0 0 0
3 Standing 0 18 292 0 0 0 0 0
4 Walking 0 0 0 303 0 0 7 0
5 Running 0 0 0 0 259 0 0 51
6 Cycling 0 1 0 0 0 309 0 0
7 C. Stairs 0 0 0 1 0 0 309 0
8 Jogging 0 0 0 0 24 0 0 286
In both datasets, most misclassifications occurred between similar activity
instances, such as misclassification between sitting and standing, and walking and
ascending stairs. Most running activities were correctly classified in PAMAP2 dataset,
but in MHEALTH, they were misclassified as jogging. Descending stairs was
misclassified mostly as ascending stairs.
Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 89
4.7 CONCLUSION
This paper presents a study that investigates the use of multiple accelerometers,
placed at three body locations (ankle, chest, and wrist), to effectively identify physical
activities. Evaluation was based on two publicly available datasets, namely, PAMAP2
and MHEALTH. The SVM was selected for further analysis as it gave the highest
average performance across both datasets. Classification performance depended on
both the accelerometer location and activity type. Classifiers trained on ankle data
provided the best average performance over all activities. Combinations of classifiers
trained on accelerometer data from different locations may improve performance and
this was investigated further with model based, class-based and our proposed
posterior-adapted class-based weighted decision fusion.
PA recognition using posterior-adapted class-based weighted fusion of multiple
accelerometers provided significant improvements in performance in both datasets. Its
performance was also found to be better than that observed for model-based, and class-
based fusion for all accelerometer combinations. It is consistent with the notion that
the combination of ankle and wrist (A+W) accelerometers can capture upper and lower
body movements; therefore, can yield significantly higher performance than other
combinations. Relative to the two-accelerometer combinations, the addition of the
chest location (A+W+C) did not improve PA recognition. Thus, more sensor data does
not always result in performance improvements for PA recognition. Considering that
chest-mounted accelerometers can be uncomfortable for everyday use; this finding is
valuable to motivate future use of ankle and wrist accelerometers for longer-term
monitoring of PAs.
A limitation of this paper is that, it uses datasets with only ankle, wrist and chest
positioned accelerometers in a controlled setup, and hence overlooks other
accelerometer locations such as thigh, hip, etc. For future studies, the proposed
framework should be tested by investigating more accelerometer locations, adding
more PA classes - especially those that are harder to distinguish, and increase the
number of participants to ensure that the findings are generalisable to a wide range of
end users. Further test of the proposed method should be done using more PA datasets
acquired in different environments to fully study the limitations. This paper has
contributed to better understanding of performance improvement with decision fusion
in physical activity recognition using multiple accelerometers.
PART II - Estimation of Impacts of Physical Activities
Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data 93
Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data
Alok Kumar Chowdhury1
Dian Tjondronegoro2
Jinglan Zhang1
Puspa Setia Pratiwi1
Stewart G. Trost3
1. Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia.
2. School of Business and Tourism, Southern Cross University, Gold Coast, Australia.
3. Institute of Health and Biomedical Innovation at QLD Centre for Children’s Health Research, School of Exercise and Nutrition Sciences, Queensland University
of Technology, Brisbane, Australia.
Corresponding Author:
Alok Kumar Chowdhury
Science and Engineering Faculty,
Queensland University of Technology,
Brisbane, Australia
Phone: +61 420 467 077
Email: [email protected]
94 Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data
QUT Verified Signature
Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data 95
QUT Verified Signature
QUT Verified Signature
96 Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data
5.1 ABSTRACT
This paper explored a non-laboratory approach to effectively predict relative
physical activity intensities using regression algorithms on multimodal physiological
data. 22 participants completed 5 to 7 physical activity sessions where each session
consisted of 5 activity trials ranging from sedentary to vigorous. During the trials,
participant’s heart rate (HR), r-r interval (RR), electrodermal activity (Eda), and body
temperature (Temp) were recorded using wearable sensors. Immediately after each
trial, participants provided their rating of perceived effort (RPE) using the 6-20 Borg
scale. This work used both person-level features and features extracted from each of
the sensor modality; followed by a feature selection step. Then, using leave-one-
subject-out cross-validation, two regression algorithms including linear regression,
and support vector machine regression were applied separately on each modality
features and all possible modality features combinations. The results showed that both
regression algorithms produced similar accuracy. In terms of the usefulness of a single
modality, features extracted from RR provided highest prediction performance
compared to any other single modality. However, combination of Eda and Temp
features fused with RR features produced the best overall performance, confirming the
benefits of using multi-modal data.
Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data 97
5.2 INTRODUCTION
With the advent of mobile and wearable sensors, an opportunity has emerged for
healthcare providers and researchers to empower people to take care of their wellness
by providing them with timely and personalised support. Sensors are increasingly
being linked to computing technologies, such as websites and smartphones, to process
the data and provide unique opportunities for the delivery of personalised and adaptive
interventions to physical activities (PA) [155-158]. These technologies can be used to
collect real-time response of users efficiently and unobtrusively by tracking the
frequency, intensity, and duration of physical activity. Such features enable users to
record, view and share PA status with their health practitioners.
Extensive PA intervention research in [159] demonstrated an opportunity to
provide a personalised behaviour treatment using adaptive goal and feedback to
increase individual’s level of PA performance. However, there is a risk to cause
individuals to perform exercise at a level that is neither safe nor effective; because
adapting PA intervention to the individual’s PA performance capacity is still a
challenging task due to the individual fitness level that affects the biomechanical,
physiological, and psychological responses associated with PA. Therefore, a system
needs to consider individual’s aerobic fitness, age or health status to produce a more
accurate PA level recommendation.
PA intensity is one of the crucial PA measurement parameters which can be
defined in either relative or absolute terms. Absolute intensity considers the external
workloads for a particular PA, usually refers to the energy cost of a specific activity
expressed as multiples of resting metabolism or Metabolic Equivalents (METs). The
relative PA intensity, on the other hand, personalise PA intensities based on the
person’s fitness or capacity. In relative terms, moderate intensity physical activity is
typically defined as 40% to 60% of VO2 reserve or an RPE of 12 – 14 [37, 160]. It
means, to achieve moderate intensity physical activity based on absolute intensity,
individuals with a lower aerobic capacity are required to work at a significantly higher
relative intensity [70]. Thus, a significant proportion of individuals with limited
aerobic capacity are erroneously misclassified as not meeting PA guidelines.
To date, research efforts to quantify PA and their intensities are mostly based on
the accelerometer sensors [161-163]. Accelerometers are only able to capture external
98 Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data
workloads, therefore can be used for calculating absolute intensity [70, 158, 163]. In
order to determine relative PA intensity, operationalising intensity as a percentage of
maximal oxygen uptake is considered the gold standard, but this is not feasible in most
situations because its measurement needs sophisticated instruments and lab-based
individual calibration. Self-rated perceived exertion scales, e.g., Borg’s rating of
perceived exertion (RPE) for adults [35] and OMNI perceived exertion scale for
children [164], are widely used and valid indicators of relative physical activity
intensity. However, they are not usable in automated scenarios as they need manual
involvements to enter data.
Using sensors, manual entries can be avoided or reduced. Relative intensity can
be measured using heartrate, such as percentage of HR reserve (% HRR) or percentage
of HR max (% HR max) [71, 72]. While heartrate based methods are more objective
and suitable for predicting moderate to vigorous relative intensities, they are not
effective for low relative intensities [39]. Moreover, these approaches require
knowledge of HR max for which commonly used age-related prediction equations are
subject to considerable measurement error [40, 41]. In addition to heartrate, some other
modalities of physiological data, including electrodermal activity (Eda) and body
temperature (Temp), can be easily obtained using wearable sensors. These
physiological indicators can provide valuable information about the metabolic demand
of exercise, and can also be used to predict relative PA intensity. However, to the best
of our knowledge, the use of multiple modalities of physiological data for relative
intensity prediction has not been previously investigated.
This paper presents a study to effectively predict the relative intensity using
multimodal physiological sensor data (heart-rate, rr-interval, Eda and Temp), and
applied two regression algorithms (linear regression and support vector machine
regression) to explore all combinations of the sensor data. Our experiments were based
on a real-world (non-laboratory) dataset, collected from 22 people, where Borg’s RPE
scale was used as a measure of relative intensity. The key contribution of this paper is
to identify: 1) the best single modality feature and, 2) the best combination of modality
features for predicting PA relative intensity.
Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data 99
5.3 DATASET COLLECTION
This study recruited 22 adults (mean age = 29.8 ± 3.2 yrs; BMI = 25.3 ± 2.6;
male = 77.3%) to perform 5 to 7 sessions of PA trials. To be eligible for the study,
participants needed to be between 18 to 40 years of age and sufficiently healthy to
perform PA by completing Physical Activity Readiness Questionnaire for Everyone
(PAR-Q+).
Each session was performed in the park and consisted of five structured PA
trials ranging from sedentary to vigorous intensity: quiet sitting and standing (5 mins),
comfortable walk (5 mins), brisk walk (5 mins), jogging (3 mins), and running (2
mins). Sufficient recovery time was provided between each activity trial.
Before the first session, participants provided basic profile information such as
age, sex, height, and weight. PA status (sedentary, insufficiently active, sufficiently
active) was measured using the Active Australian Survey [97].
During each session, participants wore an Empatica E4 smart watch on non-
dominant wrist, and a Polar H7 chest strap HR monitor. The Empatica E4 captured
electrodermal activity (Eda) and body temperature (Temp). The sampling rate for Eda
and Temp data were 4 Hz. The Polar HR monitor recorded HR at a sampling rate of 1
Hz and the RR-interval data.
Relative intensity was measured using the Borg RPE scale [35, 36]. The Borg’s
6–20 scale, shown in Figure 5.1, reflect how heavy and strenuous the PA feels to
someone, linking all sensations and feelings of physical stress, effort, and fatigue.
Rating 6 represents “no exertion at all” and 20 represents “maximal exertion”. Each
number describes a different level of exertion.
The scale was presented and explained to the participants before performing the
session. Immediately after each trial, participants rated their perceived exertion using
the scale.
100 Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data
Figure 5.1 Borg’s Rating of Perceived Exertion (6-20) scale
Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data 101
5.4 METHODS
5.4.1 Pre-processing
A moving average filter with a span of 5 was applied on the HR, RR, Eda, and
Temp data to remove any motion artefacts. To exclude non-steady-state data, 10s of
data were removed from the begging and end of each activity trial. Missing values
were replaced by linear interpolation.
5.4.2 Feature Extraction and Selection
From each sensor modality, a few time and frequency domain features were
extracted.
HR feature set: The time domain features included mean, variance, standard
deviation, skewness, kurtosis, median, numerical gradient, on and off response.
Additionally, the number of times HR increased and decreased were computed,
normalised for window size.
R-R interval feature set: The time domain features extracted from the RR
interval data included mean, variance, standard deviation, skewness, kurtosis, median,
standard deviation of successive differences between adjoining normal cycles (SDSD),
Square root of the mean squared difference of successive RR-intervals (rMSSD),
Number of pairs of successive RR-intervals that differ by more than 20 ms/length
(pNN20), Number of pairs of successive RR-intervals that differ by more than 50
ms/length (pNN50). Frequency features included spectral energy density (aVLF, aLF,
aHF), relative power (pVLF, pLF, pHF), and normalised power (nLF, nHF) of very
low frequency (0 - 0.04 Hz), low frequency (0.04 – 0.15 Hz), and high frequency (0.15
– 0.40 Hz) components. Total spectral energy density (aTotal), and ratio between LF
and HF band energy (LF/HF) were also extracted.
Eda feature set: The time domain features included mean, variance, standard
deviation, skewness, kurtosis, and median.
Temp feature set: The time domain features included mean, variance, standard
deviation, skewness, kurtosis, and median.
Person-level features: These features were always used with the sensor features
in the regression models. Person level features included height, weight, age, BMI,
gender, total PA time, weekly PA sessions, and PA status.
102 Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data
Before the regression, each feature was normalised to a zero mean and unit
variance. Then, the size of the feature vector was reduced by selecting only the best 10
features using a minimum-redundancy–maximum-relevance feature selection
(MRMR) method [165]. For an example, the best 10 features selected from the fused
feature of RR, Eda, and Temp were: 1) median(RR), 2) mean(Temp), 3) aTotal(RR),
4) aHF(RR), 5) mean(Eda), 6) pNN20(RR), 7) aVLF(RR), 8) mean(RR), 9)
skewness(RR), 10) aLF(RR).
5.4.3 Regression Algorithms
There is a linear relationship between the sensor data and relative intensity [166,
167]. For example, relative intensity usually increases with the increase of heart rate.
Eda value represents sweating which also usually increases with the relative intensity
[18]. Considering these, this study selected linear regression and a SVM regression
with a linear kernel for predicting the relative intensities. Both regression algorithms
were implemented in Matlab (version 2017a).
At first, the regression algorithms were applied separately on the features
extracted from each individual modality. Then the features of multiple modalities were
merged together to form all possible feature combinations. Regression algorithms were
applied to all feature combinations to investigate if using multiple modalities can
improve prediction performance.
Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data 103
5.5 PERFORMANCE EVALUATION
Performance was evaluated using the root-mean-square error (RMSE). For n
different predictions, if 𝑦𝑦𝑡𝑡′ are the predicted values by the model and 𝑦𝑦𝑡𝑡 are the original
values, the RMSE can be calculated using the following equation:
𝑅𝑅𝑅𝑅𝑆𝑆𝑅𝑅 = �∑ (𝑦𝑦𝑡𝑡′−𝑦𝑦𝑡𝑡)2𝑛𝑛𝑡𝑡=1
𝑃𝑃 (1)
To consider the large inter-individual variability, this study used leave one
subject out cross validation, where data from one subject are used for testing and the
other subjects’ samples are used for training. In this way, samples of each subject are
used exactly once for testing. The predicted relative intensity for each subject were
combined and RMSE was derived from the complete set.
104 Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data
5.6 EXPERIMENTAL RESULTS AND DISCUSSION
5.6.1 Performance from Using a Single Modality
Figure 5.2 shows the regression performances for the single modality models. In
both linear regression and SVM regression, models based on RR features provided the
best performance. The RMSE of RR models were 1.98 and 1.99 in linear and SVM
regression, respectively. HR models also performed well compared to Eda and Temp
with RMSEs of 2.07 and 2.11 for linear and SVM regression respectively. Temp
features provided the worst performance with significantly higher RMSE values. Eda
provided the second worst performance.
Figure 5.2 Prediction performances of single modality models
5.6.2 Performance from Using Multiple Modality
Table 5.1 lists the RMSE values for models developed by fusing the features of
multiple modalities. For both regression algorithms, the combination of RR, Eda, and
Temp yielded the best performance, outperforming the best single modality (RR)
models. HR was not able to add further information beyond that provided by RR. For
example, addition of HR with the RR (HR+RR) did not exceed the performance of RR
only.
Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data 105
Table 5.1. Prediction performances of models developed from the combination of modalities
Linear
Regression SVM
Regression Eda+Temp 3.39 3.43
HR+Eda 2.10 2.12
HR+Temp 2.05 2.05
HR+RR 1.98 1.98
HR+Eda+Temp 1.94 1.95
HR+RR+Eda 1.87 1.88
HR+RR+Eda+Temp 1.87 1.88
RR+Temp 1.87 1.87
HR+RR+Temp 1.86 1.87
RR+Eda 1.85 1.85
RR+Eda+Temp 1.85 1.84
106 Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data
5.7 CONCLUSION
This study developed regression models from multimodal physiological data to
predict relative PA intensity. It used physiological sensor data collected from 22
individuals, when they were performing physical activities ranging from sedentary to
vigorous intensity. Borg’s RPE scale served as a ground truth measure of relative
intensity requiring no laboratory testing or predictions of HR max. Two regression
algorithms were applied on the features extracted from the physiological data to
identify the best single modality for relative intensity prediction. Then, we fused the
features and applied regression algorithms on all combinations of the features to
identify the best combination of modalities for relative PA intensity prediction. The
leave one subject out cross-validation results showed that RR features provided the
best prediction performance compared to other single modalities. The best prediction
combination of modalities was RR, Eda, and Temp. Both regression algorithms
performed similarly in all cases. The study identified that, for the prediction of relative
PA intensity, Eda and Temp are not good features by themselves, but they can provide
additional information and improve prediction performance when combined with RR
or HR.
(Additional paragraph – not included in the published paper)
The strength of the current study was the use of a number of regression
algorithms on the multimodal physiological data, collected in a non-laboratory
environment, to predict the relative intensity of the participants. To the best of our
knowledge, this is the first study that used multiple modalities of physiological data
for relative intensity prediction using machine learning algorithms. However, there
were some limitations of this study that warrant further investigations. For example,
the study predicted the raw RPE values as the predictor of relative intensity. But, in
the real-world application, it would be more beneficial if the relative intensity can be
categorised into low, moderate, high intensities based on the RPE. Also, the study did
not provide any analysis to know for which RPE range most misclassifications occur.
In our next chapter (Chapter 6), we performed further investigation to overcome the
above-mentioned limitations.
Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data 107
Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data
Alok Kumar Chowdhury1
Dian Tjondronegoro2
Vinod Chandran1
Jinglan Zhang1
Stewart G. Trost3
1. Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia.
2. School of Business and Tourism, Southern Cross University, Gold Coast, Australia.
3. Institute of Health and Biomedical Innovation at QLD Centre for Children’s Health Research, School of Exercise and Nutrition Sciences, Queensland University
of Technology, Brisbane, Australia.
Corresponding Author:
Professor Stewart G. Trost, PhD
Institute of Health and Biomedical Innovation at QLD Centre for Children’s Health Research
Level 6, 62 Graham Street
South Brisbane, QLD 4101
Australia
Phone: +61 7 3069 7301
Fax: + 61 7 3138 3980
Email: [email protected]
108 Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data
QUT Verified Signature
Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data 109
QUT Verified Signature
QUT Verified Signature
110 Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data
6.1 ABSTRACT
Purpose: To investigate the feasibility of a non-laboratory approach that uses
machine learning on the multimodal sensor data to effectively predict relative physical
activity (PA) intensity. Methods: A total of 22 participants completed up to 7 physical
activity sessions consisting of sitting and standing, comfortable walk, brisk walk,
jogging, and fast running activities. During each session, participants wore a Empatica
E4 wrist-watch and a Polar chest-strapped heart-rate monitor, which recorded heart-
rate (HR), r-r interval (RR), electrodermal activity (Eda) and body temperature
(Temp). After each activity participants provided ratings of perceived exertion (RPE)
using the 6-20 Borg’s scale. Along with the attribute data of the participant, a set of
features were extracted from each of the modalities (HR, RR, Eda and Temp). Using
leave-one-subject-out cross validation, three classifiers including random forest (RF),
neural network (NN) and support vector machine (SVM) were applied independently
on each of the feature set to predict 3-class relative PA intensities: low (RPE ≤ 11),
moderate (RPE between 12-14), and high (RPE ≥ 15). Then, both feature fusion and
decision fusion (posterior-adapted class-based decision fusion) of all combination of
sensor modalities were carried out to investigate the best combination. Results:
Among the single feature sets, RR provided the best overall performance for all three
classifiers. Decision fusion did not outperform the performance of RR features for any
combinations. But, when fused using feature fusion, SVM showed best performance
for RR+Eda with 3.4% improvement compared to RR only. Using NN and RF in
feature fusion, the best combination was RR+Temp and RR+Eda+Temp respectively.
Conclusion: Use of multiple modalities can enhance the relative intensity prediction
performance.
Keywords: Motion sensors, machine learning, pattern recognition, random
forest, bagging, boosted decision trees.
Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data 111
6.2 INTRODUCTION
Regular participation in physical activity (PA) is recognised as one of the most
important steps that people can take to improve their health [2]. Physical inactivity
significantly increases the risk of numerous chronic health conditions, including
cardiovascular disease, type-2 diabetes, cancers of the breast and colon, and depression
[5, 61, 122]. This extensive scientific evidence for the health benefits of PA has
prompted numerous medical and public health organisations to issue recommendations
or guidelines for participation in physical activity. For example, the World Health
Organisation recommends at least 150 min of moderate-intensity PA or at least 75 min
of vigorous intensity PA per week, accumulated in bouts of at least 10 min in duration
[15].
In recent years, due to the increasing use of wearable sensor technology,
accelerometer and heart rate-based objective PA monitoring has become popular
among researchers and consumers [155-157]. In contrast to self-report methods,
sensor-based approaches can be used to collect real-time responses from users
efficiently and unobtrusively, and can track the frequency, intensity, and duration of
physical activity [30, 93, 161-163]. Such features enable users to record, view and
share PA status with their health practitioners and peers. Because current PA
guidelines call for participation in moderate- and vigorous-intensity PA, it is important
that wearable sensor systems for monitoring PA behaviour provide accurate
determinations of PA intensity.
PA intensity can be defined in relative or absolute terms. Relative intensity is
generally expressed as a percentage of an individual’s maximal aerobic capacity (%
VO2 max, % HR reserve) or based on ratings of perceived exertion (RPE) [33, 39]. In
relative terms, moderate intensity physical activity is typically defined as 40% to 60%
of VO2 reserve or an RPE of 12 – 14 [37, 160]. Absolute intensity, on the other hand,
refers to the energy cost of a specific activity expressed as multiples of resting
metabolism or Metabolic Equivalents (METs), where 1 MET is assumed to be 3.5
ml.kg-1.min-1. In absolute terms, moderate physical activity is defined as 3 - 6 METs,
regardless of an individual’s aerobic capacity. Thus, in order to achieve moderate
intensity physical activity based on absolute intensity, individuals with a lower aerobic
capacity are required to work at a significantly higher relative intensity [39, 70]. For
example, “brisk” walking on level ground has an absolute intensity of 4 METs. For a
112 Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data
young healthy person with a maximal aerobic capacity of 10 METs, the relative
intensity is 40% of maximal capacity; whereas for an individual with a chronic health
condition with a maximal aerobic capacity of 6 METs, the relative intensity is 67%.
Conversely, in relative intensity terms, low fit individuals working at an absolute
intensity of < 3 METs may be judged as participating in moderate intensity PA, if the
work rate exceeds 40% of maximal aerobic capacity.
To date research efforts to quantify PA intensity from wearable sensors have
predominantly been based on absolute intensity [39, 70, 158, 163]. Because such
estimates do not consider an individual’s aerobic fitness, age or health status, the
intensity of PA could be above accepted relative intensity thresholds for moderate-to-
vigorous PA (MVPA), but below the established 3 MET absolute intensity threshold
for MVPA [13, 69]. As such, a significant proportion of individuals with limited
aerobic capacity are erroneously misclassified as not meeting PA guidelines.
Moreover, m-Health platforms using wearable sensors systems to monitor the absolute
intensity of PA could be encouraging individuals to exercise at relative intensities that
are neither safe nor effective [12]. Thus, the development of validated algorithms to
predict relative PA intensity from wearable sensor data constitutes an important
research priority.
Because of the linear relationship between HR and work rate during steady state
exercise, HR based indices such as percentage of maximal HR (%HR max) and
percentage of HR reserve (%HRR) are widely used metrics for quantifying the relative
intensity of PA [39, 71, 72]. However, there are number of drawbacks to using these
metrics. First, the standard error of estimate associated with commonly used age-based
prediction equations for HRmax (220 – age, 208 – (0.7 x age)) range from 10 to 12 bpm
and therefore do not provide accurate predictions of individual HRmax [40, 41]. Second,
to account for inter-individual differences in aerobic capacity and HRmax, it is
necessary to personalise the relationship between HR and work rate via individual
calibration in the laboratory, which is time intensive, requires expensive
instrumentation, and is not feasible in large field-based studies [22, 72].
Along with HR, other modalities of physiological data, including electrodermal
activity (Eda) and body temperature (Temp) can be easily measured via wearable
sensors. These physiological indicators can provide valuable information about the
metabolic demand of exercise, and can also be used to predict relative PA intensity.
Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data 113
However, to the best of our knowledge, the use of multiple modalities of physiological
data for relative intensity prediction has not been previously investigated.
An alternative approach to measuring relative intensity that does not require
instrumentation or individual calibration in the laboratory, is the use of effort
perception or ratings of perceived exertion (RPE). Effort perception scales such as the
Borg alpha-numeric RPE Category Scale are commonly used in exercise testing and
prescription contexts and have been shown to a valid and reliable indicator of relative
PA intensity [35-38]. For example, in a meta-analysis, Chen, et al. [37] reported
weighted mean validity coefficient of 0.62 between RPE and HR, and 0.64 between
RPE and %VO2max across different studies. In a more recent study, Scherr, et al. [36]
conducted incremental exercise tests in treadmills or cycle ergometers on a very large
population of 2560 Caucasian men and women. They found strong correlation of RPE
with heart rate (r = 0.74, p < 0.001) which was not significantly affected by
participant’s gender, age, coronary artery disease, physical activity status and exercise
testing modality (all p < 0.05). Yet, despite the widespread use of RPE for effort
estimation, the utility of algorithms to predict relative PA intensity based on RPE has
not been explored. If features in the signals from multiple physiological sensors can
be trained to predict relative PA intensity based on effort perception, then PA intensity
predictions can be more personalised, and users can track/view their PA sessions and,
exercise at an intensity that is safe and effective.
This study investigated the feasibility of a non-laboratory approach that uses
machine learning on features in multimodal sensor data to effectively predict relative
PA intensity. A non-laboratory dataset, collected from 22 people was utilised, where
Borg’s RPE scale was used as a ground truth measure of relative intensity. The features
extracted from multimodal physiological data were applied to three state-of-the-art
machine learning algorithms (including support vector machine, random forest, and
neural network) to predict 3 classes of relative PA intensity. The study 1) identified
the best set of features from a single modality for predicting relative PA intensity, 2)
explored both feature fusion and decision fusion to combine different sensor
modalities to improve relative PA intensity prediction performance, and 3) identified
the best combination of modality features for the predicting relative PA intensity.
114 Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data
6.3 METHODS
6.3.1 Participants
Twenty-two adults (mean age = 29.8 ± 3.2 yrs; BMI = 25.3 ± 2.6; male = 77.3%)
participated in this study. Inclusion criteria included: 1) age between 18 to 40 years,
2) complete and pass Physical Activity Readiness Questionnaire (PAR-Q+), and 3) no
hospitalisations within the last six months. Prior to participating in the study, each
participant provided written informed consent. The data collection protocol was
approved by the Office of Research Ethics and Integrity of the Queensland University
of Technology (Ethics number: 1500000962).
6.3.2 Protocol
Participants completed up to 7 weekly physical activity sessions in the park (91%
of all participants participated 5+ sessions). Each session comprised five structured
physical activity trials ranging from sedentary to high intensity. The activity trials were
quiet sitting and standing (5 mins), comfortable walk (5 mins), brisk walk (5 mins),
jogging (3 mins), and fast running (2 mins). The intensity of walking, jogging and fast
running trials was self-selected. Sufficient recovery time was provided between each
activity trial. The resting time after each of the activity trials were 5 mins, 15 mins, 15
mins, 17 mins, and 18 mins respectively. Thus, each session usually lasted
approximately 100 mins. To ensure a common outdoor environment, all sessions were
performed in the afternoon. During the trials, a researcher accompanied the
participants and provided verbal feedback (if required) to assist participants with
motivation and to ensure even pacing during the trials.
6.3.3 Data Acquisition
Participant attributes. Before each session, participants provided basic profile
information such as age, sex, height, and weight. Habitual physical activity level was
measured using the Active Australian Survey [168]. Responses to the survey were used
to estimate total activity time, total number of activity sessions, and physical activity
status (sedentary/ insufficiently active/ sufficiently active).
Sensor data. During each session, participants wore an Empatica E4 wrist-watch
(Boston, US), a Polar H7 chest strap heart-rate monitor, and a mobile phone. The
Empatica E4 was worn on the non-dominant wrist, and captured electrodermal activity
(Eda) and body temperature (Temp). The sampling rate for Eda and Temp data were 4
Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data 115
Hz. The Polar heart-rate monitor recorded heart-rate (HR) at a sampling rate of 1 Hz
and the RR-interval data.
Annotation. Relative intensity was measured using the Borg Rating of Perceived
Exertion (RPE) scale [35, 36, 74]. The scale was presented and explained to the
participants before performing the session. Immediately after each trial, participant
rated their perceived exertion by pointing to the printed Borg’s RPE scale. A high
degree of reliability was found between RPE measurement over the 7-week study
period. The single measure ICC was 0.92 with a 95% confidence interval from 0.89 to
0.95. Borg RPE values were categorised to 3 relative intensity classes corresponding
to low (6 -11), moderate (12 -14) and high (15 -20).
6.3.4 Relative Intensity Prediction System
A relative PA intensity prediction system was designed by applying machine
learning algorithms to predict low, moderate, and high relative intensity from features
in the raw sensor data. The overall framework of the system consists of five steps: pre-
processing, feature extraction, normalisation & feature selection, classification, fusion,
and evaluation.
Pre-processing. HR, Eda, and Temp data were annotated with the relative
intensity classes and transformed into time-series data structure. To remove motion
artefacts in the physiological data, a moving average filter with a span of 5 was applied
on HR, Eda and Temp data. In addition, this study empirically discarded 10 s of data
at the beginning and end of each activity trial to remove non-steady-state data. Any
intermediate missing values were replaced by linear interpolation; and missing values
at the end of each activity were replaced by the previous value.
Feature Extraction. A number of time and frequency domain features were
extracted from each sensor modality. Because participants reported RPE at the end of
each activity trial, the window size for feature extraction was set equal to the duration
of the activity trial. The extracted features from the sensor modalities are given in
Table 6.1.
116 Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data
Table 6.1 Feature set extracted from each sensor modality
1. HR feature set Time domain features: mean, variance, standard deviation,
skewness, kurtosis, median, numerical gradient, on and off
response, the number of times HR increased normalised for
window size, and the number of times HR decreased
normalised for window size.
2. R-R interval
feature set
Time domain features: mean, variance, standard deviation,
skewness, kurtosis, median, standard deviation of successive
differences between adjoining normal cycles (SDSD), Square
root of the mean squared difference of successive RR-intervals
(rMSSD), Number of pairs of successive RR-intervals that
differ by more than 20 ms/length (pNN20), Number of pairs
of successive RR-intervals that differ by more than 50
ms/length (pNN50).
Frequency features: spectral energy density (aVLF, aLF, aHF),
relative power (pVLF, pLF, pHF), and normalised power
(nLF, nHF) of very low frequency (0 - 0.04 Hz), low frequency
(0.04 – 0.15 Hz), and high frequency (0.15 – 0.40 Hz)
components, total spectral energy density (aTotal), and ratio
between LF and HF band energy (LF/HF).
3. Eda feature
set
Time domain features: mean, variance, standard deviation,
skewness, kurtosis, and median
4. Temp feature
set
Time domain features: mean, variance, standard deviation,
skewness, kurtosis, and median
Each of these feature sets were combined with participant attribute data or
person level features. Person level features included height, weight, age, BMI, gender,
total PA time, weekly PA sessions, and PA status.
Normalisation and Feature Selection. In order to limit features to a common
range, linear methods were used to normalise each feature to a zero mean and unit
variance. Because some features can be redundant and provide irrelevant information
Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data 117
which can undesirably affect performance, a minimum-redundancy–maximum-
relevance feature selection (MRMR) method was applied [165] on the complete
dataset. As minimum redundancy criteria, this method used minimum mutual
information between features; and for maximum relevance criteria, it used the maximal
mutual information between the classes and feature. This approach resulted in the
selection of only the best 10 features as inputs to the classifiers.
Classification Algorithms. Three state-of-the-art machine learning algorithms,
which are prominently used in physical activity domain, including support vector
machine (SVM), random forest (RF), and neural network (NN) were utilised. In our
implementation, two-class SVM was adapted in a fashion that firstly classified one
class against all other classes and then classified another class versus the remaining
classes and so on [169]. A radial basis function kernel function was chosen for use in
the SVM classifier. RF was implemented using the “Treebagger” classification tool
within Matlab (2017a, The MathWorks Inc., USA). The number of decision trees in
the RF classifier were empirically set to 100 because it provided optimum performance
compared to 50 and 150. For the NN, number of input, hidden and output neurons were
10, 7, and 3 respectively. The maximum epoch and learning rate were set to 250 and
0.001 respectively.
Fusion to Combine Multiple Modalities. In order to find best feature combination
for relative intensity prediction, both feature-level and decision-level fusion were
carried out. In feature-level fusion, all combinations of the four sensor feature sets
were merged together. Then feature selection and classification algorithms were
applied on the merged feature set. In the decision-level fusion, each of the sensor
feature sets made independent decision, which then combined using a fusion
algorithm. As a decision-level fusion algorithm, our previously reported posterior-
adapted class-based decision fusion [161] was used. This algorithm applies a classifier
on the training data of each modality separately to get understanding on the model’s
prediction performance across classes. Then, for each modality it assigns weights to
the classes (class-based weights) based on the performance on the training data. During
testing a new instance, a decision with posterior probability is made independently
from each feature set using the classifier. Then, the class-based weights for each test
instance is adjusted using the posterior probability (posterior-adapted class-based
118 Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data
weights). Finally, in the fusion step, the class with the highest posterior-adapted class-
based weight is selected as the final predicted class.
Performance Evaluation. Performance was evaluated using leave-one-subject-
out (LOSO) cross-validation [137, 169]. In LOSO, data from one user are used for
testing, the other users" samples are used for training. In this way, samples of each
subject are used exactly once for testing. This study used F1 score [138] to measure
the performance of the ensemble learning methods. The study favoured F1 score over
classification accuracy because unlike accuracy or percentage of agreement, it is not
influenced by class distribution. The F1 score was computed from precision and recall
by keeping a balance between them.
𝐹𝐹1 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 2 𝑋𝑋 𝑝𝑝𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓 𝑋𝑋 𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑓𝑓𝑓𝑓𝑝𝑝𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓 + 𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑓𝑓𝑓𝑓
𝑋𝑋 100% (1)
Where precision describes the exactness of a classifier. A lower value of
precision indicates a high false-positive rate.
𝑝𝑝𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓 = 𝑡𝑡𝑠𝑠𝑡𝑡𝑠𝑠 𝑝𝑝𝑠𝑠𝑠𝑠𝑓𝑓𝑡𝑡𝑓𝑓𝑝𝑝𝑠𝑠𝑠𝑠
𝑡𝑡𝑠𝑠𝑡𝑡𝑠𝑠 𝑝𝑝𝑠𝑠𝑠𝑠𝑓𝑓𝑡𝑡𝑓𝑓𝑝𝑝𝑠𝑠𝑠𝑠 + 𝑓𝑓𝑓𝑓𝑓𝑓𝑠𝑠𝑠𝑠 𝑝𝑝𝑠𝑠𝑠𝑠𝑓𝑓𝑡𝑡𝑓𝑓𝑝𝑝𝑠𝑠𝑠𝑠 𝑋𝑋 100% (2)
Recall or sensitivity is useful to measure the completeness of classifiers. Low
recall indicates a high false-negative rate.
𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑓𝑓𝑓𝑓 = 𝑡𝑡𝑠𝑠𝑡𝑡𝑠𝑠 𝑝𝑝𝑠𝑠𝑠𝑠𝑓𝑓𝑡𝑡𝑓𝑓𝑝𝑝𝑠𝑠𝑠𝑠
𝑡𝑡𝑠𝑠𝑡𝑡𝑠𝑠 𝑝𝑝𝑠𝑠𝑠𝑠𝑓𝑓𝑡𝑡𝑓𝑓𝑝𝑝𝑠𝑠𝑠𝑠 + 𝑓𝑓𝑓𝑓𝑓𝑓𝑠𝑠𝑠𝑠 𝑓𝑓𝑠𝑠𝑛𝑛𝑓𝑓𝑡𝑡𝑓𝑓𝑝𝑝𝑠𝑠 𝑋𝑋 100% (3)
Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data 119
6.4 RESULTS
6.4.1 Relative Intensity Classification from a Single Modality
Table 6.2 reports F1 scores for the four sensor modalities. All three classifiers
showed a similar pattern of results. RR features provided the highest classification
accuracy across all three classifiers, with the NN classifier achieving the highest
accuracy (86.7%). Performance was consistently low for the classifiers trained on Eda
or Temp features. HR features also provided good F1 scores and outperformed
classifiers trained Temp and Eda.
Table 6.2 F1-scores of five different modalities using three classifiers
SVM RF NN
Feature(s) F1 Score % Feature(s) F1 Score % Feature(s) F1 Score %
Eda 59.5 Eda 58.0 Eda 60.3
Temp 61.6 Temp 59.0 Temp 61.6
HR 78.7 HR 80.8 HR 79.2
RR 83.4 RR 85.2 RR 86.7
6.4.2 Feature Fusion Results
Figure 6.1 reports feature fusion results (F1 scores) for all possible combinations
of the four different modalities using three classifiers. Eda and Temp features showed
effectiveness when combined with other modalities. The best performance observed
for SVM was RR+Eda (86.8%) which had 3.4% improvement compared to RR only.
Using RF and NN classifier, the best combinations were RR+Eda+Temp (86.3%) and
RR+Temp (87.2%) respectively. In most cases, adding Eda and/or Temp with the RR
feature improved the classification performance.
120 Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data
Figure 6.1 F1 Scores for all combinations of modalities using feature fusion;
Note: results of single modalities are also given as base-lines
The confusion matrix for the best performing combinations are shown in Figure
6.2. In all cases, classifiers correctly classified most of the low and high relative
intensity activity trials and a good number of the activity trials with moderate relative
intensity. For example, when the fusion of RR and Temp features served as inputs to
the NN classifier, 96% (362 out of 379) of low-intensity trials, 62% (74 out of the 121)
Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data 121
activity trials of moderate intensity trials, and 87% (100 out of 115) of the high
intensity trials were classified correctly. From the moderate intensity trial, 23% was
misclassified as low intensity and 15% as high intensity.
Figure 6.2 The confusion matrix for the best combinations in each classifier
6.4.3 Decision Fusion Results
Figure 6.3 shows the decision fusion results (F1 scores) of all combinations of
modalities using three classifiers. However, none of the combinations in decision
fusion were able to exceed the performance of RR. Addition of Eda and Temp with
the HR or RR reduced the performance. For example, in SVM, HR features alone
provided an F1 score of 78.7%, but when fused with the decisions from the Eda and
Temp feature models, performance was reduced to 62.9%.
122 Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data
Figure 6.3 Scores for all combinations of modalities using decision fusion;
Note: results of single modalities are also given as base-lines
Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data 123
6.4.4 Statistical Comparison
The performance of the different modalities and their combinations were tested
for statistical significance using one-way repeated measures ANOVA. The F1-scores
(using feature fusion) for all folds/users for all classifiers were merged together to
increase the statistical power and enhance the generalisability of the findings.
In overall, mean F1-scores differed significantly (Wilks’ Lambda = 0.255, F (13,
50) = 11.264, p < .0001) between the prediction performance different modalities
(single and combinations). LSD post hoc comparisons revealed that the both HR and
RR features/modality can provide statistically better performance than Eda and Temp.
RR showed significantly better performance than HR. Among the combination
modalities, only RR+Eda provided statistically improved performance over the RR.
RR+Temp and RR+Eda+Temp showed similar performance to the RR.
124 Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data
6.5 DISCUSSION
This study systematically investigated the use of machine learning algorithms
trained on features in multimodal physiological data to classify relative PA intensity.
Across all classifiers (SVM, RF and NN), the features extracted from RR interval data
provided the best performance (statistically) compared to features extracted from
heart-rate, Eda, and Temp. HR features demonstrated the second highest classification
performance, while Eda performed least. Two fusion techniques (feature- and
decision-fusion) were examined to identify the effective combination of the
physiological data. Among them, feature fusion showed improvement in performance
when adding Eda and/or Temp feature sets to the RR feature set. SVM showed best
performance for the fusion of features in RR and Eda, with a 3.4% improvement
compared to RR only. Using NN and RF in feature fusion, the best combination was
RR+Temp and RR+Eda+Temp respectively.
Our results are consistent with previous studies demonstrating that the features
derived from heart-rate data provide better prediction of relative PA intensity than
other modalities (Eda and Temp). Most previous studies have used heart-rate data as a
method for assessing relative PA intensity [39, 71, 72]. RR represents the beat-to-beat
fluctuations of the heart-rate, which can reveal the state of user’s autonomic nervous
system. The complex time and frequency- domain features, extracted from RR data
showed better performance than HR. Eda and body temperature are also linked to PA
intensity, for example, Eda is affected by sweat due to physical exertion, and
psychological stress [18]. However, our results found that these modalities (Eda, and
Temp) alone cannot provide satisfactory relative PA intensity classification accuracy.
Although some studies were conducted on the relative PA intensity
measurement, no previous studies have investigated the use of machine-learning
algorithms for automated recognition of relative PA intensity based on RPE. Among
the three state-of-the-art machine learning algorithms (SVM, RF, and NN), NN
provided slightly better accuracy on single modality than others, except for HR where
RF did best. However, there were no significant variability among the classifiers’
results, i.e., they yield equivalent results on the single modalities.
The combinations of sensor modalities showed effectiveness for some cases
using the feature fusion algorithms. The feature fusion effectively used the best
features from all included modalities during the classification, which led to improved
Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data 125
performance. For example, MRMR feature selection selected the top 10 features from
the fused feature set, and then the classifier was applied on the best features. Our results
showed that the addition of Eda and/or Temp features with the RR features can provide
increased classification performance. This indicates that features from these modalities
can provide additional information to that provided by RR features.
When the combinations of sensor modalities were investigated using the
decision fusion, the combinations showed considerably inferior performance. Unlike
feature fusion, in decision fusion each modality made separate decision on the relative
intensity and then a posterior-adapted class-based weighted algorithm combined those
decisions. In our earlier research, posterior-adapted class-based algorithm showed
improved performance for PA recognition when combining the accelerometer data
obtained from multiple body locations (ankle, chest, and wrist) [161]. Because, for
PA recognition, each of these locations had comparable but complementary
performance, the individual decisions from each model performed better when
combined using posterior-adapted class-based weighted algorithm. But, in this study,
decision fusion showed poor performance because Eda and Temp did not provide
satisfactory performance in their own right. Thus, combining the decision of two
relatively poorly performing prediction models led to further decrements in overall
performance.
In the confusion matrix, it can be seen that most of the low and high relative
intensity samples were classified correctly. However, classification models showed
relatively higher misclassification for moderate relative intensity categories. As this
study used leave-one-subject-out cross-validation, the inter-person differences in the
moderate relative intensity zone played a vital role for these misclassifications. From
the application point of view, it may be of a less concern to misclassify moderate
intensity as vigorous (and vice versa) as they both are counted toward meeting PA
guideline. But, the misclassification between moderate and low intensity is
problematic as it can provide incorrect information on whether a person meets PA
guideline. Further research can be carried out to improve the model’s performance or
reduce misclassification between low and moderate intensity, such as normalising the
physiological data before feeding into the classifier.
A strength of the current study was the use of machine learning on the
multimodal physiological data, collected in a non-laboratory environment, to predict
126 Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data
the relative intensity of the participants. The examination of each of these modalities
independently or combined (using fusion), were among the main strengths. The use of
three different “state-of-the-art” classification algorithms was an additional strength.
There were, however, some limitations that warrant consideration. First, although
activity trials were self-paced and completed at a user-specified intensity, the data were
collected in predetermined sequences with known duration. Thus, additional work is
required to evaluate the performance of the proposed methods in true free-living
contexts. Second, only few lifestyle activities were included in training data. Future
studies should include a more diverse set of lifestyles activities ranging from sedentary
to vigorous. Third, although some works found normalising physiological data can
reduce inter-individual differences and improve the accuracy of EE or PA recognition
[20], our experiment was unable to normalise the physiological features due to lack of
resting data. Fourth, our study did not utilise the accelerometer data and only focused
on the use of multimodal physiological data for relative intensity prediction. The
accelerometer can add the information on external workload to the model. In future, a
study can be carried out to investigate the utility of the combination of accelerometer
and physiological data for relative intensity prediction. Fifth and finally, in this study,
feature selection algorithm used only top 10 features for the classification task. In
future, the number of selected features can be varied and compared the model across
the different number of selected features.
In summary, the results demonstrate that relative PA intensity predication can
be performed by using machine learning on multimodal physiological data. Of the
different modes of physiological data examined, features extracted from RR data
provided best performance. Although the non-heart rate features including Eda and
Temp cannot provide satisfactory result on their own, they can improve the
performance when combined with the RR features. Thus, this research informs the best
single modality and features, and best combination of modality for predicting relative
PA intensity from wearable sensors using machine learning.
Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data 127
6.6 ACKNOWLEDGEMENTS
No funding was received for completion of this project. Trost is a member of the
ActiGraph Scientific Advisory Board. Chowdhury, Tjondronegoro, Chandran, and
Zhang declare no conflict of interest. The results from the present study do not
constitute endorsement by the American College of Sports Medicine.
Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children 129
Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children
Alok Kumar Chowdhury1
Dian Tjondronegoro2
Jinglan Zhang1
Markus Hagenbuchner3
Dylan Cliff3
Stewart G. Trost4
1. Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia.
2. School of Business and Tourism, Southern Cross University, Gold Coast, Australia.
3. University of Wollongong, Australia.
3. Institute of Health and Biomedical Innovation at QLD Centre for Children’s Health Research, School of Exercise and Nutrition Sciences, Queensland University
of Technology, Brisbane, Australia.
Corresponding Author:
Alok Kumar Chowdhury
Science and Engineering Faculty,
Queensland University of Technology,
Brisbane, Australia
Phone: +61 420 467 077
Email: [email protected]
130 Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children
ss
QUT Verified Signature
Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children 131
QUT Verified Signature
QUT Verified Signature
132 Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children
7.1 ABSTRACT
Accurate monitoring of physical activity and its respective energy expenditure
is necessary in studies that aim to quantify, understand, and promote physical activity
in preschool children. This paper proposes the use of deep learning to effectively
predict energy expenditure from body worn accelerometers. During the data collection,
eight participants performed ten simulated free-living activities ranging in intensity
from sedentary to vigorous. Participants wore accelerometers on both wrists and the
right hip, along with a portable metabolic system for direct measurement of energy
expenditure. The analysis uses Convolutional Neural Networks to perform deep
learning regression on each accelerometer configuration - singular and combined. The
performance is benchmarked against a set of conventional supervised machine
learning and simplified regression models. Based on a leave-one-subject-out cross-
validation and one-way repeated measures ANOVA, the results show that deep
learning can achieve a comparable performance to the best conventional supervised
learning algorithms, and significantly outperformed the simplified regression
approaches.
Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children 133
7.2 INTRODUCTION
Physical activity (PA) during the early childhood years has an influential role on
current and future development [170]. Regular PA provides children with a range of
important health benefits, including healthy weight, improved bone health,
cardiovascular fitness, and enhanced cognitive, emotional and psychosocial
development [170, 171]. Based on this evidence, government agencies and global
health organizations have issued PA guidelines recommending that preschoolers be
physically active for at least 180 min per day [64, 172, 173]. Accurate sensor-based
measurement of PA and energy expenditure (EE) is therefore needed to monitor
compliance with this guideline and develop e-Health monitoring and intervention
applications to increase PA and reduce sedentary behaviour in low-active preschool
children.
Due to their small size and low cost, accelerometer-based wearable sensors have
emerged as a popular method for measuring PA in preschoolers [114, 174]. However,
in most applications, the wealth of data generated from these devices has not been
thoroughly utilized with prediction of EE based on simple linear regression. Such
usages have shown poor performance for sedentary and non-ambulatory activities
[43].
In recent years, the conventional use of supervised machine learning for EE
prediction has emerged as a viable and more accurate alternative to simple linear
regression [44, 45]. The conventional machine learning approach involves manual
extraction of features, feature selection, and applying regression algorithms such as
artificial neural networks (ANN) [46-48], ensemble decision trees [49], support vector
machine [50]. The performance of regression depends on the quality and number of
features, which requires domain knowledge for feature extraction and sophisticated
feature selection algorithms. Often, the same extracted features do not perform equally
well in different studies [51].
Deep learning models are now gaining popularity and are widely used in other
domains such as computer vision and image processing [52]. A deep learning
framework is usually a collection of multiple neural network layers, where each layer
automatically extracts the hidden representation i.e. features from the input [175].
Deep learning eliminates the need of manual feature extraction and selection steps and
can be applied on the raw data directly. In general, the deep learning models are
134 Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children
computationally expensive to train; however, the testing time is small and has the
potential for use in real-time, field-based applications [176]. A long training time can
be justified as the models are usually trained offline and can be trained using resources
in the cloud [42].
In a recent study, Zhu, et al. [42] showed that deep learning/CNN models
improved EE prediction performance compared to activity specific models. However,
they did not compare performance across different accelerometer configurations.
Furthermore, their work was carried out only in adults. Due to developmental,
biomechanical, and behavioral factors, such as differences in motor proficiency,
resting metabolic rate, energy cost of locomotion, and PA types and patterns, models
developed in adults are not generalizable to preschool-aged children [53-55].
To our knowledge, this is the first study to employ deep learning algorithms to
predict EE from accelerometer data in pre-school children. Tri-axial accelerometer
signal from the left-wrist (LW), right-wrist (RW), and right-hip (RH) was collected
from eight preschool-aged children performing a range of developmentally appropriate
activities. EE was measured using a portable indirect calorimeter. Then, three EE
modelling approaches, including deep learning, conventional supervised machine
learning, and simple regression were carried out for each accelerometer location and
the combination of the right wrist and right hip.
Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children 135
7.3 DATA COLLECTION AND PRE-PROCESSING
7.3.1 Data Collection
Eight preschool children (age: 5.2 ± 0.8 yrs, weight: 20.9 ± 1.2 kg, height: 117.8
± 5.5 cm, male: 25%) performed a series of 10 simulated free-living activity trials. The
list of activities and description is given in Table 7.1.
Table 7.1 List of Activities and Their Description
Activity Trial Description Lying down Lying comfortable on the floor with a pillow Story time Sit on the floor on a cushion and listen to a story-book
Watching movie Sit on the floor on a cushion and watch a video on a tablet
Table game (free play) Sit in a chair at a table completing a developmentally appropriate puzzle activity
Whiteboard Draw on a whiteboard to create a picture while standing
Treasure hunt Walk through the activity room (20m x 10m) and search for and collect hidden toys
Pack away Collect toys and equipment and return them to the appropriate boxes or location in the activity room
Dance Watch a dance video and mirror the movements of the characters on the video
Clean up your backyard (bean bag game)
Keep playing area (4m x 3m) “clean” by throwing all bean-bags onto the instructors playing area. The instructor will do the same. Game ends when playing area is clean. Instructor will increase/decrease difficulty based on child’s ability
Captain’s coming Child stands in the centre the of activity room. Instructor calls out commands involving running, jumping, hopping, crawling
Each trial was 5 mins in length, and all trials were conducted in the research
facilities available at the University of Wollongong. Further details of the data
collection can be found in [55, 177].
During each trial, children wore ActiGraph tri-axial accelerometers (ActiGraph
Corporation, FL, USA) on three body locations (left wrist, right wrist, and right hip)
and a Metamax 3B portable calorimetry system to measure energy expenditure. The
portable calorimetry was calibrated according to the manufacturer’s instructions. The
sampling rate and dynamic range of accelerometer data were 100 Hz and +/- 8G
respectively. Portable calorimetry recorded breath-by-breath oxygen uptake (VO2).
136 Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children
7.3.2 Pre-Processing
Both accelerometer and reference VO2 data were converted into time-series
data structure. Using the timestamp, data for each trial were separated from each other.
In addition, 60s of data at the beginning and end of each trial were discarded from
analysis to remove non-steady-state data [18, 161, 162].
To ensure the validity of the portable calorimetry data, a screening algorithm
was applied to identify biologically implausible VO2 values (i.e. below the typical
resting value) and replaced those entries with the nearest valid entries. Then, the
breath-by-breath VO2 data was resampled to 100Hz using 1D interpolation with spline
method [18]. Finally, VO2 was converted into units of EE (kcal/min) using the
constant 1 L O2 = 4.825 kcal [178].
In order to get a reasonable window-size and 1024 samples of accelerometer data
in a window, this work applied 50% overlapping sliding window of size 10.24 seconds.
The selected window size is similar to the previous studies [47, 103, 179]. This
approached ensured that a sufficient number of samples were available for training the
algorithms. The average energy expenditure was calculated for each window as
corresponding ground truth energy expenditure. In our dataset, the average energy
expenditure value was between 0.65 to 5.78 (kcal/min).
After the pre-processing step, a total of 2472 windows were available for
training. The average number of windows per child was 309 ± 22.7. As each window
consists of 1024 samples for each of raw 3-axis accelerometer data, the total size for
training data was 2472 x 1024 x 3 = 7,593,984.
7.4 METHODS
EE was predicted using three different modelling approaches. The first approach
used deep learning on the raw accelerometer signal. The second approach used
conventional machine learning methods, including feature extraction, feature
selection, and regression analysis using established supervised learning algorithms.
The third method represented a simplified approach in which the mean acceleration
signal for each 10.24 second window was regressed on measured EE using least
squares regression. The three approaches were applied to four accelerometer
configurations (LW, RH, RW, and RW+RH).
Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children 137
7.4.1 Deep Learning Approach
The deep learning approach used in this study was Convolutional Neural
Networks (CNN), as the architecture is suitable to capture the dimensions of
accelerometer data. To ensure that the data is formatted to suit the CNN, a data
transformation process was applied prior to training the CNN architecture to
automatically extract features, followed by applying regression to estimate energy
expenditure.
1) Data Transformation: In this step, each window of the raw accelerometer
data was converted to a 3-dimensional matrix to make that suitable for CNN’s input.
Based on the number of accelerometer used, the data transformation techniques were
slightly different.
Single accelerometer – For a window (consists of 1024 3-axis accelerometer
samples), each of 3-axis accelerometer data (x, or y, or z) was rearranged to 32 x 32
matrix. After that, 32 x 32 matrix for each of x, y, and z are combined to form a 32 x
32 x 3 vector, shown in Figure 7.1 (a).
Combination of two accelerometers – Each corresponding two windows of two
accelerometers were merged vertically (2048 3-axis accelerometer samples). Then,
2025 samples were selected for the transformation, by removing 12 samples from the
beginning and 11 samples from the end. Finally, the samples were rearranged to 45 x
45 x 3 vector. The transformation of combined windows is shown in Figure 7.1(b).
138 Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children
(a)
(b)
Figure 7.1 The transformed representation of a) a single 3-axis accelerometer window, b) a combination of two accelerometer’s windows
2) CNN Architecture: CNN involves multiple processing layers comprising
linear and non-linear transformation. Several CNN architectures were implemented to
experimentally select the best performance. Based on the results, a 3-stage CNN
architecture modified from [180] was used in this study. The complete CNN
architecture is shown in Figure 2. It has an input layer, 3 convolution layers of rectified
linear units i.e. ReluLayer, max-pooling layers, a fully connected hidden layer, and a
regression layer. For the experiments, the number of epoch and initial learn rate were
set to 100 and 0.001 respectively to avoid overfitting of the model. A stochastic
gradient descent with momentum solver was used to train the model.
Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children 139
InputMaxPooling
Layer: 2x2 at stride 2
Conv Layer: 20 5x5 filters at stride 1, pad 2 +
ReluLayer
MaxPooling Layer:
2x2 at stride 2Full
connectionRegression
Conv Layer: 16 5x5 filters at stride 1, pad 2 +
ReluLayer
Conv Layer: 20 5x5 filters at stride 1, pad 2 +
ReluLayer
MaxPooling Layer:
2x2 at stride 2
Layer Description
Feature Map Size
[32 x 32 x 3] [32 x 32 x 16] [16 * 16 * 16] [16 * 16 * 20] [8 * 8 * 20] 1 1[8 * 8 * 20] [4 *4 * 20]
Prediction of EEBetween 0.65 to 5.78 (kcal/min)
Figure 7.2 The CNN architecture used in this study.
The figure depicts the use of single accelerometer i.e., input vector 32 x 32 x 3
Convolution Layer – In order to identify the temporal internal pattern of the input
matrix, a number of linear filters are applied. Filters slide over the input spatially and
extract the local correlation using the convolution operation with the input matrix. A
filter is just a matrix of weights and bias, which is usually spatially small. The size of
the feature map (width = w’, height = h’), after applying the convolutional layer,
depends on the size of the input vector (width = w, height = h, depth = d), size of filter
(width = f, height = f, depth = d), number of zero-padding (p), and size of stride (s).
The formula is given below:
𝑤𝑤′ = (𝑤𝑤 − 𝑓𝑓 + 2𝑝𝑝) 𝑆𝑆⁄ + 1 (2)
ℎ′ = (ℎ − 𝑓𝑓 + 2𝑝𝑝) 𝑆𝑆⁄ + 1 (3)
For example, for our first convolutional layer, input size was 32 x 32 x 3 (or 45 x
45 x 3 for the combination), and the filter size was set to 5 x 5 x 3. The number of zero
padding and stride were 2 and 1 respectively. As a result, feature map’s width and
height remained same as input 32 x 32 (or 45 x 45 for combination of two
accelerometers).
Rectified Linear Units Layer (ReluLayer) – After each convolutional layer, a
ReluLayer is used to introduce nonlinearity on top of the linear/convolutional
operation. ReluLayers work better than other nonlinear function including tanh and
sigmoid, due to its computational efficiency [181]. A ReluLayer does not change the
size of the input, it simply performs a threshold operation to each element, where any
negative input value is set to zero.
𝑓𝑓(𝑥𝑥) = �𝑥𝑥,0 𝑥𝑥 ≥ 0
𝑥𝑥 < 0 (4)
140 Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children
Max Pooling Layer – The max pooling layer runs on every depth of the input
data and down-samples the input using a max operation. For an input of size (width =
w, height = h, depth = d), the pooling layer (width = f, height = f, and stride = s) reduces
its size to w’ x h’ x d, where
𝑤𝑤′ = (𝑤𝑤 − 𝑓𝑓) 𝑠𝑠⁄ + 1 (5)
ℎ′ = (ℎ − 𝑓𝑓) 𝑠𝑠⁄ + 1 (6)
Fully Connected Hidden Layer and Regression Layer – Finally, a MLP based
fully connected layer followed by a regression layer are used on the extracted features.
In our case, the output size of fully connected layer was set equal to the number of
response variable, which was 1.
7.4.2 Conventional Supervised Learning Approach
Unlike the CNN learning technique, the conventional machine learning approach
consists of feature extraction, feature selection and regression analysis using
established supervised learning algorithms. To compare with the deep learning
approach, total 27 conventional models (shown in Figure 7.3) were developed for each
accelerometer configuration, by varying feature selection algorithms, number of
selected features, and supervised learning algorithms. The best model for each
accelerometer configuration was finally selected during the comparison.
Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children 141
Figure 7.3. Conventional approach design for each accelerometer location and a combination
1) Feature Extraction: Feature extraction utilised domain knowledge and
extracted 46 time and frequency domain features from each window of accelerometer
data. The most extracted features are adopted from our previous studies [161, 162], and
are listed in the Table 7.2.
142 Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children
Table 7.2 List of features extracted from each window of an accelerometer
No Features Feature Count
1 Vector magnitude of the accelerometer 1
2 Mean for each axis of a 3-axis accelerometer 3
3 Standard deviation for each axis of accelerometer 3
4 Minimum value for each axis 3
5 Maximum value for each axis 3
6 Variance for each axis 3
7 Median value for each axis 3
8 Skewness for each axis 3
9 Kurtosis for each axis 3
10 Energy for each axis 3
11 Cross-correlation of accelerometer axis 3
12 Principal frequency for each axis 3
13 Magnitude of principal frequency for each axis 3
14 Median crossing for each axis 3
15 25th percentile for each axis 3
16 75th percentile for each axis 3
Total number of features extracted 46
2) Feature Selection: Three feature selection algorithms, including minimum-
redundancy–maximum-relevance feature selection (MRMR) [165], correlation-based
feature selection (CFS) [133], and ReliefF feature selection [182, 183], were carried
out on the extracted features. When two accelerometers were fused, for each window,
the features for each accelerometer were fused horizontally. Then the feature selection
algorithms were applied. The number of selected features were also varied between 10,
15, 20 during the experiment. Before sending the selected features for the regression,
all features were normalised to zero mean and unit variance using linear method.
3) Supervised Learning Algorithms: The supervised learning algorithms used
in this study included multiple linear regression (MLR), support vector machine
regression (SVMR), and neural network regression (NNR). In our implementation,
RBF kernel function was used for SVMR, and for NNR, 1 hidden layer was used. The
Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children 143
number nodes in the hidden layer was half of input nodes. Maximum iteration/epoch
and learning rate were set to 250 and 0.001 respectively.
7.4.3 Simplified Approach
In the simplified approach only three features, including mean(x), mean(y),
mean(z), were extracted from each accelerometer window. Then, a least squares
regression was applied to estimate EE. For the two-accelerometer combination, the
features of two accelerometer windows were merged horizontally, which made 6
features. Then the fused features were entered into least squares regression.
7.5 PERFORMANCE EVALUATION
To measure regression task, the performance metrics use both root-mean-square
error (RMSE) and coefficient of determination (R2). For n different predictions, if 𝑦𝑦𝑡𝑡′
are the predicted values by the model and 𝑦𝑦𝑡𝑡 are the original values, the RMSE and R2
can be calculated using the following equations:
𝑅𝑅𝑅𝑅𝑆𝑆𝑅𝑅 = �∑ (𝑦𝑦𝑡𝑡′−𝑦𝑦𝑡𝑡)2𝑛𝑛𝑡𝑡=1
𝑃𝑃 (7)
𝑅𝑅2 = 1 − ∑ (𝑦𝑦𝑡𝑡′−𝑦𝑦𝑡𝑡)2𝑛𝑛𝑡𝑡=1
∑ (𝑦𝑦𝑡𝑡−𝑚𝑚𝑙𝑙𝑎𝑎𝑃𝑃(𝑦𝑦𝑡𝑡))2𝑛𝑛𝑡𝑡=1
(8)
The subject/fold wise performances of each accelerometer configuration in three
different approach were tested for statistical significance using one-way repeated
measures ANOVA. In addition, LSD post hoc tests were carried out to identify the
differences between approaches.
To understand the model’s performance on new data, this study used leave one
subject out cross validation, where data from one subject are used for testing and the
other subjects’ samples are used for training. In this way, samples of each subject are
used exactly once for testing. The averaged performance metrics, and their standard
deviations are used as final performance metrics.
144 Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children
7.6 RESULTS AND DISCUSSION
Figure 7.4 shows the results (RMSE and R2) of CNN, conventional supervised
learning and simplified regression approaches for each accelerometer configuration.
For the conventional supervised learning approaches, the best model for each
accelerometer configuration is presented.
(a)
(b)
Figure 7.4 Results of the approaches for each accelerometer location and a combination using (a) RMSE, and b) R2
7.6.1 Evaluation of Deep Learning Approach
The best performance was obtained at the RH location, (RMSE: 0.54 kcals/min,
R2: 0.71). Among the wrist locations, RW provided marginally better performance
than LW.
When the best two single accelerometer locations were combined (RW+RH), the
performance did not improve relative to performance at the hip. This finding is
Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children 145
consistent with previous works, which found that combining outputs from multiple
accelerometer locations did not improve EE prediction [103, 184].
7.6.2 Evaluation of Conventional Approach
The best conventional regression models for LW used MRMR feature selection
algorithm to select 10 features and SVMR for EE prediction (RMSE: 0.63 kcals/min
and R2: 0.62). For RW location, CFS feature selection algorithm, 10 features, and NNR
learning algorithm provided best conventional model (RMSE: 0.58 kcals/min and R2:
0.66). The best conventional model for RH utilized the CFS algorithm to select 20
features and NNR for EE prediction (RMSE: 0.55 and R2: 0.71). The best model for
RW+RH used the MRMR feature selection algorithm to select 10 features and NNR
for EE prediction (RMSE: 0.53 and R2: 0.71).
The patterns of the results are similar to the CNN approach. The best single
location for energy expenditure estimation was RH, RW location provided better
performance than the LW. RW+RH only marginally improved the regression
performance.
7.6.3 Evaluation of Simplified Regression Approach
Consistent with other approaches, the RH provided the best performance. RW
only gave marginally better performance than LW. However, combining the two
accelerometers (RW+RH) resulted in substantial performance improvement.
7.6.4 Comparison of Approaches
The best models (lowest RMSE) for the wrist locations and RW+RH
combination were obtained using the conventional supervised learning approach.
However, for the RH location, CNN provided the lowest RMSE of 0.54 kcals/min.
The simplified regression approach showed poor performance compared to the others.
Using one-way repeated measures ANOVA, significant statistical differences
were observed between the approaches for each accelerometer configuration. For
example, at the hip location, mean RMSE differed significantly between three
approaches (Wilks lambda = 0.160, F(2,6) = 15.798, P = 0.04). Least significant
difference (LSD) post hoc comparisons revealed the same result for all accelerometer
configurations. In all locations, both CNN and conventional approaches provided
146 Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children
significantly better performance than the simplified approach, while there were no
statistical differences between CNN and conventional machine learning approaches.
Although CNN models did not exceed the best performing conventional
supervised learning models in most situations, CNN is advantageous as it eliminates
the complex feature extraction and selection step, which requires extensive domain
knowledge. Also, often, the best features of one model do not perform in another
similar model. It is also known that CNN will improve its performance when the
amount of training data available is increased.
Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children 147
7.7 CONCLUSION
This study compared deep-learning/CNN models with the best conventional
supervised learning models and simple least squares regression for different
accelerometer configurations in pre-school children. Evaluation was performed using
a dataset collected from eight preschool-aged children performing 10 simulated free-
living activities.
For all accelerometer configurations, both deep learning and conventional
supervised learning algorithms provided comparable performance with no statistical
differences. However, the simplified approach showed significantly poor performance
compared to the others. This finding reflects that deep learning can be used as an
alternative tool of energy expenditure prediction which can eliminate the need for
feature engineering (i.e. feature extraction, selection).
Among the accelerometer configurations, right hip accelerometer placement
provided consistently better performance than the wrist placements. The combination
of right hip and right wrist did not exceed the performance of right hip only. Thus,
adding more sensors doesn’t result in improved energy expenditure prediction. As the
use of multiple accelerometers reduces user’s compliance, this finding suggests the
use of a single hip accelerometer for predicting energy expenditure.
Although the study had an adequate amount of accelerometer data to conduct
machine learning algorithms, the dataset involved only a small number of participants.
The dataset was collected as part of a larger study investigating sensor enabled
prediction of PA and EE in preschool children. Based on these encouraging results,
this work will be extended and applied on larger datasets in the future. The work also
acknowledges the need for research/future work on investigating other data
transformation strategies for accelerometer data before feeding to a CNN and use
completely free-living datasets to compare the approaches.
148 Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children
7.8 ACKNOWLEDGMENT
This research was supported by Australian Research Council (ARC) Discovery
grant to the “Modelling active play in preschool children using machine learning” at
the University of Wollongong (DP150100116).
Chapter 8: Conclusion and Future Work 149
Chapter 8: Conclusion and Future Work
This thesis contributed to the development of advanced learning models and a
fusion algorithm to improve the prediction of physical activity and its personal impacts
(including relative physical activity intensity, and energy expenditure) from wearable
sensor data. In addition, it identified the optimal sensor positioning and optimal
combination of multimodal data for assessing physical activity and predicting its
impacts. The major achievements are summarised as follows.
8.1 SUMMARY OF ACHIEVEMENTS
The first contribution, presented in Chapter 3, is showing that activity
recognition accuracy can be improved through the implementation of ensemble
learning methods. A custom ensemble using weighted majority voting to fuse the
decisions of four widely used ‘‘state-of-the-art’’ classification algorithms consistently
outperformed the constituent base classifiers and most conventional ensemble models.
Of the three decision fusion techniques examined in the custom ensemble, weighted
majority vote provided marginally better performance than NB fusion and significantly
outperformed BKS fusion. The conventional ensemble methods, such as bagging,
boosting, and random forests improve activity recognition in most, but not all,
situations.
The second contribution, presented in Chapter 4, is proposing the use of a novel
posterior-adapted class-based weighted decision fusion for physical activity
recognition based on data from multiple accelerometers. This method provided
significant improvements in performance, and outperformed other fusions such as
model-based and class-based weighted fusion. It showed that decision fusion with two
accelerometers, especially ankle and wrist, can significantly improve the average
performance compared to the use of a single accelerometer. The decision fusion of 3
accelerometers did not show further improvement from the best combination of 2
accelerometers.
The third contribution, presented in chapters 5 and 6, is demonstrating the use of
multi-modal physiological data using machine learning methods for prediction of
relative physical activity intensity. The results showed that the features extracted from
150 Chapter 8: Conclusion and Future Work
RR-interval can provide the highest relative physical activity intensity prediction
performance compared to single modalities (including heart-rate, electrodermal
activity, and temperature). It also identified that the features extracted from
electrodermal activity and temperature are not good by themselves, but they can
provide additional information and improve prediction performance when combined
with RR or heart-rate using feature fusion.
The fourth contribution, presented in Chapter 7, is proposing the use of deep
learning for predicting energy expenditure using accelerometer sensor data. This
method can eliminate the need for manual process of designing the approach for
feature extraction and selection. Based on the results, deep learning can achieve similar
performance to supervised learning models significantly, and significantly
outperformed simple regression models. In addition, the right hip was found to be a
better location for accelerometer placement for energy expenditure prediction
compared to wrist locations (left-, right-wrist). The use of multiple accelerometers did
not necessarily improve the energy expenditure prediction performance.
Chapter 8: Conclusion and Future Work 151
8.2 LIMITATIONS
The following research limitations warrant consideration.
Although this thesis utilised a diverse set of activity datasets to develop and test
the prediction models, none of these datasets are truly free-living datasets. All of the
datasets were either collected in a laboratory or simulated free-living environments
using predetermined activity sequences.
Our goal was to provide accurate estimation of physical activity classes, relative
physical activity intensity and energy expenditure for healthy young adults and
children. Additional investigation is required to check the suitability of the proposed
models for other groups such as the elderly, the obese and persons affected by chronic
diseases.
This work mainly utilised the movement, physiological, and person-level profile
data as the inputs of the model. The context during the data collection (e.g.,
environment, location, and time) was either controlled or efforts were made to keep it
consistent. Therefore, this study did not consider the contextual features in the model.
However, in the future, in a varying context, these contextual features should be
considered.
The proposed methods such as ensemble, deep leaning, etc. are computationally
complex. Our methods are only proposed for the offline data analysis where accuracy
is more important than the computational cost. It should also be worth to mention that,
in the age of cloud computing, the computing resources are available than ever.
Although most ensembles can take longer time to train, the test time is still small or
reasonable. Therefore, in future it is possible to improve the algorithms for faster
response time and deployed in cloud.
The sample sizes of the datasets used in this study were adequate (Appendix F).
However, the study can be conducted on a much larger dataset involving large number
of participants and activity trials.
152 Chapter 8: Conclusion and Future Work
8.3 FUTURE WORK
This study will influence the future work of mobile-based personal exercise
coaching and develop safer exercise or working strategies. This study can be expanded
in the following ways.
The proposed algorithms could be evaluated in the true free-living context and
using a wide range of activities. The approaches undertaken in this research could be
applied and evaluated in other population groups such as the elderly and individuals
with chronic diseases.
In this study, all the experiments utilised diverse range of physical activity
datasets that had a number of everyday life, ambulatory and non-ambulatory activities.
In future, a study can be conducted to investigate the utility of the proposed algorithms
on the collapsed activities or broader groups of activities (e.g. sedentary, walking,
transport, chores, sport).
The algorithms can be evaluated over a long period of time (longitudinal study)
at home to investigate the suitability of the proposed sensor-based methods in day-to-
day environment. This can help for future work on coaching personal exercises.
The algorithms can be incorporated into a mobile application to accurately
predict, show, and track the physical activity and its impacts in real time. In this regard,
the algorithms should be improved for faster response time and efficient output, and
deployed in cloud.
The algorithms can be conducted in experiments by exploring more multimodal
sensors, emerging since this study was completed.
The algorithms can be eventually trialled by partnering with wellness /coaching
programs to see how our system can assist in refining the programs over a period of
time by iteratively designing the physical activity.
Bibliography 153
Bibliography
[1] World Health Organization. (8 Nov 2017). Global Strategy on Diet, Physical
Activity and Health. Available: http://www.who.int/dietphysicalactivity/pa/en/
[2] I.-M. Lee, E. J. Shiroma, F. Lobelo, P. Puska, S. N. Blair, P. T. Katzmarzyk, et
al., "Effect of physical inactivity on major non-communicable diseases
worldwide: an analysis of burden of disease and life expectancy," The lancet,
vol. 380, pp. 219-229, 2012.
[3] A. E. Field, E. H. Coakley, A. Must, J. L. Spadano, N. Laird, W. H. Dietz, et
al., "Impact of overweight on the risk of developing common chronic diseases
during a 10-year period," Archives of internal medicine, vol. 161, pp. 1581-
1586, 2001.
[4] S. B. Eaton and S. B. Eaton, "Physical Inactivity, Obesity, and Type 2
Diabetes: An Evolutionary Perspective," Research Quarterly for Exercise and
Sport, vol. 88, pp. 1-8, 2017.
[5] F. W. Booth, M. V. Chakravarthy, S. E. Gordon, and E. E. Spangenburg,
"Waging war on physical inactivity: using modern molecular ammunition
against an ancient enemy," Journal of Applied Physiology, vol. 93, pp. 3-30,
2002.
[6] S. Arent, M. Landers, and J. Etnier, "The effects of exercise on mood in older
adults: a meta-analytic," J. Ageing Phys. Act, vol. 8, pp. 407-430, 2000.
[7] M. Teychenne, K. Ball, and J. Salmon, "Physical activity and likelihood of
depression in adults: a review," Preventive medicine, vol. 46, pp. 397-411,
2008.
[8] M. Reiner, C. Niermann, D. Jekauc, and A. Woll, "Long-term health benefits
of physical activity–a systematic review of longitudinal studies," BMC public
health, vol. 13, p. 813, 2013.
[9] F. Gómez-Gallego, J. R. Ruiz, A. Buxens, S. Altmäe, M. Artieda, C. Santiago,
et al., "Are elite endurance athletes genetically predisposed to lower disease
risk?," Physiological genomics, vol. 41, pp. 82-90, 2010.
[10] I. Janssen and A. G. LeBlanc, "Systematic review of the health benefits of
physical activity and fitness in school-aged children and youth," International
journal of behavioral nutrition and physical activity, vol. 7, p. 40, 2010.
154 Bibliography
[11] World Health Organization. (8 Nov 2017). Physical inactivity a leading cause
of disease and disability, warns WHO. Available:
http://www.who.int/mediacentre/news/releases/release23/en/
[12] W. Whang, J. E. Manson, F. B. Hu, and et al., "PHysical exertion, exercise,
and sudden cardiac death in women," JAMA, vol. 295, pp. 1399-1403, 2006.
[13] T. Mann, R. P. Lamberts, and M. I. Lambert, "Methods of prescribing relative
exercise intensity: physiological and practical considerations," Sports
medicine, vol. 43, pp. 613-625, 2013.
[14] A. E. Bauman, R. S. Reis, J. F. Sallis, J. C. Wells, R. J. Loos, B. W. Martin, et
al., "Correlates of physical activity: why are some people physically active and
others not?," The lancet, vol. 380, pp. 258-271, 2012.
[15] World Health Organization. (2011, 3 Sep 2015). Information sheet: global
recommendations on physical activity for health 18 - 64 years old. Available:
http://www.who.int/dietphysicalactivity/publications/recommendations18_64
yearsold/en/
[16] J. Parkka, M. Ermes, K. Antila, M. van Gils, A. Manttari, and H. Nieminen,
"Estimating intensity of physical activity: a comparison of wearable
accelerometer and gyro sensors and 3 sensor locations," in Engineering in
Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International
Conference of the IEEE, 2007, pp. 1511-1514.
[17] N. Vyas, J. Farringdon, D. Andre, and J. I. Stivoric, "Machine learning and
sensor fusion for estimating continuous energy expenditure," AI Magazine, vol.
33, p. 55, 2012.
[18] M. Altini, J. Penders, R. Vullers, and O. Amft, "Combining wearable
accelerometer and physiological data for activity and energy expenditure
estimation," in Proceedings of the 4th Conference on Wireless Health, 2013,
p. 1.
[19] S. G. Trost, B. S. Fees, S. J. Haar, A. D. Murray, and L. K. Crowe,
"Identification and validity of accelerometer cut‐points for toddlers," Obesity,
vol. 20, pp. 2317-2319, 2012.
[20] M. Altini, J. Penders, R. Vullers, and O. Amft, "Automatic Heart Rate
Normalization for Accurate Energy Expenditure Estimation," Methods Inf
Med, vol. 53, pp. 382-388, 2014.
Bibliography 155
[21] M. Altini, J. Penders, and O. Amft, "Energy expenditure estimation using
wearable sensors: a new methodology for activity-specific models," in
Proceedings of the conference on Wireless Health, 2012, p. 1.
[22] S. Brage, U. Ekelund, N. Brage, M. A. Hennings, K. Froberg, P. W. Franks, et
al., "Hierarchy of individual calibration levels for heart rate and accelerometry
to measure physical activity," Journal of Applied Physiology, vol. 103, pp. 682-
692, 2007.
[23] S. J. Preece, J. Y. Goulermas, L. P. Kenney, D. Howard, K. Meijer, and R.
Crompton, "Activity identification using body-mounted sensors—a review of
classification techniques," Physiological measurement, vol. 30, p. R1, 2009.
[24] K. Ellis, J. Kerr, S. Godbole, J. Staudenmayer, and G. Lanckriet, "Hip and
Wrist Accelerometer Algorithms for Free-Living Behavior Classification,"
Medicine and science in sports and exercise, vol. 48, pp. 933-940, 2016.
[25] J. Staudenmayer, S. He, A. Hickey, J. Sasaki, and P. Freedson, "Methods to
estimate aspects of physical activity and sedentary behavior from high-
frequency wrist accelerometer measurements," Journal of Applied Physiology,
vol. 119, pp. 396-403, 2015.
[26] A. Mannini, S. S. Intille, M. Rosenberger, A. M. Sabatini, and W. Haskell,
"Activity recognition using a single accelerometer placed at the wrist or ankle,"
Medicine and science in sports and exercise, vol. 45, p. 2193, 2013.
[27] M. J. Mathie, A. C. Coster, N. H. Lovell, and B. G. Celler, "Accelerometry:
providing an integrated, practical method for long-term, ambulatory
monitoring of human movement," Physiological measurement, vol. 25, p. R1,
2004.
[28] I. Cleland, B. Kikhia, C. Nugent, A. Boytsov, J. Hallberg, K. Synnes, et al.,
"Optimal placement of accelerometers for the detection of everyday activities,"
Sensors, vol. 13, pp. 9183-9200, 2013.
[29] S. Chernbumroong, A. S. Atkins, and H. Yu, "Activity classification using a
single wrist-worn accelerometer," in Software, Knowledge Information,
Industrial Management and Applications (SKIMA), 2011 5th International
Conference on, 2011, pp. 1-6.
[30] S. G. Trost, Y. Zheng, and W.-K. Wong, "Machine learning for activity
recognition: hip versus wrist data," Physiological measurement, vol. 35, p.
2183, 2014.
156 Bibliography
[31] U. G. Mangai, S. Samanta, S. Das, and P. R. Chowdhury, "A survey of decision
fusion and feature fusion strategies for pattern classification," IETE Technical
review, vol. 27, pp. 293-307, 2010.
[32] M. Soleymani, M. Pantic, and T. Pun, "Multimodal emotion recognition in
response to videos," Affective Computing, IEEE Transactions on, vol. 3, pp.
211-223, 2012.
[33] N. E. Miller, S. J. Strath, A. M. Swartz, and S. E. Cashin, "Estimating absolute
and relative physical activity intensity across age via accelerometry in adults,"
Journal of aging and physical activity, vol. 18, p. 158, 2010.
[34] C. Ozemek, H. L. Cochran, S. J. Strath, W. Byun, and L. A. Kaminsky,
"Estimating relative intensity using individualized accelerometer cutpoints: the
importance of fitness level," BMC medical research methodology, vol. 13, p.
53, 2013.
[35] G. Borg, Borg's perceived exertion and pain scales vol. viii. Champaign, IL,
US: Human Kinetics, 1998.
[36] J. Scherr, B. Wolfarth, J. W. Christle, A. Pressler, S. Wagenpfeil, and M. Halle,
"Associations between Borg’s rating of perceived exertion and physiological
measures of exercise intensity," European journal of applied physiology, vol.
113, pp. 147-155, 2013.
[37] M. J. Chen, X. Fan, and S. T. Moe, "Criterion-related validity of the Borg
ratings of perceived exertion scale in healthy individuals: a meta-analysis,"
Journal of sports sciences, vol. 20, pp. 873-899, 2002.
[38] Y.-L. Chen, C.-C. Chen, P.-Y. Hsia, and S.-K. Lin, "Relationships of Borg's
RPE 6-20 scale and heart rate in dynamic and static exercises among a sample
of young Taiwanese men.," Perceptual & Motor Skills, vol. 117, pp. 971-982,
2013.
[39] U. M. Kujala, J. Pietilä, T. Myllymäki, S. Mutikainen, T. Föhr, I. Korhonen, et
al., "Physical Activity: Absolute Intensity versus Relative-to-Fitness-Level
Volumes," Medicine and science in sports and exercise, vol. 49, pp. 474-481,
2017.
[40] M. S. Fairbarn, S. P. Blackie, N. G. McElvaney, B. R. Wiggs, P. D. Pare, and
R. L. Pardy, "Prediction of heart rate and oxygen uptake during incremental
and maximal exercise in healthy adults," Chest, vol. 105, pp. 1365-1369, 1994.
Bibliography 157
[41] H. Tanaka, K. D. Monahan, and D. R. Seals, "Age-predicted maximal heart
rate revisited," Journal of the American College of Cardiology, vol. 37, pp.
153-156, 2001.
[42] J. Zhu, A. Pande, P. Mohapatra, and J. J. Han, "Using deep learning for energy
expenditure estimation with wearable sensors," in E-health Networking,
Application & Services (HealthCom), 2015 17th International Conference on,
2015, pp. 501-506.
[43] P. S. Freedson, E. Melanson, and J. Sirard, "Calibration of the Computer
Science and Applications, Inc. accelerometer," Medicine and science in sports
and exercise, vol. 30, pp. 777-781, 1998.
[44] J. Staudenmayer, D. Pober, S. Crouter, D. Bassett, and P. Freedson, "An
artificial neural network to estimate physical activity energy expenditure and
identify physical activity type from an accelerometer," Journal of Applied
Physiology, vol. 107, pp. 1300-1307, 2009.
[45] A. H. Montoye, B. Dong, S. Biswas, and K. A. Pfeiffer, "Validation of a
wireless accelerometer network for energy expenditure measurement," Journal
of sports sciences, vol. 34, pp. 2130-2139, 2016.
[46] A. Pande, Y. Zeng, A. K. Das, P. Mohapatra, S. Miyamoto, E. Seto, et al.,
"Energy expenditure estimation with smartphone body sensors," in
Proceedings of the 8th International Conference on Body Area Networks,
2013, pp. 8-14.
[47] S. G. Trost, W.-K. Wong, K. A. Pfeiffer, and Y. Zheng, "Artificial neural
networks to predict activity type and energy expenditure in youth," Medicine
and science in sports and exercise, vol. 44, p. 1801, 2012.
[48] A. H. Montoye, M. Begum, Z. Henning, and K. A. Pfeiffer, "Comparison of
linear and non-linear models for predicting energy expenditure from raw
accelerometer data," Physiological measurement, vol. 38, p. 343, 2017.
[49] A. Pande, G. Casazza, A. Nicorici, E. Seto, S. Miyamoto, M. Lange, et al.,
"Energy expenditure estimation in boys with duchene muscular dystrophy
using accelerometer and heart rate sensors," in Healthcare Innovation
Conference (HIC), 2014 IEEE, 2014, pp. 26-29.
[50] S. Liu, R. X. Gao, D. John, J. Staudenmayer, and P. S. Freedson, "SVM-based
multi-sensor fusion for free-living physical activity assessment," in
158 Bibliography
Engineering in Medicine and Biology Society, EMBC, 2011 Annual
International Conference of the IEEE, 2011, pp. 3188-3191.
[51] S. J. Preece, J. Y. Goulermas, L. P. Kenney, and D. Howard, "A comparison
of feature extraction methods for the classification of dynamic activities from
accelerometer data," Biomedical Engineering, IEEE Transactions on, vol. 56,
pp. 871-879, 2009.
[52] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time
object detection with region proposal networks," in Advances in neural
information processing systems, 2015, pp. 91-99.
[53] M. Hagenbuchner, D. P. Cliff, S. G. Trost, N. Van Tuc, and G. E. Peoples,
"Prediction of activity type in preschool children using machine learning
techniques," journal of Science and Medicine in Sport, vol. 18, pp. 426-431,
2015.
[54] M. Brandes, B. Steenbock, and N. Wirsik, "Energy Cost of Common Physical
Activities in Preschoolers," Journal of Physical Activity and Health, vol. 20,
pp. 1-6, 2017.
[55] S. G. Trost, D. Cliff, M. Ahmadi, N. Van Tuc, and M. Hagenbuchner, "Sensor-
enabled activity class recognition in preschoolers: Hip versus wrist data,"
Medicine and science in sports and exercise, 2017.
[56] R. J. Kate, A. M. Swartz, W. A. Welch, and S. J. Strath, "Comparative
evaluation of features and techniques for identifying activity type and
estimating energy cost from accelerometer data," Physiological measurement,
vol. 37, p. 360, 2016.
[57] N. F. Butte, U. Ekelund, and K. R. Westerterp, "Assessing physical activity
using wearable monitors: measures of physical activity," Medicine and science
in sports and exercise, vol. 44, pp. S5-12, 2012.
[58] L. Kallings, M. Leijon, M. L. Hellénius, and A. Ståhle, "Physical activity on
prescription in primary health care: a follow‐up of physical activity level and
quality of life," Scandinavian journal of medicine & science in sports, vol. 18,
pp. 154-161, 2008.
[59] E. A. Awick, D. K. Ehlers, S. Aguiñaga, A. M. Daugherty, A. F. Kramer, and
E. McAuley, "Effects of a randomized exercise trial on physical activity,
psychological distress and quality of life in older adults," General Hospital
Psychiatry, 2017.
Bibliography 159
[60] J. Erlichman, A. Kerbey, and W. James, "Physical activity and its impact on
health outcomes. Paper 2: Prevention of unhealthy weight gain and obesity by
physical activity: an analysis of the evidence," Obesity reviews, vol. 3, pp. 273-
287, 2002.
[61] G. Erikssen, K. Liestøl, J. Bjørnholt, E. Thaulow, L. Sandvik, and J. Erikssen,
"Changes in physical fitness and changes in mortality," The Lancet, vol. 352,
pp. 759-762, 1998.
[62] K. E. Powell, A. E. Paluch, and S. N. Blair, "Physical activity for health: What
kind? How much? How intense? On top of what?," Public Health, vol. 32, p.
349, 2011.
[63] The Department of Health Australia. (2017, 2 Jan 2018). Australia's Physical
Activity and Sedentary Behaviour Guidelines. Available:
http://www.health.gov.au/internet/main/publishing.nsf/content/health-
pubhlth-strateg-phys-act-guidelines#apaadult
[64] A. D. Okely, D. Ghersi, K. D. Hesketh, R. Santos, S. P. Loughran, D. P. Cliff,
et al., "A collaborative approach to adopting/adapting guidelines-The
Australian 24-Hour Movement Guidelines for the early years (Birth to 5 years):
an integration of physical activity, sedentary behavior, and sleep," BMC public
health, vol. 17, p. 869, 2017.
[65] J. Parkkari, P. Kannus, A. Natri, I. Lapinleimu, M. Palvanen, M. Heiskanen, et
al., "Active Living and Injury Risk," Int J Sports Med, vol. 25, pp. 209-216, //
15.04.2004 2004.
[66] D. S. Siscovick, N. S. Weiss, R. H. Fletcher, and T. Lasky, "The Incidence of
Primary Cardiac Arrest during Vigorous Exercise," New England Journal of
Medicine, vol. 311, pp. 874-877, 1984.
[67] C. M. Albert, M. A. Mittleman, C. U. Chae, I.-M. Lee, C. H. Hennekens, and
J. E. Manson, "Triggering of sudden death from cardiac causes by vigorous
exertion," New England Journal of Medicine, vol. 343, pp. 1355-1361, 2000.
[68] P. D. Thompson, "Exercise prescription and proscription for patients with
coronary artery disease," Circulation, vol. 112, pp. 2354-2363, 2005.
[69] I.-M. Lee, H. D. Sesso, Y. Oguma, and R. S. Paffenbarger, "Relative intensity
of physical activity and risk of coronary heart disease," Circulation, vol. 107,
pp. 1110-1116, 2003.
160 Bibliography
[70] P. Freedson, H. R. Bowles, R. Troiano, and W. Haskell, "Assessment of
physical activity using wearable monitors: recommendations for monitor
calibration and use in the field," Medicine and science in sports and exercise,
vol. 44, p. S1, 2012.
[71] S. J. Strath, L. A. Kaminsky, B. E. Ainsworth, U. Ekelund, P. S. Freedson, R.
A. Gary, et al., "Guide to the assessment of physical activity: clinical and
research applications," Circulation, vol. 128, pp. 2259-2279, 2013.
[72] S. J. Strath, A. M. Swartz, D. R. Bassett Jr, W. L. O'Brien, G. A. King, and B.
E. Ainsworth, "Evaluation of heart rate as a method for assessing moderate
intensity physical activity," Medicine and Science in Sports and Exercise, vol.
32, pp. S465-70, 2000.
[73] R. J. Robertson, Perceived exertion for practitioners: rating effort with the
OMNI picture system: Human Kinetics, 2004.
[74] K. Rice, C. Gammon, K. Pfieffer, and S. G. Trost, "Age related differences in
the validity of the OMNI perceived exertion scale during lifestyle activities,"
Pediatric exercise science, vol. 27, pp. 95-101, 2015.
[75] A. C. Utter, R. J. Robertson, J. M. Green, R. R. Suminski, S. R. McAnulty, and
D. C. Nieman, "Validation of the Adult OMNI Scale of perceived exertion for
walking/running exercise," Medicine and science in sports and exercise, vol.
36, pp. 1776-1780, 2004.
[76] H. K. Neilson, P. J. Robson, C. M. Friedenreich, and I. Csizmadi, "Estimating
activity energy expenditure: how valid are physical activity questionnaires?,"
The American journal of clinical nutrition, vol. 87, pp. 279-291, 2008.
[77] K. Ohkawara, Y. Hikihara, T. Matsuo, E. L. Melanson, and M. Hibi, "Variable
factors of total daily energy expenditure in humans," The Journal of Physical
Fitness and Sports Medicine, vol. 1, pp. 389-399, 2012.
[78] M. Luštrek, B. Cvetković, and S. Kozina, "Energy expenditure estimation with
wearable accelerometers," in Circuits and Systems (ISCAS), 2012 IEEE
International Symposium on, 2012, pp. 5-8.
[79] E. Jequier and Y. Schutz, "Long-term measurements of energy expenditure in
humans using a respiration chamber," The American journal of clinical
nutrition, vol. 38, pp. 989-998, 1983.
Bibliography 161
[80] J. McLaughlin, G. King, E. Howley, D. Bassett Jr, and B. Ainsworth,
"Validation of the COSMED K4 b2 portable metabolic system," International
journal of sports medicine, vol. 22, pp. 280-284, 2001.
[81] S. G. Trost, P. D. Loprinzi, R. Moore, and K. A. Pfeiffer, "Comparison of
accelerometer cut points for predicting activity intensity in youth," Med Sci
Sports Exerc, vol. 43, pp. 1360-1368, 2011.
[82] H. J. Montoye, R. Washburn, S. Servais, A. Ertl, J. G. Webster, and F. J. Nagle,
"Estimation of energy expenditure by a portable accelerometer," Medicine and
Science in Sports and Exercise, vol. 15, pp. 403-407, 1982.
[83] J. Bussmann, W. Martens, J. Tulen, F. Schasfoort, H. Van Den Berg-Emons,
and H. Stam, "Measuring daily behavior using ambulatory accelerometry: the
Activity Monitor," Behavior Research Methods, Instruments, & Computers,
vol. 33, pp. 349-356, 2001.
[84] A. S. Jackson, S. N. Blair, M. T. Mahar, L. T. Wier, R. M. Ross, and J. E.
Stuteville, "Prediction of functional aerobic capacity without exercise testing,"
Medicine and science in sports and exercise, vol. 22, pp. 863-870, 1990.
[85] F. K. Assah, U. Ekelund, S. Brage, A. Wright, J. C. Mbanya, and N. J.
Wareham, "Accuracy and validity of a combined heart rate and motion sensor
for the measurement of free-living physical activity energy expenditure in
adults in Cameroon," International journal of epidemiology, p. dyq098, 2010.
[86] J. Smolander, T. Juuti, M.-L. Kinnunen, K. Laine, V. Louhevaara, K.
Männikkö, et al., "A new heart rate variability-based method for the estimation
of oxygen consumption without individual laboratory calibration: application
example on postal workers," Applied ergonomics, vol. 39, pp. 325-331, 2008.
[87] A. Reiss and D. Stricker, "Creating and benchmarking a new dataset for
physical activity monitoring," in Proceedings of the 5th International
Conference on PErvasive Technologies Related to Assistive Environments,
2012, p. 40.
[88] M. Saar-Tsechansky and F. Provost, "Handling missing values when applying
classification models," Journal of machine learning research, vol. 8, pp. 1623-
1657, 2007.
[89] O. Banos, J.-M. Galvez, M. Damas, H. Pomares, and I. Rojas, "Window size
impact in human activity recognition," Sensors, vol. 14, pp. 6474-6499, 2014.
162 Bibliography
[90] L. Bao and S. S. Intille, "Activity recognition from user-annotated acceleration
data," in Pervasive computing, ed: Springer, 2004, pp. 1-17.
[91] C.-W. Lin, Y.-T. Yang, J.-S. Wang, and Y.-C. Yang, "A wearable sensor
module with a neural-network-based activity classification algorithm for daily
energy expenditure estimation," Information Technology in Biomedicine, IEEE
Transactions on, vol. 16, pp. 991-998, 2012.
[92] M. Altini, J. Penders, R. Vullers, and O. Amft, "Estimating energy expenditure
using body-worn accelerometers: a comparison of methods, sensors number
and positioning," Biomedical and Health Informatics, IEEE Journal of, vol.
19, pp. 219-226, 2015.
[93] J. Pärkkä, M. Ermes, P. Korpipää, J. Mäntyjärvi, J. Peltola, and I. Korhonen,
"Activity classification using realistic data from wearable sensors,"
Information Technology in Biomedicine, IEEE Transactions on, vol. 10, pp.
119-128, 2006.
[94] K. Aminian, P. Robert, E. Jéquier, and Y. Schutz, "Incline, speed, and distance
assessment during unconstrained walking," Medicine and science in sports and
exercise, vol. 27, pp. 226-234, 1995.
[95] M. Nyan, F. Tay, K. Seah, and Y. Sitoh, "Classification of gait patterns in the
time–frequency domain," Journal of biomechanics, vol. 39, pp. 2647-2656,
2006.
[96] N. Wang, E. Ambikairajah, N. H. Lovell, and B. G. Celler, "Accelerometry
based classification of walking patterns using time-frequency analysis," in
Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual
International Conference of the IEEE, 2007, pp. 4899-4902.
[97] F. Albinali, S. Intille, W. Haskell, and M. Rosenberger, "Using wearable
activity type detection to improve physical activity energy expenditure
estimation," in Proceedings of the 12th ACM international conference on
Ubiquitous computing, 2010, pp. 311-320.
[98] M. Berchtold, M. Budde, D. Gordon, H. R. Schmidtke, and M. Beigl,
"Actiserv: Activity recognition service for mobile phones," in Wearable
Computers (ISWC), 2010 International Symposium on, 2010, pp. 1-8.
[99] K. Ellis, J. Kerr, S. Godbole, G. Lanckriet, D. Wing, and S. Marshall, "A
random forest classifier for the prediction of energy expenditure and type of
Bibliography 163
physical activity from wrist and hip accelerometers," Physiological
measurement, vol. 35, p. 2191, 2014.
[100] P. S. Freedson, K. Lyden, S. Kozey-Keadle, and J. Staudenmayer, "Evaluation
of artificial neural network algorithms for predicting METs and activity type
from accelerometer data: validation on an independent sample," Journal of
Applied Physiology, vol. 111, pp. 1804-1812, 2011.
[101] E. Fullerton, B. Heller, and M. Munoz-Organero, "Recognizing Human
Activity in Free-Living Using Multiple Body-Worn Accelerometers," IEEE
Sensors Journal, vol. 17, pp. 5290-5297, 2017.
[102] I. C. Gyllensten and A. G. Bonomi, "Identifying types of physical activity with
a single accelerometer: evaluating laboratory-trained algorithms in daily life,"
Biomedical Engineering, IEEE Transactions on, vol. 58, pp. 2656-2663, 2011.
[103] K. Mackintosh, A. Montoye, K. Pfeiffer, and M. McNarry, "Investigating
optimal accelerometer placement for energy expenditure prediction in children
using a machine learning approach," Physiological measurement, vol. 37, p.
1728, 2016.
[104] T. G. Pavey, N. D. Gilson, S. R. Gomersall, B. Clark, and S. G. Trost, "Field
evaluation of a random forest activity classifier for wrist-worn accelerometer
data," Journal of Science and Medicine in Sport, 2016.
[105] T. Tamura, M. Sekine, M. Ogawa, T. Togawa, and Y. Fukui, "Classification of
acceleration waveforms during walking by wavelet transform," Methods of
information in medicine, vol. 36, pp. 356-359, 1997.
[106] E. M. Tapia, S. S. Intille, W. Haskell, K. Larson, J. Wright, A. King, et al.,
"Real-time recognition of physical activities and their intensities using wireless
accelerometers and a heart rate monitor," in Wearable Computers, 2007 11th
IEEE International Symposium on, 2007, pp. 37-40.
[107] U. Maurer, A. Rowe, A. Smailagic, and D. Siewiorek, "Location and activity
recognition using eWatch: A wearable sensor platform," Ambient Intelligence
in Everyday Life, pp. 86-102, 2006.
[108] G. Tulum, N. T. Artuğ, and B. Bolat, "Performance evaluation of feature
selection algorithms on human activity classification," in Innovations in
Intelligent Systems and Applications (INISTA), 2013 IEEE International
Symposium on, 2013, pp. 1-4.
164 Bibliography
[109] B. Fish, A. Khan, N. H. Chehade, C. Chien, and G. Pottie, "Feature selection
based on mutual information for human activity recognition," in Acoustics,
Speech and Signal Processing (ICASSP), 2012 IEEE International Conference
on, 2012, pp. 1729-1732.
[110] M. Zhang and A. A. Sawchuk, "A feature selection-based framework for
human activity recognition using wearable multimodal sensors," in
Proceedings of the 6th International Conference on Body Area Networks,
2011, pp. 92-98.
[111] N. Bicocchi, M. Mamei, and F. Zambonelli, "Detecting activities from body-
worn accelerometers via instance-based algorithms," Pervasive and Mobile
Computing, vol. 6, pp. 482-495, 2010.
[112] C. Catal, S. Tufekci, E. Pirmit, and G. Kocabag, "On the use of ensemble of
classifiers for accelerometer-based activity recognition," Applied Soft
Computing, vol. 37, pp. 1018-1022, 2015.
[113] D. R. Bassett Jr, A. V. Rowlands, and S. G. Trost, "Calibration and validation
of wearable monitors," Medicine and science in sports and exercise, vol. 44, p.
S32, 2012.
[114] S. G. Trost, "State of the Art Reviews: Measurement of Physical Activity in
Children and Adolescents," American Journal of Lifestyle Medicine, vol. 1, pp.
299-314, 2007.
[115] A. Reiss and D. Stricker, "Introducing a new benchmarked dataset for activity
monitoring," in Wearable Computers (ISWC), 2012 16th International
Symposium on, 2012, pp. 108-109.
[116] A. Reiss and D. Stricker, "Introducing a modular activity monitoring system,"
in Engineering in Medicine and Biology Society, EMBC, 2011 Annual
International Conference of the IEEE, 2011, pp. 5621-5624.
[117] U. Maurer, A. Rowe, A. Smailagic, and D. Siewiorek, "Location and activity
recognition using eWatch: A wearable sensor platform," in Ambient
Intelligence in Everyday Life, ed: Springer, 2006, pp. 86-102.
[118] M. Ermes, J. Parkka, J. Mantyjarvi, and I. Korhonen, "Detection of daily
activities and sports with wearable sensors in controlled and uncontrolled
conditions," Information Technology in Biomedicine, IEEE Transactions on,
vol. 12, pp. 20-26, 2008.
Bibliography 165
[119] K. Y. Chen and D. R. Bassett, "The technology of accelerometry-based activity
monitors: current and future," Medicine and science in sports and exercise, vol.
37, p. S490, 2005.
[120] A. H. Montoye, J. M. Pivarnik, L. M. Mudd, S. Biswas, and K. A. Pfeiffer,
"Comparison of activity type classification accuracy from accelerometers worn
on the hip, wrists, and thigh in young, apparently healthy adults," Measurement
in Physical Education and Exercise Science, vol. 20, pp. 173-183, 2016.
[121] D. M. Karantonis, M. R. Narayanan, M. Mathie, N. H. Lovell, and B. G. Celler,
"Implementation of a real-time human movement classifier using a triaxial
accelerometer for ambulatory monitoring," IEEE transactions on information
technology in biomedicine, vol. 10, pp. 156-167, 2006.
[122] B. J. Jefferis, P. H. Whincup, L. Lennon, and S. G. Wannamethee,
"Longitudinal Associations Between Changes in Physical Activity and Onset
of Type 2 Diabetes in Older British Men The influence of adiposity," Diabetes
care, vol. 35, pp. 1876-1883, 2012.
[123] M. S. Tremblay, A. G. LeBlanc, M. E. Kho, T. J. Saunders, R. Larouche, R. C.
Colley, et al., "Systematic review of sedentary behaviour and health indicators
in school-aged children and youth," International Journal of Behavioral
Nutrition and Physical Activity, vol. 8, p. 1, 2011.
[124] N. Owen, G. N. Healy, C. E. Matthews, and D. W. Dunstan, "Too much sitting:
the population-health science of sedentary behavior," Exercise and sport
sciences reviews, vol. 38, p. 105, 2010.
[125] A. H. Montoye, R. W. Moore, H. R. Bowles, R. Korycinski, and K. A. Pfeiffer,
"Reporting accelerometer methods in physical activity intervention studies: a
systematic review and recommendations for authors," British journal of sports
medicine, pp. bjsports-2015-095947, 2016.
[126] J. Skotte, M. Korshøj, J. Kristiansen, C. Hanisch, and A. Holtermann,
"Detection of physical activity types using triaxial accelerometers," J Phys Act
Health, vol. 11, pp. 76-84, 2014.
[127] L. Breiman, "Arcing classifier (with discussion and a rejoinder by the author),"
The annals of statistics, vol. 26, pp. 801-849, 1998.
[128] R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, "Boosting the margin: A
new explanation for the effectiveness of voting methods," Annals of statistics,
pp. 1651-1686, 1998.
166 Bibliography
[129] L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001.
[130] P. Yang, Y. Hwa Yang, B. B Zhou, and A. Y Zomaya, "A review of ensemble
methods in bioinformatics," Current Bioinformatics, vol. 5, pp. 296-308, 2010.
[131] M. Arif and A. Kattan, "Physical Activities Monitoring Using Wearable
Acceleration Sensors Attached to the Body," PloS one, vol. 10, p. e0130851,
2015.
[132] S. Pirttikangas, K. Fujinami, and T. Nakajima, "Feature selection and activity
recognition from wearable sensors," in Ubiquitous Computing Systems, ed:
Springer, 2006, pp. 516-527.
[133] M. A. Hall and L. A. Smith, "Feature Selection for Machine Learning:
Comparing a Correlation-Based Filter Approach to the Wrapper," in FLAIRS
conference, 1999, pp. 235-239.
[134] D. Ruta and B. Gabrys, "An overview of classifier fusion methods," Computing
and Information systems, vol. 7, pp. 1-10, 2000.
[135] L. I. Kuncheva, J. C. Bezdek, and R. P. Duin, "Decision templates for multiple
classifier fusion: an experimental comparison," Pattern recognition, vol. 34,
pp. 299-314, 2001.
[136] Y. S. Huang and C. Y. Suen, "A method of combining multiple experts for the
recognition of unconstrained handwritten numerals," Pattern Analysis and
Machine Intelligence, IEEE Transactions on, vol. 17, pp. 90-94, 1995.
[137] A. Reiss, M. Weber, and D. Stricker, "Exploring and extending the boundaries
of physical activity recognition," in Systems, Man, and Cybernetics (SMC),
2011 IEEE International Conference on, 2011, pp. 46-50.
[138] G. Forman and M. Scholz, "Apples-to-apples in cross-validation studies:
pitfalls in classifier performance measurement," ACM SIGKDD Explorations
Newsletter, vol. 12, pp. 49-57, 2010.
[139] N. Ruch, M. Rumo, and U. Mäder, "Recognition of activities in children by
two uniaxial accelerometers in free-living conditions," European journal of
applied physiology, vol. 111, pp. 1917-1927, 2011.
[140] Š. Raudys and F. Roli, "The behavior knowledge space fusion method:
Analysis of generalization error and strategies for performance improvement,"
in International Workshop on Multiple Classifier Systems, 2003, pp. 55-64.
Bibliography 167
[141] J. E. Sasaki, A. Hickey, J. Staudenmayer, D. John, J. A. Kent, and P. S.
Freedson, "Performance of Activity Classification Algorithms in Free-living
Older Adults," Medicine and science in sports and exercise, 2015.
[142] Y. Sun, M. S. Kamel, and A. K. Wong, "Empirical study on weighted voting
multiple classifiers," in Pattern Recognition and Data Mining, ed: Springer,
2005, pp. 335-344.
[143] L. Atallah, B. Lo, R. King, and G.-Z. Yang, "Sensor positioning for activity
recognition using wearable accelerometers," IEEE transactions on biomedical
circuits and systems, vol. 5, pp. 320-329, 2011.
[144] N. Kern, B. Schiele, and A. Schmidt, "Multi-sensor activity context detection
for wearable computing," in European Symposium on Ambient Intelligence,
2003, pp. 220-232.
[145] H. Gjoreski, M. Luštrek, and M. Gams, "Accelerometer placement for posture
recognition and fall detection," in Intelligent environments (IE), 2011 7th
international conference on, 2011, pp. 47-54.
[146] D. O. Olguın and A. S. Pentland, "Human activity recognition: Accuracy
across common locations for wearable sensors," in Proceedings of 2006 10th
IEEE International Symposium on Wearable Computers, Montreux,
Switzerland, 2006, pp. 11-14.
[147] F. A. Faria, J. A. Dos Santos, A. Rocha, and R. d. S. Torres, "A framework for
selection and fusion of pattern classifiers in multimedia recognition," Pattern
Recognition Letters, vol. 39, pp. 52-64, 2014.
[148] A. Bulling, U. Blanke, and B. Schiele, "A tutorial on human activity
recognition using body-worn inertial sensors," ACM Computing Surveys
(CSUR), vol. 46, p. 33, 2014.
[149] O. Banos, M. Damas, H. Pomares, F. Rojas, B. Delgado-Marquez, and O.
Valenzuela, "Human activity recognition based on a sensor weighting
hierarchical classifier," Soft Computing, vol. 17, pp. 333-343, 2013.
[150] W. Zhang and Z. Zhang, "Belief function based decision fusion for
decentralized target classification in wireless sensor networks," Sensors, vol.
15, pp. 20524-20540, 2015.
[151] Y. Lin, Q. Hu, J. Liu, J. Chen, and J. Duan, "Multi-label feature selection based
on neighborhood mutual information," Applied Soft Computing, vol. 38, pp.
244-256, 2016.
168 Bibliography
[152] M. Soleymani, G. Chanel, J. J. Kierkels, and T. Pun, "Affective
characterization of movie scenes based on content analysis and physiological
changes," International Journal of Semantic Computing, vol. 3, pp. 235-254,
2009.
[153] O. Banos, R. Garcia, J. A. Holgado-Terriza, M. Damas, H. Pomares, I. Rojas,
et al., "mHealthDroid: a novel framework for agile development of mobile
health applications," in Ambient Assisted Living and Daily Activities, ed:
Springer, 2014, pp. 91-98.
[154] O. Banos, C. Villalonga, R. Garcia, A. Saez, M. Damas, J. A. Holgado-Terriza,
et al., "Design, implementation and validation of a novel open framework for
agile development of mobile health applications," Biomedical engineering
online, vol. 14, pp. 1-20, 2015.
[155] S. Ahangama, Y. S. Lim, S. Y. Koh, and D. C. C. Poo, "Revolutionizing
Mobile Healthcare Monitoring Technology: Analysis of Features through Task
Model," in International Conference on Social Computing and Social Media,
2014, pp. 298-305.
[156] M. Kirwan, M. J. Duncan, C. Vandelanotte, and W. K. Mummery, "Using
smartphone technology to monitor physical activity in the 10,000 Steps
program: a matched case–control trial," Journal of medical Internet research,
vol. 14, 2012.
[157] S. Bauer, J. de Niet, R. Timman, and H. Kordy, "Enhancement of care through
self-monitoring and tailored feedback via text messaging and their use in the
treatment of childhood overweight," Patient education and counseling, vol. 79,
pp. 315-319, 2010.
[158] K. Y. Chen, K. F. Janz, W. Zhu, and R. J. Brychta, "Re-defining the roles of
sensors in objective physical activity monitoring," Medicine and science in
sports and exercise, vol. 44, p. S13, 2012.
[159] M. A. Adams, J. F. Sallis, G. J. Norman, M. F. Hovell, E. B. Hekler, and E.
Perata, "An adaptive physical activity intervention for overweight adults: a
randomized controlled trial," PloS one, vol. 8, p. e82901, 2013.
[160] R. R. Pate, M. Pratt, S. N. Blair, W. L. Haskell, C. A. Macera, C. Bouchard, et
al., "Physical activity and public health: a recommendation from the Centers
for Disease Control and Prevention and the American College of Sports
Medicine," Jama, vol. 273, pp. 402-407, 1995.
Bibliography 169
[161] A. Chowdhury, D. Tjondronegoro, V. Chandran, and S. Trost, "Physical
activity recognition using posterior-adapted class-based fusion of multi-
accelerometers data," IEEE Journal of Biomedical and Health Informatics,
2017.
[162] A. K. Chowdhury, D. Tjondronegoro, V. Chandran, and S. G. Trost, "Ensemble
Methods for Classification of Physical Activities from Wrist Accelerometry,"
Medicine and science in sports and exercise, vol. 49, p. 1965, 2017.
[163] M. Altini, R. Vullers, C. Van Hoof, M. van Dort, and O. Amft, "Self-calibration
of walking speed estimations using smartphone sensors," in Pervasive
Computing and Communications Workshops (PERCOM Workshops), 2014
IEEE International Conference on, 2014, pp. 10-18.
[164] K. R. Rice, C. Gammon, K. Pfieffer, and S. Trost, "Age related differences in
the validity of the OMNI perceived exertion scale during lifestyle activities,"
Pediatric exercise science, vol. 27, pp. 95-101, 2015.
[165] H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information
criteria of max-dependency, max-relevance, and min-redundancy," IEEE
Transactions on pattern analysis and machine intelligence, vol. 27, pp. 1226-
1238, 2005.
[166] G. Borg, G. Ljunggren, and R. Ceci, "The increase of perceived exertion, aches
and pain in the legs, heart rate and blood lactate during exercise on a bicycle
ergometer," European journal of applied physiology and occupational
physiology, vol. 54, pp. 343-349, 1985.
[167] H. N. Dawes, K. L. Barker, J. Cockburn, N. Roach, O. Scott, and D. Wade,
"Borg’s rating of perceived exertion scales: do the verbal anchors mean the
same for different clinical groups?," Archives of physical medicine and
rehabilitation, vol. 86, pp. 912-916, 2005.
[168] A. I. o. Health and Welfare, The Active Australia Survey: A guide and manual
for implementation, analysis and reporting: Australian Institute of Health and
Welfare, 2003.
[169] A. K. Chowdhury, D. Tjondronegoro, V. Chandran, and S. G. Trost, "Ensemble
Methods for Classification of Physical Activities from Wrist Accelerometry,"
Medicine and science in sports and exercise, 2017.
[170] B. W. Timmons, A. G. LeBlanc, V. Carson, S. Connor Gorber, C. Dillman, I.
Janssen, et al., "Systematic review of physical activity and health in the early
170 Bibliography
years (aged 0–4 years)," Applied Physiology, Nutrition, and Metabolism, vol.
37, pp. 773-792, 2012.
[171] A. G. LeBlanc, J. C. Spence, V. Carson, S. Connor Gorber, C. Dillman, I.
Janssen, et al., "Systematic review of sedentary behaviour and health indicators
in the early years (aged 0–4 years)," Applied Physiology, Nutrition, and
Metabolism, vol. 37, pp. 753-772, 2012.
[172] Department of Health | Brochure - National Physical Activity
Recommendations for Children 0-5 Years.” [Online]. Available:
http://www.health.gov.au/internet/main/publishing.nsf/Content/npra-0-5yrs-
brochure
[173] M. S. Tremblay, J.-P. Chaput, K. B. Adamo, S. Aubert, J. D. Barnes, L.
Choquette, et al., "Canadian 24-Hour Movement Guidelines for the Early
Years (0–4 years): An Integration of Physical Activity, Sedentary Behaviour,
and Sleep," BMC public health, vol. 17, p. 874, 2017.
[174] D. P. Cliff, J. J. Reilly, and A. D. Okely, "Methodological considerations in
using accelerometers to assess habitual physical activity in children aged 0–5
years," Journal of Science and Medicine in Sport, vol. 12, pp. 557-567, 2009.
[175] L. Deng and D. Yu, "Deep learning: methods and applications," Foundations
and Trends® in Signal Processing, vol. 7, pp. 197-387, 2014.
[176] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and J.
Garcia-Rodriguez, "A Review on Deep Learning Techniques Applied to
Semantic Segmentation," arXiv preprint arXiv:1704.06857, 2017.
[177] A. Groβek, C. van Loo, G. E. Peoples, M. Hagenbuchner, R. Jones, and D. P.
Cliff, "Energy cost of physical activities and sedentary behaviors in young
children," Journal of Physical Activity and Health, vol. 13, pp. S7-S10, 2016.
[178] W. D. McArdle, F. I. Katch, and V. L. Katch, Exercise physiology: nutrition,
energy, and human performance. Philadelphia: Lea and Febiger, 1991.
[179] S. E. Crouter, M. Horton, and D. R. Bassett Jr, "Use of a 2-regression model
for estimating energy expenditure in children," Medicine and science in sports
and exercise, vol. 44, pp. 1177-1185, 2012.
[180] K. Miura and T. Harada, "Implementation of a practical distributed calculation
system with browsers and javascript, and application to distributed deep
learning," arXiv preprint arXiv:1503.05743, 2015.
Bibliography 171
[181] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann
machines," in Proceedings of the 27th international conference on machine
learning (ICML-10), 2010, pp. 807-814.
[182] M. Robnik-Šikonja and I. Kononenko, "Theoretical and empirical analysis of
ReliefF and RReliefF," Machine learning, vol. 53, pp. 23-69, 2003.
[183] G. Roffo and S. Melzi, "Ranking to Learn: Feature Ranking and Selection via
Eigenvector Centrality," arXiv preprint arXiv:1704.05409, 2017.
[184] S. G. Trost, K. L. McIver, and R. R. Pate, "Conducting accelerometer-based
activity assessments in field-based research," Medicine & Science in Sports &
Exercise, vol. 37, pp. S531-S543, 2005.
Appendices 173
Appendices
Appendix A
Supplementary Digital Content 1 (SDC 1)
Features were extracted from 10 s sliding window with 50% overlapping.
1. Mean, standard deviation, minimum, maximum, variance, median, skewness,
25th and 75th percentile, and kurtosis were simple time-domain features
extracted from each axis of a 3-axis accelerometer.
2. In addition to these simple time-domain features, energy features were
calculated as the sum of the squared discrete FFT component magnitudes of
the signal. Energy features were normalised by dividing it by window length.
Energy(X) = ∑abs�FFT(X)�2
window_size //X is accelerometer X-axis data in a window
Energy(Y) = ∑abs�FFT(Y)�2
window_size //Y is accelerometer Y-axis data in a window
Energy(Z) = ∑abs(FFT(Z))2
window_size //Z is accelerometer Z-axis data in a window
3. The principal frequency and its magnitude were also extracted. The frequency
with highest FFT magnitude was considered as principle frequency.
4. Zero-crossing of each accelerometer axis (the number of times a data changes
sign) were extracted,
Pseudo-code:
Subtract the median from the original data
X=X-median(X)
Y=Y-median(Y)
Z=Z-median(Z)
ZeroCrossing(X) = the number of times X changes sign.
ZeroCrossing(Y) = the number of times Y changes sign.
174 Appendices
ZeroCrossing(Z) = the number of times Z changes sign.
5. Accelerometer axis cross-correlations (corrxy, corrxz, corryz) were calculated.
corrxy =∑ �X(i)− mean(X)� ∗ �Y(i) − mean(Y)�window_sizei=1
�∑ �X(i)− mean(X)�2window_sizei=1 ∗ ∑ �Y(i) −mean(Y)�2window_size
i=1
corrxz =∑ �X(i) − mean(X)� ∗ �Z(i) −mean(Z)�window_sizei=1
�∑ �X(i) − mean(X)�2window_sizei=1 ∗ ∑ �Z(i) − mean(Z)�2window_size
i=1
corryz =∑ �Y(i) −mean(Y)� ∗ �Z(i) − mean(Z)�window_sizei=1
�∑ �Y(i)− mean(Y)�2window_sizei=1 ∗ ∑ �Z(i) − mean(Z)�2window_size
i=1
Appendices 175
Appendix B
Supplementary Digital Content 2 (SDC 2)
The selected features for each dataset (those are common in all folds) are provided in
following table. (√) indicates the corresponding feature was selected.
Extracted Features from a 10s window
Dataset
#1
Dataset
#2
Dataset
#3
Minimum value of X axis data - min(X) √ √ √
Minimum value of Y axis data - min(Y) √ √ √
Minimum value of Z axis data - min(Z) √ √ √
Maximum value of X axis data - max(X) √ √
Maximum value of Y axis data - max(Y) √ √ √
Maximum value of Z axis data - max(Z) √ √ √
Mean of X axis data - mean(X) √
Mean of Y axis data - mean(Y)
Mean of Z axis data - mean(Z) √
Variance of X axis data - variance(X) √ √
Variance of Y axis data - variance(Y) √ √
Variance of Z axis data - variance(Z) √ √ √
Standard Deviation of X axis data - std(X) √ √ √
Standard Deviation of Y axis data - std(Y) √ √ √
Standard Deviation of Z axis data - std(Z) √ √ √
Skewness of X axis data - skewness(X) √
Skewness of Y axis data - skewness(Y)
Skewness of Z axis data - skewness(Z)
Kurtosis of X axis data - kurtosis(X) √
Kurtosis of Y axis data - kurtosis(Y) √
Kurtosis of Z axis data - kurtosis(Z) √
Median of X axis data - median(X) √
Median of Y axis data - median(Y)
Median of Z axis data - median(Z) √
25th Percentile of X axis data - percentile25(X) √
176 Appendices
75th Percentile of X axis data - percentile75(X) √ √ √
25th Percentile of Y axis data - percentile25(Y) √ √
75th Percentile of Y axis data - percentile75(Y) √
25th Percentile of Z axis data - percentile25(Z) √ √ √
75th Percentile of Z axis data - percentile75(Z) √
Correlation of X and Y axis data - corr_axis(XY)
Correlation of X and Z axis data - corr_axis(XZ)
Correlation of Y and Z axis data - corr_axis(YZ)
Zero-crossing in X axis data - zerocross(X) √ √
Zero-crossing in Y axis data - zerocross(Y) √
Zero-crossing in Z axis data - zerocross(Z) √ √
Energy of X axis data - energy(X) √ √ √
Energy of Y axis data - energy(Y) √ √
Energy of Z axis data - energy(Z)
Dominant Frequency of X axis data -
dominantFr(X)
√ √
Dominant Frequency of Y axis data -
dominantFr(Y)
√
Dominant Frequency of Z axis data -
dominantFr(Z)
√
Magnitude of Dominant Frequency in X axis
data - dominantFrMag(X)
√
Magnitude of Dominant Frequency in Y axis
data - dominantFrMag(Y)
√ √
Magnitude of Dominant Frequency in Z axis
data - dominantFrMag(Z)
√ √
Appendices 177
Appendix C
Supplementary Digital Content 3 (SDC 3)
Multiple classification algorithms.
Binary Decision Tree (BDT) uses hierarchical approaches to develop an
optimum decision tree from the training dataset. It starts with a single node (root) and
builds a decision tree by dividing the features at the cut point that maximises impurity
reduction. Each group of features is further divided into smaller groups using same
splitting criterion. Such splitting continues until a stop condition is reached. During
testing, the obtained decision tree is used for classification tasks. In the current study,
the maximum number of decision split was empirically set at 20.
k Nearest Neighbours (kNN) is one of simplest machine learning algorithm and
is widely used as a benchmark for learning rules. To classify an object (window of test
data), kNN first identifies k nearest neighbours of that object, regardless of classes.
Then, it returns the most common class among its k nearest neighbours as the predicted
class. kNN is often computationally slow during testing as it needs to search nearest
neighbours for each sample. In the present study, the value of k was set to 7.
Support Vector Machine (SVM) classifies data by finding the best separator
between the two classes. In this study, to enable multi classification using SVM, two-
class SVM was adapted in a fashion that firstly it classifies one class against all other
classes and then it classifies another classes verses remaining classes and so on. Linear
classification function was chosen as kernel function. We set the box constraint
parameter to 1. The box constraint parameter helps to prevent overfitting
(regularization) by controlling the maximum penalty imposed on margin-violating
observations.ss
Artificial Neural Network (ANN) is widely used machine learning algorithm to
model non-linear relationship between a set of inputs and output. The network consists
of inter-connected artificial neurons in layers: an input layer, one or more hidden layers
and an output layer. Each neuron applies an activation function (logistic or linear) on
the weighted sum of the inputs to that neuron to produce an output. At first iteration,
weights are random and a cost-function such as root mean squared error calculates the
average squared error between the network's output and the target output. Then, a
178 Appendices
backpropagation algorithm optimises the weights based on the training dataset until
the network learns to correctly map arbitrary inputs to outputs (optimum root mean
squared error or maximum iteration) (27, 35). In this work, the number of input and
output neurons varied depending on the dataset, with the number of input neurons and
output neurons equalling the number of selected features and number of activity
classes, respectively. Fifty neurons were used in hidden layer. Maximum
iteration/epoch and learning rate were set to 250 and 0.001 respectively.
Appendices 179
Appendix D
SUPPLEMENTARY DIGITAL CONTENT 4 (SDC 4)
Confusion Matrix
Table 1 (SDC4). Confusion matrix of the ensemble methods in the dataset #1
1 2 3 4 5 6 7 8 1 Lying
RF 271 73 4 0 0 1 0 1 Bagging 231 86 31 0 0 1 0 1 Adaboost 305 29 10 0 0 3 0 3 WMV 317 31 1 0 0 0 0 1 NB 314 33 1 0 0 0 0 2 BKS 305 42 2 0 0 0 0 1
2 Sitting
RF 41 275 15 0 0 0 0 2 Bagging 40 268 19 0 0 2 0 4 Adaboost 26 261 36 0 0 8 0 2 WMV 12 289 25 0 2 4 0 1 NB 12 290 24 0 2 4 0 1 BKS 41 264 21 0 2 4 0 1
3 Standing
RF 11 28 298 0 0 2 3 3 Bagging 9 29 300 0 0 2 4 1 Adaboost 1 33 286 6 0 1 6 12 WMV 5 19 308 4 0 1 4 4 NB 10 17 306 4 0 1 4 3 BKS 10 22 298 4 0 3 4 4
4 Walking
RF 0 0 4 368 0 0 66 5 Bagging 0 0 13 370 0 0 57 3 Adaboost 0 0 1 407 0 0 30 5 WMV 0 0 0 399 0 0 40 4 NB 0 0 1 370 0 0 67 5 BKS 0 0 0 365 0 0 73 5
5 Running
RF 0 0 0 0 170 0 0 0 Bagging 0 0 0 0 169 0 1 0 Adaboost 0 0 0 0 169 0 1 0 WMV 0 0 0 0 169 0 0 1 NB 0 0 0 0 169 0 0 1 BKS 0 0 0 0 169 0 0 1
6 Cycling
RF 0 1 3 8 0 277 2 7 Bagging 0 1 5 0 0 263 22 7 Adaboost 0 0 6 1 0 281 0 10 WMV 0 1 7 0 0 285 1 4 NB 0 0 6 0 0 287 2 3 BKS 0 1 7 0 0 285 1 4
7 Ascending Stairs
RF 0 0 11 31 0 0 109 13 Bagging 0 0 18 28 2 0 107 9 Adaboost 1 0 9 54 0 0 89 11 WMV 0 0 5 58 0 0 95 6 NB 0 0 5 52 0 0 101 6 BKS 0 0 4 64 0 0 88 8
8 Descending Stairs
RF 3 5 0 21 0 1 20 81 Bagging 5 4 0 14 0 0 13 95 Adaboost 0 0 8 4 0 9 42 68 WMV 0 3 2 9 0 2 21 94 NB 0 3 2 8 0 3 14 101 BKS 0 3 3 14 0 3 16 92
180 Appendices
Table 2 (SDC4). Confusion matrix of the ensemble methods in the dataset #2
1 2 3 4 5 1 Stationary (sit and stand)
RF 431 18 4 1 0 Bagging 437 13 4 0 0 Adaboost 418 29 6 1 0 WMV 428 25 1 0 0 NB 418 34 2 0 0 BKS 424 21 9 0 0
2 Comfortable Walking
RF 25 313 118 0 0 Bagging 17 314 125 0 0 Adaboost 22 269 164 1 0 WMV 19 303 134 0 0 NB 23 297 136 0 0 BKS 30 294 132 0 0
3 Fast Walking
RF 2 98 354 2 0 Bagging 3 149 303 1 0 Adaboost 1 102 352 1 0 WMV 4 92 359 1 0 NB 3 104 348 1 0 BKS 5 155 295 1 0
4 Jogging
RF 0 0 4 228 32 Bagging 1 3 4 220 36 Adaboost 0 1 17 191 55 WMV 5 7 12 191 49 NB 5 7 12 187 53 BKS 5 7 4 191 57
5 Running
RF 0 0 0 51 117 Bagging 0 0 0 51 117 Adaboost 0 0 0 58 110 WMV 0 0 1 47 120 NB 0 0 0 46 122 BKS 0 0 1 44 123
Appendices 181
Table 3 (SDC4). Confusion matrix of the ensemble methods in the dataset #3
1 2 3 4 5 6 7 1 Lying down
RF 207 44 13 2 1 0 0 Bagging 206 47 9 3 1 0 1 Adaboost 185 66 11 2 2 0 1 WMV 198 48 19 1 1 0 0 NB 204 44 17 1 1 0 0 BKS 207 49 9 1 1 0 0
2 Sitting+
RF 33 544 11 1 0 0 0 Bagging 45 529 12 2 0 0 1 Adaboost 32 547 9 1 0 0 0 WMV 39 537 12 1 0 0 0 NB 56 522 10 1 0 0 0 BKS 86 494 8 1 0 0 0
3 Standing+
RF 10 4 772 18 15 37 1 Bagging 5 5 755 19 26 42 5 Adaboost 8 14 709 56 44 26 0 WMV 0 5 793 21 16 22 0 NB 8 0 783 27 17 21 1 BKS 11 7 747 28 39 25 0
4 Walking
RF 0 0 45 858 4 0 5 Bagging 0 5 60 828 4 0 15 Adaboost 0 0 31 858 5 0 18 WMV 0 0 49 859 4 0 0 NB 0 0 41 864 6 0 1 BKS 1 0 42 849 5 0 15
5 Running
RF 0 0 43 1 188 13 21 Bagging 0 0 48 10 176 8 24 Adaboost 1 0 51 1 198 5 10 WMV 0 0 48 1 191 6 20 NB 0 0 45 1 193 6 21 BKS 0 0 42 18 181 6 19
6 Basketball
RF 0 0 37 0 6 280 0 Bagging 0 0 26 0 22 274 1 Adaboost 0 0 31 0 6 286 0 WMV 0 0 25 0 4 294 0 NB 0 0 25 0 5 293 0 BKS 0 0 41 0 11 271 0
7 Dance RF 4 0 3 7 47 1 242 Bagging 4 0 7 4 54 3 232 Adaboost 4 0 16 7 33 0 244 WMV 2 6 6 7 40 0 243 NB 2 0 5 7 64 0 226 BKS 2 0 5 7 59 0 231
182 Appendices
Classification Accuracy
The classification accuracy was calculated using the following equation
𝐴𝐴𝑠𝑠𝑠𝑠𝑡𝑡𝑠𝑠𝑓𝑓𝑠𝑠𝑦𝑦 = 𝑇𝑇𝑃𝑃 + 𝑇𝑇𝑇𝑇
𝑇𝑇𝑃𝑃 + 𝑇𝑇𝑇𝑇 + 𝐹𝐹𝑃𝑃 + 𝐹𝐹𝑇𝑇
Where TP, TN, FP, FN are true-positive, true-negative, false-positive and false-
negative.
Table 4 (SDC4). Classification results (Accuracy) using wrist acceleration sensor of dataset #1
Conventional Ensembles Individual Classifiers Custom Ensembles
Rando
m Forest
Bagging
Decision Tree
Boosted Decision Tree
BDT KNN SVM ANN WMV Fusion
NB Fusion
BKS Fusion
Lying 94 92.26 96.73 92.52 96.06 97.81 97.22 97.76 97.4 95.7
Sitting 92.61 91.72 94 90.47 93.78 95.7 94.58 95.61 95.7 93.87
Standing 96.24 94.14 94.23 91.81 95.43 95.57 95.52 96.55 96.51 96.24
Walking 93.96 94.85 95.48 89.26 91.09 94.72 93.6 94.85 93.87 92.84
Running 100 99.87 99.96 97.99 99.96 99.42 99.06 99.87 99.87 99.87
Cycling 98.88 98.21 98.3 98.16 98.84 98.93 96.69 99.1 99.15 98.97
Ascending Stairs 93.46 93.11 93.11 89.7 89.75 92.88 93.78 93.96 93.29 92.39
Descending Stairs 96.37 97.27 95.26 96.02 96.33 97.14 96.6 97.4 97.72 97.18
Average 95.69 95.18 95.88 93.24 95.15 96.52 95.88 96.89 96.69 95.88
Appendices 183
Table 5 (SDC4). Classification results (Accuracy) using wrist acceleration sensor of dataset #2
Conventional Ensembles Individual Classifiers Custom Ensembles
Rando
m Forest
Bagging Decision
Tree
Boosted Decision
Tree BDT KNN SVM ANN WMV
Fusion NB
Fusion BKS
Fusion
Stationary (sit and stand)
97.22 97.89 96.72 95.44 95.33 96.16 94.72 97 96.27 96.11
Comfortable Walking
85.6 82.93 82.26 81.87 78.75 82.2 77.42 84.59 83.09 80.81
Fast Walking 87.32 84.09 83.82 84.59 80.7 86.15 77.25 86.37 85.65 82.93
Jogging 94.99 94.66 92.55 94.16 92.38 91.82 90.38 93.27 93.1 93.44
Running 95.38 95.16 93.72 94.88 93.99 93.6 93.05 94.61 94.49 94.33
Average 92.1 90.95 89.81 90.19 88.23 89.99 86.56 91.17 90.52 89.52
Table 6 (SDC4). Classification results (Accuracy) using wrist acceleration sensor of dataset #3
Conventional Ensembles Individual Classifiers Custom Ensembles
Random Forest
Bagging
Decision Tree
Boosted
Decision Tree
BDT KNN SVM ANN WMV Fusion
NB Fusion
BKS Fusion
Lying down 96.96 96.73 96.39 94.8 95.4 96.5 95.91 96.87 96.33 95.45
Sitting+ 97.36 96.67 96.53 95.25 95.65 97.07 96.62 96.84 96.84 95.71
Standing+ 93.26 92.5 91.56 89.74 91.81 92.95 92.33 93.66 93.83 92.69
Walking 97.64 96.53 96.56 95.94 97.36 97.38 96.42 97.61 97.58 96.65
Running 95.71 94.4 95.51 93.35 94.86 94.91 94.63 96.02 95.28 94.31
Basketball 97.33 97.1 98.07 95.71 97.73 98.04 98.07 98.38 98.38 97.64
Dance 97.47 96.62 97.47 95.71 96.79 96.87 97.21 97.7 97.13 96.96
Average 96.53 95.79 96.01 94.36 95.65 96.25 95.88 96.73 96.48 95.63
184 Appendices
Appendix E
SOURCE CODE REPOSITORIES
We made our codes, and features available in the github, so that researchers can
use our algorithms and models for benchmarking or evaluate using their data. The links
for the repositories are given in the table below:
Table 1. Links of source code repositories
Chapter Link of repository
Chapter 3: Decision Fused Ensembles for Effective
Classification of Physical Activities from Wrist-Worn
Accelerometer Data
https://github.com/alokchy04/Decision-Fused-Ensembles-
for-PA-Classification-from-Wrist-Worn-Accelerometer
Chapter 4: Physical Activity Recognition using Posterior-
adapted Class-based Fusion of Multi-Accelerometers Data
https://github.com/alokchy04/Physical-Activity-
Recognition-using-Posterior-adapted-Class-based-Fusion-
of-Multi-Accelerometer-Data
Appendices 185
Appendix F
NUMBER OF DATA POINTS/ SAMPLES OF EXTRACTED FEATURES
In this appendix, we provided number of data-points (user-wise) for all of our datasets.
As we did leave-one-subject-out cross-validation, user-wise data-points can provide
the information on the size of training data and testing data in each fold.
Table 1: Number of data points for Dataset #1 used in the study of Chapter 3
User # of data points
1 295 2 297 3 212 4 277 5 318 6 284 7 262 8 289
Total 2234
Table 2: Number of data points for Dataset #2 used in the study of Chapter 3
User # of data points
1 225 2 225 3 225 4 225 5 225 6 225 7 225 8 225
Total 1800
186 Appendices
Table 3: Number of data points for Dataset #3 used in the study of Chapter 3
User # of data points
1 228 2 228 3 228 4 114 5 114 6 114 7 228 8 230 9 228
10 209
11 228
12 228
13 229
14 228
15 228
16 228
17 228
Total 3518
Table 4: Number of data points for PAMAP2 dataset used in the study of Chapter 4
User # of data points
1 756 2 759 3 545 4 708 5 809 6 727 7 670 8 740
Total 5714
Appendices 187
Table 5: Number of data points for MHEALTH dataset used in the study of Chapter 4
User # of data points
1 248 2 248 3 248 4 248 5 248 6 248 7 248 8 248 9 248
10 248
Total 2480
Table 6: Number of data points for the dataset used in the Chapter 5 and 6 studies
User # of data points
1 24 2 35 3 28 4 34 5 30 6 33 7 31 8 30 9 35
10 26
11 35
12 35
13 27
14 35
15 33
16 35
17 29
18 5
19 35
20 10
21 10
Total 615
188 Appendices
Table 7: Number of data points for the dataset used in the study of Chapter 7
User # of data points
1 284 2 276 3 312 4 351 5 306 6 313 7 313 8 317
Total 2472