SENSOR-BASED PREDICTION OF PHYSICAL ACTIVITY AND ITS ... · SENSOR-BASED PREDICTION OF PHYSICAL...

SENSOR-BASED PREDICTION OF PHYSICAL ACTIVITY AND ITS IMPACTS

USING MACHINE LEARNING

Alok Kumar Chowdhury Master of Science in Computer Science & Engineering

A Thesis by Publication submitted in fulfilment of the requirements for the degree of

Doctor of Philosophy (PhD)

School of Electrical Engineering and Computer Science

Science and Engineering Faculty

Queensland University of Technology

May 2018

Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning iii

Keywords

Physical activity recognition

Machine learning

Wearable sensors

Relative physical activity intensity prediction

Energy expenditure prediction

Rate of perceived exertion

Deep learning

Ensemble learning

Decision fusion

Feature fusion

Posterior-adapted class-based fusion

Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning v

Abstract

According to the World Health Organisation (WHO), around 60-85% of the worlds’

population are physically inactive. Lack of physical activity (PA) is a key contributor

to the global overweight and obesity epidemic, and in turn, raises the risks of numerous

health issues including heart-diseases, type-2 diabetes, increasing cholesterol, some

cancers, etc. With obesity prevention and promotion of physical activity being ongoing

public health priorities, there is an urgent need for accurate yet practical measures of

physical activity. Physical activity (PA) is defined as bodily movement that is

produced by contraction of skeletal muscle that substantially increases energy

expenditure above resting. Hence, valid and reliable measures of physical activity and

its personal impacts, including relative physical activity intensity and energy

expenditure, are important to track, view, and share with the health practitioners.

Moreover, physical activity measurement tools that provide information and feedback

to end users in real-time provide the opportunity for designing personalised and

adaptive interventions to increase physical activity such as wellness e-coaching.

In recent years, with the rapid development of ubiquitous sensor technologies,

wearable sensors can provide accurate measurement of important movement and

physiological cues related to PA. Wearable sensors are well-received and widely used

by researchers and the general population, which creates the prospect for objective

measurement of PA in studies examining the impacts of physical activity on health.

However, most studies in the PA domain have relied on self-report methods or direct

measures of energy expenditure which requires expensive and bulky equipment,

mostly in lab-based contexts. On the other hand, wearable sensors-based methods are

inexpensive but, by and large, use simple methods like thresholding or simple

regression which often perform poorly.

A multitude of machine learning based PA recognition systems were developed in

recent years. Most of these studies applied a single classifier on the features extracted

from a single accelerometer location. The uses of advanced ensemble machine learning

techniques which have the potential to improve the PA recognition performance are

not thoroughly investigated in this domain. Also, there is a lack of methods to

vi Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning

effectively fuse the multi-accelerometer data to improve the PA recognition

performance from varying sensor positioning. Among sensor-based approaches to

determine the relative intensity, none of them utilised multiple important modalities of

physiological data (including heart rate, electrodermal activity, and temperature) for

relative intensity prediction using machine learning. Advanced machine learning such

as deep learning was not properly explored in the previous literature of energy

expenditure prediction.

The aim of this thesis is to introduce new methods and models to accurately predict 1)

PA type, and 2) its impacts such as relative activity intensity and energy expenditure

in simulated free-living contexts.

The specific contributions of this thesis in the context of “PA type prediction” are:

1. This research systematically compared the PA classification accuracy achieved

by conventional ensemble methods (bagged decision tree, boosted decision

tree, and random forest) and a custom multi-classifier ensemble combining

four machine learning algorithms (binary decision tree, k-nearest neighbour,

support vector machine, and neural network) using three decision fusion rules

(weighted majority voting, Naïve Bayes, and behaviour knowledge space).

Performance was evaluated in three independent PA recognition datasets. The

results revealed that combining multiple individual classifiers using ensemble

learning methods can improve activity recognition accuracy from wrist-worn

accelerometer data.

2. A novel posterior-adapted class-based weighted decision fusion was proposed

to effectively combine multiple accelerometers data for improving physical

activity recognition. The fusion was applied in two and three accelerometer

location combinations. The results identified that the proposed decision fusion

was superior to the other state-of-the-art fusion algorithms, and that a two-

accelerometer combination (wrist and ankle) provided the best PA recognition

performance.

Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning vii

The specific contributions of this thesis in the context of “estimation of impacts of PA”

are:

3. This research developed both regression and classification models to

effectively predict the relative PA intensity using multimodal physiological

sensor data (heart-rate, RR-interval, Eda and Temp). The experiments were

based on a real-world (non-laboratory) and longitudinal dataset, collected from

22 people, where Borg’s RPE scale was used as a ground truth measure of

relative intensity. The results showed that features extracted from RR-interval

provided the highest prediction performance compared to any other single

modality. However, the combination of Eda and Temp features fused with RR

features produced the best overall performance, confirming the benefits of

using multi-modal data.

4. This research is the first study that proposes the use of deep learning to

effectively predict energy expenditure from body worn accelerometers in pre-

school-aged children. It also systematically compares the deep learning

approach to conventional supervised machine learning and simplified

regression approaches in different accelerometer location configurations

including wrist and hip. The results show that deep learning can achieve a

comparable performance to the conventional supervised learning, and

significantly outperformed the simplified regression approaches.

All of the proposed methods were validated using a diverse range of datasets collected

from different participant groups (adults and children) performing different physical

activities in different contexts (laboratory-based vs outdoors).

These methods collectively deliver better algorithms and maximise the use of available

sensor information to provide accurate measurement of PA type, relative PA intensity,

and energy expenditure. It will provide better quality of information to the users and

health practitioners, which is useful to provide personalised and adaptive PA

recommendations.

Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning ix

Table of Contents

Keywords ................................................................................................................................ iii

Abstract .....................................................................................................................................v

Table of Contents .................................................................................................................... ix

List of Figures ....................................................................................................................... xiii

List of Tables ..........................................................................................................................xv

List of Abbreviations ........................................................................................................... xvii

List of Publications ............................................................................................................... xix

Statement of Original Authorship ......................................................................................... xxi

Acknowledgements ............................................................................................................. xxiii

Chapter 1: Introduction ...................................................................................... 1

1.1 Background and Motivation ...........................................................................................1

1.2 Research Problem ...........................................................................................................3

1.2.1 Lack of Methods That Use and Compare Advanced Ensemble Learning

Algorithms ............................................................................................................3

1.2.2 Lack of Decision Fusion Methods to Combine Multi-Accelerometer Data .........3

1.2.3 Lack of Methods That Use Multi-Modal Data for Relative Intensity

Prediction..............................................................................................................4

1.2.4 Lack of Methods That Use Deep Learning for Energy Expenditure

Prediction..............................................................................................................5

1.3 Research Aims and Objectives .......................................................................................6

1.4 Research Framework ......................................................................................................7

1.5 Contributions of the Thesis .............................................................................................8

1.6 Significance ..................................................................................................................11

1.7 Thesis Outline ...............................................................................................................12

Chapter 2: Literature Review ........................................................................... 13

2.1 Physical Activity and Health ........................................................................................13

2.1.1 Health Benefits of Physical Activity ..................................................................13

2.1.2 Guidelines of Physical Activity ..........................................................................13

2.1.3 Risks Associated with Physical Activity ............................................................14

2.2 Personal Impacts from Physical ActivitY .....................................................................16

2.2.1 Relative Intensity of Physical activity ................................................................16

2.2.2 Energy Expenditure ............................................................................................19

2.3 Sensor-based methods to Measure physical Activity and its impacts ..........................21

x Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning

2.3.1 Wearable Sensors ............................................................................................... 21

2.3.2 Data Pre-processing ........................................................................................... 22

2.3.3 Feature Extraction .............................................................................................. 22

2.3.4 Feature Selection................................................................................................ 27

2.3.5 Learning Algorithms .......................................................................................... 28

2.3.6 Effect of Sensor Number, Positioning, and Combination on the

Performance ....................................................................................................... 35

2.4 Summary of Current Gaps ........................................................................................... 37

PART I - Classification of Physical Activities ....................................................... 39

Chapter 3: Ensemble Methods for Classification of Physical Activities from

Wrist Accelerometry ................................................................................................ 41

3.1 ABSTRACT ................................................................................................................. 44

3.2 Introduction .................................................................................................................. 45

3.3 Methods ........................................................................................................................ 48

3.3.1 Datasets .............................................................................................................. 48

3.3.2 Classification Framework .................................................................................. 50

3.3.3 Conventional Ensemble Methods ...................................................................... 52

3.3.4 Custom Ensemble Methods ............................................................................... 53

3.3.5 Decision Fusion Techniques .............................................................................. 53

3.3.6 Performance Evaluation ..................................................................................... 54

3.4 Results .......................................................................................................................... 56

3.4.1 Dataset #1 Results .............................................................................................. 56

3.4.2 Dataset #2 Results .............................................................................................. 56

3.4.3 Dataset #3 Results .............................................................................................. 57

3.4.4 Statistical Comparison ....................................................................................... 58

3.5 Discussion .................................................................................................................... 60

3.6 Acknowledgements ...................................................................................................... 64

Chapter 4: Physical Activity Recognition using Posterior-adapted Class-

based Fusion of Multi-Accelerometers Data .......................................................... 65

4.1 ABSTRACT ................................................................................................................. 68

4.2 Introduction .................................................................................................................. 69

4.3 Related Work ............................................................................................................... 71

4.4 Methods ........................................................................................................................ 73

4.4.1 Pre-processing .................................................................................................... 73

4.4.2 Feature Extraction .............................................................................................. 73

4.4.3 Normalisation & Feature Selection .................................................................... 73

Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xi

4.4.4 Classification Algorithms ...................................................................................74

4.4.5 Decision Fusion Techniques...............................................................................75

4.5 Experiment ....................................................................................................................78

4.5.1 Datasets ..............................................................................................................78

4.5.2 Implementation of the Framework .....................................................................79

4.5.3 Evaluation Approach and Metrics ......................................................................81

4.6 Results and Discussion .................................................................................................82

4.6.1 Evaluation of Classification Algorithms ............................................................82

4.6.2 Evaluation of Different Fusion Techniques .......................................................82

4.6.3 Activity-Wise Classification Performance .........................................................84

4.6.4 Subject-Wise Classification Performance ..........................................................86

4.6.5 Confusion Matrices ............................................................................................87

4.7 Conclusion ....................................................................................................................89

PART II - Estimation of Impacts of Physical Activities ....................................... 91

Chapter 5: Towards Non-Laboratory Prediction of Relative Physical

Activity Intensities from Multimodal Wearable Sensor Data.............................. 93

5.1 ABSTRACT .................................................................................................................96

5.2 Introduction ..................................................................................................................97

5.3 Dataset Collection .........................................................................................................99

5.4 Methods ......................................................................................................................101

5.4.1 Pre-processing ..................................................................................................101

5.4.2 Feature Extraction and Selection ......................................................................101

5.4.3 Regression Algorithms .....................................................................................102

5.5 Performance Evaluation ..............................................................................................103

5.6 Experimental Results and Discussion .........................................................................104

5.6.1 Performance from Using a Single Modality .....................................................104

5.6.2 Performance from Using Multiple Modality ....................................................104

5.7 Conclusion ..................................................................................................................106

Chapter 6: Prediction of Relative Physical Activity Intensity Using

Multimodal Sensing of Physiological Data .......................................................... 107

6.1 ABSTRACT ...............................................................................................................110

6.2 Introduction ................................................................................................................111

6.3 Methods ......................................................................................................................114

6.3.1 Participants .......................................................................................................114

6.3.2 Protocol ............................................................................................................114

6.3.3 Data Acquisition ...............................................................................................114

xii Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning

6.3.4 Relative Intensity Prediction System ............................................................... 115

6.4 Results ........................................................................................................................ 119

6.4.1 Relative Intensity Classification from a Single Modality ................................ 119

6.4.2 Feature Fusion Results ..................................................................................... 119

6.4.3 Decision Fusion Results ................................................................................... 121

6.4.4 Statistical Comparison ..................................................................................... 123

6.5 Discussion .................................................................................................................. 124

6.6 Acknowledgements .................................................................................................... 127

Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School

Children ........................................................................................................ 129

7.1 ABSTRACT ............................................................................................................... 132

7.2 Introduction ................................................................................................................ 133

7.3 Data Collection and Pre-Processing ........................................................................... 135

7.3.1 Data Collection ................................................................................................ 135

7.3.2 Pre-Processing ................................................................................................. 136

7.4 Methods ...................................................................................................................... 136

7.4.1 Deep Learning Approach ................................................................................. 137

7.4.2 Conventional Supervised Learning Approach ................................................. 140

7.4.3 Simplified Approach ........................................................................................ 143

7.5 Performance Evaluation ............................................................................................. 143

7.6 Results and Discussion ............................................................................................... 144

7.6.1 Evaluation of Deep Learning Approach .......................................................... 144

7.6.2 Evaluation of Conventional Approach ............................................................. 145

7.6.3 Evaluation of Simplified Regression Approach .............................................. 145

7.6.4 Comparison of Approaches ............................................................................. 145

7.7 Conclusion ................................................................................................................. 147

7.8 Acknowledgment ....................................................................................................... 148

Chapter 8: Conclusion and Future Work ...................................................... 149

8.1 Summary of Achievements ........................................................................................ 149

8.2 Limitations ................................................................................................................. 151

8.3 Future Work ............................................................................................................... 152

Bibliography ........................................................................................................... 153

Appendices .............................................................................................................. 173

Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xiii

List of Figures

Figure 1.1 The framework of this research .................................................................. 7

Figure 2.1 Borg’s RPE scale ...................................................................................... 18

Figure 2.2 OMNI scale of perceived exertion (adult) for cycling.............................. 18

Figure 3.1 Flow diagram of the proposed framework................................................ 51

Figure 4.1 Overview of the system developed for implementing the framework ..... 80

Figure 4.2 For a given test instance (x), predicting the final label by fusing the

decisions from accelerometer sensors using weights................................... 80

Figure 4.3 Average F1-Score comparison for model-based, class-based and

posterior-adapted class-based decision fusion with the PAMAP2

dataset .......................................................................................................... 83

Figure 4.4 Average F1-Score comparison for model-based, class-based and

posterior-adapted class-based decision fusion with the MHEALTH

dataset .......................................................................................................... 83

Figure 4.5 Average F1-Scores of all single and possible accelerometer

combinations across different subjects. Error bars represent 95%

confidence intervals. (*) indicates statistical significance (p < 0.05) .......... 86

Figure 5.1 Borg’s Rating of Perceived Exertion (6-20) scale .................................. 100

Figure 5.2 Prediction performances of single modality models .............................. 104

Figure 6.1 F1 Scores for all combinations of modalities using feature fusion; ....... 120

Figure 6.2 The confusion matrix for the best combinations in each classifier ........ 121

Figure 6.3 Scores for all combinations of modalities using decision fusion; .......... 122

Figure 7.1 The transformed representation of a) a single 3-axis accelerometer

window, b) a combination of two accelerometer’s windows .................... 138

Figure 7.2 The CNN architecture used in this study. ............................................... 139

Figure 7.3. Conventional approach design for each accelerometer location and

a combination ............................................................................................. 141

Figure 7.4 Results of the approaches for each accelerometer location and a

combination using (a) RMSE, and b) R2 ................................................... 144

Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xv

List of Tables

Table 2.1 Overview of the extracted features (ordered by author name) ................... 24

Table 2.2 Overview of some wearable sensor-based works that used machine

learning algorithms (ordered by author name) ............................................. 31

Table 3.1 Comparison across three datasets .............................................................. 50

Table 3.2 Classification results (F1-Score) using wrist acceleration sensor of

dataset #1 ..................................................................................................... 56


dataset #2 ..................................................................................................... 57


dataset #3 ..................................................................................................... 58

Table 4.1 List of features extracted from each window of an accelerometer ............ 74

Table 4.2 Average F1-scores for each classification model across both datasets ...... 82

Table 4.3 F1-scores for single and all possible combinations of accelerometer

sensors in PAMAP2 dataset ......................................................................... 84

Table 4.4 F1-scores for single and all possible combinations of accelerometer

sensors in MHEALTH dataset ..................................................................... 85

Table 4.5 Confusion matrix for ankle and wrist combination (A+W) in

PAMAP2 dataset .......................................................................................... 87

Table 4.6 Confusion matrix for ankle and wrist combination (A+W) in

MHEALTH dataset ...................................................................................... 88

Table 5.1. Prediction performances of models developed from the combination

of modalities............................................................................................... 105

Table 6.1 Feature set extracted from each sensor modality ..................................... 116

Table 6.2 F1-scores of five different modalities using three classifiers ................. 119

Table 7.1 List of Activities and Their Description .................................................. 135

Table 7.2 List of features extracted from each window of an accelerometer .......... 142

Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xvii

List of Abbreviations

Abbreviation (Alphabetically Sorted)

ACC Accelerometer Data

ANN Artificial Neural Network Classification

BDT Binary Decision Tree Classification

BKS Behaviour Knowledge Space Combiner

CFS Correlation-based Feature Selection

CNNR Convolutional Neural Network Regression

DNN Deep Neural Network Classification

Eda Electrodermal activity

EE Energy Expenditure

HR Heart-Rate

kNN k Nearest Neighbour Classification

MLR Multiple Linear Regression

MRMR Minimal Redundancy Maximum Relevance

NB Naïve Bayes Combiner

NNR Neural Network Regression

PA Physical Activity

RF Random Forest Classification

RPE Rate of Perceived Exertion

RR R-R Interval

SVM Support Vector Machine Classification

SVMR Support Vector Machine Regression

Temp Body Temperature

WMV Weighted Majority Voting

Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xix

List of Publications

List of Q1 Journal Papers

1) A. K. Chowdhury, D. Tjondronegoro, V. Chandran, and S. G. Trost, "Ensemble

Methods for Classification of Physical Activities from Wrist Accelerometry,"

Medicine and Science in Sports and Exercise, vol. 49, no. 9, p. 1965, 2017. (Accepted)

2) A. K. Chowdhury, D. Tjondronegoro, V. Chandran, and S. G. Trost, "Physical

activity recognition using posterior-adapted class-based fusion of multi-

accelerometers data," IEEE Journal of Biomedical and Health Informatics, 2017.

(Accepted)

3) A. K. Chowdhury, D. Tjondronegoro, V. Chandran, J. Zhang, and S. G. Trost,

"Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of

Physiological Data," Plos One, 2018. (To be Submitted Soon)

4) A. K. Chowdhury, D. Tjondronegoro, J. Zhang, M. Hagenbuchner, C. Dylan, and S.

G. Trost, “Deep Learning for Energy Expenditure Prediction in Preschool Children,”

IEEE Journal of Biomedical and Health Informatics, 2018. (Under Revision)

List of Conference Papers

5) A. K. Chowdhury, D. Tjondronegoro, J. Zhang, P. S. Pratiwi, and S. G. Trost,

"Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from

Multimodal Wearable Sensor Data," in Proceedings of the 1st IEEE Life Science

Conference, 2017: IEEE. (Accepted)

6) A. K. Chowdhury, A. Farseev, P. R. Chakraborty, and V. Chandran, “Automatic

Classification of Physical Exercises from Wearable Sensors using Small Dataset from

Non-Laboratory Settings,” in Proceedings of the 1st IEEE Life Science Conference,

2017: IEEE. (Accepted)

7) A. K. Chowdhury, D. Tjondronegoro, J. Zhang, M. Hagenbuchner, C. Dylan, and S.

G. Trost, “Deep Learning for Energy Expenditure Prediction in Preschool Children,”

in International Conference on Biomedical and Health Informatics (BHI), 2018:

IEEE-EMBS. (Accepted as 1-page Abstract for Poster Presentation)

Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xxi

Statement of Original Authorship

The work contained in this thesis has not been previously submitted to meet

requirements for an award at this or any other higher education institution. To the best

of my knowledge and belief, the thesis contains no material previously published or

written by another person except where due reference is made.

Signature:

Date: ____01/05/2018____________

QUT Verified Signature

Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning xxiii

Acknowledgements

This PhD is attributed to the immense support received from numerous QUT

members. I would like to thank my supervision team – including Dr Jinglan Zhang

(Principal Supervisor), Professor Stewart Trost (Associate Supervisor), Professor Dian

Tjondronegoro (Associate Supervisor), and Professor Vinod Chandran (Associate

Supervisor). I am tremendously grateful to all of my supervisors for their endless

support throughout my PhD study.

I would like to express my gratitude to Dr Jinglan Zhang for accepting me as her

student from third year of my PhD. She helped me a lot regarding my technical

knowledge, writing and presenting the results. I am so fortunate to have Professor

Stewart Trost as my supervisor. He actively helped me a lot with designing the

methods, finding gaps, writing the works. He participated in every weekly meeting,

closely monitored my progress, and provided in-depth feedback to improve my work.

Professor Dian Tjondronegoro has been a great mentor towards my PhD journey and

beyond. He was my principal supervisor until he left QUT. I have learnt a lot from him

about writing, research, and professional knowledge. He always brought helpful

connections with the right people during my candidature. Professor Vinod Chandran

always provided on-time and valuable feedback on my work. He helped me a lot with

the technical issues, methods and writing.

I acknowledge the financial support received from QUT, via QUT’s

postgraduate research scholarships (QUT-PRA), and excellence top up scholarships. I

am thankful to Institute of Health and Biomedical Innovation (IHBI), Information

Systems School and School of Electrical Engineering and Computer Science for all

their support throughout my journey. A sincere thank you to Tim Mcsweeney for his

diligent proofreading of this thesis.

My heartfelt thanks go to my parents and my wife for their steady mental

support, consideration, patience, help, and encouragement. This work couldn’t have

been completed without their pure devotion, sacrifice and continued prayers towards

my success. Their encouragement kept me going when there seemed to be no way

forward.

xxiv Alok Kumar Chowdhury (2018) PhD Thesis - Sensor-Based Prediction of Physical Activity and Its Impacts using Machine Learning

Finally, I want to thank all my colleagues in QUT. I express my gratitude to all

the anonymous students and staff of QUT who helped in my research with their

participation in my studies.

Chapter 1: Introduction 1

Chapter 1: Introduction

1.1 BACKGROUND AND MOTIVATION

Physical inactivity has been identified as the fourth leading risk factor for global

mortality [1]. It is one of the key contributors to the overweight and obesity epidemic

in the worlds’ population, and is considered as one of the major contributors to chronic

diseases such as heart-disease, type-2 diabetes, depression, and some cancers [2-5].

According to the Australian Institute of Health and Welfare (AIHW), overweight and

obesity is also continuing to rise, and is considered to be the second highest contributor

to disease likelihood in Australia. The rapid development of technologies in both

developed and developing countries makes people’s life easier and enjoyable, but

unfortunately it has removed opportunities for physical activity during work and

leisure. Physical activity (PA) is good for people’s health and well-being as it helps

one to manage weight, control blood pressure, improve cardiovascular fitness, and

manage stress. There is strong evidence to show that regular physical activity

dramatically improves quality of life as it contributes to physical health and is effective

for the improvement of the mental state as well, such as decreasing depression or

anxiety [6-8]. Physical activity is associated with positive health outcomes among

multiple age groups including children, adults, and the elderly regardless of their

physical status/fitness [9, 10].

Despite the obvious benefits of PA, around 60 to 85% of worlds’ population

don’t meet the recommended minimum PA level [11]. Uncertainty with respect to the

correct frequency, intensity, and duration may be one important factor that contributes

to poor exercise adherence. Performing PA without knowing the correct dose can lead

to over- or under-exercising, which reduces the safety and effectiveness of the PA [12].

Due to different levels of physical fitness, the impacts from a given physical activity

may vary among peoples [13]. There are also other reasons such as busy lifestyle, lack

of social support, knowledge, facilities, motivation, etc. for which people are not

encouraged to follow the policies and community level initiatives [14]. For example,

the World Health Organization [15] recommends that adults, 18-64 years, perform at

least 150 minutes of moderate-intensity or 75 minutes of vigorous-intensity physical

exercise weekly. Unless a person realises the benefits of PA, and learns the contexts

2 Chapter 1: Introduction

(e.g., type, amount, intensity, calorie, place, etc.), it remains challenging to increase

people’s adherence to PA.

A self-monitoring tool that can objectively and accurately measure PA and its

personal impacts (relative intensity, and energy expenditure) is needed to monitor

compliance with PA guidelines and help people to perform PA at the correct intensity.

Nowadays, wearable and ambient sensors have rapidly been adopted by the general

population due to the development of integrated sensor technologies such as

miniaturisation, improved user experience design, and low battery consumption. These

modern sensors can be used to collect real-time movement/ acceleration and

physiological data from the users efficiently and unobtrusively. Using this modern

technology, an automated system (learning model) could be developed that can

accurately track the physical activity type, intensity, and energy expenditure and

display it to users as a daily summary [16]. Such objective tracking could increase PA

awareness, management, and promote long-term life-style changes [17], and in turn,

reduce illness and health management cost.

Physical activity usually manifests through users’ movement and physiological

responses such as heart-rate, respiration rate, skin conductance, volume of oxygen

consumption, body temperature, etc. [18, 19]. For example, accelerometer sensors can

record the acceleration or deceleration of the body which can be utilised in a learning

model for objective and direct prediction of PA. Heart-rate, skin-conductance, and

body temperature also have a strong relationship with PA type and intensity, which

can be used to personalise PA estimation [18]. A multitude of studies have been carried

out by utilising these data, captured by wearable sensors, to estimate PA classes and

energy expenditure. However, most of the approaches are either lab-based or use

simple learning algorithms which often perform poorly out of the lab. Moreover, in

general, physiological sensor data requires person-level calibration which negatively

impacts the applicability in real-world contexts [20-22].

This thesis focuses on the use of advanced machine learning and novel fusion

algorithms to improve the measurement of physical activities and its impacts on a

person such as relative intensity and energy expenditure.


1.2 RESEARCH PROBLEM

Few studies in the PA domain have explored methods to maximise the use multi-

modal and multi-body-positioned sensor data to improve the prediction of PA and its

personal impacts. Most of the previous methods are lab-based and used simple

learning algorithms for PA recognition [17]. The physiological modalities such as

electrodermal activity, body temperature etc., and multi-accelerometer combinations

warrant further investigations.

1.2.1 Lack of Methods That Use and Compare Advanced Ensemble Learning Algorithms

The use of machine learning for the recognition of physical activity and energy

expenditure have gained considerable research attention in the past few years [23].

Machine learning methods usually extract features in the data and then use supervised

or unsupervised learning algorithms to predict physical activity type and/or energy

expenditure. This approach usually involves training a single learning algorithm such

as support vector machine, neural networks, or random forest. Machine learning

approaches have been shown to provide better prediction for a greater variety of

physical activity metrics (e.g., activity type, walking speed) and energy cost, compared

to a non-machine learning approach such as thresholding (also known as cut-point)

[24, 25]. The use of machine learning is diverse, and most studies are inconsistent in

terms of finding a single learning algorithm which performs well across datasets. The

differences in the data processing methods and the problem of not having

generalisation have hindered research efforts to quantify, understand and intervene on

physical activity and sedentary behaviour. In order to increase generalisation and

overall performance, ensemble learning algorithms which use multiple learning

models and appropriately combine them to get the most out of each model are gaining

popularity. However, the ensemble learning algorithms are not properly used and

systematically compared in the PA domain.

1.2.2 Lack of Decision Fusion Methods to Combine Multi-Accelerometer Data

Combining data from multiple accelerometers, placed at different body

locations, has been found to be effective in improving the accuracy of PA recognition

[26, 27]. Cleland, et al. [28] showed that the placement of accelerometer sensors on

different body locations affects the PA prediction performance. For example, while

the wrist placement is a comfortable position for the users and preferable for


recognition of simple activities (e.g., running, standing, etc.) [29], other locations such

as the hip, ankle, and chest demonstrate good performance for certain groups of

activities (e.g., lying, running, standing, etc.) [26, 28, 30]. Acceleration data from

multiple locations can be combined using feature- or a decision-level fusion approach

[31]. Decision-level fusion has been found to be more accurate than feature fusion in

other domains [32]; however, it has not been systematically investigated for PA

recognition. It is also important to determine the best combination of sensor

placements for optimal PA recognition.

1.2.3 Lack of Methods That Use Multi-Modal Data for Relative Intensity Prediction

To date, research efforts to quantify PA intensity from wearable sensors have

predominantly been based on absolute intensity [32-34]. Because such estimates do

not consider an individual’s aerobic fitness, age or health status, the predicted intensity

of PA could be above accepted thresholds for moderate-to-vigorous PA (MVPA), As

such, m-Health platforms using wearable sensor systems to monitor the intensity of

PA could be encouraging individuals to exercise at relative intensities that are neither

safe nor effective. Thus, the development of validated algorithms to predict relative

PA intensity from wearable sensor data constitutes an important research priority.

An alternative approach to measuring relative intensity that does not require

instrumentation or individual calibration in the laboratory is the use of effort

perception or ratings of perceived exertion (RPE). Effort perception scales such as the

Borg alpha-numeric RPE Category Scale are commonly used in exercise testing and

prescription contexts and have been shown to be a valid and reliable indicator of

relative PA intensity [35-38]. Yet, despite the widespread use of RPE for effort

estimation, the utility of algorithms to predict relative PA intensity based on RPE has

not been explored.

Because of the linear relationship between HR and work rate during steady state

exercise, heartrate based methods are mostly adopted for quantifying the relative

intensity of PA, but they are not effective for low relative intensities [39]. Moreover,

these approaches require knowledge of HR max for which commonly used age-related

prediction equations are subject to considerable measurement error [40, 41]. In

addition to heartrate, some other modalities of physiological data, including

electrodermal activity (Eda) and body temperature (Temp), can be easily obtained


using wearable sensors. These physiological indicators can provide valuable

information about the metabolic demand of exercise and can also be used to predict

relative PA intensity. However, to the best of my knowledge, the use of multiple

modalities of physiological data for relative intensity prediction has not been

previously investigated.

1.2.4 Lack of Methods That Use Deep Learning for Energy Expenditure Prediction

To date, few studies in the PA domain have used advanced machine learning

algorithms such as deep learning to predict physical activity outcomes such as energy

expenditure (EE) [42]. In most energy expenditure prediction applications, the wealth

of data generated from the sensors has not been thoroughly utilised and predictions are

based on simple linear regression [43]. Such usages have shown poor performance

with large prediction errors [43]. The use of supervised machine learning for EE

prediction has emerged as a viable and more accurate alternative to simple linear

regression [44, 45]. Conventional machine learning involves manual extraction of

features, feature selection, and applying regression algorithms such as artificial neural

networks (ANN) [46-48], ensemble decision trees [49], and support vector machine

[50]. The performance of regression depends on the quality and number of features,

which requires domain knowledge for feature extraction and sophisticated feature

selection algorithms. Often, the same extracted features do not perform equally well

in different studies [51]. To eliminate the need of manual feature selection, deep

learning is gaining popularity and has demonstrated superior performance in other

domains [52]. Preliminary work has established deep learning as a viable strategy to

predict EE from accelerometer output in adults [42]; however, no previous study has

used deep learning to predict EE in preschool-aged children. Child specific models

are needed given pre-schoolers’ unique movement patterns and greater energy cost of

locomotion [53-55].

Most previous studies generally used their own datasets, with no validation of

results across variations of datasets, size of datasets and activity type selections.

Therefore, the performances of existing algorithms have been found to be inconsistent

and dependent on the sample used to generate the training data and the activity targets

under investigation [56].


1.3 RESEARCH AIMS AND OBJECTIVES

The overarching aim of this research is: To develop learning models and fusion

algorithms that can maximise the use of multi-modal and/or multi-positioned

wearable sensors data to accurately predict 1) classes of physical activity and 2) its

personal impacts (including relative physical activity intensity and energy

expenditure).

This aim gives rise to the following research objectives:

1) To develop and evaluate an advanced custom ensemble learning algorithm

for PA recognition and compare with conventional ensemble methods and

standalone state-of-the-art supervised learning algorithms.

2) To design a new decision fusion algorithm that can effectively combine

multiple accelerometers placed in different body locations for improving the

PA recognition. And identify the optimal sensor positioning (single location

or combination of multiple location) for recognising a range of PAs.

3) To explore the use of multimodal sensor data for the prediction of relative

PA intensity using both feature and decision fusion.

4) To use deep learning for energy expenditure prediction and compare it with

conventional machine learning algorithms and simple regression.

All the proposed methods need to be evaluated using a diverse range of datasets

collected from different participant groups (adults and children) performing different

physical activities in different contexts (indoor vs outdoor).


1.4 RESEARCH FRAMEWORK

The key components of this research are shown in Figure 1.1.

Figure 1.1 The framework of this research

The research involved data collection, modelling and validation of the models.

In the data collection phase, a real-world dataset was collected using the wearable

sensors in the outdoor context. The collected data included acceleration, heart-rate,

electrodermal activity, temperature, and profile data. Apart from the collected dataset,

several datasets which were either public or collected by other parties were utilised in

the research. By leveraging advanced machine learning and fusion algorithms, a set of

learning models (both classification and regression) were developed to improve the

recognition of PA and the prediction of its impacts on persons, namely relative

intensity and energy expenditure. All of the models were validated using a diverse

range of datasets and compared with state-of-the-art methods.


1.5 CONTRIBUTIONS OF THE THESIS

The contributions of this research help to extend knowledge, methods and

techniques, in the field of physical activity recognition, and the prediction of personal

impacts of PA (e.g., relative intensity, and energy expenditure prediction). Each

contribution and the related publications are listed in the following section.

The specific contributions of this thesis in the context of “PA prediction” are:

1) Recontextualisation of multi-classifier ensembles and systematically

compare them: This research systematically compared the PA classification

accuracy achieved by conventional ensemble methods (bagged decision tree,

boosted decision tree, and random forest) and a custom multi-classifier ensemble

combining four machine learning algorithms (binary decision tree, k-nearest

neighbour, support vector machine, and neural network) using three decision

fusion rules (weighted majority voting, Naïve Bayes, and behaviour knowledge

space). Performance was evaluated in three independent PA recognition datasets.

The results revealed that combining multiple individual classifiers using ensemble

learning methods can improve activity recognition accuracy from wrist-worn

accelerometer data.

Related Publication:

1. A. K. Chowdhury, D. Tjondronegoro, V. Chandran, and S. G. Trost,

"Ensemble Methods for Classification of Physical Activities from Wrist

Accelerometry," Medicine and Science in Sports and Exercise, vol. 49, no.

9, p. 1965, 2017. (Q1 Journal, Accepted)

2) Novel decision fusion algorithm and sensor positioning: A novel posterior-

adapted class-based weighted decision fusion was proposed to effectively

combine multiple accelerometers data for improving physical activity recognition.

The fusion was applied in two and three accelerometer location combinations. The

results identified that the proposed decision fusion was superior to the other state-

of-the-art fusion algorithms, and that a two-accelerometer combination (wrist and

ankle) provided the best PA recognition performance.


Related Publication:

2. A. K. Chowdhury, D. Tjondronegoro, V. Chandran, and S. G. Trost,

"Physical activity recognition using posterior-adapted class-based fusion of

multi-accelerometers data," IEEE Journal of Biomedical and Health

Informatics, 2017. (Q1 Journal, Accepted)

The specific contributions in the context of “PA impacts estimation” are:

3) Investigation of multimodal physiological data for relative intensity

prediction: Both regression and classification models were developed to

effectively predict the relative PA intensity using multimodal physiological sensor

data (heart-rate, RR-interval, Eda and Temp). The experiments were based on a

real-world (non-laboratory) and longitudinal dataset, collected from 22 people,

where Borg’s RPE scale was used as a ground truth measure of relative intensity.

The results showed that features extracted from RR-interval provided the highest

prediction performance compared to any other single modality. However, the

combination of Eda and Temp features fused with RR features produced the best

overall performance, confirming the benefits of using multi-modal data.

Related Publications:

3. 3. A. K. Chowdhury, D. Tjondronegoro, J. Zhang, P. S. Pratiwi, and S. G.

Trost, "Towards Non-Laboratory Prediction of Relative Physical Activity

Intensities from Multimodal Wearable Sensor Data," in Proceedings of the

1st IEEE Life Science Conference, 2017: IEEE. (Accepted)

4. A. K. Chowdhury, D. Tjondronegoro, V. Chandran, J. Zhang, and S. G.

Trost, "Prediction of Relative Physical Activity Intensity Using Multimodal

Sensing of Physiological Data," Plos One, 2018. (Q1 Journal, to be

submitted soon)

4) Use of Deep learning for energy expenditure prediction: This research is the

first study that proposes the use of deep learning to effectively predict energy

expenditure from body worn accelerometers in pre-school-aged children. It also

systematically compares the deep learning approach to conventional supervised

machine learning and simplified regression approaches in different accelerometer


location configurations including wrist and hip. The results show that deep

learning can achieve a comparable performance to the conventional supervised

learning, and significantly outperformed the simplified regression approaches.

Related Publications:

5. A. K. Chowdhury, D. Tjondronegoro, J. Zhang, M. Hagenbuchner, D. Cliff,

and S. G. Trost, “Deep Learning for Energy Expenditure Prediction in

Preschool Children,” in International Conference on Biomedical and Health

Informatics (BHI), 2018: IEEE-EMBS. (Accepted as 1-page abstract for

poster presentation)

6. A. K. Chowdhury, D. Tjondronegoro, J. Zhang, M. Hagenbuchner, D. Cliff,

and S. G. Trost, "Deep Learning for Energy Expenditure Estimation in

Preschool Children," IEEE Journal of Biomedical and Health Informatics,

2018. (Q1 Journal, to be submitted as full-paper)


1.6 SIGNIFICANCE

The methods developed in this thesis collectively deliver better algorithms and

maximise the use of available sensor information to provide accurate measurement of

PA type, relative PA intensity, and energy expenditure. This research is significant in

a number of ways including:

1. Enhancing techniques for classifying physical activity, introducing the use of

multi-classifier ensemble, proposing novel multi-sensor decision fusion

algorithm, and comparing sensor positioning to recommend the best use of

multiple sensors placed in different body locations for improving PA estimation.

2. Extending the knowledge on quantifying impacts of PA using multimodal

physiological data in terms of the relative intensity, and deep learning model for

energy expenditure. Hence, helping end users to exercise at an intensity that is

both safe and effective.

Overall, this research provides new, more accurate methods for sensor-enabled

monitoring of physical activities. The impact of this work to public health is

significant. Clinicians, researchers and public health agencies could significantly

improve physical activity surveillance and more effectively identify individuals at risk.

The work also has important implications for the design of accurate and effective

technology-based physical activity monitoring and intervention applications that could

be delivered through e-Health initiatives.


1.7 THESIS OUTLINE

This section provides an overview of how the thesis contributions are organised

in different chapters.

Introduction: Introduction of the thesis is presented in Chapter 1. It briefly

describes the background, problems, aims, objectives and significance of this research.

Literature Review: Chapter 2 presents the literature review. It reviews existing

work (both traditional and wearable sensor based) on Physical activity. Literature on

methods and techniques on feature extraction, feature selection, learning model, and

fusion are discussed. This chapter also summarises the research gaps in current

literature.

Part 1- Classification of Physical Activities: This part presents our two

contributions/ publications in two chapters (chapter 3 and 4). Chapter 3 presents our

work on the recontextualisation of multi-classifier ensembles and systematically

compares them with other approaches (contribution 1). Chapter 4 presents our work

on novel fusion and sensor positioning (contribution 2).

Part 2 - Estimation of Impacts of Physical Activities: This part consists of

three chapters. Chapter 5 and 6 show our work on multimodal analysis for the relative

intensity prediction (contribution 3). In chapter 7, use of deep learning for energy

expenditure prediction is presented (contribution 4).

Conclusion: Chapter 8 concludes the dissertation with a summary of the

research and provides future directions.

Chapter 2: Literature Review 13

Chapter 2: Literature Review

This chapter aims to provide the overview of background to the research aim,

which is to develop techniques to improve the measurement of physical activity and

their associated personal impacts – in terms of relative intensity and energy

expenditure – using wearable sensor data. Section 2.1 provides a background on the

importance of physical activity for people’s health and wellbeing and the current

guidelines to achieve the right amount of exercises. Section 2.2 presents the current

methods to assess personal impacts from physical activities. Section 2.3 reviews

sensor-based learning methods for physical activity type, intensity, and energy

expenditure prediction and presents the current gaps in knowledge to establish the

motivation for this research. At the end of this chapter, we will outline the key

limitations that will be addressed, and each of the subsequent chapters will provide

further insights into the current literature of the specific problems addressed.

2.1 PHYSICAL ACTIVITY AND HEALTH

2.1.1 Health Benefits of Physical Activity

Physical activity is defined as “any bodily movement produced by the

contraction of skeletal muscle that increases energy expenditure above a basal level”

[57]. Performing adequate amounts of physical activities can improve physical and

psychological health, and one’s quality of life [58, 59]. It can also improve

cardiorespiratory fitness [60]. Physical activity provides a wide spectrum of health

benefits including reduction in the risk of early death, cardiovascular disease, type 2

diabetes, colon and breast cancer, depression and loss of cognitive function [2, 61, 62].

Further, there is moderate to strong evidence that physical activity helps to maintain

weight loss, preserve physical ability in older adults, and improves sleep quality. [59,

62].

2.1.2 Guidelines of Physical Activity

On the basis of the evidence linking physical activity and health, eminent

scientific and public health organisations such as the World Health Organisation

(WHO) have issued guidelines for participation in physical activity. The guidelines are

different for different age-groups (e.g., infants, pre-schoolers, school-aged children,

14 Chapter 2: Literature Review

adults, and older adults). This thesis focuses on the data collected from adults and pre-

schoolers age-groups. The guidelines for adults and pre-schoolers are provided below.

WHO recommends adults aged 18–64 should participate in at least 150 minutes

of moderate-intensity aerobic physical activity throughout the week or do at least 75

minutes of vigorous-intensity aerobic physical activity throughout the week or an

equivalent combination of moderate- and vigorous-intensity activity [15]. In order to

achieve additional health benefits, it recommends adults to increase their moderate-

intensity aerobic physical activity to 300 minutes per week, which is equivalent to 150

minutes of vigorous-intensity aerobic physical activity per week. However, the two

intensities can be mixed in any ratio. According to The Department of Health Australia

[63], the current recommendation for Australian adults (18-64 years) is to accumulate

150 to 300 minutes (2 ½ to 5 hours) of moderate intensity physical activity or 75 to

150 minutes (1 ¼ to 2 ½ hours) of vigorous intensity physical activity, or an equivalent

combination of both moderate and vigorous activities, each week. Both guidelines

suggest people to perform muscle strengthening activities on at least 2 days each week.

Australia’s 24-hour movement guideline for the early years (0-5 years)

recommends pre-schoolers aged 3-5 years should perform at least 180 minutes of

physical activities per day, of which 60 minutes of total physical activity should be

energetic play such as running, jumping, kicking and throwing, and spread throughout

the day [63, 64]. For infants aged 0-1-year, interactive floor-based play in a supervised

and safe environment is recommended. For infants who are not mobile yet, 30 minutes

tummy time is encouraged, which includes activities of reaching and grasping, pushing

and pulling, and crawling. For toddlers (1 to 2 years), The Department of Health

Australia [63] recommends at least 180 minutes of a variety of physical activities

including energetic play such as running, jumping and twirling spread throughout the

day.

2.1.3 Risks Associated with Physical Activity

There is the risk of injuries with any types of physical activity. Risks of physical

injuries are higher in contact activities (soccer, basketball) than in the non-contact

activities like walking, swimming, and gardening [65]. Though it is rare, there are also

well-documented associations between acute episodes of exercise and sudden cardiac

death [12]. It has been estimated that between 6-17% of men who experience sudden

cardiac death do so during acute exercise [66, 67].


Despite the risks of physical activity, the benefits of being physically active far

outweigh the risks [68]. The risk can be minimised by avoiding over-exertion, i.e.

performing physical activity at an appropriate frequency, intensity, and duration.

Taking proper safety measures, such as performing physical activity in safe places,

avoiding exercise in extreme cold or heat, and the use of proper equipment can mitigate

the risks associated with physical activity participation.


2.2 PERSONAL IMPACTS FROM PHYSICAL ACTIVITY

Physical activity results in energy expenditure and the effort required to perform

physical activity is perceived by the individual. The work rate or intensity of physical

activity can be measured in either absolute or relative terms. The absolute impact of

physical activity is the external workload or energy cost for a particular physical

activity, typically expressed as multiples of resting metabolism or Metabolic

Equivalents (METs) [33]. The relative impact of physical activity can be the perceived

intensity or relative physical activity intensity. This research focuses on both ‘relative

physical activity intensity’, and ‘energy expenditure’ as the personal impacts from

physical activity. These terms are described below.

2.2.1 Relative Intensity of Physical activity

Relative intensity is generally expressed as a percentage of an individual’s

maximal aerobic capacity (% VO2 max, % HR reserve) or based on ratings of perceived

exertion (RPE).

For any given physical activity, relative intensity varies between persons due to

between-person differences in aerobic capacity and effort perception [13, 69]. For

example, the absolute intensity of brisk walking is 4 METs. For a young healthy person

whose maximal aerobic capacity is 10 METs, it will be equivalent to a relative intensity

of 40% of maximal capacity. But, for a person with chronic disease whose maximal

aerobic capacity is 6 METs, the relative intensity will be ~67% of the maximal

capacity. Thus, to achieve a certain absolute intensity, individuals with a lower aerobic

capacity are required to work significantly harder (in relative terms) than a person with

higher aerobic capacity and vice versa [70].

Measurement of Relative Intensity

Relative physical activity intensity can be expressed as a percentage of a

person’s maximal oxygen uptake (VO2max) or oxygen uptake reserve (VO2R), which

is measured using an incremental exercise test [62, 71]. VO2max is calculated using a

submaximal exercise test, usually done on a treadmill under clinical supervision.

VO2R is calculated by subtracting resting VO2 from the VO2max. Although these are

considered as gold standard, it is not feasible in real world settings because exercise

tests require sophisticated instruments, lab-based individual calibration, and multiple

visits for the participants.


Percentage of maximal heart rate (%HRmax), or heart rate reserve (%HRR) are

also popular methods for measuring relative physical activity intensity due to the linear

relationship between heart-rate and work rate during steady state exercise [71, 72].

Heart rate based methods are valid for moderate-to-vigorous physical activities, but

they are not accurate for activities of low intensity [39]. Moreover, these approaches

require knowledge of HRmax which can be estimated by the maximal exercise test or

using the person’s age. The common formula to calculate HRmax from age is (220 –

Age). The maximal exercise test requires a lab-based set up, and age-related prediction

equations are subject to considerable measurement error [40, 41].

Self-Rated Scales for Measurement of Relative Intensity

There are also widely used self-rated scales for the measurement of perceived

exertion or relative intensity, which are accessible to everyone and usable in real

contexts. These scales describe how hard an individual perceives an activity to be.

Borg’s rating of perceived exertion (RPE) scale and OMNI perceived exertion scales

are two commonly used validated measures of the physical activity intensity.

Borg’s Rate of Perceived Exertion (RPE) Scale – Borg’s RPE scale reflects

how heavy and strenuous physical activity feels to someone, linking all sensations and

feelings of physical stress, effort, and fatigue [35]. It does not consider factors such as

muscle pain or lack of breath, but focuses on total feelings of exertion. This rating

scale ranges from 6 – 20, where 6 is “no exertion at all” and 20 means “maximal

exertion”. Borg’s RPE scale is shown in Figure 2.1.

Effort perceptions measured using Borg’s RPE scale are strongly and positively

correlated with physiological indictors of exercise intensity such as oxygen

consumption, heart-rate and power output. A study on Taiwanese men [38] showed a

linear relationship with heart rate and Borg’s RPE. Scherr, et al. [36] also performed

maximal treadmill tests on 2,560 participants and simultaneously collected their heart-

rate, blood lactate and RPE. Ratings of perceived effort on the Borg scale were strongly

and positively correlated with heart-rate (r=0.74, p<0.001) and blood lactate (r=0.83,

p<0.001).


Figure 2.1 Borg’s RPE scale

OMNI Perceived Exertion Scale – The first version of the OMNI scale was

constructed for children and adolescents [73, 74]. The OMNI scale uses a pictorial

scale which is helpful for users to report their perceived intensity. More recently, an

adult version of the OMNI scale was developed which is shown in Figure 2.2. The

OMNI scale is a 10-point scale where 0 represents “extremely easy” or “not tired at

all” and 10 represents “extremely hard” or “very, very tired” for adults and children,

respectively.

Figure 2.2 OMNI scale of perceived exertion (adult) for cycling

Like Borg’s scale, a number of studies have established the concurrent validity

of the OMNI scale. The most common criteria measures have been oxygen

consumption, heart-rate and power output [73]. Utter, et al. [75] validated an OMNI


scale for young adults by correlating the RPE-OMNI with oxygen uptake (VO2),

respiratory rate (RR) and heart-rate (HR). Their analysis indicated that RPE-OMNI is

positively correlated with the physiological parameters, where r = 0.67 to 0.88 (P<

0.05).

Although RPE scales are valid and useful for people to report their relative

activity intensity, these scales are difficult to implement in automated scenarios where

people might want to automatically track and record their everyday physical activity.

2.2.2 Energy Expenditure

Total energy expenditure (TEE) is the sum of the basal metabolic rate (the

amount of energy expended while at complete rest), the thermic effect of food (the

energy required to digest and absorb food) and the energy expended in physical

activity (PAEE) [76]. Basal metabolic rate is the largest component of TEE (~60%),

and thermal effect of food is the smallest component which accounts for 10% of TEE.

PAEE is the most variable component of TEE, which accounts for approximately 30%

of the TEE [77].

Measurement of Energy Expenditure

There are several methods available to estimate the human energy expenditure.

These techniques and their advantages and disadvantages are briefly described below.

Direct calorimetry measures the energy expenditure over longer periods of time

(e.g. 24 hours). It is the most accurate technique for measuring energy expenditure. In

this approach, an individual is placed in a room calorimeter to measure the body heat.

Energy expenditure is calculated by the direct measurement of heat loss from the body.

As a result, direct calorimetry techniques are only lab based, and not a practical way

to measure energy expenditure [78, 79].

Indirect calorimetry measures energy expenditure through determining the rates

of O2 consumption and CO2 production [80]. In recent years, indirect calorimeters

have become portable which enables one to measure energy expenditure in outdoor

and real contexts reliably. An indirect calorimeter typically uses a mouth piece /

breathing mask, turbine flow meter, and O2 and CO2 analysers. Indirect calorimetry

can also be used for short periods of time (e.g., minutes to a few hours), which enables

researchers to measure the energy expenditure of participants in the most feasible way.


The doubly labelled water method was developed to measure energy expenditure. This

method is based on the kinetics of 2 stable isotopes of water, 2H2O (deuterium-labelled

water) and H218O (oxygen-18-labeled water). These stable isotopes are naturally

occurring compounds without known toxicity at the low doses used. Deuterium-

labelled water is lost from the body through the usual routes of water loss (urine, sweat,

evaporative losses). Oxygen-18-labeled water is lost from the body at a slightly faster

rate because this isotope is also lost via carbon dioxide production in addition to all

routes of water loss. The difference in the rate of loss between the 2 isotopes is

therefore a function of the rate of carbon dioxide production - a reflection of the rate

of energy production over time. The doubly labelled water method is reliable (3-10%

error) and can be used in free living samples [78]. But, the O-18 water and instruments

to measure traces of the isotopes are quite expensive. Also, it does not provide

information on daily variations of energy expenditure.


2.3 SENSOR-BASED METHODS TO MEASURE PHYSICAL ACTIVITY AND ITS IMPACTS

Given the limitations and participant burden associated with the above-

mentioned laboratory-based methods, wearable sensor-based approaches have become

the method of choice for measuring physical activity and energy expenditure [21].

In this section, the wearable sensor-based methods are briefly reviewed. Section

2.3.1 presents an overview of wearable sensor and data. The subsequent sections

summarise all the steps of a learning method e.g., data pre-processing, feature

extraction, feature selection, and learning algorithms. Section 2.3.6 presents the effects

of sensor number, and positioning on the performance of learning algorithm.

2.3.1 Wearable Sensors

Accelerometers measure body movement in terms of acceleration which is

useful for determining the type of physical activity, intensity of physical activity, and

energy expenditure in real contexts [16, 33, 34, 47, 81]. Montoye, et al. [82] were

among the first to recognise the potential of accelerometers to assess the intensity of

physical activity objectively. The early accelerometer based works used single-axis

accelerometers [83], and recent works are using 3-dimensional acceleration signals

[18, 30, 48].

Physiological sensors such as heart-rate, respiration rate, and electrodermal

activity are also widely used in physical activity research. These physiological data are

linearly related to the intensity of physical activity, and can be considered important

cues for assessing physical activity [18, 84]. Previous researchers have utilised heart-

rate with accelerometer data for measurement of physical activity type and energy

expenditure and reported consistent improvement over the approaches that use

accelerometers alone [22, 85]. Smolander, et al. [86] improved energy expenditure

estimation using respiration with the heart-rate compared to heart-rate alone.

Electrodermal responses and skin temperature provide sweating information due to

physical exertion. Altini, et al. [18] showed improved energy expenditure prediction

when physiological signals (e.g. heart rate (HR), respiration rate, galvanic skin

response, skin humidity) were combined with accelerometer data. However,

physiological responses, such as heart-rate and galvanic skin response, are often

affected by a person’s emotional stress, which lead to poor performance in low-

intensity physical activities [18]. In addition, because they are affected by gender, body


size, and fitness level, most physiological signals also require person-level

normalisation or calibration before being used in machine learning systems [20].

2.3.2 Data Pre-processing

In the data pre-processing step, the raw data are synchronised, then cleaned and

prepared for the modelling [87]. The raw sensor data may contain out-of-range values

and/or missing values. Improper screening of this data prior to the analysis phase can

produce misleading results. Thus, ensuring quality of data before running an analysis

is important. The collected sensor data is verified with the sensor’s standard

distribution or compared with the gold-standard equipment. It is checked to ensure the

sensor is properly calibrated as per the instructions. Any outliers, i.e. invalid or

transient data, are usually discarded from the analysis [17].

In order to deal with missing values of the data, data with missing values can

either be discarded or the missing values replaced using linear interpolation, unique

value, nearest value etc. [88]. Physiological signals such as heart-rate, electrodermal

activity, body temperature, etc. are pre-processed usually for drift and noise reduction.

Smoothing of the raw data can be done by using a moving average filter, or low-pass

or band-pass filter in sliding windows.

To minimise the inter-individual differences in the physiological data, individual

calibration is usually performed which involves several lab visits and tests. As such

calibration is not practical in real contexts, Altini, et al. [20] proposed automatic

normalisation of physiological data. They determined the baseline (𝑋𝑋𝑝𝑝ℎ𝑦𝑦𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏) of the

physiological signal while the subject was lying; and highest value (𝑋𝑋𝑝𝑝ℎ𝑦𝑦ℎ𝑖𝑖𝑖𝑖ℎ) of the

physiological signal was predicted using multiple linear regression of the

physiological signal from daily living activity such as walking. The signal was then

normalised (𝑋𝑋𝑝𝑝ℎ𝑦𝑦𝑛𝑛) using the following equation.

𝑋𝑋𝑝𝑝ℎ𝑦𝑦𝑛𝑛 =(𝑋𝑋𝑝𝑝ℎ𝑦𝑦 − 𝑋𝑋𝑝𝑝ℎ𝑦𝑦𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏)

(𝑋𝑋𝑝𝑝ℎ𝑦𝑦ℎ𝑖𝑖𝑖𝑖ℎ − 𝑋𝑋𝑝𝑝ℎ𝑦𝑦𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏)�

2.3.3 Feature Extraction

In previous physical activity studies, a wide range of features were extracted

from the raw sensor data which were used as input to the classification algorithms. At

first, the raw signal is usually segmented into sequences of consecutive windows, then

a set of features is extracted from each window [23]. The size of the window is an


important consideration and has varied from study to study. For example, 1-2 seconds

[89], 4 seconds [21], 6.7 seconds [90], 10 seconds [30], and 60 seconds [91]. Most

studies in physical activity classification used non-overlapping sliding windows to

extract features [30, 92]. Some previous studies have been shown to be effective using

50% overlapping sliding windows [90]. Overlapping of sliding windows can provide

more data points/windows to train a model when dataset is small.

The features extracted from accelerometer data can be divided into three types:

time domain, frequency domain, and time-frequency domain. Most works utilised

time-domain features or a combination of time-domain and frequency domain features.

Some studies used physiological features along with the accelerometer features, which

are mainly statistical measures extracted from heart-rate, RR interval, skin response

etc. An overview of the extracted features from both accelerometer and physiological

sensor data is given in Table 2.1.

The time domain features are the simplest features extracted from raw

accelerometer data. These features are usually statistical measures such as mean,

median, variance, skewness, kurtosis, etc., extracted from a window of raw data, or

low pass or high pass filtered data [23, 91, 93]. Correlations between accelerometer

axis data are also used, and have been shown to improve recognition [90, 94].

Frequency domain features are extracted from the Fast-Fourier-Transformed (FFT)

window. FFT of time domain signals provides amplitude of the frequency components

and distribution of the signal energy. Some widely used frequency domain features

include entropy, signal energy, principal frequency, and magnitude of principal

frequency. [51]. Few researchers have investigated both time and frequency

characteristics of the sensor data using wavelet analysis [23]. Wavelet analysis such as

discrete wavelet transforms (DWT) decomposes sensor data into a number of

coefficients based on frequency bands while temporal information is preserved. In

general, statistical measures such as standard deviation or root mean square of specific

wavelet coefficient are used for physical activity recognition [95, 96].


Table 2.1 Overview of the extracted features (ordered by author name)

Reference Sensor data and location

Accelerometer Features Physiological/Other

Features Time-domain Frequency-domain Time-frequency

Albinali, et al. [97]

Three miniature wireless accelerometers on dominant hip, thigh and upper arm

Distances between the means of the axes, Variance, Correlation coefficients

Entropy, FFT peaks and frequencies - -

Altini, et al. [18]

Accelerometer and ECG data using an ECG necklace at chest. Accelerometer, GSR and skin humidity using the wristband sensor.

Mean of the absolute signal, magnitude, mean distance between axes, variance, standard deviation, inter-quartiles range, median

Spectral energy, entropy, low frequency band signal power (0.1 − 0.75 Hz), high frequency band signal power (0.75 − 10 Hz), frequency and amplitude of the FFT coefficients

-

RR features - mean, variance and standard deviation. GSR features: mean, signal power, response rate and mean Ohmic Perturbation Duration. Skin humidity feature: mean.

Berchtold, et al. [98]

Mobile phone 3-axis accelerometer

Mean, variance for all 3 axes - - -

Cleland, et al. [28]

Accelerometers on lower back, wrist, foot, chest, hip, thigh

Mean value for each axis, Average Mean over 3 axes, Standard Deviation value for each axis, Average Standard Deviation over 3 axes, Skewness value for each axis, Average Skewness over 3 axes, Kurtosis value for each axis, Average Kurtosis over 3 axes, Correlations of axes

Energy value for each axis (x, y, and z), Average Energy over 3 axes

- -

Ellis, et al. [24] Accelerometers on hip and wrist

Vector magnitude (VM), mean, standard deviation, coefficient of variation, minimum, maximum, 25th, 50th and 75th percentile, 1-s lag autocorrelation, the correlation between each axis

Dominant frequency, entropy - -


Ellis, et al. [99]

Accelerometers on wrist and hip, and heart-rate

Average, Standard deviation, Coefficient of variation, Minimum and maximum, 25th and 75th percentiles, Lag 1-second autocorrelation, Third and fourth moments, Skewness and kurtosis, correlation between axes, average and standard deviation of roll, pitch and yaw, principal direction of motion

Dominant frequency and power at dominant frequency, Total energy and entropy, FFT coefficients,

- -

Freedson, et al. [100]

Single accelerometer on hip

10th, 25th, 50th, 75th, and 90th percentiles of the second-by-second accelerometer counts) and the temporal dynamics (lag-1 autocorrelation)

- - -

Fullerton, et al. [101]

Nine accelerometers on left and right ankle, left and right hip, left and right wrist, left and right upper arm and Spine

Mean, standard deviation, root mean square, peak count, peak amplitude

Spectral energy, spectral power, and signal magnitude area

- -

Gyllensten and Bonomi [102]

Accelerometer on waist

Mean, standard deviation, kurtosis, skewness, range, cross-axis correlation, accelerometer angle

Spectral energy in sub-bands (0-10 Hz in bands of 1.25 Hz), spectral entropy, peak frequencies and cross-spectral densities in sub-bands

- -

Kujala, et al. [39]

Heart rate monitor - - -

Heart rate and its variability, respiration rate, and on/off response information

Lin, et al. [91]

Accelerometer on wrist, waist and ankle, and chest strapped heart rate monitor

Count, mean signal magnitude area (SMA), standard deviation of SMA and median of SMA.

- -

Heart rate features - Mean, standard deviation, variance, interquartile range, skew, kurtosis, mean of HR difference series, standard deviation of HR difference series, and variance of HR difference series.


Mackintosh, et al. [103]

Nine accelerometers on chest, left and right wrists, hips, knees, and ankles

Mean and variance of the three accelerometer axes

- - -

Montoye, et al. [48]

Four accelerometers on hip, right thigh, and both wrists

Mean, standard deviation, minimum, maximum, covariance of adjacent windows of data, and the 10th, 25th, 50th, 75th, and 90th percentiles of the raw acceleration signal on each axis

- - -

Montoye, et al. [45]

Wireless network of three accelerometers on the right wrist, thigh, and ankle, and a hip-mounted accelerometer

Mean, standard deviation. - -

Demographic Features - weight and height.

Nyan, et al. [95]

Accelerometer on shoulder - -

Addition of 'Sum of the squared detail coefficients at levels 4' and 'Sum of the squared detail coefficients at levels 5' for each accelerometer component

-

Pavey, et al. [104]

Accelerometer on non-dominant wrist

Mean, SD, 10th, 25th, 50th,75th, 90th percentile, MAD, lag one autocorrelation,

Signal power, dominant frequency 0.25–3.0 Hz, dominant frequency magnitude 0.25–3.0 Hz, entropy 0.25–3.0 Hz

- -

Tamura, et al. [105]

Accelerometer on waist - -

Sum of the squared detail coefficients at levels 4, and level 5 for each accelerometer component

-

Tapia, et al. [106]

Five tri-axial wireless wearable accelerometers and a chest strapped heart-rate monitor

Area under curve (AUC), variance, mean, mean distances between axes, correlation coefficients

Entropy, FFT peaks and energy -

Heart rate feature – number of heart beats above the resting HR value (BPM-RHR)


Trost, et al. [30] Accelerometer on wrist and hip

Mean, standard deviation, coefficient of variation, percentiles (10th, 25th, 50th, 75th,90th), lag one autocorrelation, skewness, kurtosis, log energy, peak intensity, zero crossings, and cross-axis correlation

Signal power - -

Trost, et al. [47] Accelerometers on right hip

10th, 25th, 50th, 75th, and 90th percentiles and the lag one autocorrelation

- - -

2.3.4 Feature Selection

Feature selection techniques attempt to find a subset of n features out of m

features (m > n) so that the prediction performance is enhanced.

Filter based feature selection methods are fast and simple, where the features are

evaluated independently. In general, it calculates a goodness measure (e.g. correlation,

t-score, etc.) for each of the features, and ranks them. It then selects the best m features,

or features which have goodness measure values higher than a threshold. Popular

filter-based feature selection algorithms used in physical activity research include

correlation-based feature selection, T-score based feature selection, ReliefF, and

Minimum Redundancy Maximum Relevance (MRMR) [21, 107-109].

Unlike the filter-based methods, wrapper-based feature selection methods

evaluate subsets of features to detect the interaction between the features. Lin, et al.

[91] employed two wrapper-based feature selection methods: sequential forward

selection (SFS) and sequential backward selection (SBS) in their study. SFS starts with

an empty set and adds features consecutively, which are sent to the criterion function

to examine its effectiveness. At each step, it adds a feature if the its inclusion with

already selected feature subset results in improved performance. SBS has the opposite

search direction to that of SFS. SBS starts with the complete set of features, and then

iteratively deletes features based on the combined performance.

Some filter-based feature selection methods select redundant features which

cannot provide additional information to the learning algorithm. Wrapper-based


methods, on the other hand, are complex and take significant amounts of time when

the number of features is large. Few studies have compared feature selection methods

in the physical activity domain. For example, Tulum, et al. [108] compared the

performance two filter-based (ReliefF and T-score) feature selection methods on the

physical activity data. They reported the best test accuracy (97.6%) using features

selected by ReliefF, which was around 2% higher than T-score. Conversely, Zhang

and Sawchuk [110], found that a wrapper feature selection method (SFS) provided

higher physical activity classification accuracy when compared to a filter-based

method (ReliefF).

2.3.5 Learning Algorithms

The derived features are used as inputs to a learning algorithm. In previous

studies, algorithms employed range from simple threshold-based methods to complex

machine learning methods such as artificial neural networks, support vector machines,

hidden Markov models, etc. in physical activity research. [53, 56, 111, 112].

Threshold Based Methods

Threshold based methods use predefined thresholds or cut-points to classify

sensor output into classes of physical activity intensity. The most common approach

thus far has been to develop a simple regression equation that defines the relationship

between sensor output and energy expenditure. Once the regression equation has been

developed, thresholds or “cut-points” denoting the dividing line between sedentary-

and-light (1.5 METs), light-and-moderate (3 METs), and moderate-and-vigorous

physical activity (6 METs) are identified. Receiver operator characteristic (ROC)

curve is another widely used technique, which determines cut-point thresholds by

assessing levels of sensitivity (true positives) and specificity (true negatives) for

intensity categories. These cut-points are then used to estimate the amount of time

spent in sedentary, light, moderate, and vigorous activity [113, 114].

Although the application of cut-points continues to be standard research practice

in the movement sciences, there is growing recognition that the relationship between

accelerometer output and energy expenditure is highly activity dependent, and that a

single regression equation cannot accurately determine energy expenditure across a

wide range of activities. Validation studies involving independent samples indicate


that regression-based cut-point approaches misclassify the true intensity of physical

activity 35% to 45% of the time [47].

Machine Learning Methods

Machine learning based classification or regression is a viable and more accurate

alternative to threshold-based methods. [24, 45]. To date, a range of learning

algorithms, including supervised, unsupervised and a combination of learning

algorithms have been employed in the physical activity domain [23]. Table 2.2

summarises the studies in the exercise and movement science implementing machine

learning methods for classification and regression.

Among the supervised algorithms, artificial neural networks [47], SVM [28, 50],

random forest [104], k-nearest neighbour [115], and decision tree [90] are widely used

and reported high physical activity recognition performance using a single

accelerometer. For example, Trost, et al. [47] developed an artificial neural network

model for predicting physical activity type and energy expenditure. They found high

(88.4%) classification accuracy for physical activity type prediction. Ellis, et al. [99]

developed a random forest classifier for physical activity recognition and achieved

92.7% and 87.5% average overall accuracy for the hip and wrist accelerometer,

respectively. In their later study [24], they developed a 2-step activity recognition

model by including a hidden Markov model with random forest. They reported a

balanced accuracy of 88.1% and 83.6% for the hip and wrist, respectively.

There are some studies that compared the performance of different classification

algorithms for physical activity recognition. For example, Reiss and Stricker [116]

used decision tree classifiers which worked best among some base- and meta-level

classification techniques. Bao and Intille [90] applied decision tree, k-nearest

neighbour and naïve Bayes classification to identify 20 physical activities. They

reported high accuracy for both decision tree (84%) and k-nearest neighbour (83%).

Maurer, et al. [117] reported similar performances for decision tree, naïve Bayes, and

KNN in wrist accelerometer data for 6 activities including sitting, standing, walking,

ascending stairs, descending stairs, and running. In a study by Gyllensten and Bonomi

[102], authors reported higher accuracy for SVM model (lab – 95.1%, daily living –

75.6%) compared to neural network (lab – 91.4%, daily living – 74.8%), and decision

tree (lab – 92.2%, daily living – 72.2%) models using waist accelerometer data in both

laboratory and daily living settings. Ermes, et al. [118] used hip and wrist


accelerometers where they reported at least 4% higher classification rate using neural

network (87%) than hierarchical (83%) and decision tree (60%).

To date, researchers have developed several regression equations between

accelerometer count and assessment of physical activity to estimate physical activity

related energy expenditure and absolute intensity [33]. Most studies were conducted

in laboratory settings where correlation values between activity count and energy

expenditure ranged from 0.58 to 0.92 during various activities [119]. Apart from linear

regression, a few neural network [44, 47, 100] based regression methods such as Radial

Basis Function Network (RBFN) and Generalised Regression Neural Network

(GRNN) were also used by researchers which performed better than the linear

regression [91]. Trost, et al. [47] reported 30-40% lower RMSE for energy expenditure

prediction using an artificial neural network model compared to conventional

regression-based models. In a recent study, Zhu, et al. [42] successfully adopted deep

learning for energy expenditure prediction for adults. They reported 30-35% lower

RMSE using deep learning compared to existing activity-specific linear regression

model.

Preece, et al. [23] completed a comprehensive review of learning algorithms

used for classification or regression problems in the physical activity domain. They

were unable to declare one particular machine learning technique as universally better

than others. In a recent study, Kate, et al. [56] compared the performance of eight

learning algorithms for both physical activity recognition and energy expenditure

prediction. In their results, they were unable to find a machine learning technique that

worked best in all testing situations. Interestingly, while no single algorithm works

best in isolation, Catal, et al. [112] showed that the combination or fusion of multiple

classification algorithms using majority vote can yield better performance than a single

classifier. The ensemble of multiple classification and models is advantageous as it

usually reduces the chances of overfitting and improves the generalization of the

classification task. However, to date, relatively few studies in the physical activity

domain have evaluated the utility of decision fusion or ensemble methods

incorporating different classification/regression algorithms.


Table 2.2 Overview of some wearable sensor-based works that used machine learning algorithms (ordered by author name)

References (number of subjects)

Application(s) Sensor data and location Learning algorithm(s) Findings

Albinali, et al. [97] (24 subjects)

Physical activity recognition & Energy expenditure estimation

Three miniature wireless accelerometers on dominant hip, thigh and upper arm

Activity detection - C4.5 classifier;

Energy Expenditure estimation – several regression models

Using the combination of physical activity recognition and individually-calibrated regression models can improve (15%) energy expenditure estimation compared to best estimate from the other methods.

Altini, et al. [18] (16 subjects)

Physical activity recognition and Energy Expenditure estimation

Accelerometer and ECG data using an ECG necklace at chest.

Accelerometer, GSR and skin humidity using the wristband sensor.

Activity detection – Support vector

machine

Energy Expenditure estimation – activity-specific multiple linear regression models

The combination of accelerometer and physiological signals improves performance for activity recognition and energy expenditure.

Berchtold, et al.

[98] (20 subjects)

Physical activity

recognition

Mobile phone 3-axis accelerometer Fuzzy classification Physical activity recognition accuracy improved up

to 97%.

Catal, et al. [112] (36 users)


Accelerometer on wrist J48, Logistic regression, multi-layer perceptron, and voting

Ensemble of all classifiers using voting improved physical activity recognition.

Cleland, et al. [28] Physical activity recognition

Accelerometer on lower back, wrist, foot, chest, hip, thigh

J48, naïve Bayes, neural network, SVM

SVM provided highest accuracy. Among single locations, hip gave best performance. Compared to a single accelerometer, combining data from any two locations resulted in a significant improvement in performance. However, combining data from three or more accelerometers provided no further improvements in performance.


Ellis, et al. [24] (40 overweight women)


Accelerometer on hip and wrist Random forest and hidden Markov model

In free living, their model outperformed traditional cut-points. Hip accelerometer (88%) provided higher accuracy than wrist (83%).

Ellis, et al. [99] (42 adults)


Accelerometers on wrist and hip, and heart-rate

Random forest Heart rate improved energy expenditure prediction but did not significantly improve physical activity recognition. Hip sensor gave better accuracy than wrist.

Freedson, et al. [100] (277 subjects)

Activity type and Energy Expenditure (MET) prediction

Single hip-mounted accelerometer Artificial neural network (ANN) ANN improved energy expenditure prediction compared to other regression models. Household and locomotion activities had high (98%, 89% respectively) and sports had low (23.7%) classification rate.

Fullerton, et al. [101] (10 subjects)


Nine accelerometers on left and right ankle, left and right hip, left and right wrist, left and right upper arm and Spine

Decision tree, SVM, kNN, bagged trees

A fine kNN classification method that used mean and standard deviation features provided best prediction accuracy in free living.

Gyllensten and Bonomi [102] (20 subjects)

Activity type recognition Waist worn accelerometer Decision trees, feed-forward neural networks (NN), support vector machines (SVM), and decision tree

In daily life settings, laboratory trained model provided around 20% lower performance. Among the classifiers, SVM provided higher classification than other algorithms.

Kate, et al. [56]146 subjects


Hip mounted accelerometer SVM, ANN, Logistic regression, decision trees, bagged decision trees, random forest, naïve Bayes, kNN, and linear regression.

Physical activity recognition improved with the increase of features. All learning algorithms gave competitive performance. Activity specific energy expenditure model showed better results than single model.


Lin, et al. [91] (26 subjects)

Energy Expenditure estimation

Wrist, waist and ankle worn accelerometer and chest strapped heart rate monitor

Activity classification – decision

tree

Energy Expenditure estimation - Radial Basis Function Network, Generalised Regression Neural Network

Generalised Regression Neural Network provided higher R2 for Energy Expenditure prediction than Radial Basis Function Network.

Luštrek, et al. [78] Energy Expenditure estimation

Accelerometers on chest, and wrist Energy Expenditure estimation - Linear Regression, MultiLayer Perceptron artificial neural network, Support Vector Regression (SVR), M5P model trees, M5Rules and REPTree regression trees.

The composite classifier, consist of two activity-specific classifiers and a general classifier provided substantial improvement over the single-classifier approach.

Mackintosh, et al. [103] (27 children)


Nine accelerometers on chest, left and right wrists, hips, knees, and ankles

Artificial neural networks All single, 2, 3, and 4 accelerometer combinations had similar performance, which was better than the combination of 9-accelerometers

Montoye, et al. [48] (44 adults)


Four accelerometers on hip, right thigh, and both wrists

Linear regression, linear mixed, and ANN models

ANN showed significant improvement over linear models for wrist accelerometer. However, for hip and thigh locations, linear models provided similar performance to ANN.


Activity type recognition Accelerometers on hip, wrists, and thigh Artificial neural networks Wrist accelerometer provided highest and hip provided lowest accuracy.



Wireless network of three accelerometers worn on the right wrist, thigh, and ankle, and a hip-mounted accelerometer

Artificial neural networks Wireless network only marginally improved Energy Expenditure estimation than a hip accelerometer.


Pavey, et al. [104] (21 Subjects)


Accelerometer on non-dominant wrist Random forest In lab, high physical activity classification accuracy was obtained, and in free living, method was less accurate for identifying stepping and non-steeping activities.

Tapia et al. [106] (21 subjects)

Physical activity and absolute intensity

Five tri-axial wireless wearable accelerometers and a chest strapped heart-rate monitor

C4.5 decision tree Subject-dependent training provided 94.6% and subject-independent training provided only 56.3% accuracy. Heart rate with accelerometer did not improve much.

Trost, et al. [30] (52 children)


Accelerometer on wrist and hip Regularised logistic regression Both hip and wrist provided comparable performance.

Trost, et al. [47] (100 youth participants)


Accelerometers on the right hip Artificial neural networks ANN improved performance than conventional regression-based approaches.


2.3.6 Effect of Sensor Number, Positioning, and Combination on the Performance

Accelerometer sensors are commonly placed on the participant’s chest, hip,

thigh, or wrist. The hip is closest to the centre of mass of a human body; as a result, an

accelerometer placed on the hip location can capture most of human motion. A range

of basic daily activities, including walking, postures and activity transitions can be

classified according to the accelerations measured from a hip-worn accelerometer

[121]. In some studies, the hip location provided better performance than other

locations (e.g., wrist) for physical activity type and energy expenditure prediction [24].

On the other hand, the wrist placement is convenient for users in everyday life usages

and has become a popular placement for commercial activity trackers. In a recent

study, Montoye, et al. [120] compared the hip, thigh and wrist location for physical

activity recognition. They reported the wrist to be the best location with the hip as the

worst. Trost, et al. [30] found comparable performance for both hip and wrist locations.

A number of studies have shown that the effective combination/fusion of outputs

from multiple accelerometers placed at different body locations can improve both

physical activity and energy expenditure prediction accuracy compared to the use of a

single accelerometer [28]. Altini, et al. [92] compared the sensor numbers and

locations for both activity recognition and energy expenditure estimation. There were

no statistical differences in energy expenditure estimation error between the single

accelerometer and a combination of five accelerometers. Bao and Intille [90]

developed a model for accelerometers placed on the upper-arm, lower-arm, hip, thigh,

and ankle. When the outputs from all sensors were fused using feature fusion, the

physical activity classification accuracy was 84%, which was 3% higher than the

accuracy achieved by the combination of thigh and wrist accelerometers. Cleland, et

al. [28] conducted a comprehensive study on accelerometer placements where they

combined accelerometer data from six body locations (lower back, wrist, foot, chest,

hip, and thigh) using simple feature fusion. Among the single locations, they found the

hip to be the best location for physical activity recognition. When they combined the

outputs from any two accelerometer locations, the performance improved significantly

compared to the single accelerometer. However, the combination of three or more

accelerometers did not improve performance.


The results of previous studies indicate that sensor location and number are

important methodological considerations in physical activity research. Although a

number of studies have been conducted on this topic, the findings are inconsistent. A

single accelerometer location does not perform equally for all activities, and the overall

performance depends on the target activity [26]. When combining sensor data from

multiple locations, either feature-level or decision-level fusion can be applied [31].

However, so far, most studies have adopted a simple feature-level fusion approach. An

advanced fusion algorithm using decision fusion has yet to be explored in physical

activity domain.


2.4 SUMMARY OF CURRENT GAPS

As identified in this chapter, the key gaps are provided below:

1. Although most wearable sensor studies have used a diverse range of learning

algorithms, their results are not consistent. An ensemble of several learning

algorithms may effectively leverage the advantages of each learning algorithm

and provide better overall classification accuracy. However, studies

implementing and comparing the performance of an advanced ensemble of

learning algorithms are lacking.

2. Combining data from multiple accelerometers, placed at different body

locations, can improve the physical activity recognition. To date, most studies

are based on feature fusion. Advanced decision fusion methods may combine

sensors more effectively which warrant investigation.

3. Relative physical activity intensity may manifest through a person’s

physiological responses such as heart-rate, RR interval, and electrodermal

activity. However, methods to combine multimodal sensor data and predict

relative intensity are lacking.

4. Advanced learning algorithms such as deep learning can automatically extract

hidden patterns from the data, which may provide improved energy expenditure

prediction performance. This needs to be investigated and compared with current

state-of-the-art methods.

In the following chapters, novel methods to address these problems are

presented. Each chapter includes a more comprehensive background related to the

specific problem.

PART I - Classification of Physical Activities

Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry 41

Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry

Alok Kumar Chowdhury1

Dian Tjondronegoro2

Vinod Chandran1

Stewart G. Trost3

1. Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia.

2. School of Business and Tourism, Southern Cross University, Gold Coast, Australia.

3. Institute of Health and Biomedical Innovation at QLD Centre for Children’s Health Research, School of Exercise and Nutrition Sciences, Queensland University

of Technology, Brisbane, Australia.

Corresponding Author:

Professor Stewart G. Trost, PhD

Institute of Health and Biomedical Innovation at QLD Centre for Children’s Health Research

Level 6, 62 Graham Street

South Brisbane, QLD 4101

Australia

Phone: +61 7 3069 7301

Fax: + 61 7 3138 3980

Email: [email protected]

42 Chapter 3: Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry



QUT Verified

Signature


3.1 ABSTRACT

Purpose: To investigate whether the use of ensemble learning algorithms

improve physical activity recognition accuracy compared to the single classifier

algorithms, and to compare the classification accuracy achieved by three conventional

ensemble machine learning methods (bagging, boosting, random forest) and a custom

ensemble model comprising four algorithms commonly used for activity recognition

(binary decision tree, k nearest neighbour, support vector machine, and neural

network). Methods: The study utilised three independent datasets that included wrist-

worn accelerometer data. For each dataset, a four-step classification framework

consisting of data pre-processing, feature extraction, normalisation and feature

selection, and classifier training and testing was implemented. For the custom

ensemble, decisions from the single classifiers were aggregated using three decision

fusion methods: weighted majority vote, naïve Bayes combination, and behaviour

knowledge space combination. Classifiers were cross-validated using leave-one

subject out cross-validation and compared on the basis of average F1 scores. Results:

In all three datasets, ensemble learning methods consistently outperformed the

individual classifiers. Among the conventional ensemble methods, random forest

models provided consistently high activity recognition; however, the custom ensemble

model using weighted majority voting demonstrated the highest classification

accuracy in two of the three datasets. Conclusion: Combining multiple individual

classifiers using conventional or custom ensemble learning methods can improve

activity recognition accuracy from wrist-worn accelerometer data.

Keywords: Motion sensors, machine learning, pattern recognition, random

forest, bagging, boosted decision trees.


3.2 INTRODUCTION

Physical inactivity is recognised as a critical population health risk factor, and a

significant contributor to the direct and indirect health care costs associated with

management of a wide range of chronic health conditions [122]. In addition, there is a

growing body of evidence to suggest that, sedentary behaviour, characterised by

prolonged bouts of sitting, is associated with serious health conditions, independent of

the effects of physical activity [123, 124]. Hence, valid and reliable measures of

physical activity and sedentary behaviour are a necessity in studies designed to: 1)

document the frequency and distribution of physical activity and sedentary behaviour

in defined population groups; 2) identify the psychosocial and environmental factors

that influence physical activity and sedentary behaviour; and 3) evaluate the efficacy

or effectiveness of programs and policies to increase habitual physical activity and

reduce sedentary behaviour [114].

Accelerometer-based motion sensors have become the method of choice for

measuring physical activity and sedentary time in free-living contexts, because they

are small, robust and low cost [81, 125]. However, differences in accelerometer data

processing methods have hindered research efforts to quantify, understand and

intervene on physical activity and sedentary behaviour. Existing approaches can be

categorised into two groups: 1) threshold-based and 2) machine learning approaches.

Threshold or “cut-point” methods use regression methods to map accelerometer

outputs to energy expenditure [81, 113, 125]. Machine learning approaches extract

features or patterns from the acceleration data and use supervised or unsupervised

learning algorithms to predict physical activity type and/or energy expenditure [25, 44,

120, 126]. Relative to cut-point methods, machine learning approaches can provide a

greater variety of physical activity metrics (e.g., activity type, walking speed) and more

accurate predictions of energy cost [24, 25, 30]. Nevertheless, the adoption of machine

learning methods in physical activity studies has been low because they are not as

easily implemented as threshold based methods.

To date, a range of machine learning algorithms have been used in the physical

activity (PA) classification and measurement domain. They include nearest neighbour

(kNN), artificial neural networks (ANN), support vector machines (SVM), Markov

models, decision trees etc. Preece et al. [23] summarised the relative strengths,

weaknesses, and performance characteristics of 11 different machine learning


approaches. The authors concluded that it was impossible to declare one particular

machine learning technique as universally better than others for any given PA

recognition problem. Most recently, Kate and colleagues [56] compared the accuracy

of eight different machine learning techniques for activity recognition and energy cost

estimation from accelerometer data. Their results indicated that no single machine

learning technique works best in all testing situations.

Because there is no single, optimal machine learning algorithm for any given

classification or estimation problem, ensemble learning approaches are gaining

popularity [112]. An ensemble of classifiers is a set of base level classifiers (known as

weak learners) whose individual decisions are combined to improve overall decision

accuracy. In this respect, an ensemble of machine learning models can be

conceptualised as a committee of experts brought together to make a final decision. If

the weak learners are combined appropriately the fusion of outputs is constructive

leading to better overall decisions and generalisation.

Three commonly used ensemble learning schemes are bagging, boosting, and

random forests. Bagging stands for bootstrap aggregation. It involves taking multiple

random samples of training instances (with replacement) and applying a weak learning

algorithm (typically a decision tree) to the data. The decisions of each classifier are

combined to make a final class prediction using the majority vote rule [127]. Boosting

also applies a voting procedure to combine the decisions of multiple weak learners.

However, boosting adopts an iterative approach in which each new model is influenced

by the performance of previously built models. The boosting algorithm begins by

assigning equal weights to all instances in the training data. It then builds a classifier

(typically a decision tree) and instances are reweighted based on the classifiers

performance on the training data. The weights of correctly classified instances are

decreased, while the weights of misclassified instances are increased. This weighting

scheme allows subsequent classifiers to be more proficient at classifying instances

misclassified by earlier models. The final class prediction is based on the weighted

majority vote of each model, where the weights are determined by the accuracy of the

model [128]. Random Forests are another widely used ensemble learning method. The

random forests algorithm is similar to bagging in that multiple weak learners (decision

trees) are trained on randomly sampled instances from the training data. However,

unlike bagging, where all features in the training data are considered for splitting a


node, the random forest algorithm selects the best among a random sample of features.

The decisions generated by each tree are recorded and the final class prediction is

based on majority vote [129].

While bagging and boosting combine base learners of the same type, it is

possible to construct custom ensembles featuring learning algorithms of different

types. The decisions of each classifier are subsequently combined using an established

decision fusion method such as weighted majority voting, naïve Bayes, behaviour

knowledge space etc. [31]. Unlike conventional ensembles, custom ensemble methods

achieve diversity by using heterogeneous classification algorithms as base classifiers,

which may lead to better generalised performance [130].

Although ensemble learning methods are starting to emerge in physical activity

research, no previous studies in the exercise and movement sciences have compared

the performance of different ensemble methods and decision fusion rules. Therefore,

the purpose of this study was to systematically compare the classification accuracy

achieved by conventional ensemble methods (bagged decision tree, boosted decision

tree, and random forest) and a custom multi-classifier ensemble combining four

machine learning algorithms (binary decision tree, KNN, SVM and neural network)

using three decision fusion rules (weighted majority voting, naïve Bayes, and

behaviour knowledge space). Performance was evaluated in three independent

physical activity recognition datasets.


3.3 METHODS

3.3.1 Datasets

This study used three independent accelerometer datasets collected from

different participant groups (adults and children), performing different activity in

different contexts (lab and outdoor). A brief description of each dataset is provided in

Table 3.1.

Dataset #1. The PAMAP2 dataset is a fully annotated, publicly available

physical activity monitoring dataset. The data was downloaded from the UCI machine

learning repository

https://archive.ics.uci.edu/ml/datasets/PAMAP2+Physical+Activity+Monitoring.

Detailed information about the study can be found elsewhere [87, 131]. Nine

participants (1 female, 8 male, age: 27.2 ± 3.3 years, and BMI: 25.1 ± 2.6 kg/m2)

performed twelve different types of physical activities. Participants wore three Colibri

wireless IMUs (Inertial Measurement Units) on their dominant-arm wrist, dominant-

side ankle and chest. Each IMU contained two three-dimensional (3D) acceleration

sensor (scale: ±6 g and ±16 g) with a resolution of 13 bits, a 3D gyroscope sensor, a

3D magnetometer sensor, temperature, orientation and heart-rate monitor sensors. The

sampling rate of accelerometer and heart-rate sensor was 100 Hz and ~ 9 Hz

respectively. Only 3D accelerometer (±16 g) data of wrist accelerometer was used in

this study. The physical activities included in the datasets were: lying down, sitting,

standing, walking, running, cycling, ascending stairs, descending stairs, Nordic

walking, vacuum cleaning, ironing clothes and jumping rope. For the purposes of this

study, the first eight basic activity classes were selected for evaluation because these

activity classes were widely used in past studies.

Dataset #2. The second dataset comprised wrist accelerometer data collected on

eight individuals (mean age = 29.9 ± 4.2 y, 50% male, mean BMI = 22.8 ± 1.9 kg/m2)

during an outdoor physical activity session in a park. The data collection protocol

included the activities in the following order: stationary activity (sit or stand still) for

5 min, self-paced comfortable walk for 5 min, self-paced brisk walk for 5 min, jogging

for 5 min, and fast-run for 2 min. In between each activity, participants rested for 5 to

15 min. During each trial, motion and heart rate were recorded using Empatica E4

monitor. The Empatica E4 (Empatica Inc., Boston, USA), a light-weight (25 grams)

wrist-watch, was placed on participant’s non-dominant-wrist to record 3D acceleration

https://archive.ics.uci.edu/ml/datasets/PAMAP2+Physical+Activity+Monitoring


(±2 g), heart rate, electrodermal activity, and temperature. This study utilised only the

3D acceleration data. The sampling rate of the acceleration data was 32 Hz.

Dataset #3. The third accelerometer dataset was collected from 17 children (9

boys, 8 girls, age: 14.6 ± 2.4 years, BMI percentile: 66.8 ± 25.9) [30, 81]. A total of 12

activity trials were performed over two laboratory visits. On visit 1, participants

completed the following 6 trials: lying down, handwriting, laundry task, throw and

catch, comfortable over-ground walk, and aerobic dance. On the 2nd visit, the following

6 trials were completed: seated computer game, floor sweeping, brisk over-ground

walk, basketball, over-ground run/jog, and brisk treadmill walk. The duration of each

trial was 5 minutes. Based on the movement pattern, activities were categorised into

seven categories: lying down, sitting (handwriting, computer game) standing with

upper body movements (throw and catch, laundry task, floor sweeping), walking

(comfortable over-ground walk, brisk over-ground walk, brisk treadmill walk),

running, basketball, and dance. Further details of the activities trials can be found in

[81]. During the trials, participants wore an ActiGraph GT3X+ tri-axial accelerometer

(ActiGraph Corporation, Pensacola, FL) on the right hip and non-dominant wrist. The

sampling rate was set to 30 Hz. In this study, only wrist-worn acceleration data were

used.


Table 3.1 Comparison across three datasets

Dataset #1 Dataset #2 Dataset #3

Data Collection Environment

PAMAP2 is a Public dataset collected in lab.

Private dataset collected in outdoor

Private dataset collected in lab

Participants 9 Adult participants (1 female, 8 male), Age: 27.2 ± 3.3 years, BMI: 25.1 ± 2.6 kg/m2

8 Adult participants (4 female and 4 male), Age: 29.9 ± 4.2 years, BMI: 22.8 ± 1.9 kg/m2

17 Children (9 boys, 8 girls), Age: 14.6 ± 2.4 years BMI Percentile: 66.8 ± 25.9

Sensors and Placements

Colibri wireless IMU contains 3D accelerometer, 3D gyroscope sensor, 3D magnetometer sensor, temperature, orientation and heart-rate monitor sensors Placements: Dominant-arm wrist, dominant-side ankle and chest

Empatica E4 contains 3D accelerometer, electrodermal activity, heart-rate and temperature sensors Placements: Non-dominant wrist

ActiGraph GT3X+ tri-axial accelerometer Placements: Right hip and non-dominant wrist

Accelerometer Sensor specifics

Scale: ± 6g and ± 16g Sampling rate: 100 Hz

Scale: ± 2g Sampling rate: 32 Hz

Scale: ± 6g Sampling rate: 30 Hz

Physical activities performed

Lying down, sitting, standing, walking, running, cycling, ascending stairs, descending stairs, Nordic walking, vacuum cleaning, ironing clothes and jumping rope

sit or stand still, self-paced comfortable walk, self-paced brisk walk, jogging, and fast-run

Lying down, sitting (handwriting, computer game), standing with upper body movements (throw and catch, laundry task, floor sweeping), walking (comfortable over-ground walk, brisk over-ground walk, brisk treadmill walk), running, basketball, and dance

3.3.2 Classification Framework

The four steps of the classification framework, shown in Figure 3.1, were data

pre-processing, feature extraction, normalisation and feature selection, and activity

classification. In the classification step, both conventional and custom ensemble

methods were implemented. In the custom ensemble method, decisions from four

“state-of-the-art” single classifiers, including binary decision tree (BDT), k nearest


neighbour (kNN), support vector machine (SVM) and artificial neural network (ANN),

were aggregated together using three decision fusion techniques, namely weighted

majority voting (WMV), naïve Bayes combiner (NB), and behaviour knowledge space

combiner (BKS). Among the conventional ensemble methods, boosted decision trees,

bagged decision trees and random forest were investigated. All steps of the framework

were implemented using Matlab (The MathWorks Inc., USA). Example features and

code for using the proposed framework can be found in the following link:

https://github.com/alokchy04/Decision-Fused-Ensembles-for-PA-Classification-

from-Wrist-Worn-Accelerometer.

Figure 3.1 Flow diagram of the proposed framework

Pre-processing. In the pre-processing step, the accelerometer data was annotated

with activity labels and converted to time-series data structure. If the dataset contained

missing accelerometer data, linear interpolation was used to find the intermediate

missing values in the data. Missing values at the end of each labelled activity were

replaced by previous value. In addition, 10 s of data at the beginning and end of each

labelled activity was discarded from analysis to remove non-steady-state data.

Feature Extraction. A range of time- and frequency- domain features were

extracted from 10 s sliding window with 50% overlapping. A 10 s window was chosen

as this period is sufficient to capture multiple periodic movements for all activities

https://github.com/alokchy04/Decision-Fused-Ensembles-for-PA-Classification-from-Wrist-Worn-Accelerometer



[30]. In total, 45 features were extracted from each accelerometer. Consistent with

previous studies, mean, standard deviation, minimum, maximum, variance, median,

skewness, 25th and 75th percentile, and kurtosis were extracted from each axis of a 3-

axis accelerometer [118, 132]. In addition to these simple time-domain features,

frequency domain features including spectral energy, dominant frequency, dominant

frequency magnitude, zero crossings, and cross-axis correlations were calculated.

Spectral energy was calculated by summing the squared discrete FFT component

magnitudes of the signal [90]. Spectral energy was normalised by dividing it by

window length. The frequency with highest FFT magnitude was considered as

principle frequency [51]. Zero-crossing for each accelerometer axis represented the

number of times the signal changed sign, and accelerometer axis cross-correlations

(corrxy, corrxz, corryz) [44, 90] were calculated and included in the feature list. The

detailed description of the features is provided in Appendix A.

Normalisation and Feature Selection. Normalisation of the features before

classification is useful when the feature values vary in different dynamic ranges. In

this study, the training features were normalised to a zero mean and unit variance by

subtracting the corresponding mean and dividing by the standard deviation. Features

in the testing data were normalised using the same approach using the training data

means and standard deviations. Feature selection is another important step necessary

to improve time and space complexity of the classification algorithms. A correlation-

based feature selection method [133] was applied on the training data to select features

for classification. Features with a correlation ≥ 0.25 coefficient with the activity

classes were selected as inputs to the classifiers. A list of features selected for inclusion

in each training dataset is provided in Appendix B.

3.3.3 Conventional Ensemble Methods

The performance of three standard ensemble methods were evaluated - bagged

decision trees, random forests and boosted decision trees. ‘Treebagger’ classification

class of Matlab was used as the bagging decision tree and random forest

implementation. The number of decision trees in the ensemble was empirically set to

20 as it provided optimum performance. While the bagging decision tree considered

all features for splitting a node, random forest implementation used the number of

features to sample equal to the square root of the total number of features available.


For the boosting decision tree implementation, Adaboost.M2 multi-class classification

method with 100 learning cycle and ‘Discriminant’ weak learners was chosen.

3.3.4 Custom Ensemble Methods

Heterogeneity in the decisions of multiple single classification algorithms (base-

classifier) on the same dataset can be utilised to improve classification performance in

physical activity recognition problems [31, 112]. When each base classifier has good

individual performance and also sufficient diversity (due to having different

algorithms), fusion will significantly improve performance. This study employed four

well-known, widely-used supervised learning algorithms of different complexity

(binary decision tree, k nearest neighbour, support vector machine, and artificial neural

network) as base classifiers, which were fused together using three decision-fusion

techniques (weighted majority voting, naïve Bayes, and behaviour knowledge space).

Detailed information related to the implementation of the single classifier models can

be found in Appendix C.

3.3.5 Decision Fusion Techniques

In a N-classifier ensemble, let the classifier set and set of classes are E = {E1, E2,

…, EN) and C = {c1,c2,…,cm) respectively. Each classifier Ei produces a class label li

∈ C, i = 1,…,m without any further information. When classifying an object x, the N

classifier outputs a vector L = [l1, l2, …, lN]. Then, decision fusion techniques combine

the classifier’s output and provide a single class label. The current study evaluated the

performance of three different decision fusion techniques including weighted majority

voting, naïve Bayes combination, and behaviour knowledge space combination [31,

134, 135].

Weighted Majority Voting (WMV). The weighted majority vote is one of the most

widely used decision fusion combiners, often useful when all classifiers in the

ensemble do not have equal performance. This approach measures the individual

accuracy of each classifier on the training data and uses these as weights,

W={w1,w2,…,wN), to give the more competent classifiers more authority in making the

final decision. Then, when predicting for an object x, for all predicted class labels it

calculates the score using following equation 1,

𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑘𝑘) = � 𝑤𝑤𝑖𝑖 𝑘𝑘 = 1,2, … ,𝑚𝑚 𝑙𝑙𝑖𝑖=𝑐𝑐𝑘𝑘

(1)


Finally, it selects the class label which has maximum score.

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑙𝑙𝑠𝑠𝑓𝑓 = arg max𝑘𝑘=1𝑚𝑚 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑘𝑘) (2)

Naïve Bayes (NB) Combination. This fusion method assumes the classifiers are

mutually independent. For each classifier Ei, a m×m confusion matrix CMi is

calculated by applying it to the training data set. Let, T is the total number of objects

in training data, where the number of objects in each class is denoted by T1, T2, …, Tm.

During testing for an object x, this method calculates posterior probability for all

predicted class labels using following formula.

𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑘𝑘) = 𝑇𝑇𝑘𝑘𝑇𝑇

�𝑠𝑠𝑚𝑚𝑖𝑖(𝑘𝑘, 𝑓𝑓𝑖𝑖)𝑁𝑁

𝑖𝑖=1

𝑘𝑘 = 1,2, … ,𝑚𝑚 (3)

Where 𝑠𝑠𝑚𝑚𝑖𝑖(𝑘𝑘, 𝑓𝑓) is the number of elements of the training dataset whose true

class label was ck and were assigned by Ei to class cl. Finally, naïve Bayes combiner

assigns the class label with maximum score to object x using equation 2.

Behaviour Knowledge Space (BKS) Combination. Unlike most fusion

methods, behaviour knowledge space (BKS) [136] does not require an assumption of

independence of the decisions of individual classifiers. The accuracy of the BKS

combiner is very high when dataset is large, but on small datasets BKS often over-

trains. It creates a knowledge space using a lookup table based on the classification of

training data. The look up table provides information on how often each labelling

combination is produced by the classifiers. When testing for an object x (window of

test data), it looks for a combination of predicted class labels in the look up table and

selects the most frequent true label corresponding to that combination as a final result.

The challenge with this fusion technique comes when the testing data evokes label

combinations that do not appear in the look up table. To address this problem, weighted

majority voting was used for combinations of labels not in the look up table.

3.3.6 Performance Evaluation

The performance was evaluated using leave-one-subject-out (LOSO) cross-

validation [137]. In LOSO, data from one user are used for testing; the other users’

samples are used for training. In this way, samples of each subject are used exactly

once for testing. This study used F1-score [138] to measure the performance of the

ensemble learning methods. The study favoured F1-score over classification accuracy


because unlike accuracy or percentage of agreement, it is not influenced by class

distribution. The F1-score was computed from precision and recall by keeping a

balance between them.

F1 Score =2 x precision x recall

precision + recall x 100 % (4)

Where, precision describes the exactness of a classifier. A lower value of

precision indicates a high false positive rate. Recall or sensitivity is useful to measure

the completeness of classifiers. Low recall indicates a high false negative rate.


3.4 RESULTS

3.4.1 Dataset #1 Results

Table 3.2 reports F1-Scores of the four single classifiers, the custom ensemble,

and the three conventional ensemble methods for all activities in dataset #1. In this

dataset, the custom ensembles using WMV and NB fusion were effective and

outperformed all of the single classifiers, but the conventional ensemble methods

failed to exceed all of the single classifiers. Among the four single classifiers, the

performance of SVM was best.

Table 3.2 Classification results (F1-Score) using wrist acceleration sensor of dataset #1

Conventional Ensembles Individual Classifiers Custom Ensembles

Random Forest

Bagging Decision

Tree

Boosted Decision

Tree BDT KNN SVM ANN WMV

Fusion NB

Fusion BKS

Fusion

Lying 80.18 72.76 89.31 73.28 87.36 92.78 91.22 92.69 91.55 86.4

Sitting 76.92 74.34 79.57 70.94 78.97 85.71 82.39 85.5 85.8 79.4

Standing 87.65 82.08 81.6 76.02 84.91 86.04 85.16 88.89 88.7 87.65

Walking 84.5 86.55 88.96 70.07 76.99 87.45 84.34 87.4 84.38 82.02

Running 100 99.12 99.71 85.25 99.71 96.02 94.15 99.12 99.12 99.12

Cycling 95.68 92.93 93.67 92.89 95.62 95.96 86.4 96.61 96.8 96.12

Ascending Stairs 59.89 58.15 53.61 44.44 39.26 48.87 58.51 58.46 57.39 50.87

Descending Stairs 66.67 75.7 56.2 64.26 68.46 72.88 69.6 76.42 79.84 74.49

Average 81.44 80.2 80.33 72.14 78.91 83.22 81.47 85.64 85.45 82.01

Std Dev. 13.63 12.87 16.93 14.41 18.91 15.75 11.78 13.03 13.00 15.00



and the three conventional ensemble methods in dataset #2. In this dataset, the random

forest model was the best ensemble classifier overall, with the custom ensemble with

WMV fusion also providing better recognition accuracy than the four single classifiers.


The random forest model provided better recognition accuracy for all physical

activities, with the exception of stationary activities, for which the bagged decision

tree was best. The custom ensemble with WMV fusion performed well for sitting and

standing, comfortable walking, and fast walking, but failed to outperform BDT for

jogging and running. Of the four single classifiers, BDT was the best performer.



Random Forest

Bagging Decision

Tree

Boosted Decision


Fusion NB

Fusion BKS

Fusion

Stationary (sit and stand) 94.52 95.83 93.41 91.35 90.5 92.24 89.55 94.07 92.58 92.37

Comfortable Walking 70.73 67.17 62.78 65.17 57.08 64.52 58.91 68.63 66.15 63.02

Fast Walking 75.64 67.94 70.75 67.75 64.7 74.36 54.2 74.56 72.96 65.77

Jogging 83.52 82.09 74.03 79.21 72.98 69.94 66.92 75.94 75.1 76.4

Running 73.82 72.9 66.07 73.56 67.47 66.67 55.83 71.22 71.14 70.69

Average 79.65 77.18 73.41 75.41 70.54 73.54 65.08 76.88 75.58 73.65

Std Dev. 9.56 12.00 11.98 10.43 12.54 11.09 14.53 10.02 10.06 11.64



and the three conventional ensemble methods in dataset #3. In this dataset, the custom

ensemble with WMV fusion provided the highest performance, while the random

forest and custom ensemble with NB custom also performed better than the single

classifiers. Compared to the other classifiers, the random forest model exhibited the

best recognition accuracy for lying down, sitting, and walking activities. Of the four

single classifiers, SVM exhibited the highest classification accuracy.




Random Forest

Bagging Decision

Tree

Boosted Decision


Fusion NB

Fusion BKS

Fusion

Lying down 79.46 78.18 74.45 68.17 66.94 76.66 70.97 78.26 75.98 72.13

Sitting+ 92.13 90.04 89.97 85.52 87.09 91.19 89.87 90.63 90.39 86.74

Standing+ 86.69 85.12 82.68 79.34 84.3 86.19 85 87.67 87.83 85.32

Walking 95.39 93.14 93.41 91.95 94.9 94.88 93.14 95.34 95.31 93.5

Running 71.35 64.12 71.48 57.45 66.54 67.16 64.41 73.18 69.93 64.41

Basketball 85.63 84.31 89.38 76.52 87.22 89 89.54 91.16 91.14 86.72

Dance 84.47 79.59 84.58 74.45 78.88 80.9 82.05 85.71 81.74 81.2

Average 85.02 82.07 83.71 76.2 80.84 83.71 82.14 85.99 84.62 81.43

Std Dev 7.95 9.52 8.19 11.28 10.73 9.54 10.67 7.77 9.12 9.94

Classification accuracies and confusion matrices for three datasets can be found

in Appendix D.

3.4.4 Statistical Comparison

The comparative performance of the different ensemble models and the single

classifiers across different folds/subjects were tested for statistical significance using

one-way repeated measures ANOVA. To increase statistical power and enhance the

generalisability of the findings, F1-scores for each hold out subject/fold from all three

datasets were pooled.

Overall, mean F1-scores differed significantly between the ensemble and single

classifier models (Wilks’ Lambda = 0.270, F (9, 24) = 7.204, p < .0001). LSD post hoc

comparisons revealed that the custom ensemble with WMV or NB provided

statistically significant improvements in performance relative to the single classifiers.

The custom ensemble with WMV significantly outperformed the conventional

ensemble models with the exception of the random forest classifier. NB fusion

significantly outperformed Adaboost, but not random forest or bagged decision trees.

The custom ensemble with BKS fusion offered no significant improvements in


performance relative to the conventional ensemble models and, with the exception of

BDT, failed to outperform the single classifiers. Among the conventional ensembles,

the random forest ensemble significantly outperformed the custom ensemble with

BKS and the single classifiers, with the exception of SVM. Bagged decision tree and

Adaboost significantly outperformed BDT, but not SVM, KNN, or ANN.


3.5 DISCUSSION

This study systematically examined the performance accuracy achieved by

several conventional ensembles and a custom ensemble method in three datasets

featuring wrist worn accelerometer data. Across the three datasets, random forest

ensembles and the custom ensemble with weighted majority vote provided

consistently higher classification accuracy than bagged and boosted decision trees, and

with the exception of SVM in dataset #1, significantly outperformed the four single

classifiers. Of the three decision fusion techniques examined, weighted majority vote

provided marginally better performance than Naïve Bayes fusion However, both

weighted majority vote and Naïve Bayes fusion significantly outperformed behaviour

knowledge space fusion.

Our results are consistent with previous studies demonstrating that combining

multiple classifiers with different induction bias provides better PA recognition than

conventional ensemble methods and single model classifiers. Ruch et al. [139] used

majority voting (MV) to combine the decisions of k-nearest neighbour (kNN), normal

density discriminant function (NDDf), and custom decision tree (CDT) classifiers. In

free living conditions, MV provided a maximum 67% classification accuracy when

employing both hip and wrist accelerometers. Most recently, Catal et al. (5) reported

that the combination of three physical activity classifiers (logistic regression, decision

tree, and multi-layer perceptron) using voting provided better performance than a

single model approach.

The poorest performing custom ensemble in our experiment was BKS. The

limitations of BKS are well-documented [140]. It frequently suffers from

generalisation error if the training dataset is not sufficiently large and/or representative.

For example, when the number of classes (m) and classifiers (N) are large, the

combinations of classifier’s outputs for all classes in the look-up table become very

large (mN x m). In this case, if the training dataset is not representative and sufficiently

large to estimate all or most of the combinations of classifiers outputs, BKS fusion can

provide poor performance. BKS fusion also does not perform well when the

combinations of classifier’s outputs are ambiguous i.e., multiple occurrences of the

same combination of classifier outputs correspond to different true labels in the look-

up table and has low confidence/probability on the most representative true-class.

Considering the number of classes (8 in dataset #1, 5 in dataset #2, and 7 in dataset #3)


and four classifiers used in this study, the datasets were small for a reliable BKS fusion.

Also, in our experiment BKS fusion occasionally misclassified test instances due to

the ambiguous cells in the look-up table. To avoid the problems related to ambiguous

cells, some existing papers propose to use a local classifier in the original feature space

associated to ambiguous cells [140] – which has not been investigated in this paper.

Among the conventional ensembles, the random forest algorithm provided

strong classification performance, with F1 Scores ranging from 79.6% to 85% across

three datasets. This finding is consistent with results of recent studies developing and

testing random forest classifiers for use in the exercise and movement sciences. Ellis

and colleagues [99] developed a random forest classifier for recognition of four broad

classes of physical activities (household duties, stair climbing, walking, running) in

healthy adults. Separate classifiers were trained using frequency and time domain

features in accelerometer data collected on the hip and wrist. Using leave-one-subject-

out cross-validation, the average overall accuracy for the hip and wrist classifier was

92.7% and 87.5%, respectively. In a follow-up investigation [24], a 2-step activity

recognition model comprising a random forest classifier and a hidden Markov model

provided a balanced accuracy of 88.1% and 83.6% for the hip and wrist, respectively.

Most recently, Pavey et al. [104] developed a random forest activity classifier for

recognition four activity classes from accelerometer data collected on the wrist.

Recognition accuracy for sedentary, stationary plus, walking, and running was 80.1%,

95.7%, 91.7%, and 93.7%, respectively. When evaluated on 24-hour free-living data,

recognition of stepping events (walking and running) exceeded 90%.

However, it is important to note that random forest classifiers do not perform

well in all testing scenarios. Sasaki et al. [141] used time and frequency domain

features in accelerometer signal collected on the dominant hip, wrist, and ankle to train

random forest physical activity classifiers for older adults (65-85 years). In the leave-

one-subject-out cross-validation of the laboratory-based activity trials, recognition

accuracy for five activity classes (sedentary, standing, household chores, locomotion,

recreational activities) was 87%, 84%, and 89% for the hip, wrist, and ankle models,

respectively. However, when the models were deployed in free-living conditions, the

overall classification accuracy declined significantly to just over 50%.

Although the focus of this study was ensemble learning methods, the strong

performance (F1-score) of the base classifiers is worth noting. Of the four single


classifiers examined, SVM provided the highest averaged F1-score for dataset #1

(83%) and dataset #3 (84%), which is consistent with the results of previous

investigations comparing the performance of different supervised learning algorithms

[28]. BDT on other hand, performed best (75%) for dataset #2, but exhibited the worst

performance of all the base classifiers in datasets #1 and # 3. The superior performance

of BDT in dataset #2, may be explained, at least in part, by the relatively homogeneous

nature of the activities represented in the training data (rest versus walking and running

at different speeds). It may be that ensemble methods are more suitable for more

complex activity recognition problems requiring the detection of more fine-grained

activities. Future research should explore this hypothesis.

Although the ensemble methods consistently achieved better performance

accuracy, the magnitude of improvement over the single model classifiers was

relatively small. This is because the single classifiers were trained with sufficient data

and exhibited relatively high recognition accuracy in their own right. Nevertheless,

when investigating performance on a class-wise basis, notable performance

differences were observed for several activity classes. In dataset #1, the custom

ensemble with WMV fusion improved the recognition of stair climbing to 61%

compared to the best single classifier (ANN 57%). Similarly, for dataset #3,

recognition accuracy for running increased to 73%, where best single classifier (SVM)

provided only 68% accuracy. While the increment in performance afforded by

ensemble methods varied by dataset and activity class, the results confirm the general

principle that ensemble methods work best when the decisions from individual

classifiers are complimentary.

A strength of the current study was the use of three diverse physical activity

datasets, collected on different participant groups (adults and children) performing

different physical activities in different contexts (laboratory-based vs. outdoors). The

examination of three different decision fusion methods to combine four widely used

“state-of-the-art” classification algorithms was an additional strength. There were,

however, some limitations that warrant consideration. First, although the study was

conducted using training data collected in under different conditions, all three datasets

comprised activities that were completed in predetermined sequences. Thus, additional

work is required to evaluate the relative performance of ensemble methods in true free-

living contexts. Second, the activities in the selected datasets were primarily


ambulatory in nature. Only dataset #3 included non-ambulatory lifestyle activities such

as basketball, dance, etc. Future studies should include a more diverse set of physical

activities to recognise using ensemble methods. Third, a simple correlation based

feature selection method was used. The use of a more sophisticated feature selection

algorithm would likely have improved performance. Fourth and finally, our

experiments focused on activity recognition or classification. It should be noted that

ensemble methods can also be used for numerical prediction problems such as

estimating energy expenditure or physical activity intensity.

In summary, the results demonstrate that activity recognition accuracy can be

improved through the implementation of ensemble learning methods. Conventional

ensemble methods such as bagging, boosting, and random forests improve activity

recognition, in most, but not all situations. However, a custom ensemble using weight

majority voting to fuse the decisions of four widely used “state-of-the-art”

classification algorithms consistently outperformed the constituent base classifiers and

most conventional ensemble models. Decision fused ensemble methods thus have

strong potential to improve physical activity recognition from wearable sensors.


3.6 ACKNOWLEDGEMENTS

No funding was received for completion of this project. Trost is a member of the

ActiGraph Scientific Advisory Board. Chowdhury, Tjondronegoro, and Chandran

declare no conflict of interest. The results from the present study do not constitute

endorsement by the American College of Sports Medicine.

Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data 65

Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data


Dian Tjondronegoro2

Vinod Chandran1

Stewart G. Trost3






Professor Dian Tjondronegoro

School of Business and Tourism,

Southern Cross University,

Gold Coast, Australia

Phone: +61 415 558 420


66 Chapter 4: Physical Activity Recognition using Posterior-adapted Class-based Fusion of Multi-Accelerometers Data



4.1 ABSTRACT

This paper proposes the use of posterior-adapted class-based weighted decision

fusion to effectively combine multiple accelerometers data for improving physical

activity recognition. The cutting-edge performance of this method is benchmarked

against model-based weighted fusion and class-based weighted fusion without

posterior adaptation, based on two publicly available datasets, namely PAMAP2 and

MHEALTH. Experimental results show that: (a) posterior-adapted class-based

weighted fusion outperformed model-based and class-based weighted fusion; (b)

decision fusion with two accelerometers showed statistically significant improvement

in average performance compared to the use of a single accelerometer; (c) generally,

decision fusion from 3 accelerometers did not show further improvement from the best

combination of 2 accelerometers, (d) a combination of ankle and wrist located

accelerometers showed the best overall performance compared to any combination of

two or three accelerometers.

Index Terms—Activity Recognition, Accelerometer, Decision Fusion, Class-

Based Weighted Fusion


4.2 INTRODUCTION

Physical inactivity is a critical health risk factor [122], which triggers the need

for real time physical activity (PA) recognition and quantification of the frequency and

intensity of each PA instances using accelerometer-based motion sensors [26, 131]. A

range of approaches including rule-based (such as threshold/hierarchical), supervised

and unsupervised classification algorithms have been proposed for PA recognition [23,

53, 56, 111, 112]. The choice between using machine learning or rule-based approach

is often determined by the availability of a suitable training set. In the case of data

scarcity, rule based systems are usually used based on the domain knowledge. Most

recent papers in the PA domain suggested the use of supervised machine learning

algorithms [24, 120] as there are usually enough labelled data to train a reliable

machine-learning model. However, previous studies generally used their own datasets,

with no validation of results across variations of datasets, size of datasets and activity

type selections. Therefore, the performance of existing algorithms have been found to

be inconsistent and dependent on the sample used to generate the training data and the

activity targets under investigation [56].

Multiple accelerometers placed at different body locations has been found to be

effective in improving the accuracy of PA recognition and the performance depends

on activity type [26, 27, 30]. Acceleration data from multiple locations can be

combined using feature- or a decision-level fusion approach [31]. Decision-level

fusion has been found to be more accurate than feature fusion in other domains [32];

however, it has not been systematically investigated for PA recognition.

This paper’s key contribution is to propose the use of posterior-adapted class-

based weighted decision fusion. It is novel, as class-based decision fusion has not been

used for PA recognition, while it has been found to perform better than model-based

decision fusion [142]. Moreover, using posterior probability of the test data can further

improve the performance and it has also not been utilised in PA domain. In model-

based fusion, a model is developed for each accelerometer location, then the fusion

assigns a weight for each model based on the overall performance based on its training

data. Such approach is theoretically less robust compared to class-based fusion, which

focuses on the class (i.e., activity) wise performance of the models. Posterior-

adaptation means that the class-based weights are dynamically adjusted using the


confidence scores from each classification model based on real observations (i.e., test

data).

Aside from finding the most effective fusion technique, another challenge is to

determine the best combination of sensor placements for optimal PA recognition.

Therefore, our experiments have investigated how decision-level fusion can optimally

combine multiple classification models, where each model is trained using the

accelerometer data obtained from ankle, chest and wrist respectively. The robustness

of our proposed method has been tested against two publicly available datasets and

benchmarked with model-based and class-based weighted decision fusion techniques.

To sum up, the novelty of this paper is proposing the use of posterior-adapted class-

based weighted decision fusion to effectively combine multiple accelerometers data

for improving physical activity recognition.


4.3 RELATED WORK

PA recognition accuracy has been found to be dependent on the accelerometer

locations and types of PAs. For example, Atallah, et al. [143] used k nearest neighbour

(KNN) classifier and Bayesian classifier and found that the wrist location was good

for recognising very low-intensity-level and medium-intensity-level activities. For

low-intensity-level and transition activities, the waist location was the best. However,

the authors did not combine data from the accelerometers to find the optimal

combinations.

Some studies compared the performance of classifiers trained on data from the

combination of different accelerometer locations. Bao et al. [90] used feature fusion

on the accelerometer data collected from the upper-arm, lower-arm, hip, thigh, and

ankle and then applied several learning algorithms. They found a decision tree to be

the best performer (84%) when all sensors were fused, while the combination of thigh

and wrist accelerometer provided 3% less accuracy. However, the authors did not

investigate all possible accelerometer location combinations or the effect of

accelerometer location on recognising different PA types. A comprehensive study by

Cleland et al [28] used feature fusion and compared the performance of support vector

machine classifiers trained on accelerometer data from six body locations (lower back,

wrist, foot, chest, hip, and thigh) and their combinations. Compared to a single

accelerometer, combining data from any two locations resulted in a significant

improvement in performance. However, combining data from three or more

accelerometers provided no further improvements in performance. Kern, et al. [144],

[145] and [146] also reported significant improvements in recognition performance

when combining two or more accelerometer locations. Notably, all of the

aforementioned studies used feature fusion, which is more prone to noisy and

redundant data compared to decision fusion approach [31]. These studies did not use

multiple public datasets and consider the varying performance of single accelerometers

for different activities when combining different accelerometer positions.

There are existing studies in pattern recognition that investigated the best

classifier combination for decision fusion, such as using a diversity measure analysis

[147]. In activity recognition, the commonly used decision fusion rules include

majority voting, summation, hierarchical fusion and Bayesian fusion [148]. Banos, et

al. [149] proposed hierarchical-weighted decision fusion by combining the advantages


of the hierarchical decision and majority voting models which utilised class-level

classifiers and sensor-level classifier for making decisions. While several weighted

fusion techniques (classifier, class and sample-based) were compared empirically in

[142], class-based fusion seems to be more suitable for accelerometer fusion in PA

recognition due to the variation in the class-wise performance of different placement

of accelerometers. However, class-based fusion is yet to be fully investigated in the

PA domain. Moreover, the approach for calculating the weights in class-based fusion

needs improvement as it uses training errors to evaluate testing reliability. Adaptation

of class-based weights using the posterior probability of the test instances should

further improve the fusion techniques. Zhang and Zhang [150] showed that adjusting

the probabilities derived from the training output confusion matrix using the decision

reliability can improve the decision-making accuracy. However, they did not use class-

based weights, and did not apply their fusion algorithm for the PA recognition.


4.4 METHODS

The framework comprises pre-processing, feature extraction, normalisation,

feature selection, and classification. These steps were simultaneously applied to data

from each accelerometer location (e.g., ankle, chest, and wrist), resulting in activity

candidates. The final decision (i.e., which activity is the most likely) was achieved by

applying a posterior-adapted class-based weighted decision fusion. Each step will be

described in this section.

4.4.1 Pre-processing

Each of the 3-axis (x, y, z) accelerometer data was converted to a time-series

data structure. A linear interpolation method was used to impute missing data in the

middle of a labelled activity sequence. The missing values at the end of each labelled

activity sequence were replaced by the previous value.

4.4.2 Feature Extraction

For each of the 3-axis accelerometer data, a set of 45 features (in time- and

frequency- domain) was extracted from a 2-second sliding window without

overlapping. Short windows (interval 1–2 second) were used, as it has been shown to

demonstrate the best trade-off between accuracy and speed in PA recognition [89].

Specifically, 2-second window was empirically set as it was capable of capturing the

periodic movements for the selected PA classes.

Table 4.1 lists the extracted features from each window. These features were

combined from the features extracted in previous PA recognition studies [44, 51, 90,

118, 132].

4.4.3 Normalisation & Feature Selection

Normalisation is required to limit feature values within a range, and in this case

we set the range to zero mean and unit variance using linear methods. For example, a

feature x can be normalised using following formula.

𝑥𝑥�𝑖𝑖 = 𝑥𝑥𝑖𝑖−𝑥𝑥𝜎𝜎

(1)

Where, 𝑥𝑥 and 𝜎𝜎 are mean and standard deviation respectively.

Use of unnecessary features may lead to over-fitting, low performance and

computational load [151]. Therefore, instead of adopting all the 45 features for


classification, correlation-based feature selection method was adopted to select the

most useful features. This feature selection method is fast, simple, and found to be

useful in previous studies [133]. In this study, the training data was used to compute

the correlations between each labelled activity and feature. Features that have a

correlation of 0.25 or greater (threshold was suggested in [152]) were selected for

training and testing the classifiers.

Table 4.1 List of features extracted from each window of an accelerometer

No Features Feature Count

1 Mean for each axis of a 3-axis accelerometer 3 2 Standard deviation for each axis of accelerometer 3 3 Minimum value for each axis 3 4 Maximum value for each axis 3 5 Variance for each axis 3 6 Median value for each axis 3 7 Skewness for each axis 3 8 Kurtosis for each axis 3 9 Energy for each axis 3 10 Cross-correlation of accelerometer axis 3 11 Principal frequency for each axis 3 12 Magnitude of principal frequency for each axis 3 13 Median crossing for each axis 3 14 25th percentile for each axis 3 15 75th percentile for each axis 3

Total number of features extracted 45

4.4.4 Classification Algorithms

In order to find the best classification approach, several state-of-the-art machine

learning methods were initially benchmarked, including binary decision tree (BDT),

support vector machine (SVM), and deep neural network (DNN), random forest (RF)

and Adaboost.

In our implementation, the maximum decision split for BDT was set to 20. The

DNN used two auto-encoders to convert inputs into 35 and 20 deep features

respectively. A softmax layer was trained using 20 deep features for activity

classification. To reduce overfitting, L2-weight regularisation (value set to 0.001) was


added to train criterion. In RF, random subset of predictors for each decision split was

equal to the square root of the total number of available features. For the Adaboost.M2

algorithm, a multi-class classification method with 100 learning cycle was

implemented.

Based on the experimental results (see section 4.6.1 for details), SVM was

selected as the best classification algorithm, as it showed the highest classification

accuracy compared to other classification algorithms, although the difference was

marginal.

4.4.5 Decision Fusion Techniques

Let’s consider, the fusion of decisions from 𝑓𝑓 models for a 𝑚𝑚-class problem. The

sets of models and classes can be presented as M = {M1, M2… Mn} and C = {C1, C2…

Cm}. When classifying a test instance (x), each model provides a predicted class label

along with a posterior probability of the predicted label, which is a measure of the

confidence of the decision from that model for that test instance. Let the predicted

vector for that instance be V(x) = {V1(x), V2(x)… Vn(x)} where each Vi(x) 𝜖𝜖 C, and the

posterior probabilities be W2(x) = {W21(x), W22(x) … W2n(x)}. A decision fusion

technique provides a final prediction for x by combining individual predictions {V(x)}.

A model-based weighted voting assigns a weight to each model/classifier based

on the overall performance of that model on the training data irrespective of classes

[31, 142]. This weight is independent of its predicted class. In the fusion step, a

weighted majority is used to decide the final predicted class. In contrast, a class-based

weighted decision fusion assigns weights to all classes based on the prior knowledge

of the model’s prediction performance for the different classes [142]. In the fusion

step, a weighted majority is again used to decide the final predicted class but the

weights are now different and the majority class may be different. This study proposes

posterior-adapted class-based fusion, which adjusts the class-based weights for each

test instance using the posterior probability of the model on the prediction. The steps

to achieve these weighted decision fusion schemes are described below.

Weight calculation – Using 10-fold cross validation on the training data, both

predicted training classes and true training classes are compared and the F1-scores for

all classes are computed. F1-scores indicate the model’s confidence for each class

based on the training data, which are used as class-based weights. The 10-fold


validation allows reliable calculation of expected class-wise performance on unseen

data and avoid overfitting.

Let the class-based weights for models be W1 = {W11, W12 … W1n}, where W1i is

a collection of weights for all (m) classes for the ith model, i.e., {w1i1, w1i2 … w1im}. For

model-based fusion, a weight for each model is calculated by taking average of class-

based weights W1i.

𝑊𝑊𝑎𝑎𝑎𝑎𝑎𝑎 𝑖𝑖 = 𝑊𝑊1𝚤𝚤�� 1 ≤ 𝑓𝑓 ≤ 𝑓𝑓 (2)

Weight adjustment – For each test instance, the class-based weights are

adjusted using the posterior probability of the predicted label. Let the adjusted class-

based weights for the given test instance are Wi(x) = {wi1, wi2 … wim} 1≤ i ≤ n. At first,

the adjusted class-based weights are initialised to the class-based weights.

𝑊𝑊𝑖𝑖(𝑥𝑥) = 𝑊𝑊1𝑖𝑖 1 ≤ 𝑓𝑓 ≤ 𝑓𝑓 (3)

Then, weights are adjusted by the posterior probability using the following

equation.

𝑤𝑤𝑖𝑖𝑘𝑘 = (𝛼𝛼 ∗ 𝑤𝑤𝑖𝑖𝑘𝑘 + (1 − 𝛼𝛼) ∗ 𝑊𝑊2𝑖𝑖)𝑉𝑉𝑖𝑖=𝐶𝐶𝑘𝑘

1 ≤ 𝑘𝑘 ≤ 𝑚𝑚, 1 ≤ 𝑓𝑓 ≤ 𝑓𝑓 (4)

Here, α is a weight adjustment parameter, within 0 to 1, that requires tuning.

Model-based Fusion – This fusion scheme takes a weight (𝑊𝑊avg i) for each

model and current prediction vector {V(x)} to make a final prediction for a test instance

(x). It computes the score for each of the predicted label by summing up the

corresponding model’s weight using equation (8).

𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑉𝑉𝑖𝑖(𝑥𝑥) = ∑𝑤𝑤𝑎𝑎𝑎𝑎𝑎𝑎 𝑖𝑖 1 ≤ 𝑓𝑓 ≤ 𝑓𝑓 (5)

Then it selects that predicted label {Vi(x)} as final decision, which has the highest

score.

Class-based Fusion – This fusion scheme considers the class-based weights

W1(x) and current prediction vector {V(x)} to make a final prediction for a test instance

(x). It calculates score for each class using following formula.

𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑘𝑘 = ∑ 𝑤𝑤1𝑖𝑖𝑘𝑘 1 ≤ 𝑘𝑘 ≤ 𝑚𝑚, 1 ≤ 𝑓𝑓 ≤ 𝑓𝑓𝑉𝑉𝑖𝑖(𝑥𝑥)=𝐶𝐶𝑘𝑘 (6)

Finally, it selects the class label as final prediction, which has maximum score

using equation (7).


𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑙𝑙𝑎𝑎𝑙𝑙𝑙𝑙𝑙𝑙 = 𝐶𝐶𝑎𝑎𝑎𝑎𝑎𝑎𝑚𝑚𝑎𝑎𝑥𝑥𝑘𝑘=1𝑚𝑚 𝑆𝑆𝑐𝑐𝑆𝑆𝑎𝑎𝑙𝑙𝑘𝑘 (7)

Posterior-adapted Class-based Fusion – This fusion scheme is similar to class-

based fusion, but it used adjusted class-based weights Wi(x). It calculates score for each

class using the following formula.

𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑘𝑘 = ∑ 𝑤𝑤𝑖𝑖𝑘𝑘 1 ≤ 𝑘𝑘 ≤ 𝑚𝑚, 1 ≤ 𝑓𝑓 ≤ 𝑓𝑓𝑉𝑉𝑖𝑖(𝑥𝑥)=𝐶𝐶𝑘𝑘 (8)

Finally, it selects the class label as final prediction, which has maximum score

using equation (7).


4.5 EXPERIMENT

4.5.1 Datasets

Two publicly available PA monitoring datasets were chosen for the study, as

they both have accelerometer sensors data from three body positions (ankle, chest and

wrist). These data had been shown by previous studies [87, 131, 153, 154] to be

effective for machine learning purposes, which confirms that there was enough data to

train the machine learning models.

The PAMAP2 Dataset includes data from nine participants (1 female, 8 male),

with age and body mass index (BMI) of 27.2 ± 3.3 years and 25.1 ± 2.6 kg/m2

respectively. Participants wore three Colibri wireless IMUs on their dominant-side

wrist, ankle, and chest, when performing physical activities including lying down,

sitting, standing, walking, running, cycling, Nordic walking, ascending stairs,

descending stairs, vacuum cleaning, ironing clothes and jumping rope. Each sensor

contains two three-dimensional (3D) acceleration sensor (scale: ±6g and ±16g) with a

resolution of 13 bits, a gyroscope sensor, a magnetometer sensor, temperature,

orientation and heart rate monitor sensors. The sampling rate of recorded acceleration

data is 100 Hz. Further details of the study protocol can be found in [87, 131].

The MHEALTH Dataset includes data from ten participants, in an out-of-lab

environment, while performing twelve physical activities. The physical activities

include: standing still (1 min), sitting and relaxing (1 min), lying down (1 min),

walking (1 min), climbing stairs (1 min), waist bends forward (20x), frontal elevation

of arms (20x), knees bending (crouching) (20x), cycling (1 min), jogging (1 min),

running (1 min), and jumping front & back (20x). During the data collection,

Shimmer2 (Shimmer 2R, Real-time Technologies, Dublin, Ireland) wearable sensors

were attached to the subject’s chest, right wrist and left ankle. These sensors monitor

3D acceleration data (±6g) from chest, ankle, & wrist, electrocardiography (ECG)

signal, 3D gyroscope data from ankle, & wrist, and 3D magnetometer data from ankle,

& wrist. The sampling rate of recorded data is 50Hz. Further details on the data

collection can be found in [153, 154].

Both datasets are fully labelled with each raw acceleration signal annotated

based on the performed activity. For the purpose of this study, a subset data was

extracted from both datasets. For PAMAP2, the selected activity classes were lying


down, sitting, standing, walking, running, cycling, ascending stairs, and descending

stairs. For MHEALTH dataset, lying down, sitting and relaxing, standing still,

walking, running, cycling, climbing stairs, and jogging activities were chosen for

analysis.

4.5.2 Implementation of the Framework

Figure 4.1 shows how the framework had been implemented. For each

accelerometer location data (ankle, chest, and wrist), SVM classifier was applied in

the four-phase processes: (1) training phase, where the classifier was trained using

training data; (2) weight calculation phase, where the classifier was evaluated using

10-fold cross-validation of the training data and the resultant class-based weights and

average weights were assigned, (3) individual model decision phase, where the

classification model (trained in phase 1) was applied to a new/testing data and

predicted label and its posterior probability, (4) class-based weight adjustment

phase, where class-based weights from training data (output of phase 2) were adjusted

using the posterior probability of the predicted label (output of phase 3), called

adjusted class-based weights.

Finally, a decision fusion phase (figure 4.2) combined the decisions from each

individual sensor location using posterior-adapted class-based weighted decision

fusion, and also using model-based, class-based decision fusion techniques for

benchmarking purposes.


Figure 4.1 Overview of the system developed for implementing the framework

Figure 4.2 For a given test instance (x), predicting the final label by fusing the decisions from accelerometer sensors using weights


4.5.3 Evaluation Approach and Metrics

Leave-one-subject-out cross-validation was used to evaluate and compare the

classification models. This evaluation uses one subject’s data for testing and remaining

subject’s data for training to conduct a subject-independent evaluation. Thus, all

subject’s data are considered once for testing (as suggested in [137]). In a real-world

context, it is desirable for an activity recognition system to perform well for a new

subject.

The performance of each classifier was evaluated by calculating precision, recall

and F1-score. For each class, predictions were compared to ground truth labels and the

number of true-positives (TP), true-negatives (TN), false-positives (FP), and false-

negatives (FN) were calculated. Precision measures the exactness of a classifier while

recall can measure the completeness of classifiers. These can be calculated for a

particular class using the following equations.

𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇

(9)

𝑅𝑅𝑠𝑠𝑠𝑠𝑓𝑓𝑓𝑓𝑓𝑓 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇+𝐹𝐹𝑁𝑁

(10)

The F1-score is a balanced combination of both precision and recall can be

measured using the following formula.

𝐹𝐹1 − 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 2 𝑥𝑥 𝑇𝑇𝑎𝑎𝑙𝑙𝑐𝑐𝑖𝑖𝑃𝑃𝑖𝑖𝑆𝑆𝑃𝑃 𝑥𝑥 𝑅𝑅𝑙𝑙𝑐𝑐𝑎𝑎𝑙𝑙𝑙𝑙𝑇𝑇𝑎𝑎𝑙𝑙𝑐𝑐𝑖𝑖𝑃𝑃𝑖𝑖𝑆𝑆𝑃𝑃+𝑅𝑅𝑙𝑙𝑐𝑐𝑎𝑎𝑙𝑙𝑙𝑙

𝑥𝑥 100 % (11)

The predicted classes for each subject were combined and a confusion matrix

was derived from the complete set. Then using the confusion matrix, F1-scores for all

activity classes were computed to get an insight into the model’s performance for each

class. Let the number of subjects and classes are n and m respectively. The classes can

be presented as C = {C1, C2… Cm}. Given that, true and predicted classes are {T1, T2

… Tn} and {P1, P2 … Pn} respectively. Where, Ti and Pi are true and predicted classes

for ith subject and Ti 𝜖𝜖 C, Pi 𝜖𝜖 C. The F1-Scores were calculated using the following

steps.

Step 1: 𝑃𝑃 = ⋃ 𝑃𝑃𝑖𝑖𝑃𝑃𝑖𝑖=1 ; 𝑇𝑇 = ⋃ 𝑇𝑇𝑖𝑖𝑃𝑃

𝑖𝑖=1

Step 2: 𝐹𝐹1𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝐶𝐶𝐶𝐶𝐶𝐶𝑆𝑆𝑆𝑆_𝑊𝑊𝑊𝑊𝑆𝑆𝑊𝑊(𝑇𝑇,𝑃𝑃) = {𝐹𝐹𝑆𝑆1,𝐹𝐹𝑆𝑆2, …𝐹𝐹𝑆𝑆𝑚𝑚}

Here, 𝐹𝐹𝑆𝑆𝑘𝑘 is the overall F1-Score of kth class across subjects.


4.6 RESULTS AND DISCUSSION

4.6.1 Evaluation of Classification Algorithms

Table 4.2 shows the F1-scores of machine learning algorithms using both

PAMAP2 and MHEALTH datasets. The results were not conclusive in terms of

deciding the best classification approach. Both RF and SVM consistently showed

better performance for all three accelerometer locations across both datasets. However,

for the remaining analyses, this study adopted SVM, as it gave the highest F1-score

(82.32%) when averaged over all placement locations and both datasets, which is

consistent with previous work [28].

Table 4.2 Average F1-scores for each classification model across both datasets

Classifier PAMAP2 MHEALTH Average

Across Both Datasets Ankle Chest Wrist Ankle Chest Wrist

SVM 84.72 81.00 80.86 83.64 79.62 84.10 82.32

RF 84.88 77.57 77.91 83.55 83.16 86.35 82.24

BDT 77.56 71.95 74.42 79.36 80.49 87.22 78.50

DNN 78.17 79.38 76.12 87.30 77.87 86.88 80.95

Adaboost 79.68 79.02 76.79 86.08 82.99 86.68 81.87

4.6.2 Evaluation of Different Fusion Techniques

Figures 4.3 and 4.4 show the average classification performances of model-

based, class-based and posterior-adapted class-based decision fusion across different

accelerometer location combinations for the PAMAP2 and MHEALTH datasets

respectively. Error bars in both figures present 95% confidence interval (CI). The

weight adjustment parameter (α) in posterior-adapted class-based weighted fusion was

set to 0, 0.25, 0.5, 0.75 and 1. An α = 0.5 adjusts the weights by taking the average of

the class-based weights and posterior probabilities and provided the best performance

for the optimal accelerometer combination (Ankle + Wrist). Hence, the results reported

in this paper used α = 0.5.


Figure 4.3 Average F1-Score comparison for model-based, class-based and posterior-adapted class-

based decision fusion with the PAMAP2 dataset

Figure 4.4 Average F1-Score comparison for model-based, class-based and posterior-adapted class-based decision fusion with the MHEALTH dataset

In both datasets, the posterior-adapted class-based weighted fusion consistently

provided the best average F1-Scores for all accelerometer combinations compared to

that obtained using either model-based or class-based weighted fusion. While the

performance of model-based fusion was poor for most two accelerometer

combinations, the class-based fusion performed well in most situations (F1-Scores

were higher than model-based but lower than posterior-adapted class-based).


With PAMAP2, the posterior-adapted class-based weighted fusion provided

statistically significant improvement in performance compared to model-based fusion

for all two accelerometer combinations, but not A+C+W. With MHEALTH, the

posterior-adapted class-based weighted fusion provided statistically significant

improvements in performance compared to model-based fusion for A+W and C+W,

but not A+C or A+C+W. Across all accelerometer configurations, posterior-adapted

class-based weighted fusion consistently provided higher classification accuracy than

class weighted decision fusion; however, there were no statistically significant

differences in average F1-Scores.

4.6.3 Activity-Wise Classification Performance

Tables 4.3 and 4.4 report the class/activity-wise and average F1-scores for single

location classifiers and all possible combinations of accelerometer locations (using

posterior-adapted class-based weighted fusion) for PAMAP2 and MHEALTH,

respectively.

Table 4.3 F1-scores for single and all possible combinations of accelerometer sensors in PAMAP2

dataset

Posterior-adapted class-based weighted

fusion

Ankle Chest Wrist A+C A+W C+W A+C+W

Lying 96.88 98.36 90.81 97.25 95.31 95.41 97.06

Sitting 61.65 70.02 82.56 70.00 85.64 85.08 84.24

Standing 72.31 70.55 86.31 72.83 88.98 87.68 87.32

Walking 97.12 82.80 85.70 96.68 96.41 88.65 96.71

Running 89.61 93.41 98.14 98.49 99.19 99.19 99.31

Cycling 94.93 86.27 95.15 96.47 97.58 91.41 98.36

Asc. Stairs 85.00 68.64 44.22 88.01 84.44 71.23 85.88

Desc. Stairs 80.25 77.96 64.01 89.26 88.92 80.06 89.67

Mean 84.72 81.00 80.86 88.62 92.06 87.34 92.32


Of the single location models, classifiers trained on ankle data performed best

across both datasets. However, classification accuracy for each PA class varied with

accelerometer location. In PAMAP2, the ankle was the best location for walking,

ascending stairs, and descending stairs, while the wrist location was best for sitting,

standing, running and cycling. The chest location was only best for lying down. In

MHEALTH, the ankle location was best for lying down, walking, cycling, and

climbing stairs, while the wrist location was best for sitting and standing. The chest

location was best for running, and jogging.

Table 4.4 F1-scores for single and all possible combinations of accelerometer sensors in MHEALTH

dataset

Posterior-adapted class-based weighted

fusion

Ankle Chest Wrist A+C A+W C+W A+C+W

Lying 94.74 89.09 90.85 100.0 96.49 100.0 100.0

Sitting 45.00 44.41 79.23 26.12 81.65 84.93 67.67

Standing 61.02 60.78 86.59 56.54 85.26 86.59 75.95

Walking 95.40 85.94 83.59 97.58 98.70 89.11 94.08

Running 88.62 88.70 76.14 88.78 87.35 86.90 87.71

Cycling 99.36 93.98 94.92 97.32 96.56 99.84 100.0

C. Stairs 98.24 85.58 84.92 98.08 98.72 89.45 94.47

Jogging 86.78 88.47 76.56 89.20 88.41 87.60 88.82

Mean 83.64 79.62 84.10 81.70 91.64 90.55 88.59

Fusion of multiple accelerometer locations using the posterior-adapted class-

based decision fusion showed notable improvements in performance compared to the

single location models. In PAMAP2, classification performances for the fusion of

ankle and wrist accelerometers (A+W) and all three accelerometers (A+C+W) were

similar and best among all the combinations. Chest with wrist (C+W) and ankle with

chest (A+C) accelerometer locations also exhibited superior performance to that


observed for any single location model. In MHEALTH, the best fusion performance

was obtained for A+W (91.6%) and C+W (90.6%), with A+C+W also provided

outstanding classification performance (88.6%). All combinations except A+C

exceeded the performance of any single location model.

4.6.4 Subject-Wise Classification Performance

Performance differences across different subjects were tested for statistical

significance using one-way repeated measures ANOVA. To achieve better statistical

confidence, F1-scores for each hold out subject in both datasets were pooled. The

results are shown in figure 4.5.

Figure 4.5 Average F1-Scores of all single and possible accelerometer combinations across different subjects. Error bars represent 95% confidence intervals. (*) indicates statistical significance (p < 0.05)


Overall, mean F1-scores differed significantly between the combinations of

accelerometer locations (Wilks’ Lambda = 0.140, F (6, 12) = 12.269, p < 0.001). Least

significant difference (LSD) post hoc tests revealed a significant improvement in

performance when fusing the predictions of two or three accelerometers. All

accelerometer combinations except the combination of ankle and chest (A+C)

significantly outperformed all single sensor locations. A+W and A+C+W provided the

highest average F1-scores across different subjects, but there were not any significant

statistical differences between A+W, C+W, and A+C+W.

4.6.5 Confusion Matrices

Table 4.5 and 4.6 show the confusion matrices of the best-performing

accelerometer combination, i.e., combination of ankle and wrist (A+W), using

Posterior-adapted class-based weighted fusion for PAMAP and MHEALTH datasets,

respectively.

Table 4.5 Confusion matrix for ankle and wrist combination (A+W) in PAMAP2 dataset

1 2 3 4 5 6 7 8

1 Lying 884 0 1 1 0 0 0 1

2 Sitting 63 701 76 0 0 7 1 2

3 Standing 21 81 763 2 0 3 0 2

4 Walking 0 0 0 1101 0 0 12 5

5 Running 0 0 0 2 430 0 0 3

6 Cycling 0 4 3 1 0 746 1 1

7 Asc. Stairs 0 1 0 49 1 11 342 29

8 Desc. Stairs 0 0 0 10 1 6 21 325


Table 4.6 Confusion matrix for ankle and wrist combination (A+W) in MHEALTH dataset

1 2 3 4 5 6 7 8

1 Lying 289 0 0 0 0 21 0 0

2 Sitting 0 227 83 0 0 0 0 0

3 Standing 0 18 292 0 0 0 0 0

4 Walking 0 0 0 303 0 0 7 0

5 Running 0 0 0 0 259 0 0 51

6 Cycling 0 1 0 0 0 309 0 0

7 C. Stairs 0 0 0 1 0 0 309 0

8 Jogging 0 0 0 0 24 0 0 286

In both datasets, most misclassifications occurred between similar activity

instances, such as misclassification between sitting and standing, and walking and

ascending stairs. Most running activities were correctly classified in PAMAP2 dataset,

but in MHEALTH, they were misclassified as jogging. Descending stairs was

misclassified mostly as ascending stairs.


4.7 CONCLUSION

This paper presents a study that investigates the use of multiple accelerometers,

placed at three body locations (ankle, chest, and wrist), to effectively identify physical

activities. Evaluation was based on two publicly available datasets, namely, PAMAP2

and MHEALTH. The SVM was selected for further analysis as it gave the highest

average performance across both datasets. Classification performance depended on

both the accelerometer location and activity type. Classifiers trained on ankle data

provided the best average performance over all activities. Combinations of classifiers

trained on accelerometer data from different locations may improve performance and

this was investigated further with model based, class-based and our proposed

posterior-adapted class-based weighted decision fusion.

PA recognition using posterior-adapted class-based weighted fusion of multiple

accelerometers provided significant improvements in performance in both datasets. Its

performance was also found to be better than that observed for model-based, and class-

based fusion for all accelerometer combinations. It is consistent with the notion that

the combination of ankle and wrist (A+W) accelerometers can capture upper and lower

body movements; therefore, can yield significantly higher performance than other

combinations. Relative to the two-accelerometer combinations, the addition of the

chest location (A+W+C) did not improve PA recognition. Thus, more sensor data does

not always result in performance improvements for PA recognition. Considering that

chest-mounted accelerometers can be uncomfortable for everyday use; this finding is

valuable to motivate future use of ankle and wrist accelerometers for longer-term

monitoring of PAs.

A limitation of this paper is that, it uses datasets with only ankle, wrist and chest

positioned accelerometers in a controlled setup, and hence overlooks other

accelerometer locations such as thigh, hip, etc. For future studies, the proposed

framework should be tested by investigating more accelerometer locations, adding

more PA classes - especially those that are harder to distinguish, and increase the

number of participants to ensure that the findings are generalisable to a wide range of

end users. Further test of the proposed method should be done using more PA datasets

acquired in different environments to fully study the limitations. This paper has

contributed to better understanding of performance improvement with decision fusion

in physical activity recognition using multiple accelerometers.

PART II - Estimation of Impacts of Physical Activities

Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data 93

Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data


Dian Tjondronegoro2

Jinglan Zhang1

Puspa Setia Pratiwi1

Stewart G. Trost3






Alok Kumar Chowdhury

Science and Engineering Faculty,

Queensland University of Technology,

Brisbane, Australia

Phone: +61 420 467 077


94 Chapter 5: Towards Non-Laboratory Prediction of Relative Physical Activity Intensities from Multimodal Wearable Sensor Data



5.1 ABSTRACT

This paper explored a non-laboratory approach to effectively predict relative

physical activity intensities using regression algorithms on multimodal physiological

data. 22 participants completed 5 to 7 physical activity sessions where each session

consisted of 5 activity trials ranging from sedentary to vigorous. During the trials,

participant’s heart rate (HR), r-r interval (RR), electrodermal activity (Eda), and body

temperature (Temp) were recorded using wearable sensors. Immediately after each

trial, participants provided their rating of perceived effort (RPE) using the 6-20 Borg

scale. This work used both person-level features and features extracted from each of

the sensor modality; followed by a feature selection step. Then, using leave-one-

subject-out cross-validation, two regression algorithms including linear regression,

and support vector machine regression were applied separately on each modality

features and all possible modality features combinations. The results showed that both

regression algorithms produced similar accuracy. In terms of the usefulness of a single

modality, features extracted from RR provided highest prediction performance

compared to any other single modality. However, combination of Eda and Temp

features fused with RR features produced the best overall performance, confirming the

benefits of using multi-modal data.


5.2 INTRODUCTION

With the advent of mobile and wearable sensors, an opportunity has emerged for

healthcare providers and researchers to empower people to take care of their wellness

by providing them with timely and personalised support. Sensors are increasingly

being linked to computing technologies, such as websites and smartphones, to process

the data and provide unique opportunities for the delivery of personalised and adaptive

interventions to physical activities (PA) [155-158]. These technologies can be used to

collect real-time response of users efficiently and unobtrusively by tracking the

frequency, intensity, and duration of physical activity. Such features enable users to

record, view and share PA status with their health practitioners.

Extensive PA intervention research in [159] demonstrated an opportunity to

provide a personalised behaviour treatment using adaptive goal and feedback to

increase individual’s level of PA performance. However, there is a risk to cause

individuals to perform exercise at a level that is neither safe nor effective; because

adapting PA intervention to the individual’s PA performance capacity is still a

challenging task due to the individual fitness level that affects the biomechanical,

physiological, and psychological responses associated with PA. Therefore, a system

needs to consider individual’s aerobic fitness, age or health status to produce a more

accurate PA level recommendation.

PA intensity is one of the crucial PA measurement parameters which can be

defined in either relative or absolute terms. Absolute intensity considers the external

workloads for a particular PA, usually refers to the energy cost of a specific activity

expressed as multiples of resting metabolism or Metabolic Equivalents (METs). The

relative PA intensity, on the other hand, personalise PA intensities based on the

person’s fitness or capacity. In relative terms, moderate intensity physical activity is

typically defined as 40% to 60% of VO2 reserve or an RPE of 12 – 14 [37, 160]. It

means, to achieve moderate intensity physical activity based on absolute intensity,

individuals with a lower aerobic capacity are required to work at a significantly higher

relative intensity [70]. Thus, a significant proportion of individuals with limited

aerobic capacity are erroneously misclassified as not meeting PA guidelines.

To date, research efforts to quantify PA and their intensities are mostly based on

the accelerometer sensors [161-163]. Accelerometers are only able to capture external


workloads, therefore can be used for calculating absolute intensity [70, 158, 163]. In

order to determine relative PA intensity, operationalising intensity as a percentage of

maximal oxygen uptake is considered the gold standard, but this is not feasible in most

situations because its measurement needs sophisticated instruments and lab-based

individual calibration. Self-rated perceived exertion scales, e.g., Borg’s rating of

perceived exertion (RPE) for adults [35] and OMNI perceived exertion scale for

children [164], are widely used and valid indicators of relative physical activity

intensity. However, they are not usable in automated scenarios as they need manual

involvements to enter data.

Using sensors, manual entries can be avoided or reduced. Relative intensity can

be measured using heartrate, such as percentage of HR reserve (% HRR) or percentage

of HR max (% HR max) [71, 72]. While heartrate based methods are more objective

and suitable for predicting moderate to vigorous relative intensities, they are not

effective for low relative intensities [39]. Moreover, these approaches require

knowledge of HR max for which commonly used age-related prediction equations are

subject to considerable measurement error [40, 41]. In addition to heartrate, some other

modalities of physiological data, including electrodermal activity (Eda) and body

temperature (Temp), can be easily obtained using wearable sensors. These

physiological indicators can provide valuable information about the metabolic demand

of exercise, and can also be used to predict relative PA intensity. However, to the best

of our knowledge, the use of multiple modalities of physiological data for relative

intensity prediction has not been previously investigated.

This paper presents a study to effectively predict the relative intensity using

multimodal physiological sensor data (heart-rate, rr-interval, Eda and Temp), and

applied two regression algorithms (linear regression and support vector machine

regression) to explore all combinations of the sensor data. Our experiments were based

on a real-world (non-laboratory) dataset, collected from 22 people, where Borg’s RPE

scale was used as a measure of relative intensity. The key contribution of this paper is

to identify: 1) the best single modality feature and, 2) the best combination of modality

features for predicting PA relative intensity.


5.3 DATASET COLLECTION

This study recruited 22 adults (mean age = 29.8 ± 3.2 yrs; BMI = 25.3 ± 2.6;

male = 77.3%) to perform 5 to 7 sessions of PA trials. To be eligible for the study,

participants needed to be between 18 to 40 years of age and sufficiently healthy to

perform PA by completing Physical Activity Readiness Questionnaire for Everyone

(PAR-Q+).

Each session was performed in the park and consisted of five structured PA

trials ranging from sedentary to vigorous intensity: quiet sitting and standing (5 mins),

comfortable walk (5 mins), brisk walk (5 mins), jogging (3 mins), and running (2

mins). Sufficient recovery time was provided between each activity trial.

Before the first session, participants provided basic profile information such as

age, sex, height, and weight. PA status (sedentary, insufficiently active, sufficiently

active) was measured using the Active Australian Survey [97].

During each session, participants wore an Empatica E4 smart watch on non-

dominant wrist, and a Polar H7 chest strap HR monitor. The Empatica E4 captured

electrodermal activity (Eda) and body temperature (Temp). The sampling rate for Eda

and Temp data were 4 Hz. The Polar HR monitor recorded HR at a sampling rate of 1

Hz and the RR-interval data.

Relative intensity was measured using the Borg RPE scale [35, 36]. The Borg’s

6–20 scale, shown in Figure 5.1, reflect how heavy and strenuous the PA feels to

someone, linking all sensations and feelings of physical stress, effort, and fatigue.

Rating 6 represents “no exertion at all” and 20 represents “maximal exertion”. Each

number describes a different level of exertion.

The scale was presented and explained to the participants before performing the

session. Immediately after each trial, participants rated their perceived exertion using

the scale.


Figure 5.1 Borg’s Rating of Perceived Exertion (6-20) scale


5.4 METHODS

5.4.1 Pre-processing

A moving average filter with a span of 5 was applied on the HR, RR, Eda, and

Temp data to remove any motion artefacts. To exclude non-steady-state data, 10s of

data were removed from the begging and end of each activity trial. Missing values

were replaced by linear interpolation.

5.4.2 Feature Extraction and Selection

From each sensor modality, a few time and frequency domain features were

extracted.

HR feature set: The time domain features included mean, variance, standard

deviation, skewness, kurtosis, median, numerical gradient, on and off response.

Additionally, the number of times HR increased and decreased were computed,

normalised for window size.

R-R interval feature set: The time domain features extracted from the RR

interval data included mean, variance, standard deviation, skewness, kurtosis, median,

standard deviation of successive differences between adjoining normal cycles (SDSD),

Square root of the mean squared difference of successive RR-intervals (rMSSD),

Number of pairs of successive RR-intervals that differ by more than 20 ms/length

(pNN20), Number of pairs of successive RR-intervals that differ by more than 50

ms/length (pNN50). Frequency features included spectral energy density (aVLF, aLF,

aHF), relative power (pVLF, pLF, pHF), and normalised power (nLF, nHF) of very

low frequency (0 - 0.04 Hz), low frequency (0.04 – 0.15 Hz), and high frequency (0.15

– 0.40 Hz) components. Total spectral energy density (aTotal), and ratio between LF

and HF band energy (LF/HF) were also extracted.

Eda feature set: The time domain features included mean, variance, standard

deviation, skewness, kurtosis, and median.

Temp feature set: The time domain features included mean, variance, standard

deviation, skewness, kurtosis, and median.

Person-level features: These features were always used with the sensor features

in the regression models. Person level features included height, weight, age, BMI,

gender, total PA time, weekly PA sessions, and PA status.


Before the regression, each feature was normalised to a zero mean and unit

variance. Then, the size of the feature vector was reduced by selecting only the best 10

features using a minimum-redundancy–maximum-relevance feature selection

(MRMR) method [165]. For an example, the best 10 features selected from the fused

feature of RR, Eda, and Temp were: 1) median(RR), 2) mean(Temp), 3) aTotal(RR),

4) aHF(RR), 5) mean(Eda), 6) pNN20(RR), 7) aVLF(RR), 8) mean(RR), 9)

skewness(RR), 10) aLF(RR).

5.4.3 Regression Algorithms

There is a linear relationship between the sensor data and relative intensity [166,

167]. For example, relative intensity usually increases with the increase of heart rate.

Eda value represents sweating which also usually increases with the relative intensity

[18]. Considering these, this study selected linear regression and a SVM regression

with a linear kernel for predicting the relative intensities. Both regression algorithms

were implemented in Matlab (version 2017a).

At first, the regression algorithms were applied separately on the features

extracted from each individual modality. Then the features of multiple modalities were

merged together to form all possible feature combinations. Regression algorithms were

applied to all feature combinations to investigate if using multiple modalities can

improve prediction performance.


5.5 PERFORMANCE EVALUATION

Performance was evaluated using the root-mean-square error (RMSE). For n

different predictions, if 𝑦𝑦𝑡𝑡′ are the predicted values by the model and 𝑦𝑦𝑡𝑡 are the original

values, the RMSE can be calculated using the following equation:

𝑅𝑅𝑅𝑅𝑆𝑆𝑅𝑅 = �∑ (𝑦𝑦𝑡𝑡′−𝑦𝑦𝑡𝑡)2𝑛𝑛𝑡𝑡=1

𝑃𝑃 (1)

To consider the large inter-individual variability, this study used leave one

subject out cross validation, where data from one subject are used for testing and the

other subjects’ samples are used for training. In this way, samples of each subject are

used exactly once for testing. The predicted relative intensity for each subject were

combined and RMSE was derived from the complete set.


5.6 EXPERIMENTAL RESULTS AND DISCUSSION

5.6.1 Performance from Using a Single Modality

Figure 5.2 shows the regression performances for the single modality models. In

both linear regression and SVM regression, models based on RR features provided the

best performance. The RMSE of RR models were 1.98 and 1.99 in linear and SVM

regression, respectively. HR models also performed well compared to Eda and Temp

with RMSEs of 2.07 and 2.11 for linear and SVM regression respectively. Temp

features provided the worst performance with significantly higher RMSE values. Eda

provided the second worst performance.

Figure 5.2 Prediction performances of single modality models

5.6.2 Performance from Using Multiple Modality

Table 5.1 lists the RMSE values for models developed by fusing the features of

multiple modalities. For both regression algorithms, the combination of RR, Eda, and

Temp yielded the best performance, outperforming the best single modality (RR)

models. HR was not able to add further information beyond that provided by RR. For

example, addition of HR with the RR (HR+RR) did not exceed the performance of RR

only.


Table 5.1. Prediction performances of models developed from the combination of modalities

Linear

Regression SVM

Regression Eda+Temp 3.39 3.43

HR+Eda 2.10 2.12

HR+Temp 2.05 2.05

HR+RR 1.98 1.98

HR+Eda+Temp 1.94 1.95

HR+RR+Eda 1.87 1.88

HR+RR+Eda+Temp 1.87 1.88

RR+Temp 1.87 1.87

HR+RR+Temp 1.86 1.87

RR+Eda 1.85 1.85

RR+Eda+Temp 1.85 1.84


5.7 CONCLUSION

This study developed regression models from multimodal physiological data to

predict relative PA intensity. It used physiological sensor data collected from 22

individuals, when they were performing physical activities ranging from sedentary to

vigorous intensity. Borg’s RPE scale served as a ground truth measure of relative

intensity requiring no laboratory testing or predictions of HR max. Two regression

algorithms were applied on the features extracted from the physiological data to

identify the best single modality for relative intensity prediction. Then, we fused the

features and applied regression algorithms on all combinations of the features to

identify the best combination of modalities for relative PA intensity prediction. The

leave one subject out cross-validation results showed that RR features provided the

best prediction performance compared to other single modalities. The best prediction

combination of modalities was RR, Eda, and Temp. Both regression algorithms

performed similarly in all cases. The study identified that, for the prediction of relative

PA intensity, Eda and Temp are not good features by themselves, but they can provide

additional information and improve prediction performance when combined with RR

or HR.

(Additional paragraph – not included in the published paper)

The strength of the current study was the use of a number of regression

algorithms on the multimodal physiological data, collected in a non-laboratory

environment, to predict the relative intensity of the participants. To the best of our

knowledge, this is the first study that used multiple modalities of physiological data

for relative intensity prediction using machine learning algorithms. However, there

were some limitations of this study that warrant further investigations. For example,

the study predicted the raw RPE values as the predictor of relative intensity. But, in

the real-world application, it would be more beneficial if the relative intensity can be

categorised into low, moderate, high intensities based on the RPE. Also, the study did

not provide any analysis to know for which RPE range most misclassifications occur.

In our next chapter (Chapter 6), we performed further investigation to overcome the

above-mentioned limitations.

Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data 107

Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data


Dian Tjondronegoro2

Vinod Chandran1

Jinglan Zhang1

Stewart G. Trost3






Professor Stewart G. Trost, PhD

Institute of Health and Biomedical Innovation at QLD Centre for Children’s Health Research

Level 6, 62 Graham Street

South Brisbane, QLD 4101

Australia

Phone: +61 7 3069 7301

Fax: + 61 7 3138 3980


108 Chapter 6: Prediction of Relative Physical Activity Intensity Using Multimodal Sensing of Physiological Data



6.1 ABSTRACT

Purpose: To investigate the feasibility of a non-laboratory approach that uses

machine learning on the multimodal sensor data to effectively predict relative physical

activity (PA) intensity. Methods: A total of 22 participants completed up to 7 physical

activity sessions consisting of sitting and standing, comfortable walk, brisk walk,

jogging, and fast running activities. During each session, participants wore a Empatica

E4 wrist-watch and a Polar chest-strapped heart-rate monitor, which recorded heart-

rate (HR), r-r interval (RR), electrodermal activity (Eda) and body temperature

(Temp). After each activity participants provided ratings of perceived exertion (RPE)

using the 6-20 Borg’s scale. Along with the attribute data of the participant, a set of

features were extracted from each of the modalities (HR, RR, Eda and Temp). Using

leave-one-subject-out cross validation, three classifiers including random forest (RF),

neural network (NN) and support vector machine (SVM) were applied independently

on each of the feature set to predict 3-class relative PA intensities: low (RPE ≤ 11),

moderate (RPE between 12-14), and high (RPE ≥ 15). Then, both feature fusion and

decision fusion (posterior-adapted class-based decision fusion) of all combination of

sensor modalities were carried out to investigate the best combination. Results:

Among the single feature sets, RR provided the best overall performance for all three

classifiers. Decision fusion did not outperform the performance of RR features for any

combinations. But, when fused using feature fusion, SVM showed best performance

for RR+Eda with 3.4% improvement compared to RR only. Using NN and RF in

feature fusion, the best combination was RR+Temp and RR+Eda+Temp respectively.

Conclusion: Use of multiple modalities can enhance the relative intensity prediction

performance.

Keywords: Motion sensors, machine learning, pattern recognition, random

forest, bagging, boosted decision trees.


6.2 INTRODUCTION

Regular participation in physical activity (PA) is recognised as one of the most

important steps that people can take to improve their health [2]. Physical inactivity

significantly increases the risk of numerous chronic health conditions, including

cardiovascular disease, type-2 diabetes, cancers of the breast and colon, and depression

[5, 61, 122]. This extensive scientific evidence for the health benefits of PA has

prompted numerous medical and public health organisations to issue recommendations

or guidelines for participation in physical activity. For example, the World Health

Organisation recommends at least 150 min of moderate-intensity PA or at least 75 min

of vigorous intensity PA per week, accumulated in bouts of at least 10 min in duration

[15].

In recent years, due to the increasing use of wearable sensor technology,

accelerometer and heart rate-based objective PA monitoring has become popular

among researchers and consumers [155-157]. In contrast to self-report methods,

sensor-based approaches can be used to collect real-time responses from users

efficiently and unobtrusively, and can track the frequency, intensity, and duration of

physical activity [30, 93, 161-163]. Such features enable users to record, view and

share PA status with their health practitioners and peers. Because current PA

guidelines call for participation in moderate- and vigorous-intensity PA, it is important

that wearable sensor systems for monitoring PA behaviour provide accurate

determinations of PA intensity.

PA intensity can be defined in relative or absolute terms. Relative intensity is

generally expressed as a percentage of an individual’s maximal aerobic capacity (%

VO2 max, % HR reserve) or based on ratings of perceived exertion (RPE) [33, 39]. In

relative terms, moderate intensity physical activity is typically defined as 40% to 60%

of VO2 reserve or an RPE of 12 – 14 [37, 160]. Absolute intensity, on the other hand,

refers to the energy cost of a specific activity expressed as multiples of resting

metabolism or Metabolic Equivalents (METs), where 1 MET is assumed to be 3.5

ml.kg-1.min-1. In absolute terms, moderate physical activity is defined as 3 - 6 METs,

regardless of an individual’s aerobic capacity. Thus, in order to achieve moderate

intensity physical activity based on absolute intensity, individuals with a lower aerobic

capacity are required to work at a significantly higher relative intensity [39, 70]. For

example, “brisk” walking on level ground has an absolute intensity of 4 METs. For a


young healthy person with a maximal aerobic capacity of 10 METs, the relative

intensity is 40% of maximal capacity; whereas for an individual with a chronic health

condition with a maximal aerobic capacity of 6 METs, the relative intensity is 67%.

Conversely, in relative intensity terms, low fit individuals working at an absolute

intensity of < 3 METs may be judged as participating in moderate intensity PA, if the

work rate exceeds 40% of maximal aerobic capacity.

To date research efforts to quantify PA intensity from wearable sensors have

predominantly been based on absolute intensity [39, 70, 158, 163]. Because such

estimates do not consider an individual’s aerobic fitness, age or health status, the

intensity of PA could be above accepted relative intensity thresholds for moderate-to-

vigorous PA (MVPA), but below the established 3 MET absolute intensity threshold

for MVPA [13, 69]. As such, a significant proportion of individuals with limited

aerobic capacity are erroneously misclassified as not meeting PA guidelines.

Moreover, m-Health platforms using wearable sensors systems to monitor the absolute

intensity of PA could be encouraging individuals to exercise at relative intensities that

are neither safe nor effective [12]. Thus, the development of validated algorithms to

predict relative PA intensity from wearable sensor data constitutes an important

research priority.

Because of the linear relationship between HR and work rate during steady state

exercise, HR based indices such as percentage of maximal HR (%HR max) and

percentage of HR reserve (%HRR) are widely used metrics for quantifying the relative

intensity of PA [39, 71, 72]. However, there are number of drawbacks to using these

metrics. First, the standard error of estimate associated with commonly used age-based

prediction equations for HRmax (220 – age, 208 – (0.7 x age)) range from 10 to 12 bpm

and therefore do not provide accurate predictions of individual HRmax [40, 41]. Second,

to account for inter-individual differences in aerobic capacity and HRmax, it is

necessary to personalise the relationship between HR and work rate via individual

calibration in the laboratory, which is time intensive, requires expensive

instrumentation, and is not feasible in large field-based studies [22, 72].

Along with HR, other modalities of physiological data, including electrodermal

activity (Eda) and body temperature (Temp) can be easily measured via wearable

sensors. These physiological indicators can provide valuable information about the

metabolic demand of exercise, and can also be used to predict relative PA intensity.


However, to the best of our knowledge, the use of multiple modalities of physiological

data for relative intensity prediction has not been previously investigated.

An alternative approach to measuring relative intensity that does not require

instrumentation or individual calibration in the laboratory, is the use of effort

perception or ratings of perceived exertion (RPE). Effort perception scales such as the

Borg alpha-numeric RPE Category Scale are commonly used in exercise testing and

prescription contexts and have been shown to a valid and reliable indicator of relative

PA intensity [35-38]. For example, in a meta-analysis, Chen, et al. [37] reported

weighted mean validity coefficient of 0.62 between RPE and HR, and 0.64 between

RPE and %VO2max across different studies. In a more recent study, Scherr, et al. [36]

conducted incremental exercise tests in treadmills or cycle ergometers on a very large

population of 2560 Caucasian men and women. They found strong correlation of RPE

with heart rate (r = 0.74, p < 0.001) which was not significantly affected by

participant’s gender, age, coronary artery disease, physical activity status and exercise

testing modality (all p < 0.05). Yet, despite the widespread use of RPE for effort

estimation, the utility of algorithms to predict relative PA intensity based on RPE has

not been explored. If features in the signals from multiple physiological sensors can

be trained to predict relative PA intensity based on effort perception, then PA intensity

predictions can be more personalised, and users can track/view their PA sessions and,

exercise at an intensity that is safe and effective.

This study investigated the feasibility of a non-laboratory approach that uses

machine learning on features in multimodal sensor data to effectively predict relative

PA intensity. A non-laboratory dataset, collected from 22 people was utilised, where

Borg’s RPE scale was used as a ground truth measure of relative intensity. The features

extracted from multimodal physiological data were applied to three state-of-the-art

machine learning algorithms (including support vector machine, random forest, and

neural network) to predict 3 classes of relative PA intensity. The study 1) identified

the best set of features from a single modality for predicting relative PA intensity, 2)

explored both feature fusion and decision fusion to combine different sensor

modalities to improve relative PA intensity prediction performance, and 3) identified

the best combination of modality features for the predicting relative PA intensity.


6.3 METHODS

6.3.1 Participants

Twenty-two adults (mean age = 29.8 ± 3.2 yrs; BMI = 25.3 ± 2.6; male = 77.3%)

participated in this study. Inclusion criteria included: 1) age between 18 to 40 years,

2) complete and pass Physical Activity Readiness Questionnaire (PAR-Q+), and 3) no

hospitalisations within the last six months. Prior to participating in the study, each

participant provided written informed consent. The data collection protocol was

approved by the Office of Research Ethics and Integrity of the Queensland University

of Technology (Ethics number: 1500000962).

6.3.2 Protocol

Participants completed up to 7 weekly physical activity sessions in the park (91%

of all participants participated 5+ sessions). Each session comprised five structured

physical activity trials ranging from sedentary to high intensity. The activity trials were

quiet sitting and standing (5 mins), comfortable walk (5 mins), brisk walk (5 mins),

jogging (3 mins), and fast running (2 mins). The intensity of walking, jogging and fast

running trials was self-selected. Sufficient recovery time was provided between each

activity trial. The resting time after each of the activity trials were 5 mins, 15 mins, 15

mins, 17 mins, and 18 mins respectively. Thus, each session usually lasted

approximately 100 mins. To ensure a common outdoor environment, all sessions were

performed in the afternoon. During the trials, a researcher accompanied the

participants and provided verbal feedback (if required) to assist participants with

motivation and to ensure even pacing during the trials.

6.3.3 Data Acquisition

Participant attributes. Before each session, participants provided basic profile

information such as age, sex, height, and weight. Habitual physical activity level was

measured using the Active Australian Survey [168]. Responses to the survey were used

to estimate total activity time, total number of activity sessions, and physical activity

status (sedentary/ insufficiently active/ sufficiently active).

Sensor data. During each session, participants wore an Empatica E4 wrist-watch

(Boston, US), a Polar H7 chest strap heart-rate monitor, and a mobile phone. The

Empatica E4 was worn on the non-dominant wrist, and captured electrodermal activity

(Eda) and body temperature (Temp). The sampling rate for Eda and Temp data were 4


Hz. The Polar heart-rate monitor recorded heart-rate (HR) at a sampling rate of 1 Hz

and the RR-interval data.

Annotation. Relative intensity was measured using the Borg Rating of Perceived

Exertion (RPE) scale [35, 36, 74]. The scale was presented and explained to the

participants before performing the session. Immediately after each trial, participant

rated their perceived exertion by pointing to the printed Borg’s RPE scale. A high

degree of reliability was found between RPE measurement over the 7-week study

period. The single measure ICC was 0.92 with a 95% confidence interval from 0.89 to

0.95. Borg RPE values were categorised to 3 relative intensity classes corresponding

to low (6 -11), moderate (12 -14) and high (15 -20).

6.3.4 Relative Intensity Prediction System

A relative PA intensity prediction system was designed by applying machine

learning algorithms to predict low, moderate, and high relative intensity from features

in the raw sensor data. The overall framework of the system consists of five steps: pre-

processing, feature extraction, normalisation & feature selection, classification, fusion,

and evaluation.

Pre-processing. HR, Eda, and Temp data were annotated with the relative

intensity classes and transformed into time-series data structure. To remove motion

artefacts in the physiological data, a moving average filter with a span of 5 was applied

on HR, Eda and Temp data. In addition, this study empirically discarded 10 s of data

at the beginning and end of each activity trial to remove non-steady-state data. Any

intermediate missing values were replaced by linear interpolation; and missing values

at the end of each activity were replaced by the previous value.

Feature Extraction. A number of time and frequency domain features were

extracted from each sensor modality. Because participants reported RPE at the end of

each activity trial, the window size for feature extraction was set equal to the duration

of the activity trial. The extracted features from the sensor modalities are given in

Table 6.1.


Table 6.1 Feature set extracted from each sensor modality

1. HR feature set Time domain features: mean, variance, standard deviation,

skewness, kurtosis, median, numerical gradient, on and off

response, the number of times HR increased normalised for

window size, and the number of times HR decreased

normalised for window size.

2. R-R interval

feature set

Time domain features: mean, variance, standard deviation,

skewness, kurtosis, median, standard deviation of successive

differences between adjoining normal cycles (SDSD), Square

root of the mean squared difference of successive RR-intervals

(rMSSD), Number of pairs of successive RR-intervals that

differ by more than 20 ms/length (pNN20), Number of pairs

of successive RR-intervals that differ by more than 50

ms/length (pNN50).

Frequency features: spectral energy density (aVLF, aLF, aHF),

relative power (pVLF, pLF, pHF), and normalised power

(nLF, nHF) of very low frequency (0 - 0.04 Hz), low frequency

(0.04 – 0.15 Hz), and high frequency (0.15 – 0.40 Hz)

components, total spectral energy density (aTotal), and ratio

between LF and HF band energy (LF/HF).

3. Eda feature

set


skewness, kurtosis, and median

4. Temp feature

set


skewness, kurtosis, and median

Each of these feature sets were combined with participant attribute data or

person level features. Person level features included height, weight, age, BMI, gender,

total PA time, weekly PA sessions, and PA status.

Normalisation and Feature Selection. In order to limit features to a common

range, linear methods were used to normalise each feature to a zero mean and unit

variance. Because some features can be redundant and provide irrelevant information


which can undesirably affect performance, a minimum-redundancy–maximum-

relevance feature selection (MRMR) method was applied [165] on the complete

dataset. As minimum redundancy criteria, this method used minimum mutual

information between features; and for maximum relevance criteria, it used the maximal

mutual information between the classes and feature. This approach resulted in the

selection of only the best 10 features as inputs to the classifiers.

Classification Algorithms. Three state-of-the-art machine learning algorithms,

which are prominently used in physical activity domain, including support vector

machine (SVM), random forest (RF), and neural network (NN) were utilised. In our

implementation, two-class SVM was adapted in a fashion that firstly classified one

class against all other classes and then classified another class versus the remaining

classes and so on [169]. A radial basis function kernel function was chosen for use in

the SVM classifier. RF was implemented using the “Treebagger” classification tool

within Matlab (2017a, The MathWorks Inc., USA). The number of decision trees in

the RF classifier were empirically set to 100 because it provided optimum performance

compared to 50 and 150. For the NN, number of input, hidden and output neurons were

10, 7, and 3 respectively. The maximum epoch and learning rate were set to 250 and

0.001 respectively.

Fusion to Combine Multiple Modalities. In order to find best feature combination

for relative intensity prediction, both feature-level and decision-level fusion were

carried out. In feature-level fusion, all combinations of the four sensor feature sets

were merged together. Then feature selection and classification algorithms were

applied on the merged feature set. In the decision-level fusion, each of the sensor

feature sets made independent decision, which then combined using a fusion

algorithm. As a decision-level fusion algorithm, our previously reported posterior-

adapted class-based decision fusion [161] was used. This algorithm applies a classifier

on the training data of each modality separately to get understanding on the model’s

prediction performance across classes. Then, for each modality it assigns weights to

the classes (class-based weights) based on the performance on the training data. During

testing a new instance, a decision with posterior probability is made independently

from each feature set using the classifier. Then, the class-based weights for each test

instance is adjusted using the posterior probability (posterior-adapted class-based


weights). Finally, in the fusion step, the class with the highest posterior-adapted class-

based weight is selected as the final predicted class.

Performance Evaluation. Performance was evaluated using leave-one-subject-

out (LOSO) cross-validation [137, 169]. In LOSO, data from one user are used for

testing, the other users" samples are used for training. In this way, samples of each

subject are used exactly once for testing. This study used F1 score [138] to measure

the performance of the ensemble learning methods. The study favoured F1 score over

classification accuracy because unlike accuracy or percentage of agreement, it is not

influenced by class distribution. The F1 score was computed from precision and recall

by keeping a balance between them.

𝐹𝐹1 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 2 𝑋𝑋 𝑝𝑝𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓 𝑋𝑋 𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑓𝑓𝑓𝑓𝑝𝑝𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓 + 𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑓𝑓𝑓𝑓

𝑋𝑋 100% (1)

Where precision describes the exactness of a classifier. A lower value of

precision indicates a high false-positive rate.

𝑝𝑝𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓𝑠𝑠𝑓𝑓 = 𝑡𝑡𝑠𝑠𝑡𝑡𝑠𝑠 𝑝𝑝𝑠𝑠𝑠𝑠𝑓𝑓𝑡𝑡𝑓𝑓𝑝𝑝𝑠𝑠𝑠𝑠

𝑡𝑡𝑠𝑠𝑡𝑡𝑠𝑠 𝑝𝑝𝑠𝑠𝑠𝑠𝑓𝑓𝑡𝑡𝑓𝑓𝑝𝑝𝑠𝑠𝑠𝑠 + 𝑓𝑓𝑓𝑓𝑓𝑓𝑠𝑠𝑠𝑠 𝑝𝑝𝑠𝑠𝑠𝑠𝑓𝑓𝑡𝑡𝑓𝑓𝑝𝑝𝑠𝑠𝑠𝑠 𝑋𝑋 100% (2)

Recall or sensitivity is useful to measure the completeness of classifiers. Low

recall indicates a high false-negative rate.

𝑠𝑠𝑠𝑠𝑠𝑠𝑓𝑓𝑓𝑓𝑓𝑓 = 𝑡𝑡𝑠𝑠𝑡𝑡𝑠𝑠 𝑝𝑝𝑠𝑠𝑠𝑠𝑓𝑓𝑡𝑡𝑓𝑓𝑝𝑝𝑠𝑠𝑠𝑠

𝑡𝑡𝑠𝑠𝑡𝑡𝑠𝑠 𝑝𝑝𝑠𝑠𝑠𝑠𝑓𝑓𝑡𝑡𝑓𝑓𝑝𝑝𝑠𝑠𝑠𝑠 + 𝑓𝑓𝑓𝑓𝑓𝑓𝑠𝑠𝑠𝑠 𝑓𝑓𝑠𝑠𝑛𝑛𝑓𝑓𝑡𝑡𝑓𝑓𝑝𝑝𝑠𝑠 𝑋𝑋 100% (3)


6.4 RESULTS

6.4.1 Relative Intensity Classification from a Single Modality

Table 6.2 reports F1 scores for the four sensor modalities. All three classifiers

showed a similar pattern of results. RR features provided the highest classification

accuracy across all three classifiers, with the NN classifier achieving the highest

accuracy (86.7%). Performance was consistently low for the classifiers trained on Eda

or Temp features. HR features also provided good F1 scores and outperformed

classifiers trained Temp and Eda.

Table 6.2 F1-scores of five different modalities using three classifiers

SVM RF NN

Feature(s) F1 Score % Feature(s) F1 Score % Feature(s) F1 Score %

Eda 59.5 Eda 58.0 Eda 60.3

Temp 61.6 Temp 59.0 Temp 61.6

HR 78.7 HR 80.8 HR 79.2

RR 83.4 RR 85.2 RR 86.7

6.4.2 Feature Fusion Results

Figure 6.1 reports feature fusion results (F1 scores) for all possible combinations

of the four different modalities using three classifiers. Eda and Temp features showed

effectiveness when combined with other modalities. The best performance observed

for SVM was RR+Eda (86.8%) which had 3.4% improvement compared to RR only.

Using RF and NN classifier, the best combinations were RR+Eda+Temp (86.3%) and

RR+Temp (87.2%) respectively. In most cases, adding Eda and/or Temp with the RR

feature improved the classification performance.


Figure 6.1 F1 Scores for all combinations of modalities using feature fusion;

Note: results of single modalities are also given as base-lines

The confusion matrix for the best performing combinations are shown in Figure

6.2. In all cases, classifiers correctly classified most of the low and high relative

intensity activity trials and a good number of the activity trials with moderate relative

intensity. For example, when the fusion of RR and Temp features served as inputs to

the NN classifier, 96% (362 out of 379) of low-intensity trials, 62% (74 out of the 121)


activity trials of moderate intensity trials, and 87% (100 out of 115) of the high

intensity trials were classified correctly. From the moderate intensity trial, 23% was

misclassified as low intensity and 15% as high intensity.

Figure 6.2 The confusion matrix for the best combinations in each classifier

6.4.3 Decision Fusion Results

Figure 6.3 shows the decision fusion results (F1 scores) of all combinations of

modalities using three classifiers. However, none of the combinations in decision

fusion were able to exceed the performance of RR. Addition of Eda and Temp with

the HR or RR reduced the performance. For example, in SVM, HR features alone

provided an F1 score of 78.7%, but when fused with the decisions from the Eda and

Temp feature models, performance was reduced to 62.9%.


Figure 6.3 Scores for all combinations of modalities using decision fusion;

Note: results of single modalities are also given as base-lines


6.4.4 Statistical Comparison

The performance of the different modalities and their combinations were tested

for statistical significance using one-way repeated measures ANOVA. The F1-scores

(using feature fusion) for all folds/users for all classifiers were merged together to

increase the statistical power and enhance the generalisability of the findings.

In overall, mean F1-scores differed significantly (Wilks’ Lambda = 0.255, F (13,

50) = 11.264, p < .0001) between the prediction performance different modalities

(single and combinations). LSD post hoc comparisons revealed that the both HR and

RR features/modality can provide statistically better performance than Eda and Temp.

RR showed significantly better performance than HR. Among the combination

modalities, only RR+Eda provided statistically improved performance over the RR.

RR+Temp and RR+Eda+Temp showed similar performance to the RR.


6.5 DISCUSSION

This study systematically investigated the use of machine learning algorithms

trained on features in multimodal physiological data to classify relative PA intensity.

Across all classifiers (SVM, RF and NN), the features extracted from RR interval data

provided the best performance (statistically) compared to features extracted from

heart-rate, Eda, and Temp. HR features demonstrated the second highest classification

performance, while Eda performed least. Two fusion techniques (feature- and

decision-fusion) were examined to identify the effective combination of the

physiological data. Among them, feature fusion showed improvement in performance

when adding Eda and/or Temp feature sets to the RR feature set. SVM showed best

performance for the fusion of features in RR and Eda, with a 3.4% improvement

compared to RR only. Using NN and RF in feature fusion, the best combination was

RR+Temp and RR+Eda+Temp respectively.

Our results are consistent with previous studies demonstrating that the features

derived from heart-rate data provide better prediction of relative PA intensity than

other modalities (Eda and Temp). Most previous studies have used heart-rate data as a

method for assessing relative PA intensity [39, 71, 72]. RR represents the beat-to-beat

fluctuations of the heart-rate, which can reveal the state of user’s autonomic nervous

system. The complex time and frequency- domain features, extracted from RR data

showed better performance than HR. Eda and body temperature are also linked to PA

intensity, for example, Eda is affected by sweat due to physical exertion, and

psychological stress [18]. However, our results found that these modalities (Eda, and

Temp) alone cannot provide satisfactory relative PA intensity classification accuracy.

Although some studies were conducted on the relative PA intensity

measurement, no previous studies have investigated the use of machine-learning

algorithms for automated recognition of relative PA intensity based on RPE. Among

the three state-of-the-art machine learning algorithms (SVM, RF, and NN), NN

provided slightly better accuracy on single modality than others, except for HR where

RF did best. However, there were no significant variability among the classifiers’

results, i.e., they yield equivalent results on the single modalities.

The combinations of sensor modalities showed effectiveness for some cases

using the feature fusion algorithms. The feature fusion effectively used the best

features from all included modalities during the classification, which led to improved


performance. For example, MRMR feature selection selected the top 10 features from

the fused feature set, and then the classifier was applied on the best features. Our results

showed that the addition of Eda and/or Temp features with the RR features can provide

increased classification performance. This indicates that features from these modalities

can provide additional information to that provided by RR features.

When the combinations of sensor modalities were investigated using the

decision fusion, the combinations showed considerably inferior performance. Unlike

feature fusion, in decision fusion each modality made separate decision on the relative

intensity and then a posterior-adapted class-based weighted algorithm combined those

decisions. In our earlier research, posterior-adapted class-based algorithm showed

improved performance for PA recognition when combining the accelerometer data

obtained from multiple body locations (ankle, chest, and wrist) [161]. Because, for

PA recognition, each of these locations had comparable but complementary

performance, the individual decisions from each model performed better when

combined using posterior-adapted class-based weighted algorithm. But, in this study,

decision fusion showed poor performance because Eda and Temp did not provide

satisfactory performance in their own right. Thus, combining the decision of two

relatively poorly performing prediction models led to further decrements in overall

performance.

In the confusion matrix, it can be seen that most of the low and high relative

intensity samples were classified correctly. However, classification models showed

relatively higher misclassification for moderate relative intensity categories. As this

study used leave-one-subject-out cross-validation, the inter-person differences in the

moderate relative intensity zone played a vital role for these misclassifications. From

the application point of view, it may be of a less concern to misclassify moderate

intensity as vigorous (and vice versa) as they both are counted toward meeting PA

guideline. But, the misclassification between moderate and low intensity is

problematic as it can provide incorrect information on whether a person meets PA

guideline. Further research can be carried out to improve the model’s performance or

reduce misclassification between low and moderate intensity, such as normalising the

physiological data before feeding into the classifier.

A strength of the current study was the use of machine learning on the

multimodal physiological data, collected in a non-laboratory environment, to predict


the relative intensity of the participants. The examination of each of these modalities

independently or combined (using fusion), were among the main strengths. The use of

three different “state-of-the-art” classification algorithms was an additional strength.

There were, however, some limitations that warrant consideration. First, although

activity trials were self-paced and completed at a user-specified intensity, the data were

collected in predetermined sequences with known duration. Thus, additional work is

required to evaluate the performance of the proposed methods in true free-living

contexts. Second, only few lifestyle activities were included in training data. Future

studies should include a more diverse set of lifestyles activities ranging from sedentary

to vigorous. Third, although some works found normalising physiological data can

reduce inter-individual differences and improve the accuracy of EE or PA recognition

[20], our experiment was unable to normalise the physiological features due to lack of

resting data. Fourth, our study did not utilise the accelerometer data and only focused

on the use of multimodal physiological data for relative intensity prediction. The

accelerometer can add the information on external workload to the model. In future, a

study can be carried out to investigate the utility of the combination of accelerometer

and physiological data for relative intensity prediction. Fifth and finally, in this study,

feature selection algorithm used only top 10 features for the classification task. In

future, the number of selected features can be varied and compared the model across

the different number of selected features.

In summary, the results demonstrate that relative PA intensity predication can

be performed by using machine learning on multimodal physiological data. Of the

different modes of physiological data examined, features extracted from RR data

provided best performance. Although the non-heart rate features including Eda and

Temp cannot provide satisfactory result on their own, they can improve the

performance when combined with the RR features. Thus, this research informs the best

single modality and features, and best combination of modality for predicting relative

PA intensity from wearable sensors using machine learning.


6.6 ACKNOWLEDGEMENTS

No funding was received for completion of this project. Trost is a member of the

ActiGraph Scientific Advisory Board. Chowdhury, Tjondronegoro, Chandran, and

Zhang declare no conflict of interest. The results from the present study do not

constitute endorsement by the American College of Sports Medicine.

Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children 129

Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children


Dian Tjondronegoro2

Jinglan Zhang1

Markus Hagenbuchner3

Dylan Cliff3

Stewart G. Trost4



3. University of Wollongong, Australia.




Alok Kumar Chowdhury

Science and Engineering Faculty,

Queensland University of Technology,

Brisbane, Australia

Phone: +61 420 467 077


130 Chapter 7: Deep Learning for Energy Expenditure Estimation in Pre-School Children

ss



7.1 ABSTRACT

Accurate monitoring of physical activity and its respective energy expenditure

is necessary in studies that aim to quantify, understand, and promote physical activity

in preschool children. This paper proposes the use of deep learning to effectively

predict energy expenditure from body worn accelerometers. During the data collection,

eight participants performed ten simulated free-living activities ranging in intensity

from sedentary to vigorous. Participants wore accelerometers on both wrists and the

right hip, along with a portable metabolic system for direct measurement of energy

expenditure. The analysis uses Convolutional Neural Networks to perform deep

learning regression on each accelerometer configuration - singular and combined. The

performance is benchmarked against a set of conventional supervised machine

learning and simplified regression models. Based on a leave-one-subject-out cross-

validation and one-way repeated measures ANOVA, the results show that deep

learning can achieve a comparable performance to the best conventional supervised

learning algorithms, and significantly outperformed the simplified regression

approaches.


7.2 INTRODUCTION

Physical activity (PA) during the early childhood years has an influential role on

current and future development [170]. Regular PA provides children with a range of

important health benefits, including healthy weight, improved bone health,

cardiovascular fitness, and enhanced cognitive, emotional and psychosocial

development [170, 171]. Based on this evidence, government agencies and global

health organizations have issued PA guidelines recommending that preschoolers be

physically active for at least 180 min per day [64, 172, 173]. Accurate sensor-based

measurement of PA and energy expenditure (EE) is therefore needed to monitor

compliance with this guideline and develop e-Health monitoring and intervention

applications to increase PA and reduce sedentary behaviour in low-active preschool

children.

Due to their small size and low cost, accelerometer-based wearable sensors have

emerged as a popular method for measuring PA in preschoolers [114, 174]. However,

in most applications, the wealth of data generated from these devices has not been

thoroughly utilized with prediction of EE based on simple linear regression. Such

usages have shown poor performance for sedentary and non-ambulatory activities

[43].

In recent years, the conventional use of supervised machine learning for EE

prediction has emerged as a viable and more accurate alternative to simple linear

regression [44, 45]. The conventional machine learning approach involves manual

extraction of features, feature selection, and applying regression algorithms such as

artificial neural networks (ANN) [46-48], ensemble decision trees [49], support vector

machine [50]. The performance of regression depends on the quality and number of

features, which requires domain knowledge for feature extraction and sophisticated

feature selection algorithms. Often, the same extracted features do not perform equally

well in different studies [51].

Deep learning models are now gaining popularity and are widely used in other

domains such as computer vision and image processing [52]. A deep learning

framework is usually a collection of multiple neural network layers, where each layer

automatically extracts the hidden representation i.e. features from the input [175].

Deep learning eliminates the need of manual feature extraction and selection steps and

can be applied on the raw data directly. In general, the deep learning models are


computationally expensive to train; however, the testing time is small and has the

potential for use in real-time, field-based applications [176]. A long training time can

be justified as the models are usually trained offline and can be trained using resources

in the cloud [42].

In a recent study, Zhu, et al. [42] showed that deep learning/CNN models

improved EE prediction performance compared to activity specific models. However,

they did not compare performance across different accelerometer configurations.

Furthermore, their work was carried out only in adults. Due to developmental,

biomechanical, and behavioral factors, such as differences in motor proficiency,

resting metabolic rate, energy cost of locomotion, and PA types and patterns, models

developed in adults are not generalizable to preschool-aged children [53-55].

To our knowledge, this is the first study to employ deep learning algorithms to

predict EE from accelerometer data in pre-school children. Tri-axial accelerometer

signal from the left-wrist (LW), right-wrist (RW), and right-hip (RH) was collected

from eight preschool-aged children performing a range of developmentally appropriate

activities. EE was measured using a portable indirect calorimeter. Then, three EE

modelling approaches, including deep learning, conventional supervised machine

learning, and simple regression were carried out for each accelerometer location and

the combination of the right wrist and right hip.


7.3 DATA COLLECTION AND PRE-PROCESSING

7.3.1 Data Collection

Eight preschool children (age: 5.2 ± 0.8 yrs, weight: 20.9 ± 1.2 kg, height: 117.8

± 5.5 cm, male: 25%) performed a series of 10 simulated free-living activity trials. The

list of activities and description is given in Table 7.1.

Table 7.1 List of Activities and Their Description

Activity Trial Description Lying down Lying comfortable on the floor with a pillow Story time Sit on the floor on a cushion and listen to a story-book

Watching movie Sit on the floor on a cushion and watch a video on a tablet

Table game (free play) Sit in a chair at a table completing a developmentally appropriate puzzle activity

Whiteboard Draw on a whiteboard to create a picture while standing

Treasure hunt Walk through the activity room (20m x 10m) and search for and collect hidden toys

Pack away Collect toys and equipment and return them to the appropriate boxes or location in the activity room

Dance Watch a dance video and mirror the movements of the characters on the video

Clean up your backyard (bean bag game)

Keep playing area (4m x 3m) “clean” by throwing all bean-bags onto the instructors playing area. The instructor will do the same. Game ends when playing area is clean. Instructor will increase/decrease difficulty based on child’s ability

Captain’s coming Child stands in the centre the of activity room. Instructor calls out commands involving running, jumping, hopping, crawling

Each trial was 5 mins in length, and all trials were conducted in the research

facilities available at the University of Wollongong. Further details of the data

collection can be found in [55, 177].

During each trial, children wore ActiGraph tri-axial accelerometers (ActiGraph

Corporation, FL, USA) on three body locations (left wrist, right wrist, and right hip)

and a Metamax 3B portable calorimetry system to measure energy expenditure. The

portable calorimetry was calibrated according to the manufacturer’s instructions. The

sampling rate and dynamic range of accelerometer data were 100 Hz and +/- 8G

respectively. Portable calorimetry recorded breath-by-breath oxygen uptake (VO2).


7.3.2 Pre-Processing

Both accelerometer and reference VO2 data were converted into time-series

data structure. Using the timestamp, data for each trial were separated from each other.

In addition, 60s of data at the beginning and end of each trial were discarded from

analysis to remove non-steady-state data [18, 161, 162].

To ensure the validity of the portable calorimetry data, a screening algorithm

was applied to identify biologically implausible VO2 values (i.e. below the typical

resting value) and replaced those entries with the nearest valid entries. Then, the

breath-by-breath VO2 data was resampled to 100Hz using 1D interpolation with spline

method [18]. Finally, VO2 was converted into units of EE (kcal/min) using the

constant 1 L O2 = 4.825 kcal [178].

In order to get a reasonable window-size and 1024 samples of accelerometer data

in a window, this work applied 50% overlapping sliding window of size 10.24 seconds.

The selected window size is similar to the previous studies [47, 103, 179]. This

approached ensured that a sufficient number of samples were available for training the

algorithms. The average energy expenditure was calculated for each window as

corresponding ground truth energy expenditure. In our dataset, the average energy

expenditure value was between 0.65 to 5.78 (kcal/min).

After the pre-processing step, a total of 2472 windows were available for

training. The average number of windows per child was 309 ± 22.7. As each window

consists of 1024 samples for each of raw 3-axis accelerometer data, the total size for

training data was 2472 x 1024 x 3 = 7,593,984.

7.4 METHODS

EE was predicted using three different modelling approaches. The first approach

used deep learning on the raw accelerometer signal. The second approach used

conventional machine learning methods, including feature extraction, feature

selection, and regression analysis using established supervised learning algorithms.

The third method represented a simplified approach in which the mean acceleration

signal for each 10.24 second window was regressed on measured EE using least

squares regression. The three approaches were applied to four accelerometer

configurations (LW, RH, RW, and RW+RH).


7.4.1 Deep Learning Approach

The deep learning approach used in this study was Convolutional Neural

Networks (CNN), as the architecture is suitable to capture the dimensions of

accelerometer data. To ensure that the data is formatted to suit the CNN, a data

transformation process was applied prior to training the CNN architecture to

automatically extract features, followed by applying regression to estimate energy

expenditure.

1) Data Transformation: In this step, each window of the raw accelerometer

data was converted to a 3-dimensional matrix to make that suitable for CNN’s input.

Based on the number of accelerometer used, the data transformation techniques were

slightly different.

Single accelerometer – For a window (consists of 1024 3-axis accelerometer

samples), each of 3-axis accelerometer data (x, or y, or z) was rearranged to 32 x 32

matrix. After that, 32 x 32 matrix for each of x, y, and z are combined to form a 32 x

32 x 3 vector, shown in Figure 7.1 (a).

Combination of two accelerometers – Each corresponding two windows of two

accelerometers were merged vertically (2048 3-axis accelerometer samples). Then,

2025 samples were selected for the transformation, by removing 12 samples from the

beginning and 11 samples from the end. Finally, the samples were rearranged to 45 x

45 x 3 vector. The transformation of combined windows is shown in Figure 7.1(b).


(a)

(b)

Figure 7.1 The transformed representation of a) a single 3-axis accelerometer window, b) a combination of two accelerometer’s windows

2) CNN Architecture: CNN involves multiple processing layers comprising

linear and non-linear transformation. Several CNN architectures were implemented to

experimentally select the best performance. Based on the results, a 3-stage CNN

architecture modified from [180] was used in this study. The complete CNN

architecture is shown in Figure 2. It has an input layer, 3 convolution layers of rectified

linear units i.e. ReluLayer, max-pooling layers, a fully connected hidden layer, and a

regression layer. For the experiments, the number of epoch and initial learn rate were

set to 100 and 0.001 respectively to avoid overfitting of the model. A stochastic

gradient descent with momentum solver was used to train the model.


InputMaxPooling

Layer: 2x2 at stride 2

Conv Layer: 20 5x5 filters at stride 1, pad 2 +

ReluLayer

MaxPooling Layer:

2x2 at stride 2Full

connectionRegression


ReluLayer


ReluLayer

MaxPooling Layer:

2x2 at stride 2

Layer Description

Feature Map Size

[32 x 32 x 3] [32 x 32 x 16] [16 * 16 * 16] [16 * 16 * 20] [8 * 8 * 20] 1 1[8 * 8 * 20] [4 *4 * 20]

Prediction of EEBetween 0.65 to 5.78 (kcal/min)

Figure 7.2 The CNN architecture used in this study.

The figure depicts the use of single accelerometer i.e., input vector 32 x 32 x 3

Convolution Layer – In order to identify the temporal internal pattern of the input

matrix, a number of linear filters are applied. Filters slide over the input spatially and

extract the local correlation using the convolution operation with the input matrix. A

filter is just a matrix of weights and bias, which is usually spatially small. The size of

the feature map (width = w’, height = h’), after applying the convolutional layer,

depends on the size of the input vector (width = w, height = h, depth = d), size of filter

(width = f, height = f, depth = d), number of zero-padding (p), and size of stride (s).

The formula is given below:

𝑤𝑤′ = (𝑤𝑤 − 𝑓𝑓 + 2𝑝𝑝) 𝑆𝑆⁄ + 1 (2)

ℎ′ = (ℎ − 𝑓𝑓 + 2𝑝𝑝) 𝑆𝑆⁄ + 1 (3)

For example, for our first convolutional layer, input size was 32 x 32 x 3 (or 45 x

45 x 3 for the combination), and the filter size was set to 5 x 5 x 3. The number of zero

padding and stride were 2 and 1 respectively. As a result, feature map’s width and

height remained same as input 32 x 32 (or 45 x 45 for combination of two

accelerometers).

Rectified Linear Units Layer (ReluLayer) – After each convolutional layer, a

ReluLayer is used to introduce nonlinearity on top of the linear/convolutional

operation. ReluLayers work better than other nonlinear function including tanh and

sigmoid, due to its computational efficiency [181]. A ReluLayer does not change the

size of the input, it simply performs a threshold operation to each element, where any

negative input value is set to zero.

𝑓𝑓(𝑥𝑥) = �𝑥𝑥,0 𝑥𝑥 ≥ 0

𝑥𝑥 < 0 (4)


Max Pooling Layer – The max pooling layer runs on every depth of the input

data and down-samples the input using a max operation. For an input of size (width =

w, height = h, depth = d), the pooling layer (width = f, height = f, and stride = s) reduces

its size to w’ x h’ x d, where

𝑤𝑤′ = (𝑤𝑤 − 𝑓𝑓) 𝑠𝑠⁄ + 1 (5)

ℎ′ = (ℎ − 𝑓𝑓) 𝑠𝑠⁄ + 1 (6)

Fully Connected Hidden Layer and Regression Layer – Finally, a MLP based

fully connected layer followed by a regression layer are used on the extracted features.

In our case, the output size of fully connected layer was set equal to the number of

response variable, which was 1.

7.4.2 Conventional Supervised Learning Approach

Unlike the CNN learning technique, the conventional machine learning approach

consists of feature extraction, feature selection and regression analysis using

established supervised learning algorithms. To compare with the deep learning

approach, total 27 conventional models (shown in Figure 7.3) were developed for each

accelerometer configuration, by varying feature selection algorithms, number of

selected features, and supervised learning algorithms. The best model for each

accelerometer configuration was finally selected during the comparison.


Figure 7.3. Conventional approach design for each accelerometer location and a combination

1) Feature Extraction: Feature extraction utilised domain knowledge and

extracted 46 time and frequency domain features from each window of accelerometer

data. The most extracted features are adopted from our previous studies [161, 162], and

are listed in the Table 7.2.


Table 7.2 List of features extracted from each window of an accelerometer

No Features Feature Count

1 Vector magnitude of the accelerometer 1

2 Mean for each axis of a 3-axis accelerometer 3

3 Standard deviation for each axis of accelerometer 3

4 Minimum value for each axis 3

5 Maximum value for each axis 3

6 Variance for each axis 3

7 Median value for each axis 3

8 Skewness for each axis 3

9 Kurtosis for each axis 3

10 Energy for each axis 3

11 Cross-correlation of accelerometer axis 3

12 Principal frequency for each axis 3

13 Magnitude of principal frequency for each axis 3

14 Median crossing for each axis 3

15 25th percentile for each axis 3

16 75th percentile for each axis 3

Total number of features extracted 46

2) Feature Selection: Three feature selection algorithms, including minimum-

redundancy–maximum-relevance feature selection (MRMR) [165], correlation-based

feature selection (CFS) [133], and ReliefF feature selection [182, 183], were carried

out on the extracted features. When two accelerometers were fused, for each window,

the features for each accelerometer were fused horizontally. Then the feature selection

algorithms were applied. The number of selected features were also varied between 10,

15, 20 during the experiment. Before sending the selected features for the regression,

all features were normalised to zero mean and unit variance using linear method.

3) Supervised Learning Algorithms: The supervised learning algorithms used

in this study included multiple linear regression (MLR), support vector machine

regression (SVMR), and neural network regression (NNR). In our implementation,

RBF kernel function was used for SVMR, and for NNR, 1 hidden layer was used. The


number nodes in the hidden layer was half of input nodes. Maximum iteration/epoch

and learning rate were set to 250 and 0.001 respectively.

7.4.3 Simplified Approach

In the simplified approach only three features, including mean(x), mean(y),

mean(z), were extracted from each accelerometer window. Then, a least squares

regression was applied to estimate EE. For the two-accelerometer combination, the

features of two accelerometer windows were merged horizontally, which made 6

features. Then the fused features were entered into least squares regression.

7.5 PERFORMANCE EVALUATION

To measure regression task, the performance metrics use both root-mean-square

error (RMSE) and coefficient of determination (R2). For n different predictions, if 𝑦𝑦𝑡𝑡′

are the predicted values by the model and 𝑦𝑦𝑡𝑡 are the original values, the RMSE and R2

can be calculated using the following equations:

𝑅𝑅𝑅𝑅𝑆𝑆𝑅𝑅 = �∑ (𝑦𝑦𝑡𝑡′−𝑦𝑦𝑡𝑡)2𝑛𝑛𝑡𝑡=1

𝑃𝑃 (7)

𝑅𝑅2 = 1 − ∑ (𝑦𝑦𝑡𝑡′−𝑦𝑦𝑡𝑡)2𝑛𝑛𝑡𝑡=1

∑ (𝑦𝑦𝑡𝑡−𝑚𝑚𝑙𝑙𝑎𝑎𝑃𝑃(𝑦𝑦𝑡𝑡))2𝑛𝑛𝑡𝑡=1

(8)

The subject/fold wise performances of each accelerometer configuration in three

different approach were tested for statistical significance using one-way repeated

measures ANOVA. In addition, LSD post hoc tests were carried out to identify the

differences between approaches.

To understand the model’s performance on new data, this study used leave one

subject out cross validation, where data from one subject are used for testing and the

other subjects’ samples are used for training. In this way, samples of each subject are

used exactly once for testing. The averaged performance metrics, and their standard

deviations are used as final performance metrics.


7.6 RESULTS AND DISCUSSION

Figure 7.4 shows the results (RMSE and R2) of CNN, conventional supervised

learning and simplified regression approaches for each accelerometer configuration.

For the conventional supervised learning approaches, the best model for each

accelerometer configuration is presented.

(a)

(b)

Figure 7.4 Results of the approaches for each accelerometer location and a combination using (a) RMSE, and b) R2

7.6.1 Evaluation of Deep Learning Approach

The best performance was obtained at the RH location, (RMSE: 0.54 kcals/min,

R2: 0.71). Among the wrist locations, RW provided marginally better performance

than LW.

When the best two single accelerometer locations were combined (RW+RH), the

performance did not improve relative to performance at the hip. This finding is


consistent with previous works, which found that combining outputs from multiple

accelerometer locations did not improve EE prediction [103, 184].

7.6.2 Evaluation of Conventional Approach

The best conventional regression models for LW used MRMR feature selection

algorithm to select 10 features and SVMR for EE prediction (RMSE: 0.63 kcals/min

and R2: 0.62). For RW location, CFS feature selection algorithm, 10 features, and NNR

learning algorithm provided best conventional model (RMSE: 0.58 kcals/min and R2:

0.66). The best conventional model for RH utilized the CFS algorithm to select 20

features and NNR for EE prediction (RMSE: 0.55 and R2: 0.71). The best model for

RW+RH used the MRMR feature selection algorithm to select 10 features and NNR

for EE prediction (RMSE: 0.53 and R2: 0.71).

The patterns of the results are similar to the CNN approach. The best single

location for energy expenditure estimation was RH, RW location provided better

performance than the LW. RW+RH only marginally improved the regression

performance.

7.6.3 Evaluation of Simplified Regression Approach

Consistent with other approaches, the RH provided the best performance. RW

only gave marginally better performance than LW. However, combining the two

accelerometers (RW+RH) resulted in substantial performance improvement.

7.6.4 Comparison of Approaches

The best models (lowest RMSE) for the wrist locations and RW+RH

combination were obtained using the conventional supervised learning approach.

However, for the RH location, CNN provided the lowest RMSE of 0.54 kcals/min.

The simplified regression approach showed poor performance compared to the others.

Using one-way repeated measures ANOVA, significant statistical differences

were observed between the approaches for each accelerometer configuration. For

example, at the hip location, mean RMSE differed significantly between three

approaches (Wilks lambda = 0.160, F(2,6) = 15.798, P = 0.04). Least significant

difference (LSD) post hoc comparisons revealed the same result for all accelerometer

configurations. In all locations, both CNN and conventional approaches provided


significantly better performance than the simplified approach, while there were no

statistical differences between CNN and conventional machine learning approaches.

Although CNN models did not exceed the best performing conventional

supervised learning models in most situations, CNN is advantageous as it eliminates

the complex feature extraction and selection step, which requires extensive domain

knowledge. Also, often, the best features of one model do not perform in another

similar model. It is also known that CNN will improve its performance when the

amount of training data available is increased.


7.7 CONCLUSION

This study compared deep-learning/CNN models with the best conventional

supervised learning models and simple least squares regression for different

accelerometer configurations in pre-school children. Evaluation was performed using

a dataset collected from eight preschool-aged children performing 10 simulated free-

living activities.

For all accelerometer configurations, both deep learning and conventional

supervised learning algorithms provided comparable performance with no statistical

differences. However, the simplified approach showed significantly poor performance

compared to the others. This finding reflects that deep learning can be used as an

alternative tool of energy expenditure prediction which can eliminate the need for

feature engineering (i.e. feature extraction, selection).

Among the accelerometer configurations, right hip accelerometer placement

provided consistently better performance than the wrist placements. The combination

of right hip and right wrist did not exceed the performance of right hip only. Thus,

adding more sensors doesn’t result in improved energy expenditure prediction. As the

use of multiple accelerometers reduces user’s compliance, this finding suggests the

use of a single hip accelerometer for predicting energy expenditure.

Although the study had an adequate amount of accelerometer data to conduct

machine learning algorithms, the dataset involved only a small number of participants.

The dataset was collected as part of a larger study investigating sensor enabled

prediction of PA and EE in preschool children. Based on these encouraging results,

this work will be extended and applied on larger datasets in the future. The work also

acknowledges the need for research/future work on investigating other data

transformation strategies for accelerometer data before feeding to a CNN and use

completely free-living datasets to compare the approaches.


7.8 ACKNOWLEDGMENT

This research was supported by Australian Research Council (ARC) Discovery

grant to the “Modelling active play in preschool children using machine learning” at

the University of Wollongong (DP150100116).

Chapter 8: Conclusion and Future Work 149

Chapter 8: Conclusion and Future Work

This thesis contributed to the development of advanced learning models and a

fusion algorithm to improve the prediction of physical activity and its personal impacts

(including relative physical activity intensity, and energy expenditure) from wearable

sensor data. In addition, it identified the optimal sensor positioning and optimal

combination of multimodal data for assessing physical activity and predicting its

impacts. The major achievements are summarised as follows.

8.1 SUMMARY OF ACHIEVEMENTS

The first contribution, presented in Chapter 3, is showing that activity

recognition accuracy can be improved through the implementation of ensemble

learning methods. A custom ensemble using weighted majority voting to fuse the

decisions of four widely used ‘‘state-of-the-art’’ classification algorithms consistently

outperformed the constituent base classifiers and most conventional ensemble models.

Of the three decision fusion techniques examined in the custom ensemble, weighted

majority vote provided marginally better performance than NB fusion and significantly

outperformed BKS fusion. The conventional ensemble methods, such as bagging,

boosting, and random forests improve activity recognition in most, but not all,

situations.

The second contribution, presented in Chapter 4, is proposing the use of a novel

posterior-adapted class-based weighted decision fusion for physical activity

recognition based on data from multiple accelerometers. This method provided

significant improvements in performance, and outperformed other fusions such as

model-based and class-based weighted fusion. It showed that decision fusion with two

accelerometers, especially ankle and wrist, can significantly improve the average

performance compared to the use of a single accelerometer. The decision fusion of 3

accelerometers did not show further improvement from the best combination of 2

accelerometers.

The third contribution, presented in chapters 5 and 6, is demonstrating the use of

multi-modal physiological data using machine learning methods for prediction of

relative physical activity intensity. The results showed that the features extracted from

150 Chapter 8: Conclusion and Future Work

RR-interval can provide the highest relative physical activity intensity prediction

performance compared to single modalities (including heart-rate, electrodermal

activity, and temperature). It also identified that the features extracted from

electrodermal activity and temperature are not good by themselves, but they can

provide additional information and improve prediction performance when combined

with RR or heart-rate using feature fusion.

The fourth contribution, presented in Chapter 7, is proposing the use of deep

learning for predicting energy expenditure using accelerometer sensor data. This

method can eliminate the need for manual process of designing the approach for

feature extraction and selection. Based on the results, deep learning can achieve similar

performance to supervised learning models significantly, and significantly

outperformed simple regression models. In addition, the right hip was found to be a

better location for accelerometer placement for energy expenditure prediction

compared to wrist locations (left-, right-wrist). The use of multiple accelerometers did

not necessarily improve the energy expenditure prediction performance.

Chapter 8: Conclusion and Future Work 151

8.2 LIMITATIONS

The following research limitations warrant consideration.

Although this thesis utilised a diverse set of activity datasets to develop and test

the prediction models, none of these datasets are truly free-living datasets. All of the

datasets were either collected in a laboratory or simulated free-living environments

using predetermined activity sequences.

Our goal was to provide accurate estimation of physical activity classes, relative

physical activity intensity and energy expenditure for healthy young adults and

children. Additional investigation is required to check the suitability of the proposed

models for other groups such as the elderly, the obese and persons affected by chronic

diseases.

This work mainly utilised the movement, physiological, and person-level profile

data as the inputs of the model. The context during the data collection (e.g.,

environment, location, and time) was either controlled or efforts were made to keep it

consistent. Therefore, this study did not consider the contextual features in the model.

However, in the future, in a varying context, these contextual features should be

considered.

The proposed methods such as ensemble, deep leaning, etc. are computationally

complex. Our methods are only proposed for the offline data analysis where accuracy

is more important than the computational cost. It should also be worth to mention that,

in the age of cloud computing, the computing resources are available than ever.

Although most ensembles can take longer time to train, the test time is still small or

reasonable. Therefore, in future it is possible to improve the algorithms for faster

response time and deployed in cloud.

The sample sizes of the datasets used in this study were adequate (Appendix F).

However, the study can be conducted on a much larger dataset involving large number

of participants and activity trials.

152 Chapter 8: Conclusion and Future Work

8.3 FUTURE WORK

This study will influence the future work of mobile-based personal exercise

coaching and develop safer exercise or working strategies. This study can be expanded

in the following ways.

The proposed algorithms could be evaluated in the true free-living context and

using a wide range of activities. The approaches undertaken in this research could be

applied and evaluated in other population groups such as the elderly and individuals

with chronic diseases.

In this study, all the experiments utilised diverse range of physical activity

datasets that had a number of everyday life, ambulatory and non-ambulatory activities.

In future, a study can be conducted to investigate the utility of the proposed algorithms

on the collapsed activities or broader groups of activities (e.g. sedentary, walking,

transport, chores, sport).

The algorithms can be evaluated over a long period of time (longitudinal study)

at home to investigate the suitability of the proposed sensor-based methods in day-to-

day environment. This can help for future work on coaching personal exercises.

The algorithms can be incorporated into a mobile application to accurately

predict, show, and track the physical activity and its impacts in real time. In this regard,

the algorithms should be improved for faster response time and efficient output, and

deployed in cloud.

The algorithms can be conducted in experiments by exploring more multimodal

sensors, emerging since this study was completed.

The algorithms can be eventually trialled by partnering with wellness /coaching

programs to see how our system can assist in refining the programs over a period of

time by iteratively designing the physical activity.

Bibliography 153

Bibliography

[1] World Health Organization. (8 Nov 2017). Global Strategy on Diet, Physical

Activity and Health. Available: http://www.who.int/dietphysicalactivity/pa/en/

[2] I.-M. Lee, E. J. Shiroma, F. Lobelo, P. Puska, S. N. Blair, P. T. Katzmarzyk, et

al., "Effect of physical inactivity on major non-communicable diseases

worldwide: an analysis of burden of disease and life expectancy," The lancet,

vol. 380, pp. 219-229, 2012.

[3] A. E. Field, E. H. Coakley, A. Must, J. L. Spadano, N. Laird, W. H. Dietz, et

al., "Impact of overweight on the risk of developing common chronic diseases

during a 10-year period," Archives of internal medicine, vol. 161, pp. 1581-

1586, 2001.

[4] S. B. Eaton and S. B. Eaton, "Physical Inactivity, Obesity, and Type 2

Diabetes: An Evolutionary Perspective," Research Quarterly for Exercise and

Sport, vol. 88, pp. 1-8, 2017.

[5] F. W. Booth, M. V. Chakravarthy, S. E. Gordon, and E. E. Spangenburg,

"Waging war on physical inactivity: using modern molecular ammunition

against an ancient enemy," Journal of Applied Physiology, vol. 93, pp. 3-30,

2002.

[6] S. Arent, M. Landers, and J. Etnier, "The effects of exercise on mood in older

adults: a meta-analytic," J. Ageing Phys. Act, vol. 8, pp. 407-430, 2000.

[7] M. Teychenne, K. Ball, and J. Salmon, "Physical activity and likelihood of

depression in adults: a review," Preventive medicine, vol. 46, pp. 397-411,

2008.

[8] M. Reiner, C. Niermann, D. Jekauc, and A. Woll, "Long-term health benefits

of physical activity–a systematic review of longitudinal studies," BMC public

health, vol. 13, p. 813, 2013.

[9] F. Gómez-Gallego, J. R. Ruiz, A. Buxens, S. Altmäe, M. Artieda, C. Santiago,

et al., "Are elite endurance athletes genetically predisposed to lower disease

risk?," Physiological genomics, vol. 41, pp. 82-90, 2010.

[10] I. Janssen and A. G. LeBlanc, "Systematic review of the health benefits of

physical activity and fitness in school-aged children and youth," International

journal of behavioral nutrition and physical activity, vol. 7, p. 40, 2010.

http://www.who.int/dietphysicalactivity/pa/en/

154 Bibliography

[11] World Health Organization. (8 Nov 2017). Physical inactivity a leading cause

of disease and disability, warns WHO. Available:

http://www.who.int/mediacentre/news/releases/release23/en/

[12] W. Whang, J. E. Manson, F. B. Hu, and et al., "PHysical exertion, exercise,

and sudden cardiac death in women," JAMA, vol. 295, pp. 1399-1403, 2006.

[13] T. Mann, R. P. Lamberts, and M. I. Lambert, "Methods of prescribing relative

exercise intensity: physiological and practical considerations," Sports

medicine, vol. 43, pp. 613-625, 2013.

[14] A. E. Bauman, R. S. Reis, J. F. Sallis, J. C. Wells, R. J. Loos, B. W. Martin, et

al., "Correlates of physical activity: why are some people physically active and

others not?," The lancet, vol. 380, pp. 258-271, 2012.

[15] World Health Organization. (2011, 3 Sep 2015). Information sheet: global

recommendations on physical activity for health 18 - 64 years old. Available:

http://www.who.int/dietphysicalactivity/publications/recommendations18_64

yearsold/en/

[16] J. Parkka, M. Ermes, K. Antila, M. van Gils, A. Manttari, and H. Nieminen,

"Estimating intensity of physical activity: a comparison of wearable

accelerometer and gyro sensors and 3 sensor locations," in Engineering in

Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International

Conference of the IEEE, 2007, pp. 1511-1514.

[17] N. Vyas, J. Farringdon, D. Andre, and J. I. Stivoric, "Machine learning and

sensor fusion for estimating continuous energy expenditure," AI Magazine, vol.

33, p. 55, 2012.

[18] M. Altini, J. Penders, R. Vullers, and O. Amft, "Combining wearable

accelerometer and physiological data for activity and energy expenditure

estimation," in Proceedings of the 4th Conference on Wireless Health, 2013,

p. 1.

[19] S. G. Trost, B. S. Fees, S. J. Haar, A. D. Murray, and L. K. Crowe,

"Identification and validity of accelerometer cut‐points for toddlers," Obesity,

vol. 20, pp. 2317-2319, 2012.

[20] M. Altini, J. Penders, R. Vullers, and O. Amft, "Automatic Heart Rate

Normalization for Accurate Energy Expenditure Estimation," Methods Inf

Med, vol. 53, pp. 382-388, 2014.

http://www.who.int/mediacentre/news/releases/release23/en/

http://www.who.int/dietphysicalactivity/publications/recommendations18_64yearsold/en/

http://www.who.int/dietphysicalactivity/publications/recommendations18_64yearsold/en/

Bibliography 155

[21] M. Altini, J. Penders, and O. Amft, "Energy expenditure estimation using

wearable sensors: a new methodology for activity-specific models," in

Proceedings of the conference on Wireless Health, 2012, p. 1.

[22] S. Brage, U. Ekelund, N. Brage, M. A. Hennings, K. Froberg, P. W. Franks, et

al., "Hierarchy of individual calibration levels for heart rate and accelerometry

to measure physical activity," Journal of Applied Physiology, vol. 103, pp. 682-

692, 2007.

[23] S. J. Preece, J. Y. Goulermas, L. P. Kenney, D. Howard, K. Meijer, and R.

Crompton, "Activity identification using body-mounted sensors—a review of

classification techniques," Physiological measurement, vol. 30, p. R1, 2009.

[24] K. Ellis, J. Kerr, S. Godbole, J. Staudenmayer, and G. Lanckriet, "Hip and

Wrist Accelerometer Algorithms for Free-Living Behavior Classification,"

Medicine and science in sports and exercise, vol. 48, pp. 933-940, 2016.

[25] J. Staudenmayer, S. He, A. Hickey, J. Sasaki, and P. Freedson, "Methods to

estimate aspects of physical activity and sedentary behavior from high-

frequency wrist accelerometer measurements," Journal of Applied Physiology,

vol. 119, pp. 396-403, 2015.

[26] A. Mannini, S. S. Intille, M. Rosenberger, A. M. Sabatini, and W. Haskell,

"Activity recognition using a single accelerometer placed at the wrist or ankle,"

Medicine and science in sports and exercise, vol. 45, p. 2193, 2013.

[27] M. J. Mathie, A. C. Coster, N. H. Lovell, and B. G. Celler, "Accelerometry:

providing an integrated, practical method for long-term, ambulatory

monitoring of human movement," Physiological measurement, vol. 25, p. R1,

2004.

[28] I. Cleland, B. Kikhia, C. Nugent, A. Boytsov, J. Hallberg, K. Synnes, et al.,

"Optimal placement of accelerometers for the detection of everyday activities,"

Sensors, vol. 13, pp. 9183-9200, 2013.

[29] S. Chernbumroong, A. S. Atkins, and H. Yu, "Activity classification using a

single wrist-worn accelerometer," in Software, Knowledge Information,

Industrial Management and Applications (SKIMA), 2011 5th International

Conference on, 2011, pp. 1-6.

[30] S. G. Trost, Y. Zheng, and W.-K. Wong, "Machine learning for activity

recognition: hip versus wrist data," Physiological measurement, vol. 35, p.

2183, 2014.

156 Bibliography

[31] U. G. Mangai, S. Samanta, S. Das, and P. R. Chowdhury, "A survey of decision

fusion and feature fusion strategies for pattern classification," IETE Technical

review, vol. 27, pp. 293-307, 2010.

[32] M. Soleymani, M. Pantic, and T. Pun, "Multimodal emotion recognition in

response to videos," Affective Computing, IEEE Transactions on, vol. 3, pp.

211-223, 2012.

[33] N. E. Miller, S. J. Strath, A. M. Swartz, and S. E. Cashin, "Estimating absolute

and relative physical activity intensity across age via accelerometry in adults,"

Journal of aging and physical activity, vol. 18, p. 158, 2010.

[34] C. Ozemek, H. L. Cochran, S. J. Strath, W. Byun, and L. A. Kaminsky,

"Estimating relative intensity using individualized accelerometer cutpoints: the

importance of fitness level," BMC medical research methodology, vol. 13, p.

53, 2013.

[35] G. Borg, Borg's perceived exertion and pain scales vol. viii. Champaign, IL,

US: Human Kinetics, 1998.

[36] J. Scherr, B. Wolfarth, J. W. Christle, A. Pressler, S. Wagenpfeil, and M. Halle,

"Associations between Borg’s rating of perceived exertion and physiological

measures of exercise intensity," European journal of applied physiology, vol.

113, pp. 147-155, 2013.

[37] M. J. Chen, X. Fan, and S. T. Moe, "Criterion-related validity of the Borg

ratings of perceived exertion scale in healthy individuals: a meta-analysis,"

Journal of sports sciences, vol. 20, pp. 873-899, 2002.

[38] Y.-L. Chen, C.-C. Chen, P.-Y. Hsia, and S.-K. Lin, "Relationships of Borg's

RPE 6-20 scale and heart rate in dynamic and static exercises among a sample

of young Taiwanese men.," Perceptual & Motor Skills, vol. 117, pp. 971-982,

2013.

[39] U. M. Kujala, J. Pietilä, T. Myllymäki, S. Mutikainen, T. Föhr, I. Korhonen, et

al., "Physical Activity: Absolute Intensity versus Relative-to-Fitness-Level

Volumes," Medicine and science in sports and exercise, vol. 49, pp. 474-481,

2017.

[40] M. S. Fairbarn, S. P. Blackie, N. G. McElvaney, B. R. Wiggs, P. D. Pare, and

R. L. Pardy, "Prediction of heart rate and oxygen uptake during incremental

and maximal exercise in healthy adults," Chest, vol. 105, pp. 1365-1369, 1994.

Bibliography 157

[41] H. Tanaka, K. D. Monahan, and D. R. Seals, "Age-predicted maximal heart

rate revisited," Journal of the American College of Cardiology, vol. 37, pp.

153-156, 2001.

[42] J. Zhu, A. Pande, P. Mohapatra, and J. J. Han, "Using deep learning for energy

expenditure estimation with wearable sensors," in E-health Networking,

Application & Services (HealthCom), 2015 17th International Conference on,

2015, pp. 501-506.

[43] P. S. Freedson, E. Melanson, and J. Sirard, "Calibration of the Computer

Science and Applications, Inc. accelerometer," Medicine and science in sports

and exercise, vol. 30, pp. 777-781, 1998.

[44] J. Staudenmayer, D. Pober, S. Crouter, D. Bassett, and P. Freedson, "An

artificial neural network to estimate physical activity energy expenditure and

identify physical activity type from an accelerometer," Journal of Applied

Physiology, vol. 107, pp. 1300-1307, 2009.

[45] A. H. Montoye, B. Dong, S. Biswas, and K. A. Pfeiffer, "Validation of a

wireless accelerometer network for energy expenditure measurement," Journal

of sports sciences, vol. 34, pp. 2130-2139, 2016.

[46] A. Pande, Y. Zeng, A. K. Das, P. Mohapatra, S. Miyamoto, E. Seto, et al.,

"Energy expenditure estimation with smartphone body sensors," in

Proceedings of the 8th International Conference on Body Area Networks,

2013, pp. 8-14.

[47] S. G. Trost, W.-K. Wong, K. A. Pfeiffer, and Y. Zheng, "Artificial neural

networks to predict activity type and energy expenditure in youth," Medicine

and science in sports and exercise, vol. 44, p. 1801, 2012.

[48] A. H. Montoye, M. Begum, Z. Henning, and K. A. Pfeiffer, "Comparison of

linear and non-linear models for predicting energy expenditure from raw

accelerometer data," Physiological measurement, vol. 38, p. 343, 2017.

[49] A. Pande, G. Casazza, A. Nicorici, E. Seto, S. Miyamoto, M. Lange, et al.,

"Energy expenditure estimation in boys with duchene muscular dystrophy

using accelerometer and heart rate sensors," in Healthcare Innovation

Conference (HIC), 2014 IEEE, 2014, pp. 26-29.

[50] S. Liu, R. X. Gao, D. John, J. Staudenmayer, and P. S. Freedson, "SVM-based

multi-sensor fusion for free-living physical activity assessment," in

158 Bibliography

Engineering in Medicine and Biology Society, EMBC, 2011 Annual

International Conference of the IEEE, 2011, pp. 3188-3191.

[51] S. J. Preece, J. Y. Goulermas, L. P. Kenney, and D. Howard, "A comparison

of feature extraction methods for the classification of dynamic activities from

accelerometer data," Biomedical Engineering, IEEE Transactions on, vol. 56,

pp. 871-879, 2009.

[52] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time

object detection with region proposal networks," in Advances in neural

information processing systems, 2015, pp. 91-99.

[53] M. Hagenbuchner, D. P. Cliff, S. G. Trost, N. Van Tuc, and G. E. Peoples,

"Prediction of activity type in preschool children using machine learning

techniques," journal of Science and Medicine in Sport, vol. 18, pp. 426-431,

2015.

[54] M. Brandes, B. Steenbock, and N. Wirsik, "Energy Cost of Common Physical

Activities in Preschoolers," Journal of Physical Activity and Health, vol. 20,

pp. 1-6, 2017.

[55] S. G. Trost, D. Cliff, M. Ahmadi, N. Van Tuc, and M. Hagenbuchner, "Sensor-

enabled activity class recognition in preschoolers: Hip versus wrist data,"

Medicine and science in sports and exercise, 2017.

[56] R. J. Kate, A. M. Swartz, W. A. Welch, and S. J. Strath, "Comparative

evaluation of features and techniques for identifying activity type and

estimating energy cost from accelerometer data," Physiological measurement,

vol. 37, p. 360, 2016.

[57] N. F. Butte, U. Ekelund, and K. R. Westerterp, "Assessing physical activity

using wearable monitors: measures of physical activity," Medicine and science

in sports and exercise, vol. 44, pp. S5-12, 2012.

[58] L. Kallings, M. Leijon, M. L. Hellénius, and A. Ståhle, "Physical activity on

prescription in primary health care: a follow‐up of physical activity level and

quality of life," Scandinavian journal of medicine & science in sports, vol. 18,

pp. 154-161, 2008.

[59] E. A. Awick, D. K. Ehlers, S. Aguiñaga, A. M. Daugherty, A. F. Kramer, and

E. McAuley, "Effects of a randomized exercise trial on physical activity,

psychological distress and quality of life in older adults," General Hospital

Psychiatry, 2017.

Bibliography 159

[60] J. Erlichman, A. Kerbey, and W. James, "Physical activity and its impact on

health outcomes. Paper 2: Prevention of unhealthy weight gain and obesity by

physical activity: an analysis of the evidence," Obesity reviews, vol. 3, pp. 273-

287, 2002.

[61] G. Erikssen, K. Liestøl, J. Bjørnholt, E. Thaulow, L. Sandvik, and J. Erikssen,

"Changes in physical fitness and changes in mortality," The Lancet, vol. 352,

pp. 759-762, 1998.

[62] K. E. Powell, A. E. Paluch, and S. N. Blair, "Physical activity for health: What

kind? How much? How intense? On top of what?," Public Health, vol. 32, p.

349, 2011.

[63] The Department of Health Australia. (2017, 2 Jan 2018). Australia's Physical

Activity and Sedentary Behaviour Guidelines. Available:

http://www.health.gov.au/internet/main/publishing.nsf/content/health-

pubhlth-strateg-phys-act-guidelines#apaadult

[64] A. D. Okely, D. Ghersi, K. D. Hesketh, R. Santos, S. P. Loughran, D. P. Cliff,

et al., "A collaborative approach to adopting/adapting guidelines-The

Australian 24-Hour Movement Guidelines for the early years (Birth to 5 years):

an integration of physical activity, sedentary behavior, and sleep," BMC public

health, vol. 17, p. 869, 2017.

[65] J. Parkkari, P. Kannus, A. Natri, I. Lapinleimu, M. Palvanen, M. Heiskanen, et

al., "Active Living and Injury Risk," Int J Sports Med, vol. 25, pp. 209-216, //

15.04.2004 2004.

[66] D. S. Siscovick, N. S. Weiss, R. H. Fletcher, and T. Lasky, "The Incidence of

Primary Cardiac Arrest during Vigorous Exercise," New England Journal of

Medicine, vol. 311, pp. 874-877, 1984.

[67] C. M. Albert, M. A. Mittleman, C. U. Chae, I.-M. Lee, C. H. Hennekens, and

J. E. Manson, "Triggering of sudden death from cardiac causes by vigorous

exertion," New England Journal of Medicine, vol. 343, pp. 1355-1361, 2000.

[68] P. D. Thompson, "Exercise prescription and proscription for patients with

coronary artery disease," Circulation, vol. 112, pp. 2354-2363, 2005.

[69] I.-M. Lee, H. D. Sesso, Y. Oguma, and R. S. Paffenbarger, "Relative intensity

of physical activity and risk of coronary heart disease," Circulation, vol. 107,

pp. 1110-1116, 2003.

http://www.health.gov.au/internet/main/publishing.nsf/content/health-pubhlth-strateg-phys-act-guidelines#apaadult

http://www.health.gov.au/internet/main/publishing.nsf/content/health-pubhlth-strateg-phys-act-guidelines#apaadult

160 Bibliography

[70] P. Freedson, H. R. Bowles, R. Troiano, and W. Haskell, "Assessment of

physical activity using wearable monitors: recommendations for monitor

calibration and use in the field," Medicine and science in sports and exercise,

vol. 44, p. S1, 2012.

[71] S. J. Strath, L. A. Kaminsky, B. E. Ainsworth, U. Ekelund, P. S. Freedson, R.

A. Gary, et al., "Guide to the assessment of physical activity: clinical and

research applications," Circulation, vol. 128, pp. 2259-2279, 2013.

[72] S. J. Strath, A. M. Swartz, D. R. Bassett Jr, W. L. O'Brien, G. A. King, and B.

E. Ainsworth, "Evaluation of heart rate as a method for assessing moderate

intensity physical activity," Medicine and Science in Sports and Exercise, vol.

32, pp. S465-70, 2000.

[73] R. J. Robertson, Perceived exertion for practitioners: rating effort with the

OMNI picture system: Human Kinetics, 2004.

[74] K. Rice, C. Gammon, K. Pfieffer, and S. G. Trost, "Age related differences in

the validity of the OMNI perceived exertion scale during lifestyle activities,"

Pediatric exercise science, vol. 27, pp. 95-101, 2015.

[75] A. C. Utter, R. J. Robertson, J. M. Green, R. R. Suminski, S. R. McAnulty, and

D. C. Nieman, "Validation of the Adult OMNI Scale of perceived exertion for

walking/running exercise," Medicine and science in sports and exercise, vol.

36, pp. 1776-1780, 2004.

[76] H. K. Neilson, P. J. Robson, C. M. Friedenreich, and I. Csizmadi, "Estimating

activity energy expenditure: how valid are physical activity questionnaires?,"

The American journal of clinical nutrition, vol. 87, pp. 279-291, 2008.

[77] K. Ohkawara, Y. Hikihara, T. Matsuo, E. L. Melanson, and M. Hibi, "Variable

factors of total daily energy expenditure in humans," The Journal of Physical

Fitness and Sports Medicine, vol. 1, pp. 389-399, 2012.

[78] M. Luštrek, B. Cvetković, and S. Kozina, "Energy expenditure estimation with

wearable accelerometers," in Circuits and Systems (ISCAS), 2012 IEEE

International Symposium on, 2012, pp. 5-8.

[79] E. Jequier and Y. Schutz, "Long-term measurements of energy expenditure in

humans using a respiration chamber," The American journal of clinical

nutrition, vol. 38, pp. 989-998, 1983.

Bibliography 161

[80] J. McLaughlin, G. King, E. Howley, D. Bassett Jr, and B. Ainsworth,

"Validation of the COSMED K4 b2 portable metabolic system," International

journal of sports medicine, vol. 22, pp. 280-284, 2001.

[81] S. G. Trost, P. D. Loprinzi, R. Moore, and K. A. Pfeiffer, "Comparison of

accelerometer cut points for predicting activity intensity in youth," Med Sci

Sports Exerc, vol. 43, pp. 1360-1368, 2011.

[82] H. J. Montoye, R. Washburn, S. Servais, A. Ertl, J. G. Webster, and F. J. Nagle,

"Estimation of energy expenditure by a portable accelerometer," Medicine and

Science in Sports and Exercise, vol. 15, pp. 403-407, 1982.

[83] J. Bussmann, W. Martens, J. Tulen, F. Schasfoort, H. Van Den Berg-Emons,

and H. Stam, "Measuring daily behavior using ambulatory accelerometry: the

Activity Monitor," Behavior Research Methods, Instruments, & Computers,

vol. 33, pp. 349-356, 2001.

[84] A. S. Jackson, S. N. Blair, M. T. Mahar, L. T. Wier, R. M. Ross, and J. E.

Stuteville, "Prediction of functional aerobic capacity without exercise testing,"

Medicine and science in sports and exercise, vol. 22, pp. 863-870, 1990.

[85] F. K. Assah, U. Ekelund, S. Brage, A. Wright, J. C. Mbanya, and N. J.

Wareham, "Accuracy and validity of a combined heart rate and motion sensor

for the measurement of free-living physical activity energy expenditure in

adults in Cameroon," International journal of epidemiology, p. dyq098, 2010.

[86] J. Smolander, T. Juuti, M.-L. Kinnunen, K. Laine, V. Louhevaara, K.

Männikkö, et al., "A new heart rate variability-based method for the estimation

of oxygen consumption without individual laboratory calibration: application

example on postal workers," Applied ergonomics, vol. 39, pp. 325-331, 2008.

[87] A. Reiss and D. Stricker, "Creating and benchmarking a new dataset for

physical activity monitoring," in Proceedings of the 5th International

Conference on PErvasive Technologies Related to Assistive Environments,

2012, p. 40.

[88] M. Saar-Tsechansky and F. Provost, "Handling missing values when applying

classification models," Journal of machine learning research, vol. 8, pp. 1623-

1657, 2007.

[89] O. Banos, J.-M. Galvez, M. Damas, H. Pomares, and I. Rojas, "Window size

impact in human activity recognition," Sensors, vol. 14, pp. 6474-6499, 2014.

162 Bibliography

[90] L. Bao and S. S. Intille, "Activity recognition from user-annotated acceleration

data," in Pervasive computing, ed: Springer, 2004, pp. 1-17.

[91] C.-W. Lin, Y.-T. Yang, J.-S. Wang, and Y.-C. Yang, "A wearable sensor

module with a neural-network-based activity classification algorithm for daily

energy expenditure estimation," Information Technology in Biomedicine, IEEE

Transactions on, vol. 16, pp. 991-998, 2012.

[92] M. Altini, J. Penders, R. Vullers, and O. Amft, "Estimating energy expenditure

using body-worn accelerometers: a comparison of methods, sensors number

and positioning," Biomedical and Health Informatics, IEEE Journal of, vol.

19, pp. 219-226, 2015.

[93] J. Pärkkä, M. Ermes, P. Korpipää, J. Mäntyjärvi, J. Peltola, and I. Korhonen,

"Activity classification using realistic data from wearable sensors,"

Information Technology in Biomedicine, IEEE Transactions on, vol. 10, pp.

119-128, 2006.

[94] K. Aminian, P. Robert, E. Jéquier, and Y. Schutz, "Incline, speed, and distance

assessment during unconstrained walking," Medicine and science in sports and

exercise, vol. 27, pp. 226-234, 1995.

[95] M. Nyan, F. Tay, K. Seah, and Y. Sitoh, "Classification of gait patterns in the

time–frequency domain," Journal of biomechanics, vol. 39, pp. 2647-2656,

2006.

[96] N. Wang, E. Ambikairajah, N. H. Lovell, and B. G. Celler, "Accelerometry

based classification of walking patterns using time-frequency analysis," in

Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual


[97] F. Albinali, S. Intille, W. Haskell, and M. Rosenberger, "Using wearable

activity type detection to improve physical activity energy expenditure

estimation," in Proceedings of the 12th ACM international conference on

Ubiquitous computing, 2010, pp. 311-320.

[98] M. Berchtold, M. Budde, D. Gordon, H. R. Schmidtke, and M. Beigl,

"Actiserv: Activity recognition service for mobile phones," in Wearable

Computers (ISWC), 2010 International Symposium on, 2010, pp. 1-8.

[99] K. Ellis, J. Kerr, S. Godbole, G. Lanckriet, D. Wing, and S. Marshall, "A

random forest classifier for the prediction of energy expenditure and type of

Bibliography 163

physical activity from wrist and hip accelerometers," Physiological

measurement, vol. 35, p. 2191, 2014.

[100] P. S. Freedson, K. Lyden, S. Kozey-Keadle, and J. Staudenmayer, "Evaluation

of artificial neural network algorithms for predicting METs and activity type

from accelerometer data: validation on an independent sample," Journal of

Applied Physiology, vol. 111, pp. 1804-1812, 2011.

[101] E. Fullerton, B. Heller, and M. Munoz-Organero, "Recognizing Human

Activity in Free-Living Using Multiple Body-Worn Accelerometers," IEEE

Sensors Journal, vol. 17, pp. 5290-5297, 2017.

[102] I. C. Gyllensten and A. G. Bonomi, "Identifying types of physical activity with

a single accelerometer: evaluating laboratory-trained algorithms in daily life,"

Biomedical Engineering, IEEE Transactions on, vol. 58, pp. 2656-2663, 2011.

[103] K. Mackintosh, A. Montoye, K. Pfeiffer, and M. McNarry, "Investigating

optimal accelerometer placement for energy expenditure prediction in children

using a machine learning approach," Physiological measurement, vol. 37, p.

1728, 2016.

[104] T. G. Pavey, N. D. Gilson, S. R. Gomersall, B. Clark, and S. G. Trost, "Field

evaluation of a random forest activity classifier for wrist-worn accelerometer

data," Journal of Science and Medicine in Sport, 2016.

[105] T. Tamura, M. Sekine, M. Ogawa, T. Togawa, and Y. Fukui, "Classification of

acceleration waveforms during walking by wavelet transform," Methods of

information in medicine, vol. 36, pp. 356-359, 1997.

[106] E. M. Tapia, S. S. Intille, W. Haskell, K. Larson, J. Wright, A. King, et al.,

"Real-time recognition of physical activities and their intensities using wireless

accelerometers and a heart rate monitor," in Wearable Computers, 2007 11th

IEEE International Symposium on, 2007, pp. 37-40.

[107] U. Maurer, A. Rowe, A. Smailagic, and D. Siewiorek, "Location and activity

recognition using eWatch: A wearable sensor platform," Ambient Intelligence

in Everyday Life, pp. 86-102, 2006.

[108] G. Tulum, N. T. Artuğ, and B. Bolat, "Performance evaluation of feature

selection algorithms on human activity classification," in Innovations in

Intelligent Systems and Applications (INISTA), 2013 IEEE International

Symposium on, 2013, pp. 1-4.

164 Bibliography

[109] B. Fish, A. Khan, N. H. Chehade, C. Chien, and G. Pottie, "Feature selection

based on mutual information for human activity recognition," in Acoustics,

Speech and Signal Processing (ICASSP), 2012 IEEE International Conference

on, 2012, pp. 1729-1732.

[110] M. Zhang and A. A. Sawchuk, "A feature selection-based framework for

human activity recognition using wearable multimodal sensors," in

Proceedings of the 6th International Conference on Body Area Networks,

2011, pp. 92-98.

[111] N. Bicocchi, M. Mamei, and F. Zambonelli, "Detecting activities from body-

worn accelerometers via instance-based algorithms," Pervasive and Mobile

Computing, vol. 6, pp. 482-495, 2010.

[112] C. Catal, S. Tufekci, E. Pirmit, and G. Kocabag, "On the use of ensemble of

classifiers for accelerometer-based activity recognition," Applied Soft

Computing, vol. 37, pp. 1018-1022, 2015.

[113] D. R. Bassett Jr, A. V. Rowlands, and S. G. Trost, "Calibration and validation

of wearable monitors," Medicine and science in sports and exercise, vol. 44, p.

S32, 2012.

[114] S. G. Trost, "State of the Art Reviews: Measurement of Physical Activity in

Children and Adolescents," American Journal of Lifestyle Medicine, vol. 1, pp.

299-314, 2007.

[115] A. Reiss and D. Stricker, "Introducing a new benchmarked dataset for activity

monitoring," in Wearable Computers (ISWC), 2012 16th International

Symposium on, 2012, pp. 108-109.

[116] A. Reiss and D. Stricker, "Introducing a modular activity monitoring system,"

in Engineering in Medicine and Biology Society, EMBC, 2011 Annual


[117] U. Maurer, A. Rowe, A. Smailagic, and D. Siewiorek, "Location and activity

recognition using eWatch: A wearable sensor platform," in Ambient

Intelligence in Everyday Life, ed: Springer, 2006, pp. 86-102.

[118] M. Ermes, J. Parkka, J. Mantyjarvi, and I. Korhonen, "Detection of daily

activities and sports with wearable sensors in controlled and uncontrolled

conditions," Information Technology in Biomedicine, IEEE Transactions on,

vol. 12, pp. 20-26, 2008.

Bibliography 165

[119] K. Y. Chen and D. R. Bassett, "The technology of accelerometry-based activity

monitors: current and future," Medicine and science in sports and exercise, vol.

37, p. S490, 2005.

[120] A. H. Montoye, J. M. Pivarnik, L. M. Mudd, S. Biswas, and K. A. Pfeiffer,

"Comparison of activity type classification accuracy from accelerometers worn

on the hip, wrists, and thigh in young, apparently healthy adults," Measurement

in Physical Education and Exercise Science, vol. 20, pp. 173-183, 2016.

[121] D. M. Karantonis, M. R. Narayanan, M. Mathie, N. H. Lovell, and B. G. Celler,

"Implementation of a real-time human movement classifier using a triaxial

accelerometer for ambulatory monitoring," IEEE transactions on information

technology in biomedicine, vol. 10, pp. 156-167, 2006.

[122] B. J. Jefferis, P. H. Whincup, L. Lennon, and S. G. Wannamethee,

"Longitudinal Associations Between Changes in Physical Activity and Onset

of Type 2 Diabetes in Older British Men The influence of adiposity," Diabetes

care, vol. 35, pp. 1876-1883, 2012.

[123] M. S. Tremblay, A. G. LeBlanc, M. E. Kho, T. J. Saunders, R. Larouche, R. C.

Colley, et al., "Systematic review of sedentary behaviour and health indicators

in school-aged children and youth," International Journal of Behavioral

Nutrition and Physical Activity, vol. 8, p. 1, 2011.

[124] N. Owen, G. N. Healy, C. E. Matthews, and D. W. Dunstan, "Too much sitting:

the population-health science of sedentary behavior," Exercise and sport

sciences reviews, vol. 38, p. 105, 2010.

[125] A. H. Montoye, R. W. Moore, H. R. Bowles, R. Korycinski, and K. A. Pfeiffer,

"Reporting accelerometer methods in physical activity intervention studies: a

systematic review and recommendations for authors," British journal of sports

medicine, pp. bjsports-2015-095947, 2016.

[126] J. Skotte, M. Korshøj, J. Kristiansen, C. Hanisch, and A. Holtermann,

"Detection of physical activity types using triaxial accelerometers," J Phys Act

Health, vol. 11, pp. 76-84, 2014.

[127] L. Breiman, "Arcing classifier (with discussion and a rejoinder by the author),"

The annals of statistics, vol. 26, pp. 801-849, 1998.

[128] R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, "Boosting the margin: A

new explanation for the effectiveness of voting methods," Annals of statistics,

pp. 1651-1686, 1998.

166 Bibliography

[129] L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001.

[130] P. Yang, Y. Hwa Yang, B. B Zhou, and A. Y Zomaya, "A review of ensemble

methods in bioinformatics," Current Bioinformatics, vol. 5, pp. 296-308, 2010.

[131] M. Arif and A. Kattan, "Physical Activities Monitoring Using Wearable

Acceleration Sensors Attached to the Body," PloS one, vol. 10, p. e0130851,

2015.

[132] S. Pirttikangas, K. Fujinami, and T. Nakajima, "Feature selection and activity

recognition from wearable sensors," in Ubiquitous Computing Systems, ed:

Springer, 2006, pp. 516-527.

[133] M. A. Hall and L. A. Smith, "Feature Selection for Machine Learning:

Comparing a Correlation-Based Filter Approach to the Wrapper," in FLAIRS

conference, 1999, pp. 235-239.

[134] D. Ruta and B. Gabrys, "An overview of classifier fusion methods," Computing

and Information systems, vol. 7, pp. 1-10, 2000.

[135] L. I. Kuncheva, J. C. Bezdek, and R. P. Duin, "Decision templates for multiple

classifier fusion: an experimental comparison," Pattern recognition, vol. 34,

pp. 299-314, 2001.

[136] Y. S. Huang and C. Y. Suen, "A method of combining multiple experts for the

recognition of unconstrained handwritten numerals," Pattern Analysis and

Machine Intelligence, IEEE Transactions on, vol. 17, pp. 90-94, 1995.

[137] A. Reiss, M. Weber, and D. Stricker, "Exploring and extending the boundaries

of physical activity recognition," in Systems, Man, and Cybernetics (SMC),

2011 IEEE International Conference on, 2011, pp. 46-50.

[138] G. Forman and M. Scholz, "Apples-to-apples in cross-validation studies:

pitfalls in classifier performance measurement," ACM SIGKDD Explorations

Newsletter, vol. 12, pp. 49-57, 2010.

[139] N. Ruch, M. Rumo, and U. Mäder, "Recognition of activities in children by

two uniaxial accelerometers in free-living conditions," European journal of

applied physiology, vol. 111, pp. 1917-1927, 2011.

[140] Š. Raudys and F. Roli, "The behavior knowledge space fusion method:

Analysis of generalization error and strategies for performance improvement,"

in International Workshop on Multiple Classifier Systems, 2003, pp. 55-64.

Bibliography 167

[141] J. E. Sasaki, A. Hickey, J. Staudenmayer, D. John, J. A. Kent, and P. S.

Freedson, "Performance of Activity Classification Algorithms in Free-living

Older Adults," Medicine and science in sports and exercise, 2015.

[142] Y. Sun, M. S. Kamel, and A. K. Wong, "Empirical study on weighted voting

multiple classifiers," in Pattern Recognition and Data Mining, ed: Springer,

2005, pp. 335-344.

[143] L. Atallah, B. Lo, R. King, and G.-Z. Yang, "Sensor positioning for activity

recognition using wearable accelerometers," IEEE transactions on biomedical

circuits and systems, vol. 5, pp. 320-329, 2011.

[144] N. Kern, B. Schiele, and A. Schmidt, "Multi-sensor activity context detection

for wearable computing," in European Symposium on Ambient Intelligence,

2003, pp. 220-232.

[145] H. Gjoreski, M. Luštrek, and M. Gams, "Accelerometer placement for posture

recognition and fall detection," in Intelligent environments (IE), 2011 7th

international conference on, 2011, pp. 47-54.

[146] D. O. Olguın and A. S. Pentland, "Human activity recognition: Accuracy

across common locations for wearable sensors," in Proceedings of 2006 10th

IEEE International Symposium on Wearable Computers, Montreux,

Switzerland, 2006, pp. 11-14.

[147] F. A. Faria, J. A. Dos Santos, A. Rocha, and R. d. S. Torres, "A framework for

selection and fusion of pattern classifiers in multimedia recognition," Pattern

Recognition Letters, vol. 39, pp. 52-64, 2014.

[148] A. Bulling, U. Blanke, and B. Schiele, "A tutorial on human activity

recognition using body-worn inertial sensors," ACM Computing Surveys

(CSUR), vol. 46, p. 33, 2014.

[149] O. Banos, M. Damas, H. Pomares, F. Rojas, B. Delgado-Marquez, and O.

Valenzuela, "Human activity recognition based on a sensor weighting

hierarchical classifier," Soft Computing, vol. 17, pp. 333-343, 2013.

[150] W. Zhang and Z. Zhang, "Belief function based decision fusion for

decentralized target classification in wireless sensor networks," Sensors, vol.

15, pp. 20524-20540, 2015.

[151] Y. Lin, Q. Hu, J. Liu, J. Chen, and J. Duan, "Multi-label feature selection based

on neighborhood mutual information," Applied Soft Computing, vol. 38, pp.

244-256, 2016.

168 Bibliography

[152] M. Soleymani, G. Chanel, J. J. Kierkels, and T. Pun, "Affective

characterization of movie scenes based on content analysis and physiological

changes," International Journal of Semantic Computing, vol. 3, pp. 235-254,

2009.

[153] O. Banos, R. Garcia, J. A. Holgado-Terriza, M. Damas, H. Pomares, I. Rojas,

et al., "mHealthDroid: a novel framework for agile development of mobile

health applications," in Ambient Assisted Living and Daily Activities, ed:

Springer, 2014, pp. 91-98.

[154] O. Banos, C. Villalonga, R. Garcia, A. Saez, M. Damas, J. A. Holgado-Terriza,

et al., "Design, implementation and validation of a novel open framework for

agile development of mobile health applications," Biomedical engineering

online, vol. 14, pp. 1-20, 2015.

[155] S. Ahangama, Y. S. Lim, S. Y. Koh, and D. C. C. Poo, "Revolutionizing

Mobile Healthcare Monitoring Technology: Analysis of Features through Task

Model," in International Conference on Social Computing and Social Media,

2014, pp. 298-305.

[156] M. Kirwan, M. J. Duncan, C. Vandelanotte, and W. K. Mummery, "Using

smartphone technology to monitor physical activity in the 10,000 Steps

program: a matched case–control trial," Journal of medical Internet research,

vol. 14, 2012.

[157] S. Bauer, J. de Niet, R. Timman, and H. Kordy, "Enhancement of care through

self-monitoring and tailored feedback via text messaging and their use in the

treatment of childhood overweight," Patient education and counseling, vol. 79,

pp. 315-319, 2010.

[158] K. Y. Chen, K. F. Janz, W. Zhu, and R. J. Brychta, "Re-defining the roles of

sensors in objective physical activity monitoring," Medicine and science in

sports and exercise, vol. 44, p. S13, 2012.

[159] M. A. Adams, J. F. Sallis, G. J. Norman, M. F. Hovell, E. B. Hekler, and E.

Perata, "An adaptive physical activity intervention for overweight adults: a

randomized controlled trial," PloS one, vol. 8, p. e82901, 2013.

[160] R. R. Pate, M. Pratt, S. N. Blair, W. L. Haskell, C. A. Macera, C. Bouchard, et

al., "Physical activity and public health: a recommendation from the Centers

for Disease Control and Prevention and the American College of Sports

Medicine," Jama, vol. 273, pp. 402-407, 1995.

Bibliography 169

[161] A. Chowdhury, D. Tjondronegoro, V. Chandran, and S. Trost, "Physical

activity recognition using posterior-adapted class-based fusion of multi-

accelerometers data," IEEE Journal of Biomedical and Health Informatics,

2017.

[162] A. K. Chowdhury, D. Tjondronegoro, V. Chandran, and S. G. Trost, "Ensemble


Medicine and science in sports and exercise, vol. 49, p. 1965, 2017.

[163] M. Altini, R. Vullers, C. Van Hoof, M. van Dort, and O. Amft, "Self-calibration

of walking speed estimations using smartphone sensors," in Pervasive

Computing and Communications Workshops (PERCOM Workshops), 2014

IEEE International Conference on, 2014, pp. 10-18.

[164] K. R. Rice, C. Gammon, K. Pfieffer, and S. Trost, "Age related differences in

the validity of the OMNI perceived exertion scale during lifestyle activities,"

Pediatric exercise science, vol. 27, pp. 95-101, 2015.

[165] H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information

criteria of max-dependency, max-relevance, and min-redundancy," IEEE

Transactions on pattern analysis and machine intelligence, vol. 27, pp. 1226-

1238, 2005.

[166] G. Borg, G. Ljunggren, and R. Ceci, "The increase of perceived exertion, aches

and pain in the legs, heart rate and blood lactate during exercise on a bicycle

ergometer," European journal of applied physiology and occupational

physiology, vol. 54, pp. 343-349, 1985.

[167] H. N. Dawes, K. L. Barker, J. Cockburn, N. Roach, O. Scott, and D. Wade,

"Borg’s rating of perceived exertion scales: do the verbal anchors mean the

same for different clinical groups?," Archives of physical medicine and

rehabilitation, vol. 86, pp. 912-916, 2005.

[168] A. I. o. Health and Welfare, The Active Australia Survey: A guide and manual

for implementation, analysis and reporting: Australian Institute of Health and

Welfare, 2003.

[169] A. K. Chowdhury, D. Tjondronegoro, V. Chandran, and S. G. Trost, "Ensemble


Medicine and science in sports and exercise, 2017.

[170] B. W. Timmons, A. G. LeBlanc, V. Carson, S. Connor Gorber, C. Dillman, I.

Janssen, et al., "Systematic review of physical activity and health in the early

170 Bibliography

years (aged 0–4 years)," Applied Physiology, Nutrition, and Metabolism, vol.

37, pp. 773-792, 2012.

[171] A. G. LeBlanc, J. C. Spence, V. Carson, S. Connor Gorber, C. Dillman, I.

Janssen, et al., "Systematic review of sedentary behaviour and health indicators

in the early years (aged 0–4 years)," Applied Physiology, Nutrition, and

Metabolism, vol. 37, pp. 753-772, 2012.

[172] Department of Health | Brochure - National Physical Activity

Recommendations for Children 0-5 Years.” [Online]. Available:

http://www.health.gov.au/internet/main/publishing.nsf/Content/npra-0-5yrs-

brochure

[173] M. S. Tremblay, J.-P. Chaput, K. B. Adamo, S. Aubert, J. D. Barnes, L.

Choquette, et al., "Canadian 24-Hour Movement Guidelines for the Early

Years (0–4 years): An Integration of Physical Activity, Sedentary Behaviour,

and Sleep," BMC public health, vol. 17, p. 874, 2017.

[174] D. P. Cliff, J. J. Reilly, and A. D. Okely, "Methodological considerations in

using accelerometers to assess habitual physical activity in children aged 0–5

years," Journal of Science and Medicine in Sport, vol. 12, pp. 557-567, 2009.

[175] L. Deng and D. Yu, "Deep learning: methods and applications," Foundations

and Trends® in Signal Processing, vol. 7, pp. 197-387, 2014.

[176] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and J.

Garcia-Rodriguez, "A Review on Deep Learning Techniques Applied to

Semantic Segmentation," arXiv preprint arXiv:1704.06857, 2017.

[177] A. Groβek, C. van Loo, G. E. Peoples, M. Hagenbuchner, R. Jones, and D. P.

Cliff, "Energy cost of physical activities and sedentary behaviors in young

children," Journal of Physical Activity and Health, vol. 13, pp. S7-S10, 2016.

[178] W. D. McArdle, F. I. Katch, and V. L. Katch, Exercise physiology: nutrition,

energy, and human performance. Philadelphia: Lea and Febiger, 1991.

[179] S. E. Crouter, M. Horton, and D. R. Bassett Jr, "Use of a 2-regression model

for estimating energy expenditure in children," Medicine and science in sports

and exercise, vol. 44, pp. 1177-1185, 2012.

[180] K. Miura and T. Harada, "Implementation of a practical distributed calculation

system with browsers and javascript, and application to distributed deep

learning," arXiv preprint arXiv:1503.05743, 2015.

http://www.health.gov.au/internet/main/publishing.nsf/Content/npra-0-5yrs-brochure

http://www.health.gov.au/internet/main/publishing.nsf/Content/npra-0-5yrs-brochure

Bibliography 171

[181] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann

machines," in Proceedings of the 27th international conference on machine

learning (ICML-10), 2010, pp. 807-814.

[182] M. Robnik-Šikonja and I. Kononenko, "Theoretical and empirical analysis of

ReliefF and RReliefF," Machine learning, vol. 53, pp. 23-69, 2003.

[183] G. Roffo and S. Melzi, "Ranking to Learn: Feature Ranking and Selection via

Eigenvector Centrality," arXiv preprint arXiv:1704.05409, 2017.

[184] S. G. Trost, K. L. McIver, and R. R. Pate, "Conducting accelerometer-based

activity assessments in field-based research," Medicine & Science in Sports &

Exercise, vol. 37, pp. S531-S543, 2005.

Appendices 173

Appendices

Appendix A

Supplementary Digital Content 1 (SDC 1)

Features were extracted from 10 s sliding window with 50% overlapping.

1. Mean, standard deviation, minimum, maximum, variance, median, skewness,

25th and 75th percentile, and kurtosis were simple time-domain features

extracted from each axis of a 3-axis accelerometer.

2. In addition to these simple time-domain features, energy features were

calculated as the sum of the squared discrete FFT component magnitudes of

the signal. Energy features were normalised by dividing it by window length.

Energy(X) = ∑abs�FFT(X)�2

window_size //X is accelerometer X-axis data in a window

Energy(Y) = ∑abs�FFT(Y)�2

window_size //Y is accelerometer Y-axis data in a window

Energy(Z) = ∑abs(FFT(Z))2

window_size //Z is accelerometer Z-axis data in a window

3. The principal frequency and its magnitude were also extracted. The frequency

with highest FFT magnitude was considered as principle frequency.

4. Zero-crossing of each accelerometer axis (the number of times a data changes

sign) were extracted,

Pseudo-code:

Subtract the median from the original data

X=X-median(X)

Y=Y-median(Y)

Z=Z-median(Z)

ZeroCrossing(X) = the number of times X changes sign.

ZeroCrossing(Y) = the number of times Y changes sign.

174 Appendices

ZeroCrossing(Z) = the number of times Z changes sign.

5. Accelerometer axis cross-correlations (corrxy, corrxz, corryz) were calculated.

corrxy =∑ �X(i)− mean(X)� ∗ �Y(i) − mean(Y)�window_sizei=1

�∑ �X(i)− mean(X)�2window_sizei=1 ∗ ∑ �Y(i) −mean(Y)�2window_size

i=1

corrxz =∑ �X(i) − mean(X)� ∗ �Z(i) −mean(Z)�window_sizei=1

�∑ �X(i) − mean(X)�2window_sizei=1 ∗ ∑ �Z(i) − mean(Z)�2window_size

i=1

corryz =∑ �Y(i) −mean(Y)� ∗ �Z(i) − mean(Z)�window_sizei=1

�∑ �Y(i)− mean(Y)�2window_sizei=1 ∗ ∑ �Z(i) − mean(Z)�2window_size

i=1

Appendices 175

Appendix B


The selected features for each dataset (those are common in all folds) are provided in

following table. (√) indicates the corresponding feature was selected.

Extracted Features from a 10s window

Dataset

#1

Dataset

#2

Dataset

#3

Minimum value of X axis data - min(X) √ √ √

Minimum value of Y axis data - min(Y) √ √ √

Minimum value of Z axis data - min(Z) √ √ √

Maximum value of X axis data - max(X) √ √

Maximum value of Y axis data - max(Y) √ √ √

Maximum value of Z axis data - max(Z) √ √ √

Mean of X axis data - mean(X) √

Mean of Y axis data - mean(Y)

Mean of Z axis data - mean(Z) √

Variance of X axis data - variance(X) √ √

Variance of Y axis data - variance(Y) √ √

Variance of Z axis data - variance(Z) √ √ √

Standard Deviation of X axis data - std(X) √ √ √

Standard Deviation of Y axis data - std(Y) √ √ √

Standard Deviation of Z axis data - std(Z) √ √ √

Skewness of X axis data - skewness(X) √

Skewness of Y axis data - skewness(Y)

Skewness of Z axis data - skewness(Z)

Kurtosis of X axis data - kurtosis(X) √

Kurtosis of Y axis data - kurtosis(Y) √

Kurtosis of Z axis data - kurtosis(Z) √

Median of X axis data - median(X) √

Median of Y axis data - median(Y)

Median of Z axis data - median(Z) √

25th Percentile of X axis data - percentile25(X) √

176 Appendices

75th Percentile of X axis data - percentile75(X) √ √ √

25th Percentile of Y axis data - percentile25(Y) √ √

75th Percentile of Y axis data - percentile75(Y) √

25th Percentile of Z axis data - percentile25(Z) √ √ √

75th Percentile of Z axis data - percentile75(Z) √

Correlation of X and Y axis data - corr_axis(XY)

Correlation of X and Z axis data - corr_axis(XZ)

Correlation of Y and Z axis data - corr_axis(YZ)

Zero-crossing in X axis data - zerocross(X) √ √

Zero-crossing in Y axis data - zerocross(Y) √

Zero-crossing in Z axis data - zerocross(Z) √ √

Energy of X axis data - energy(X) √ √ √

Energy of Y axis data - energy(Y) √ √

Energy of Z axis data - energy(Z)

Dominant Frequency of X axis data -

dominantFr(X)

√ √

Dominant Frequency of Y axis data -

dominantFr(Y)

√

Dominant Frequency of Z axis data -

dominantFr(Z)

√

Magnitude of Dominant Frequency in X axis

data - dominantFrMag(X)

√

Magnitude of Dominant Frequency in Y axis

data - dominantFrMag(Y)

√ √

Magnitude of Dominant Frequency in Z axis

data - dominantFrMag(Z)

√ √

Appendices 177

Appendix C


Multiple classification algorithms.

Binary Decision Tree (BDT) uses hierarchical approaches to develop an

optimum decision tree from the training dataset. It starts with a single node (root) and

builds a decision tree by dividing the features at the cut point that maximises impurity

reduction. Each group of features is further divided into smaller groups using same

splitting criterion. Such splitting continues until a stop condition is reached. During

testing, the obtained decision tree is used for classification tasks. In the current study,

the maximum number of decision split was empirically set at 20.

k Nearest Neighbours (kNN) is one of simplest machine learning algorithm and

is widely used as a benchmark for learning rules. To classify an object (window of test

data), kNN first identifies k nearest neighbours of that object, regardless of classes.

Then, it returns the most common class among its k nearest neighbours as the predicted

class. kNN is often computationally slow during testing as it needs to search nearest

neighbours for each sample. In the present study, the value of k was set to 7.

Support Vector Machine (SVM) classifies data by finding the best separator

between the two classes. In this study, to enable multi classification using SVM, two-

class SVM was adapted in a fashion that firstly it classifies one class against all other

classes and then it classifies another classes verses remaining classes and so on. Linear

classification function was chosen as kernel function. We set the box constraint

parameter to 1. The box constraint parameter helps to prevent overfitting

(regularization) by controlling the maximum penalty imposed on margin-violating

observations.ss

Artificial Neural Network (ANN) is widely used machine learning algorithm to

model non-linear relationship between a set of inputs and output. The network consists

of inter-connected artificial neurons in layers: an input layer, one or more hidden layers

and an output layer. Each neuron applies an activation function (logistic or linear) on

the weighted sum of the inputs to that neuron to produce an output. At first iteration,

weights are random and a cost-function such as root mean squared error calculates the

average squared error between the network's output and the target output. Then, a

178 Appendices

backpropagation algorithm optimises the weights based on the training dataset until

the network learns to correctly map arbitrary inputs to outputs (optimum root mean

squared error or maximum iteration) (27, 35). In this work, the number of input and

output neurons varied depending on the dataset, with the number of input neurons and

output neurons equalling the number of selected features and number of activity

classes, respectively. Fifty neurons were used in hidden layer. Maximum

iteration/epoch and learning rate were set to 250 and 0.001 respectively.

Appendices 179

Appendix D

SUPPLEMENTARY DIGITAL CONTENT 4 (SDC 4)

Confusion Matrix

Table 1 (SDC4). Confusion matrix of the ensemble methods in the dataset #1

1 2 3 4 5 6 7 8 1 Lying

RF 271 73 4 0 0 1 0 1 Bagging 231 86 31 0 0 1 0 1 Adaboost 305 29 10 0 0 3 0 3 WMV 317 31 1 0 0 0 0 1 NB 314 33 1 0 0 0 0 2 BKS 305 42 2 0 0 0 0 1

2 Sitting


3 Standing


4 Walking


5 Running


6 Cycling


7 Ascending Stairs


8 Descending Stairs


180 Appendices


1 2 3 4 5 1 Stationary (sit and stand)

RF 431 18 4 1 0 Bagging 437 13 4 0 0 Adaboost 418 29 6 1 0 WMV 428 25 1 0 0 NB 418 34 2 0 0 BKS 424 21 9 0 0

2 Comfortable Walking


3 Fast Walking


4 Jogging


5 Running


Appendices 181


1 2 3 4 5 6 7 1 Lying down

RF 207 44 13 2 1 0 0 Bagging 206 47 9 3 1 0 1 Adaboost 185 66 11 2 2 0 1 WMV 198 48 19 1 1 0 0 NB 204 44 17 1 1 0 0 BKS 207 49 9 1 1 0 0

2 Sitting+


3 Standing+


4 Walking


5 Running


6 Basketball


7 Dance RF 4 0 3 7 47 1 242 Bagging 4 0 7 4 54 3 232 Adaboost 4 0 16 7 33 0 244 WMV 2 6 6 7 40 0 243 NB 2 0 5 7 64 0 226 BKS 2 0 5 7 59 0 231

182 Appendices

Classification Accuracy

The classification accuracy was calculated using the following equation

𝐴𝐴𝑠𝑠𝑠𝑠𝑡𝑡𝑠𝑠𝑓𝑓𝑠𝑠𝑦𝑦 = 𝑇𝑇𝑃𝑃 + 𝑇𝑇𝑇𝑇

𝑇𝑇𝑃𝑃 + 𝑇𝑇𝑇𝑇 + 𝐹𝐹𝑃𝑃 + 𝐹𝐹𝑇𝑇

Where TP, TN, FP, FN are true-positive, true-negative, false-positive and false-

negative.

Table 4 (SDC4). Classification results (Accuracy) using wrist acceleration sensor of dataset #1


Rando

m Forest

Bagging

Decision Tree

Boosted Decision Tree

BDT KNN SVM ANN WMV Fusion

NB Fusion

BKS Fusion

Lying 94 92.26 96.73 92.52 96.06 97.81 97.22 97.76 97.4 95.7

Sitting 92.61 91.72 94 90.47 93.78 95.7 94.58 95.61 95.7 93.87

Standing 96.24 94.14 94.23 91.81 95.43 95.57 95.52 96.55 96.51 96.24

Walking 93.96 94.85 95.48 89.26 91.09 94.72 93.6 94.85 93.87 92.84

Running 100 99.87 99.96 97.99 99.96 99.42 99.06 99.87 99.87 99.87

Cycling 98.88 98.21 98.3 98.16 98.84 98.93 96.69 99.1 99.15 98.97

Ascending Stairs 93.46 93.11 93.11 89.7 89.75 92.88 93.78 93.96 93.29 92.39

Descending Stairs 96.37 97.27 95.26 96.02 96.33 97.14 96.6 97.4 97.72 97.18

Average 95.69 95.18 95.88 93.24 95.15 96.52 95.88 96.89 96.69 95.88

Appendices 183



Rando

m Forest

Bagging Decision

Tree

Boosted Decision


Fusion NB

Fusion BKS

Fusion

Stationary (sit and stand)

97.22 97.89 96.72 95.44 95.33 96.16 94.72 97 96.27 96.11

Comfortable Walking

85.6 82.93 82.26 81.87 78.75 82.2 77.42 84.59 83.09 80.81

Fast Walking 87.32 84.09 83.82 84.59 80.7 86.15 77.25 86.37 85.65 82.93

Jogging 94.99 94.66 92.55 94.16 92.38 91.82 90.38 93.27 93.1 93.44

Running 95.38 95.16 93.72 94.88 93.99 93.6 93.05 94.61 94.49 94.33

Average 92.1 90.95 89.81 90.19 88.23 89.99 86.56 91.17 90.52 89.52



Random Forest

Bagging

Decision Tree

Boosted

Decision Tree

BDT KNN SVM ANN WMV Fusion

NB Fusion

BKS Fusion

Lying down 96.96 96.73 96.39 94.8 95.4 96.5 95.91 96.87 96.33 95.45

Sitting+ 97.36 96.67 96.53 95.25 95.65 97.07 96.62 96.84 96.84 95.71

Standing+ 93.26 92.5 91.56 89.74 91.81 92.95 92.33 93.66 93.83 92.69

Walking 97.64 96.53 96.56 95.94 97.36 97.38 96.42 97.61 97.58 96.65

Running 95.71 94.4 95.51 93.35 94.86 94.91 94.63 96.02 95.28 94.31

Basketball 97.33 97.1 98.07 95.71 97.73 98.04 98.07 98.38 98.38 97.64

Dance 97.47 96.62 97.47 95.71 96.79 96.87 97.21 97.7 97.13 96.96

Average 96.53 95.79 96.01 94.36 95.65 96.25 95.88 96.73 96.48 95.63

184 Appendices

Appendix E

SOURCE CODE REPOSITORIES

We made our codes, and features available in the github, so that researchers can

use our algorithms and models for benchmarking or evaluate using their data. The links

for the repositories are given in the table below:

Table 1. Links of source code repositories

Chapter Link of repository

Chapter 3: Decision Fused Ensembles for Effective

Classification of Physical Activities from Wrist-Worn

Accelerometer Data

https://github.com/alokchy04/Decision-Fused-Ensembles-

for-PA-Classification-from-Wrist-Worn-Accelerometer

Chapter 4: Physical Activity Recognition using Posterior-

adapted Class-based Fusion of Multi-Accelerometers Data

https://github.com/alokchy04/Physical-Activity-

Recognition-using-Posterior-adapted-Class-based-Fusion-

of-Multi-Accelerometer-Data



https://github.com/alokchy04/Physical-Activity-Recognition-using-Posterior-adapted-Class-based-Fusion-of-Multi-Accelerometer-Data



Appendices 185

Appendix F

NUMBER OF DATA POINTS/ SAMPLES OF EXTRACTED FEATURES

In this appendix, we provided number of data-points (user-wise) for all of our datasets.

As we did leave-one-subject-out cross-validation, user-wise data-points can provide

the information on the size of training data and testing data in each fold.

Table 1: Number of data points for Dataset #1 used in the study of Chapter 3

User # of data points

1 295 2 297 3 212 4 277 5 318 6 284 7 262 8 289

Total 2234



1 225 2 225 3 225 4 225 5 225 6 225 7 225 8 225

Total 1800

186 Appendices



1 228 2 228 3 228 4 114 5 114 6 114 7 228 8 230 9 228

10 209

11 228

12 228

13 229

14 228

15 228

16 228

17 228

Total 3518

Table 4: Number of data points for PAMAP2 dataset used in the study of Chapter 4


1 756 2 759 3 545 4 708 5 809 6 727 7 670 8 740

Total 5714

Appendices 187

Table 5: Number of data points for MHEALTH dataset used in the study of Chapter 4


1 248 2 248 3 248 4 248 5 248 6 248 7 248 8 248 9 248

10 248

Total 2480

Table 6: Number of data points for the dataset used in the Chapter 5 and 6 studies


1 24 2 35 3 28 4 34 5 30 6 33 7 31 8 30 9 35

10 26

11 35

12 35

13 27

14 35

15 33

16 35

17 29

18 5

19 35

20 10

21 10

Total 615

188 Appendices

Table 7: Number of data points for the dataset used in the study of Chapter 7


1 284 2 276 3 312 4 351 5 306 6 313 7 313 8 317

Total 2472

SENSOR-BASED PREDICTION OF PHYSICAL ACTIVITY AND ITS ... · SENSOR-BASED PREDICTION OF PHYSICAL...

Documents

Transcript of SENSOR-BASED PREDICTION OF PHYSICAL ACTIVITY AND ITS ... · SENSOR-BASED PREDICTION OF PHYSICAL...