FAULT DETECTION IN MOBILE ROBOTICS USING AUTOENCODER …

Malardalen UniversitySchool of Innovation Design and Engineering

Vasteras, Sweden

Thesis for the Degree of Master of Science - Computer Science withSpecialization in Embedded Systems 30.0 credits

FAULT DETECTION IN MOBILEROBOTICS USING AUTOENCODER

AND MAHALANOBIS DISTANCE

Christian [email protected]

Examiner: Thomas NolteMalardalen University, Vasteras, Sweden

Supervisor: Nandinbaatar TsogMalardalen University, Vasteras, Sweden

Company Supervisor: Jonatan BlomABB Corporate Research, Vasteras, Sweden

June 10, 2021

Christian Mortensen Fault Detection in Mobile Robotics

Abstract

Intelligent fault detection systems using machine learning can be applied to learn to spotanomalies in signals sampled directly from machinery. As a result, expensive repair costsdue to mechanical breakdowns and potential harm to humans due to malfunctioning equip-ment can be prevented. In recent years, Autoencoders have been applied for fault detectionin areas such as industrial manufacturing. It has been shown that they are well suitedfor the purpose as such models can learn to recognize healthy signals that facilitate thedetection of anomalies. The content of this thesis is an investigation into the applicabilityof Autoencoders for fault detection in mobile robotics by assigning anomaly scores to sam-pled torque signals based on the Autoencoder reconstruction errors and the Mahalanobisdistance to a known distribution of healthy errors. An experiment was carried out by train-ing a model with signals recorded from a four-wheeled mobile robot executing a pre-defineddiagnostics routine to stress the motors, and datasets of healthy samples along with threedifferent injected faults were created. The model produced overall greater anomaly scoresfor one of the fault cases in comparison to the healthy data. However, the two other casesdid not yield any difference in anomaly scores due to the faults not impacting the patternof the signals. Additionally, the Autoencoders ability to isolate a fault to a location wasstudied by examining the reconstruction errors faulty samples determine whether the errorsof signals originating from the faulty component could be used for this purpose. Althoughwe could not confirm this based on the results, fault isolation with Autoencoders could stillbe possible given more representative signals.

i


Table of Contents

1 Introduction 11.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 42.1 Model-based Fault Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Data-driven Fault Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Anomaly Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4.1 Distance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Related Work 9

4 Research Methodology 11

5 System Model 13

6 Autoencoder-Based Fault Detection 146.1 Fault Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6.1.1 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146.1.2 Anomaly Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156.1.3 Fault Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6.2 Fault Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

7 Experiments 177.1 Industrial Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

7.1.1 Diagnostics Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . 177.1.2 Fault Injections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187.1.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

7.2 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197.3 Data Pre-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

8 Results 228.1 Fault Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238.2 Fault Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

9 Discussion 29

10 Conclusions 3010.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

References 33

Appendix A Industrial Use Case Results 34

Appendix B Synthetic Data Results 37

ii


List of AcronymsANN Artificial Neural Network. 1, 5, 9

CNN Convolutional Neural Network. 9, 10

IMU Inertial Measurement Unit. 10

k-NN k-Nearest Neighbor. 1, 7

LSR Latent Space Representation. 5–7, 10

LSTM Long Short-Term Memory. 9, 10

MCD Minimum Covariance Determinant. 8, 9, 15

MLP Multilayer Perceptron. 20

PCA Principal Component Analysis. 5

ROS Robot Operating System. 17

SGD Stochastic Gradient Descent. 7

SVM Support Vector Machine. 1

UAV Unmanned Aerial Vehicle. 2, 9, 10

iii


1. IntroductionAs the complexity of modern technologies such as robots and automotive vehicles increases,so does the need for robust and intelligent monitoring strategies to automatically detectfaults during operation. Due to the large amount of sensor data that is constantly flowingthrough such systems, it is not feasible for a human diagnostician to analyze and detectanomalies manually from these signals. This is problematic as undetected mechanicalfaults can lead to damage to the equipment or harm to human operators, among otherthings [1, 2]. Model-based methods are widely used in the industry to detect faults inmechanical components such as motors, and this is done by mathematically modelling theexpected response when some input is given [3]. These methods rely on a priori knowledgeof the system to correctly model the expected behaviour, and have difficulties in accountingfor noise caused by external factors such as changes in the operating environment [3].

Fault detection using machine learning techniques can alleviate the issues that comeswith model-based approaches, where instead a machine learning model can be traineddetect faults from recorded sensor data as opposed to hand crafted mathematical mod-els [4, 5]. The traditional way of machine learning in the fault detection domain is toextract descriptive features in either the time, frequency, or time-frequency doman usingsignal processing techniques[3, 6]. Still, the data may contain useless information whichwill affect the stability and accuracy of the fault detection. To improve the quality ofthe extracted data further techniques are commonly applied to select the most relevantfeatures, a common approach is to use principal component analysis (PCA) to reducethe dimensionality of the data [7]. Fault detection can then be done by training a ma-chine learning model using the pre-processed data, e.g. Support Vector Machine (SVM)or k-Nearest Neighbor (k-NN) [6]. While these approaches have proved to yield very goodresults when used for fault detection [7, 8], they require expert knowledge about signal pro-cessing to design a feature extraction and selection pipeline that will produce descriptivefeatures [6, 9, 10]. In addition to the aforementioned problem, traditional models do notgeneralize well over multiple applications, often requiring a redesign of the pre-processingpipeline to migrate a model over to another use case [6].

Due to the recent advances in machine learning using artificial neural networks (ANNs),a large part of the current research in the domain are focused on the application of suchtechniques [2, 3, 4, 10, 11, 12]. In particular special types of neural networks known asAutoencoders have found interest for fault detection applications. Autoencoders have theability to extract relevant features directly from raw sensor signals requiring no complexpre-processing, leading to a greater capability for a single model to generalize over multipledifferent faults [2, 4, 12, 13]. They are also trained in an unsupervised fashion, where theylearn to reduce training samples into intermediate representations which facilitates thereconstruction back to signals that are close to the original samples [13]. Autoencoders arewell suited for anomaly detection applications as a representation of fault-free sensor datacan be learned, and consequently will yield poor reconstructions of faulty data providedthat the signals differ enough from the healthy ones.

Previous research in fault detection techniques using Autoencoders have mainly tar-geted industrial uses in areas such as manufacturing equipment or wind turbines, butapplications for these techniques can also be found in other areas such as robotics. Mo-bile robots operate in dynamic environments and can as such benefit greatly from suchtechniques as the model can be trained to extract features valuable features from manydifferent situations. Additionally, sensor data can be gathered over longer periods of timeand be used to update the weights of the model for better predictive power.

1


The goal of this thesis is to evaluate the usage of Autoencoders and the Mahalanobisdistance metric for fault detection in a four wheeled mobile robot, and is carried out incollaboration with ABB Corporate Research in Vasteras, Sweden. Additionally, the abilityfor the Autoencoder to distinguish between healthy and faulty variables in a sample isinvestigated for fault isolation purposes.

1.1. Problem FormulationIntelligent fault monitoring systems can be used to analyze sensor signals from machin-ery for early detection of faults, and as a result can prevent costly repair costs. Data-driven approaches have previously been successfully used for this purpose, which use datarecorded from the machinery to teach a model to distinguish between healthy and faultyoperating conditions. Traditional data-driven approaches typically require manual featureengineering using e.g. signal processing techniques to convert raw signals into a formatthat contains relevant information for anomaly detection, a process which can be verywork intensive. In recent years Autoencoders have been successfully applied in models foranomaly detection in various forms of machinery, where they can act as automatic fea-ture extractors from raw signals to remove the need manual signal processing steps. Themajority of research on Autoencoder based methods for fault detection has been focusedon industrial machinery, with a minority of research being conducted on robotic systemssuch as UAVs. This thesis aims to investigate how Autoencoders can be applied for faultdetection and isolation in the mobile robotics domain, and with this in mind the followingresearch questions have been formulated:

RQ1: How can we apply Autoencoders to detect faults in signals originatingfrom the drive system of a wheeled mobile robot?

RQ2: Can we use Autoencoders to isolate faults to a location based on signalsoriginating from the drive system of a wheeled mobile robot?

1.2. Research ContributionsThe main research contribution of this work is a study on the usage of Autoencodersfor fault detection in the domain of mobile robotics, which so far has not been widelyresearched. An experiment is carried out by training an Autoencoder to recognize healthysignals recorded from a mobile robot, which is then evaluated by inputting both healthyand faulty sensor signals to compare the results.

1.3. Thesis OutlineListed below is an outline of the rest of the thesis, together with brief descriptions of thecontent of each section:

• Section 2 provides the theoretical background to the work. It provides an overviewof intelligent fault detection techniques, and defines the methods used in this thesis.

• Section 3 presents other works that are related to this thesis.

2


• Section 4 describes the research methodology that was used from the initial problemdescription, to experimental design and execution, and finally to result analysis.

• Section 5 describes the assumptions regarding the system, explaining the expectedinputs and outputs.

• Section 6 documents the proposed fault detection method, where each individualsoftware component is described.

• Section 7 contains information regarding how the experiments were carried out,including data collection, pre-processing, and experimental parameters.

• Section 8 presents the results of the experiments in table and graph form.

• Section 9 provides an analysis and discussion of the results from Section 8, and triesto answer the research questions.

• Section 10 concludes the report and summarizes the results. Possible future worksare also presented here.

3


2. BackgroundA number of different strategies can be applied for fault detection in robotics, wherethe correct choice of method for a given situation will depend on many different factors.Faults may be software or hardware related, or even related to how the robot interactswith its environment and other robots [5]. An additional contributing factor to the choiceof method is the level of autonomy of the robotic system itself. Mobile robots are complexsystems with several different sub-systems that work together to achieve autonomy. Higherlevel behaviours such as localisation, navigation, and even collaboration between robotsare all parts that may experience faults that can be diagnosed. The goal of this researchis to detect mechanical anomalies, and as such the rest of this section will be focused onstrategies that can be applied to fulfill this purpose.

2.1. Model-based Fault DetectionModel-based methods rely on mathematical models of the system under normal operationor using models of known faulty behaviours, and rely on a priori knowledge of the systemwhere discrepancies between the modeled and the actual behaviour are used to detectfaults [5]. Analytical models can be applied to individual subsystems of the robot toestimate their expected output using observers and to generate a residual as the differenceof this value and the actual output [5, 14]. In a fault-free system the residual should beclose to zero, and if not it can be considered that some error has occurred [5]. The expectedoutputs of sub-systems of the mobile robot such as the revolutions per minute (RPM) ofthe motors that drive the wheels can be easily modeled as functions of time [5]. However,a major flaw with analytical models is the difficulty in accounting for the uncertaintiesand disturbances which are unavoidable when the model is applied in practice [3, 14].

2.2. Data-driven Fault DetectionData-driven approaches extract relevant information directly from sampled data, and in-clude machine learning approaches and statistical filtering techniques [5]. A major disad-vantage of relying on model-based fault detection is the difficulty in correctly modeling thecomplex behaviours of a robot, and as such data-driven methods could be more suitabledue to the sampled data being used to drive the construction of the model. Data-drivenmethods typically generalize well over multiple problem areas, and as such the same modelcould possibly be used to detect faults on many different robots [5]. These methods learnfrom previous examples of observed normal or faulty behaviours, and can then be appliedduring run-time to detect or predict faulty behaviours from the data produced onlineby the robot. Training can be supervised by using labeled data samples such that thelabels correspond to either normal behaviour or to a known fault, or be unsupervisedwhere the model can distinguish between faulty and normal behaviour solely based on thedata itself [3]. The majority of recent research regarding intelligent fault detection regardunsupervised or semi-supervised methods, where only a small amount of labeled data isprovided). This may be due to the difficulty in capturing and labeling all possible faultcases in the recorded data [5, 15], acquiring enough training data to train an accuratemodel [5, 15], in addition to the knowledge and work of a human expert influencing theaccuracy due to how the data is labeled [12, 15, 16]. Unsupervised fault detection ap-proaches include those that use clustering techniques to separate samples into categories,or anomaly detection to find outliers which can be considered as faults [5].

4


Input Encoder DecoderCode(LSR)

Output

Figure 1: The components of an Autoencoder. Some input is coded by the encoder into a formatof a lower dimensionality, which is then reconstructed by the decoder into its original form.

Machine learning models based on Artificial Neural Networks (ANNs) have gained alot of attention during recent years. An ANN can be considered as a set of processingunits referred to as neurons, each with its own set of input and output connections [17, 18].Neural networks are divided into input and output layers, with multiple hidden layers ofneurons in-between them. Each neuron is activated by a non-linear activation functionof the values produced by the weighted connections from the previous layer, the value ofwhich is passed on to the connected neurons in the next layer [18]. The weights of theinput connections are updated during training based on how well the model performed,which is expressed with a loss function [18]. Two categories of ANNs exist, referred to asfeed-forward and recurrent networks. Feed-forward networks follow the structure describedpreviously and can be seen as directed acyclic graphs, whereas each neuron in a recurrentnetwork has an additional connection to itself [18].

One of the main challenges regarding machine learning techniques is to develop a suit-able feature engineering strategy to convert the signals into a format which is workable bya classifier. For this purpose, various signal processing methods are applied to transformthe raw signals into the frequency or time-frequency domain. Other statistical techniquessuch as Principal Component Analysis (PCA) may be applied to extract the most impor-tant information from the data while at the same time reducing the dimensionality of thefeature vector used by a classifier [19, 20]. An Autoencoder is a type of ANN that can beseen as a generalization of PCA that both reduces the data to a format of lower dimension,and then reconstructs the intermediate format into a representation that is close to theoriginal [21]. The intermediate data of the Autoencoder can then be used in a similar wayas the principal components in PCA, but has some interesting characteristics which maygive it an edge for fault detection in multivariate data.

2.3. AutoencodersAn Autoencoder is an ANN consisting of two main components; an encoder that convertsthe input data into a format with lower dimensionality by learning its most distinguishingfeatures, and a decoder which reconstructs the encoded data with as small difference fromthe original representation as possible [13]. The code produced by the encoder is referredto as the Latent Space Representation (LSR), and due to its reduced number of neuronsin comparison to the input layer, it should contain only the most representative featuresof the input [12, 13]. Figure 1 shows the structure of an Autoencoder on a high level.

5


1 1

Figure 2: An Autoencoder consisting of a single hidden layer with the neurons h1, h2, ..., hk.Each neuron is fully connected to all neurons in its preceding layer in addition to an additionalbias neuron. The influence the connections have are determined by the weight matrices W1 andW2, as well as the bias vectors b1 and b2.

Autoencoders are trained in an unsupervised fashion, i.e. with unlabeled data. Toget a separation between anomalies and nominal data we can train an Autoencoder toreconstruct healthy signals, and then consider an input as faulty in when the differencebetween the reconstruction and the input is above a certain threshold [3].

For a formal definition of an Autoencoder, we can consider the function fθ(x) =h as the encoder mapping the input x = {x1, x2, ..., xn} into a feature vector h ={h1, h2, ..., hk}, i.e. the LSR [12, 22]. We then define the decoder as a function of theLSR gθ(h) = x mapping from the feature space back to the input space, producing areconstruction x = {x1, x2, ..., xn} of the input such that x ≈ x [22].

h = fθ(x) = f(W1x+ b1) (1)x = gθ(h) = g(W2h+ b2) (2)

Equations 1 and 2 show the mathematical definitions of the encoder and decoderrespectively. The functions f(·) and g(·) are typically non-linear, such as the sigmoid orhyperbolic tangent functions. Both functions are parameterized by θ = {W1,W2, b1, b2},containing the respective weights and biases of the links between the layers. The weightmatrices W1 and W2 determine the contribution of the outputs of the previous layer inactivating the neurons, whereas the bias vectors b1 and b2 determine contributions froman additional dummy neuron which is always fully activated (in other words, it is alwaysequal to 1) [18]. These parameters can be seen in Figure 2 as the weights of the linksbetween the layers.

In the training phase, the parameters initially randomized and then updated simulta-neously through several iterations to minimize some loss (cost) function. In the most basicAutoencoder, the loss for a training sample L(x, x) is equal to the squared reconstructionerror [22], as is shown in Equation 3.

L(x, x) = ‖x− x‖2 (3)

6


For a training dataset with N examples, the goal is to find an optimal set of parametersthat minimize the loss over all samples, see Equation 4. A common optimization strategyto use is Stochastic Gradient Descent (SGD) [22].

θ′ = argminθ

1N

N∑i=1

L(xi, gθ(fθ(xi))) (4)

When designing an Autoencoder, it is important to consider some strategy for pre-venting overfitting, as a generalizable model needs to produce a low reconstruction erroron both the training and testing samples [22]. Reducing the size of the latent space lowenough is one way to do this, as the encoder will be forced to discard information to keeponly coarse-grained features. However, reducing it too much will likely negatively impactthe accuracy of the model, and in these cases a better option may be to use regularizedAutoencoders, which add additional mechanisms to the process to increase the general-izability of the model [22]. A common regularization technique is to enforce a sparsityconstraint by including an additional term to the optimization objective that penalizesthe weights of the hidden neurons, which prevents certain neurons from firing [22]. Suchtechniques can be used to increase the size of the bottleneck between the encoder and thedecoder and may even allow for Autoencoders with an LSR with a higher dimensionalitythen the input data, as it prevents the model from simply duplicating the inputs [22]. L1regularization can be used for this purpose, where the penalty term is equal to the sum ofthe weights multiplied by some regularization factor λ [22, 23].

2.4. Anomaly ScoringFault detection entails separating faulty from healthy samples, and there exists severalmethods for achieving this by having an Autoencoder act as a feature extractor. Clusteringapproaches such as k-means or k-NN can be applied on the LSR to group samples based onsome similarity metric [12]. In this work, the reconstruction errors from an Autoencoderwas used, and similar approaches have been previously applied by several authors withvery promising results [10, 20, 24]. The intuition is to train an Autoencoder to model thenormal operating behaviour of the robot, so that erroneous signals will be reconstructedwith large errors at a higher frequency than that of healthy signals. Consequently, asample can be marked as faulty if the distance to some distribution of healthy samples isgreater than a set threshold, and the distance can be used as a metric of how anomalousthe sample is, i.e. an anomaly score.

2.4.1. Distance Metrics

To find anomalies a reliable distance metric is needed to score the samples based ontheir positions in relation to a known distribution of healthy samples. We can considera sample to be scored as a vector e = {e1, e2, ..., ep} with p elements, each being thereconstruction errors for some observation (i.e., sensor measurement) in o and o calculatedas in Equation 5. For one-dimensional non time-series inputs o is typically equal to thefull inputs and outputs of the Autoencoder, but in the case of two-dimensional time-seriesdata the size is equal to the number of different signals.

e = o− o (5)

The simplest metric that we can use to score the sample is its Euclidean distance tothe mean of a known distribution of reconstruction errors from healthy samples. This

7


(a) Euclidean distance. (b) Mahalanobis distance.

Figure 3: A comparison between an anomaly score based on the Euclidean distance in the leftfigure and the Mahalanobis distance in the figure on the right. The two purple samples are in thecase of the euclidean distance given the same score despite one of them being closer to the cluster,whereas the score varies for the Mahalanobis distance.

metric can be seen in Equation 6 denoted as ED(·), where µ ∈ Rp is the center of thedistribution, in other words the vector of mean values from the samples.

ED(e) = ‖e− µ‖ (6)

In cases where the variables are uncorrelated the Euclidean distance may be a usefulmetric to find outliers. However, in cases where some correlation between the variables canbe observed it will not provide useful information in terms of the distance to the cluster,as can be seen in Figure 3a. The Mahalanobis distance which is defined in Equation 7 is ameasure better suited for multivariate data with correlations as it considers the distributionof the points, which is done by incorporating the covariance matrix [25]. Figure 3b showshow the Mahalanobis distance provides a better estimate of the anomalousness of a sample,as points which are further away from the cluster are scored higher than those closer toit.

MD(e) =√

(e− µ)Σ−1(e− µ)T (7)

A further extension is the robust Mahalanobis distance which reduces the influenceof anomalies in the training data by computing a covariance matrix through MinimumCovariance Determinant (MCD) estimator. The MCD estimator finds a subset of sampleswith a given size from the original dataset which minimizes the determinant of the covari-ance matrix [26]. Equation 8 provides the definition of the robust distance Mahalanobisdistance, where µ is the robust location of the distribution (the mean of the samples se-lected by MCD). See the paper by Rousseeuw and Driessen [27] as well as the paper byHubert and Debruyne [26] for further information regarding MCD and its uses in anomalydetection.

RD(e) =√

(e− µ)MCD−1(e− µ)T (8)

8


3. Related WorkHistorically the problem of detecting faults in machinery has been approached in severaldifferent ways. Model-based approaches have found a lot of success, but in recent yearsthe research in data-driven methods have picked up speed. An early example is the workconducted by Goel et al., where an intelligent monitoring system for a mobile robot wasdeveloped by analyzing sensory signals from wheel encoders and gyroscopes [28]. A bank ofKalman filters which were tuned to known faults generated residuals, and an ANN trainedunder supervision was applied to provide a final classification. An average Mahalanobisdistance for each filter with their respective output residuals and covariance matrices wascalculated from multiple time-steps. The final classification was produced by feeding theaverage distances of each filter to the ANN to produce a final probability distribution, andthen selecting the final fault class as the most likely candidate from the distribution.

A more recent example include the work of Fu et al., where deep learning with a hybridCNN and LSTM network was applied to detect and isolate actuator faults in UAVs [29].The training dataset contained signals such as power outputs and attitude data fromsix different actuators. Faults were pre-recorded and labeled to correspond to the faultyactuator to facilitate fault classification by the network.

Unsupervised methods have also been explored to both detect and isolate faults. Themain advantage of these methods is that the training data can be reduced to only containnominal data to construct a model which is capable of detecting anomalies in online signalsin comparison to known healthy patterns, and as a result being able to cover a wider span ofpossible faults with a smaller dataset. Khalastchi et al. explored a fault detection methodbased on the Mahalanobis distance metric to find outliers in recent sensor readings, bycomparing it to previously sampled data [25]. The model is trained during run-time byutilizing a sliding window with to select a fixed number of recent readings, excluding themost recent ones. From the samples in a window, sets of correlated attributes are found bycomputing the Pearson correlation coefficient of each attribute pair, and discarding thosethat are below a set threshold. The Pearson correlation coefficient was also used by Zhaoet al. to capture correlations between signals for fault detection in electric generators [30].

In recent years Autoencoders have been applied as a replacement for manual featureengineering techniques, and have shown to be capable in capturing non-linear relationshipsbetween variables without supervision [10]. Wu et al. employed a multi-level denoisingAutoencoder to detect faults in wind turbines based on multivariate data sampled froma SCADA system [10]. The Autoencoder was trained to reconstruct healthy signals, andwith the trained network the same signals were once again passed through the model tooutput a set of healthy reconstruction errors. From the errors a MCD covariance matrixwas computed, which was then used to derive a distribution of anomaly scores with therobust Mahalanobis distance metric. A fault detection threshold was then set from thedistribution by integrating a Gaussian kernel density function up to some confidence level.

Liang et al. used a similar method for fault detection in pumps [24]. Reconstructionerrors from a sparse autoencoder was computed during run-time, and the Mahalanobisdistance with respect to a distribution of healthy reconstruction errors was used to de-termine anomaly scores. A dynamic fault threshold was also computed ahead of time byintegrating the density estimate with a Gaussian kernel from the healthy distribution tosome confidence level. When the anomaly score of a sample was greater than the thresholda fault is assumed, and the fault was then isolated to a subset of the input variables asthe ones with greatest squared prediction error between the inputs and reconstructions.

Park et al. used a variational Autoencoder to detect faults in raw sensor readings from

9


a manipulator robot used for assisted feeding tasks [31]. The encoder and decoder werecombined with LSTM units to better account for the temporal features of the signals.Additionally, Gaussian noise was added to the signals to introduce denoising capabilitiesto the model. Fault thresholds are set online by applying support vector regression witha radial basis function kernel to the LSR of the variational Autoencoder, while anomalyscores were computed as the negative log-likelihood of an input.

Sadhu et al. investigated a combined fault detection and diagnosis framework basedon deep learning for real-time usage in UAVs [20]. The framework consisted of an initialAutoencoder based on 1D CNNs and bi-directional LSTM units to act as the fault detector,and a CNN and LSTM classifier to perform the diagnosis in case a fault was detected.Sampled raw IMU data recorded during non-faulty operation of the UAV was used totrain the Autoencoder, and once the network was trained the samples were inputted onceagain compute the reconstruction errors and fit them to a Gaussian distribution. TheMahalanobis distance of the errors for each datapoint with respect to the fitted distributionwas used as anomaly scores and a fault threshold was set based on the top 0.01% scoringsamples.

As can be seen by the previously described works, feature extraction techniques basedon Autoencoders have been applied in heavy static machinery such as pumps and windturbines. For autonomous systems we can refer to the works of Park et al. and Sadhuet al. where Autoencoders have successfully been used to extract representative featuresfor robot manipulators and UAVs. The data collected by these authors may not berepresentative for similar applications in the domain of mobile robotics however, and tothe best of our knowledge no previous works have been published regarding fault detectionusing Autoencoders in this area.

10


4. Research MethodologyA framework for research proposed by Holz et al. was used as a base to more concretelyformulate the research methodology, consisting of four questions as listed below [32].

1. What do we want to achieve? - The goal of the research is to evaluate if Autoen-coders are applicable for the purpose of fault detection of mechanical faults in mobilerobotics.

2. Where does the data come from? - An experiment is conducted to produce the data.Sensor signals recorded from the drive system of a mobile robot are used to traina machine learning model, and the outputs of the model are then used to evaluateits performance. Both healthy signals, and signals recorded when faults have beeninjected into the drive system are recorded.

3. What do we do with the data? - With the data we try to identify patterns in themodel’s output when different types of signals are provided, i.e. if we can see anydifferences between healthy and faulty signals.

4. Have we achieved our goal? - By inspecting the data we can draw conclusions as towhether or not the model that was developed performed well for the purpose.

Based on the points above, the research methods used in this thesis are literaturereviews, empirical experiments, synthetic experiments, and statistical analysis. The re-search process followed a structure as described by Robson and McCartan [33], where aninitial literature study into relevant works was used to guide the formulation of researchquestions and the design of an experiment. An experiment consisting of a base-line andvarious treatments was used, where the base-line consisted of healthy samples and thetreatments were samples recorded during various fault cases. In Figure 4 the main stepsof the research process that is based on the work by Nunamaker and Chen [34] can beseen, and each step is detailed below:

• An initial problem description was provided by the industrial partner regardingwhether artificial intelligence techniques can be used to detect faults in their mobileplatform.

• A literature study was carried out to get a better understanding of the problem, inaddition to mapping out the state-of-the-art in the major problem areas which arefault detection, isolation, as well as diagnostics. From this phase information wasgathered about methods that could be applied to the given problem, in addition tohow the problem could be used to gain interesting insights from a research perspec-tive. A large part of the thesis was spent on this step. The literature study was

LiteratureStudy

ProblemFormulation

ExperimentDesign

ExperimentExecution

ResultAnalysis

Initial ProblemDescription

Figure 4: A diagram outlining the major steps of the research process as boxes in chronologicalorder from left to right, where the arrows show the transitions between steps. In certain casesrevisions to the previous steps had to be made as new knowledge was gained further down theprocess, and as such arrows can point both forward and backwards.

11


conducted in several iterations, as knowledge gaps that were found in some of theother steps would require further study of the state-of-the-art.

• From the information gained by the literature study, a more concrete problem for-mulation was made. The problem formulation consists of a selection of researchquestions that would later guide the design and execution of the experiment, butchanges to the questions could be made at a later stage based on additional knowl-edge gained from the experiments.

• In the experiment design phase the core of the work was formulated and the designof the model was made, which was then implemented and used during the experi-ment execution stage. Two separate experiments were created and executed, to (1)investigate how the model performed by changing various parameters such as theAutoencoder’s number of neurons and the size of the input samples, and (2) to eval-uate the best performing model from the first step’s capability detecting faults. Thesecond experiment was conducted to answer the research question and was designedas a single-case experiment [33], with a base-line generated by inputting healthyobservations into the model, and several treatments in the form of sensor readingsrecorded after various types of faults had been injected into the drive system. Tobe able to answer RQ1, the anomaly scores produced by the model was observed,whereas the prediction error was used to answer RQ2.

• The outputs of the experiments were finally studied in the result analysis stage usingstatistical methods, and conclusions were made based on visual observations of howthe anomaly scores and prediction errors differed from the base-line after varioustreatments had been applied.

As can be seen in Figure 4 the research was carried out as an iterative process, whereit was possible to return to previous steps after additional knowledge had been attainedin the later stages. This was applied in practice during at several times during the work.An example of how this was used is by conducting additional literature studies after aproblem that was too big for the scope of thesis had been formulated, or by returning tothe experiment design step to revise the model after it was found out to not perform wellenough on the test data.

12


5. System ModelIn this section any assumptions made regarding the system will be described, includingregarding its inputs and outputs. The data that is processed by the system consists ofsensory signals sampled from a robot, and we define the number of signals as p from anunbounded time-series. Our system will processes one sample at a time, and to be ableto also capture temporal information we wish to process some number of time-points at atime. We define a single sample as a two-dimensional matrix X ∈ Rp×w, where w is thenumber of time-points. The output is a binary value designating whether the sample iseither nominal or anomalous, and as such we define the entire system as a function F (X) =y, where y ∈ {¬Anomalous,Anomalous}. Prior to online fault detection, the model mustbe trained from a dataset of healthy samples, which we define as D = {X1,X2, ...,Xn}.

13


6. Autoencoder-Based Fault DetectionIn this section a concrete formulation of the proposed method is made, and the differentcomponents making up the system are explained. The method consists of an initial modeltraining step where patterns in healthy signals are learned using a pre-recorded dataset,and an online fault detector where the model produces predictions regarding signals sam-pled in real-time. A diagram showing the offline training steps can be seen in Figure 5,and a similar diagram for how anomalies are detected in real-time is visible in Figure 6.

6.1. Fault Detection

Offline Training

Weightsand BiasesTrain

Autoencoder

Reconstruction Errors

Autoencoder

AnomalyScoresMahalanobis

Distance

Training Data

Fault ThresholdDetermine

Threshold

Covariance Matrix,Location

MCD

Figure 5: A diagram showing the offline training process of the fault detector. The training datacontaining healthy samples is first used to train the weights of the neural network, and are thenonce again passed through the Autoencoder to produce a set of reconstruction errors to be usedfor determining the online fault detection threshold. Anomaly scores are assigned to the errorsbased on the covariance and central tendency of the error distribution from an MCD estimator,and these scores are then used to set the threshold to correspond with a top percentage of themost anomalous samples.

The model relies on an initial feature extraction step using an Autoencoder, which isfollowed by an anomaly detection scheme based on the Mahalanobis distance metric. Wefirst train the model offline using a dataset of recorded healthy signals, where the weightsand biases of the Autoencoder are set, and a fault threshold is determined. A diagramdetailing the training process of the fault detector can be seen in figure 5, where thetrained weights and the fault threshold are later passed on to the model used for anomalydetection in online data.

6.1.1. Autoencoder

The Autoencoder used in this work consists of a single hidden layer. For both the encoderand decoder networks the sigmoidal activation function is used as in Equation 9, whichmaps its input to the range [0, 1].

σ(x) = 11 + e−x

(9)

We train the network using the squared reconstruction error as the cost function whichwas previously defined in Equation 4, and we also add a L1 regularization term for allweights in the network with the regularization parameter λ = 10−5.

14


Online Fault Detection

ReconstructionErrors

Autoencoder

Weightsand Biases

AnomalyScoresMahalanobis

Distance

Covariance Matrix,Location

Fault Threshold Threshold

Comparison

UnknownSample

YesFaulty? Fault

Isolation

Figure 6: A diagram of the online fault detector, which utilizes the pre-computed Autoencoderweights, covariance matrix, and robust location. A single sample is processed at a time, wherea sample error is computed based on the reconstructed signal from the Autoencoder. Followingthis, an anomaly score is assigned by computing the Mahalanobis distance from the previouslycomputed distribution of healthy training samples. Finally, the score is compared with the faultthreshold. Only if the sample is determined to be faulty, the fault is isolated.

6.1.2. Anomaly Scoring

The mean squared error over the time points in a window is used as the reconstructionerror measure for a sample, and this is done to produce an unified error for the entiresample such that e ∈ Rp. In other words, we derive a sample error by first computing thesquared error between the original p signals and their reconstructions, and then averagethese values over all time-points in the sample window. During the training phase adistribution of sample errors are computed from the same healthy signals that was usedto train the Autoencoder in the previous step, and based on this distribution a covariancematrix and robust estimation of central tendency is computed using the MCD estimator.The covariance matrix and robust location are then used to compute anomaly scores usingthe robust Mahalanobis distance, which can be seen in Equation 8.

6.1.3. Fault Threshold

The fault threshold is an important part of the fault detector as it has a large effect onthe model’s ability to detect true faults from faulty signals, in addition to the numberof false positives as a result of from healthy ones. During training, a threshold is set bysetting the threshold to correspond to a certain percentage of the highest anomaly scoresproduced from the healthy training data. Any sample that during testing or online use hasan anomaly score exceeding this threshold will be predicted as anomalous by the model.

6.2. Fault IsolationThe previously described fault detection module only signals if some fault has happened,but does not provide any information of where it originates from. Inspired by a previousapproach taken by Liang et al. [24], we compute a two-dimensional map using the Q-statistic (the squared prediction error) as in Equation 10 from the elements of the inputand reconstruction matrices.

15


Qi,j = (xi,j − xi,j)2, (i = 1, ..., p), (j = 1, ..., w) (10)

In the error map the signals are separated row-wise and as such we can isolate thefaulty variables to the rows with the largest contribution in the map. Furthermore weisolate the fault to a given component as the set of signals with the largest contributionwhich originate from it.

16


7. ExperimentsThis section details how the experiments were carried out, including information on howdata was collected and processed, the architecture of the model, and various experimentalparameters. The experiments were carried out on two different sets of data, where thefirst dataset contained sensor measurements recorded from a mobile robot, and the secondwas synthetically created.

7.1. Industrial Use CaseTo evaluate the model for real world applications a dataset was created from signalsrecorded from a four-wheel omni-directional mobile platform. A model of the target plat-form can be seen in Figure 7. At the highest level an Intel NUC mini-pc running RobotOperating System (ROS) [35] can be seen, which communicates with a STM32 F7 ARMCortex microcontroller through a network switch. The STM32 controller coordinates andcommunicates with drive units 1 to 4 over CAN. Each drive unit actuates separate brush-less DC steering and wheel drive motors placed at the four corners of the mobile base, i.e.at the front left, front right, rear left, and rear right positions.

ROS

Intel NUC

STM32 F7

Drive Unit 1 Drive Unit 2 Drive Unit 3 Drive Unit 4

NetworkSwitch

CAN 1 CAN 2

Ethernet

CAN

Figure 7: A diagram showing the components of interest and their connections in the distributedarchitecture of the robot.

7.1.1. Diagnostics Routine

In order to simplify the data collection process a diagnostics routine was created, whichintended to stress the motors such that faulty signals could be discerned from healthy

17


ones. One full execution cycle of the diagnostics routine is described below:

1. Each wheel is rotated in turn by actuating the steering motors at different velocities.

2. The wheels are aligned in a circle and the entire base is rotated in-place back andforth four times with increasing velocity.

3. The base is rotated with a constant velocity, and one steering motor at a time isactuated to rotate the wheel to -4 degrees, then to 4 degrees, then back to 0 degreesrelative the starting rotation.

4. The base is rotated with a constant velocity, and for one wheel motor at a time aswitch from velocity control to torque control is made with two different referencevalues (2 Nm and −2 Nm).

7.1.2. Fault Injections

To provide a better understanding of the model’s capability to detect faults, two differenttypes of artificial sensor faults were injected into the drive units. A description of each ofthese faults are listed below:

• Encoder alignment fault (1 degree) - An error of 1 degree is introduced into thealignment of the rotor magnetic field relative the stator which is used by the motorencoder to produce an offset.

• Steering motor absolute angle error (3 degrees) - An error of 3 degrees areintroduced into the angle of the steering motor encoder. This causes the wheel toalways be slightly misaligned with respect to its expected rotation.

7.1.3. Data Collection

A dataset was created by recording signals directly from the robot in a lab environment.The signals were recorded into a ROS bag format at 50Hz from a remote laptop connectedto the on-board network switch over Wi-Fi, and the bag files were then converted into aCSV format to be processed by the model. Torque signals from the drive and steeringmotors of all wheels were separated for use to train the final model as they providedpromising results after preliminary tests, resulting in a total of 8 different signals, as canbe seen in Table 1.

Origin Signal Unit

Drive Motor Torque NmSteering Motor Torque Nm

Number of collected signals: 8

Table 1: Actuator signals collected from each of the four drive units.

A description of the complete dataset is shown in Table 2. One complete diagnosticscycle was recorded for each of the injected faults, and 12 cycles for the healthy data. Beforeeach diagnostics cycle the robot was moved to a different location, and was also rotatedto have varying starting orientations. In Figure 8 the eight torque signals recorded duringone of the healthy cycles are shown.

18


Type Collected cycles

Healthy 12 CyclesFront right encoder alignment fault 1 CycleRear right encoder alignment fault 1 Cycle

Front right steering motor absolute angle error 1 Cycle

Table 2: Number of diagnostics cycles collected for each fault class.

0 50 100 150 200Time (s)

10.0

7.5

5.0

2.5

0.0

2.5

5.0

7.5

10.0

Torq

ue (N

m)

Healthy SignalFront Left Drive WheelFront Left Steering

Front Right Drive WheelFront Right Steering

Rear Left Drive WheelRear Left Steering

Rear Right Drive WheelRear Right Steering

Figure 8: Healthy torque signals recorded during one of the diagnostics cycles from the mobilerobot in the industrial use case.

7.2. Synthetic DataTo further test the model a synthetic dataset was generated to tweak the model andgenerate additional results, and was used for the initial design and testing of the modelbefore the real world data for the industrial use case could be recorded. The completedataset contained four different waves that were generated with varying amplitudes andfrequencies, and Gaussian noise was also added to further diversify the data. Data tosimulate healthy signals were generated with fixed values for waveform amplitude andfrequency, and the noise was generated with a fixed standard deviation. In a similarfashion to the industrial datasets, the signals were sampled at 50Hz. A sequence of faultydata was generated by in turn increasing the amplitude, frequency, and noise standarddeviation, with the goal of investigating how the model reacts to changes in these values.

7.3. Data Pre-ProcessingIn the experiments the data was pre-processed by standardization and normalization,where the range of the elements in the datasets are mapped to the interval [0, 1]. Afterthis the signals are divided into samples of size w × p such that they can be fed to themodel. Data standardization is done as is shown in Equation 11, where x is the raw dataprior to standardization, µ ∈ Rp and s ∈ Rp are the mean and standard deviations ofthe signals computed from the healthy training data. After standardization, the data isnormalized to lie within the range [0, 1], to speed up the learning process.

xi = xi − µs

(11)

19


The datasets are then divided into equally sized blocks using a sliding window approach.The width of the window is set in accordance to the expected length of each sample(w), and this window is then moved across the data sequence with a set step size of 1to capture as much information as possible from the signals. At each step across thesequence, a sample is created by extracting the sensor values within the window.

7.4. Model TrainingThe training of the model was done using the healthy data, which was split up intosubsets by keeping two of the recorded cycles for validation and testing, and the rest fortraining. The validation data is used to provide a metric of how well the Autoencoderperforms after each training epoch, while the testing set is used to assess the predictiveskill of the full model. The data is split up into smaller samples using the sliding windowtechnique separately for each cycle. The samples from the 10 different diagnostic cycles inthe training set are then merged into one unified segment of samples, and 101965 trainingsamples were extracted in this way. No additional pre-processing was made on the signalswith the exception to standardization and normalization as previously discussed, and assuch the signals can be considered to be raw.

Table 3 shows the parameters of the experiment. The window size is fixed at 16, anda step size of 1 is used to capture as many features as possible from the signals. Severalmodels are created to assess the impact of the number of hidden neurons on the predictiveperformance of the model, controlled by the parameter k.

Parameter Value

Number of hidden neurons (k) [8, 16, 32, 64]Window size (w) [8, 16, 32]Window step size 1Threshold top % 0.1%

Table 3: Experimental parameters.

In Table 4, the structure of the Autoencoder can be seen. We utilize a MultilayerPerceptron (MLP) structure for the network using fully connected layers, where eachneuron in a layer has connections to all neurons in the neighboring layers (excluding thebias neurons). As can be seen, the size first fully connected layer (i.e. the hidden layer) iscontrolled by the parameter k. The size of the input layer and the second fully connectedlayer (the output layer) is controlled by w and p, which is the shape of a sample window.

Layer Number of neurons

Input layer w × pFully connected layer kFully connected layer w × p

Table 4: Network structure of the Autoencoder.

Training was carried out on an external laptop running Ubuntu 20.04, with a 2.8GHzIntel i5-6500U CPU with 2 physical and 4 logical cores. The model was implemented inPython 3.8.5, where the neural network was created using Tensorflow [36] version 2.2.0

20


with Keras [37]. Scikit-learn [38] 0.24.1 was used for fast MCD estimations, as well as forMahalanobis distance calculations. Additionally, NumPy [39] version 1.18.4 was used.

In Table 5 the hyperparameters that were used to train the Autoencoder can be seen.The loss function used is the mean squared error between the input and output layers,as defined in Equation 3. The Adam optimizer was used to update the weights in thenetwork with the learning parameters that are listed in Table 6. For more informationon the Adam optimization algorithm, see the paper by Kingma and Ba [40]. The modelwas trained with an early stopping patience of 20 epochs (i.e., the training stops after 20complete iterations through all samples without a reduction in validation loss), or until amaximum of 400 epochs. Additionally, a batch size of 128 samples per iteration was used.

Hyperparameter Value

Loss function MSEOptimizer Adam

Learning rate 0.001Max epochs 400

Early stopping patience 20Batch size 128

Table 5: Tensorflow hyperparameters for the Autoencoder network.

Parameter Value

α 0.001β1 0.9β2 0.999ε 10−7

Table 6: Learning parameters for the Adam optimizer used in the experiments.

21


8. ResultsIn this section the results of the experiments in the form of graphs and tables are presented.To investigate the impact on number of hidden neurons and the size of the sample win-dow on the predictive accuracy of the model, an initial experiment was conducted wherechanges were made to the parameters k and w as can be seen in Table 3. The parameterswere investigated individually by first fixing the window size w to 16 and tweaking thenumber of hidden neurons k, after which k was fixed to the value that performed the bestwhile changes were made to w. In Tables 7 and 8 the results of changing the parameterscan be seen for the synthetic dataset and the data from the mobile robot in the industrialuse case respectively. The measures of performance in these tables are the percentage oftrue positives (i.e. the percentage of anomalous samples over all samples in the faultydatasets), and the percentage of false positives (i.e. the percentage of anomalous samplesfrom the healthy test datasets). A sample is reported as anomalous if it exceeds the faultthreshold set during training, which corresponds to the anomaly score of the top 0.1%highest scoring sample from the healthy training data. Figures showing the impact theparameters have on varying top percentages for determining the fault threshold (from 0%to 100%) can be seen in Appendices A and B for further reference.

k 8 16 32 64 16 16 16w 16 16 16 16 8 16 32

True Positives 60.29% 55.47% 52.78% 49.16% 25.99% 55.47% 60.08%False Positives 0.55% 0.18% 0.15% 0.04% 0.04% 0.18% 0.41%

Table 7: Reported percentage of true and false positives with varying parameter values from thesynthetic dataset.

The results from the synthetic dataset that can be seen in Table 7 show that a smallernumber of hidden neurons provide the greatest percentage of both true and false positives,while the opposite can be seen as the number of neurons increases. Selecting the bestparameter here depends on whether a large number of true positives or a low numberof false positives is desired, and is as such dependant on the situation. With this inmind k = 16 was selected, as it provides a good middle point between both variables. Asimilar pattern can be seen for the window size, as a larger window size increases boththe percentage of true positives and false positives. The smallest tested window size (8)performs quite poorly however, producing a percentage of true positives that is roughlyhalf as large than when w = 16. Because of this, w = 16 was chosen for further experimentsas it performed the best overall.

k 8 16 32 64 64 64 64w 16 16 16 16 8 16 32

True Positives 0.09% 0.11% 0.61% 1.14% 0.96% 1.14% 1.22%False Positives 0.11% 0.11% 0.11% 0.14% 0.11% 0.14% 0.16%

Table 8: Reported percentage of true and false positives with varying parameter values from thedataset recorded from the mobile robot in the industrial use case.

We can see some differences between the synthetic dataset and the industrial use caseby comparing the previous table to Table 8, which contains the results of changing the

22


same parameters for the torque signals recorded directly from the mobile robot. The mostprominent feature of the results are the overall low percentages of both true and falsepositives. Secondly we can see that the highest number of hidden neurons give the highestpercentage of true positives, while the lower numbers perform better in terms of loweringthe amount of false positives. It is worth noting here that the size of the input is larger forthese models (8 input signals as opposed to 4 for the synthetic data), which may have animpact on these results. A hidden layer size of k = 64 was fixed when tweaking the size ofthe sliding window, and the value w = 16 was chosen for further experiments. In Table 9the final parameters of both models can be seen, in addition to the fault thresholds.

Dataset k w Fault thresholdSynthetic 16 16 5.70Industrial 64 16 336.46

Table 9: The final parameters chosen for both datasets and used for further experimentation.

8.1. Fault DetectionIn Figure 9 a box plot showing the distribution of anomaly scores for the synthetic datasetscan be seen. A noticeable difference between the anomaly scores for both classes can beseen, with a median score of around 2 and 8 for the healthy and faulty datasets respectively.Figure 23 in Appendix B provide a different view of these distributions in the form ofhistograms.

100 101

Anomaly Score

Faulty

Healthy

Figure 9: A logarithmic scale box plot showing the spread of anomaly scores for the synthetictest data.

Figures 10 and 11 show a comparison between the original signals and the anomalyscores assigned by the model for the synthetic healthy and faulty test sets. The faultthreshold fits the healthy data well in this case, with only occasional samples exceedingit. For the faulty data we can see an increase in anomaly scores, with several large areasthat exceed the fault threshold. Increasing the amplitudes as is the case in the interval[0, 1000] does not seem to produce large anomaly scores, although some increases canbe seen at the end of this interval. The model seem more sensitive to changes in thefrequency of the waveforms, as large scores can be seen in the interval [1000, 2250] wherethis parameter was increased when the signals were generated. It does also seem to beable to detect noise in the signals, in this case Gaussian noise, which is evident in the timeinterval [2500, 3500] where increases in the standard deviation of the noise generator alsoproduced larger anomaly scores.

23


0 10 20 30 40 5010

5

0

5

10

Valu

e

0 10 20 30 40 50Time (s)

0

20

40

60

80

Anom

aly

Scor

eHealthy

Figure 10: ”Healthy” signals from the test sets from the synthetic data, and the predicted anomalyscores. The fault threshold with value 5.70 is denoted by the red line in the bottom graph.

0 10 20 30 40 50 60 7010

5

0

5

10

Valu

e

0 10 20 30 40 50 60 70Time (s)

0

20

40

60

80

Anom

aly

Scor

e

Faulty

Figure 11: Differing signals (i.e. ”faulty” signals) from the synthetic dataset, and its anomalyscores. The fault threshold with value 5.70 is denoted by the red line in the bottom graph.

For the industrial use case we can see a breakdown of the distribution of anomalyscores for all the recorded fault classes and the healthy test case in Figure 12. The scoresfor both encoder alignment fault cases follow the same distribution as the healthy testdata, and as a result the model will not be able to detect these faults from the providedsignals. On the other hand we can see that the absolute angle fault injected in the frontright steer motor has a different distribution of scores, signifying that the model has beenable to detect some anomalies in these samples. Refer to Figure 20 in Appendix A for thehistograms detailing these results.

100 101 102

Anomaly Score

Steer motor absolute angle error front right

Encoder alignment fault rear right

Encoder alignment fault front right

Healthy

Figure 12: A logarithmic scale box plot showing the spread of anomaly scores for all test casesfrom the signals recorded from the mobile robot in the industrial use case.

Figure 13 shows a plot of the healthy input signals from the industrial test samples and

24


the anomaly scores produced by the method. As can be seen, a majority of the scores arelow but with occasional large spikes which exceed the threshold, which is in line with thethreshold being set to the anomaly score of the top 0.1% scoring sample. In comparisonto the synthetic data, the threshold does not fit the majority of the data well due to thesespikes in anomaly scores.

0 50 100 150 20010

5

0

5

10

Torq

ue (N

m)

0 50 100 150 200Time (s)

0

200

400

600

800

Anom

aly

Scor

e

Healthy

Figure 13: Healthy signals produced by the mobile robot during diagnostics cycle, and thepredicted anomaly scores. The fault threshold with value 336.46 is denoted by the red line in thebottom graph.

Similar results can be seen from the two encoder faults in Figure 14, which is asexpected considering their respective anomaly score distribution in Figure 12 being closeto the healthy data. It seems that there are no noticeable differences in the input signals inthis case, which may due to the injected error not being severe enough to cause a differencein the output torque of any motor.

To contrast, Figure 15 shows the anomaly scores for the absolute angle fault of the frontright steering motor. Here we can see large differences in anomaly scores in comparisonto the previous examples. Similarly to the other samples we can observe the same kindof spikes at the start of the diagnostics routine, followed by several large areas exceedingthe threshold with increasing severity as the velocity increases during the second step ofthe routine. We can also see some larger areas during the second and third step where theanomaly scores are larger than that of the nominal case, but not severe enough to exceedthe threshold and be flagged as anomalous.

25


0 50 100 150 20010

5

0

5

10

Torq

ue (N

m)

0 50 100 150 200Time (s)

0

200

400

600

800

Anom

aly

Scor

eEncoder alignment fault front right

0 50 100 150 20010

5

0

5

10

Torq

ue (N

m)

0 50 100 150 200Time (s)

0

200

400

600

800

Anom

aly

Scor

e


Figure 14: Faulty signals from an injected alignment fault into the front and rear right wheelencoders of the mobile robot, and the anomaly scores produced by the model. The fault thresholdwith value 336.46 is denoted by the red line in the bottom-most graphs.

0 50 100 150 20010

5

0

5

10

Torq

ue (N

m)

0 50 100 150 200Time (s)

0

200

400

600

800

Anom

aly

Scor

e


Figure 15: Faulty signals from a diagnostics cycle with an absolute angle error fault injected intothe front right steering motor of the mobile robot, and its resulting anomaly scores. The faultthreshold with value 336.46 is denoted by the red line in the bottom graph.

8.2. Fault IsolationThe fault isolation strategy builds upon the premise of being able to identify the failingcomponents from sensor readings. We will only examine the data from the industrial usecase for this, as it contains known faulty components. Moreover, we will only use the datafrom the front right steering motor angle error fault, since it was the only fault case that

26


showed any differences from the samples of the nominal test case.In Figure 16 the mean Q-statistic (Equation 10) of each cell (sensor and time pair)

computed over several fixed intervals can be seen. The largest differences in anomaly scoresbetween this case and the nominal data can be observed in the time interval of [70s, 200s](see Figure 15, and as such the prediction errors have been calculated separately in fourdifferent slices of this interval to investigate which signals have the largest reconstructionerrors. Additionally, the values have been normalized to lie within the interval [0, 1] forvisualization purposes.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Time in Sample

Front Left Drive WheelFront Left Steering




Steer motor absolute angle error front rightTime Interval [80s, 120s]

0.0 0.2 0.4 0.6 0.8 1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Time in Sample






0.0 0.2 0.4 0.6 0.8 1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Time in Sample






0.0 0.2 0.4 0.6 0.8 1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Time in Sample






0.0 0.2 0.4 0.6 0.8 1.0

Figure 16: Map of the squared prediction errors averaged over four different intervals for theabsolute angle error fault in the front right steering motor. The different signals are organizedrow-wise whereas the time-points in the sample window are stacked column-wise.

From the images in Figure 16 it can be observed that the largest errors in all cases canbe found in the signals originating from the rear right wheel drive motor, despite the faultbeing injected in the front right steering motor. In Figure 17 the averaged errors over thefull samples for each signal have been plotted for the same test case. We can see thatthe largest errors are not produced for the signals originating from the front right motors,which is where the fault was injected.

27


050

100

150

200

1050510

Torque (Nm)Fr

ont L

eft D

rive

Whe

elFr

ont L

eft S

teer

ing

Fron

t Rig

ht D

rive

Whe

elFr

ont R

ight

Ste

erin

gRe

ar L

eft D

rive

Whe

elRe

ar L

eft S

teer

ing

Rear

Rig

ht D

rive

Whe

elRe

ar R

ight

Ste

erin

g

050

100

150

200

Tim

e (s

)

0.00

0.02

0.04

0.06

0.08

0.10

SPESt

eer m

otor

abs

olut

e an

gle

erro

r fro

nt ri

ght

Figure 17: A comparison between the original signals from the absolute angle fault in the frontright steering motor, and the average signal errors for each sample.

28


9. DiscussionThe results provide some insights into answering the research questions, but to be able todraw more concrete conclusions a larger dataset that includes more fault cases is necessary.Signals from both encoder alignment fault cases follow closely to that of the nominal case,and as such the pattern of anomaly scores are almost identical to the scores producedfrom the healthy test data. However, we can definitely see that the approach has somepotential by considering the results from feeding the model with data from the absoluteangle error fault case as shown in Figure 15, where major differences can be seen in thetime interval of the second and third steps of the diagnostics routine.

In all the test cases, we see large spikes in anomaly scores during the first and thirdstep of the routine where the signals are quite noisy. This has major implications onthe accuracy of the model, and also impacts the selection of a good fault threshold in anegative way. In the experiments the threshold was selected from the top 0.1% scoringsample from the training data, which resulted in a threshold at around the value of theseoutliers. As a consequence of this, several areas which are visibly different would notexceed this threshold, and are therefore not interpreted as anomalous by the model. Forinstance, consider the time interval [70s, 160s] in Figure 15, where multiple large areasof successively increasing anomaly scores can be seen. As a result of the fault thresholdbeing set too high, most of these noticably anomalous samples are not classified as such.The spikes in anomaly score indicates that the Autoencoder has issues in learning a goodenough representation of the noisy data. Hence, future improvements to the model mayneed to be made in either denoising the data as an initial pre-processing steps, or byinvestigating how a denoising Autoencoder would perform with the same data.

The final experiment was to investigate if the signals reconstructed by the Autoencoderprovided information regarding which signals was faulty, and as such pin-pointing thefaulty component for isolation purposes. For this purpose the squared prediction errorbetween the original and reconstructed signals were used, with the hypothesis that theerrors of the faulty signals would be greater in a given sample than that of the healthyones. As can be seen in Figures 16 and 17 the reconstruction errors for the absolute anglefault in the front right steering motor can not be used to confirm this hypothesis, as mostof the large errors seem to point towards the rear right wheel motor being faulty. Byinspecting the input signals we can however see differences between the front left and rearright wheel drive motor signals and their healthy counterparts, and as such a spread inreconstruction errors as can be seen in the figure is expected. Hence, torque signals maynot provide valuable enough information as is, and an additional layer might need to beadded to the model to map such patterns to their actual causes. Further experimentationwith a larger dataset of faults is needed here to draw any real conclusions however, as thesignals from the other fault cases do not provide any indication of being faulty, and canas a result not be used.

The results can be partly used to answer the research questions of the thesis. Aswas shown in the experiments carried out on the industrial dataset, we can see thatthe Autoencoder has the capability of learning valuable features in order to differentiatebetween healthy and faulty behaviour (RQ1). The experiments did not confirm if theAutoencoder can be used to isolate faults to the failing component by examining thereconstructed signals, as interference due to anomalous behaviours in the torque outputsfrom other motors could be observed which affected the reconstruction errors (RQ2).

29


10. ConclusionsThis thesis has investigated the use of an intelligent fault detection and isolation strat-egy for mobile robots based on a model using a sparse Autoencoder to extract features,and the Mahalanobis distance metric to assign anomaly scores to samples. A datasetof torque signals from eight different motors was recorded during several iterations of apre-defined diagnostics routine designed to stress the motors, and contained samples ofhealthy behaviours as well as three different types of injected faults. It was shown that theAutoencoder was able to learn a representation of the healthy signals in order to producea noticeable different reconstruction when unknown signals were inputted to the model.Three different fault cases recorded from a mobile robot were tested, but the model wasonly able to detect one out of these faults as the signals produced by the others did nothave any noticeable difference from the healthy case. A fault threshold was set based onthe samples which were deemed the most anomalous by the Autoencoder from the healthytraining data, but due to outliers in the anomaly scores of the training data the thresholdwas not sensitive enough to predict a majority of the faulty samples correctly. Addition-ally, the model’s capability to isolate faults was examined by considering the signal withthe largest Autoencoder reconstruction error in a sample as the cause of the fault. Theresults did not directly confirm whether the model could be used for fault isolation due toanomalies also being present in the other signals, and as such it was not possible to isolatethe faults to a motor based on the reconstruction errors alone.

10.1. Future WorkFor practical applications the model needs to be improved to better handle inputs withnoise, as noisy signals can produce anomaly scores that may be interpreted as anomalousdespite not containing any faults. Alternatively, some strategy for discarding outliers couldbe implemented to completely remove the impact of the large spikes in anomaly scores.

While the results in this study show that Autoencoders can be applied for fault diag-nostics in mobile robot, it does not show how it compares to other data-driven methods.As such, a comparison study between other state-of-the-art fault detection strategies andthe Autoencoder based method should be carried out using the same dataset. Preferablythe dataset should also contain additional fault cases to properly verify the models abilityto detect faults, as the dataset that was used to conduct the experiments in this work onlycontained one fault case that could be used for this purpose.

30


References[1] G. Qiao, “Advanced Sensor and Target Development to Support Robot Accuracy

Degradation Assessment,” in 2019 IEEE 15th International Conference on Automa-tion Science and Engineering (CASE), vol. 2019-August. IEEE, aug 2019, pp. 54–59.

[2] S. Tang, S. Yuan, and Y. Zhu, “Convolutional Neural Network in Intelligent FaultDiagnosis Toward Rotatory Machinery,” IEEE Access, vol. 8, pp. 86 510–86 519, 2020.

[3] E. Principi, D. Rossetti, S. Squartini, and F. Piazza, “Unsupervised electric mo-tor fault detection by using deep autoencoders,” IEEE/CAA Journal of AutomaticaSinica, vol. 6, no. 2, pp. 441–451, 2019.

[4] G. Zhao, G. Zhang, Q. Ge, and X. Liu, “Research advances in fault diagnosis and prog-nostic based on deep learning,” Proceedings of 2016 Prognostics and System HealthManagement Conference, PHM-Chengdu 2016, pp. 1–6, 2017.

[5] E. Khalastchi and M. Kalech, “On fault detection and diagnosis in robotic systems,”ACM Computing Surveys, vol. 51, no. 1, pp. 1–24, 2018.

[6] Y. Lei, F. Jia, J. Lin, S. Xing, and S. X. Ding, “An Intelligent Fault DiagnosisMethod Using Unsupervised Feature Learning Towards Mechanical Big Data,” IEEETransactions on Industrial Electronics, vol. 63, no. 5, pp. 3137–3147, 2016.

[7] A. A. Silva, A. M. Bazzi, and S. Gupta, “Fault diagnosis in electric drives usingmachine learning approaches,” in Proceedings of the 2013 IEEE International ElectricMachines and Drives Conference, IEMDC 2013, vol. 2020, no. Q2. IEEE, may 2013,pp. 722–726.

[8] I. Andrijauskas and R. Adaskevicius, “SVM Based Bearing Fault Diagnosis in Induc-tion Motors Using Frequency Spectrum Features of Stator Current,” 2018 23rd In-ternational Conference on Methods and Models in Automation and Robotics, MMAR2018, pp. 826–831, 2018.

[9] Y. Huang, C. H. Chen, and C. J. Huang, “Motor fault detection and feature extractionusing rnn-based variational autoencoder,” IEEE Access, vol. 7, pp. 139 086–139 096,2019.

[10] X. Wu, G. Jiang, X. Wang, P. Xie, and X. Li, “A Multi-Level-Denoising AutoencoderApproach for Wind Turbine Fault Detection,” IEEE Access, vol. 7, pp. 59 376–59 387,2019.

[11] P. Wolf, A. Mrowca, T. T. Nguyen, B. Baker, and S. Gunnemann, “Pre-ignitionDetection Using Deep Neural Networks: A Step Towards Data-driven AutomotiveDiagnostics,” IEEE Conference on Intelligent Transportation Systems, Proceedings,ITSC, vol. 2018-November, no. Dl, pp. 176–183, 2018.

[12] P. Wolf, A. Chin, and B. Baker, “Unsupervised data-driven automotive diagnosticswith improved deep temporal clustering,” IEEE Vehicular Technology Conference,vol. 2019-September, 2019.

[13] L. Zhang, J. Lin, B. Liu, Z. Zhang, X. Yan, and M. Wei, “A Review on Deep Learn-ing Applications in Prognostics and Health Management,” IEEE Access, vol. 7, pp.162 415–162 438, 2019.

31


[14] Y. Cheng, R. Wang, and M. Xu, “A Combined Model-Based and Intelligent Methodfor Small Fault Detection and Isolation of Actuators,” IEEE Transactions on Indus-trial Electronics, vol. 63, no. 4, pp. 2403–2413, 2016.

[15] F. Cheng, A. Raghavan, D. Jung, Y. Sasaki, and Y. Tajika, “High-accuracy unsuper-vised fault detection of industrial robots using current signal analysis,” 2019 IEEE In-ternational Conference on Prognostics and Health Management, ICPHM 2019, 2019.

[16] P. Liang, C. Deng, J. Wu, G. Li, Z. Yang, and Y. Wang, “Intelligent Fault Diagno-sis via Semisupervised Generative Adversarial Nets and Wavelet Transform,” IEEETransactions on Instrumentation and Measurement, vol. 69, no. 7, pp. 4659–4671,2020.

[17] J. Schmidhuber, “Deep Learning in neural networks: An overview,” Neural Networks,vol. 61, pp. 85–117, 2015.

[18] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 3rd ed. PrenticeHall, 2010.

[19] D. L. Ossig, K. Kurzenberger, S. A. Speidel, K. U. Henning, and O. Sawodny, “SensorFault Detection Using an Extended Kalman Filter and Machine Learning for a VehicleDynamics Controller,” IECON Proceedings (Industrial Electronics Conference), vol.2020-Octob, pp. 361–366, 2020.

[20] V. Sadhu, S. Zonouz, and D. Pompili, “On-board Deep-learning-based UnmannedAerial Vehicle Fault Cause Detection and Identification,” arXiv, no. c, pp. 5255–5261,2020.

[21] W. Yu and C. Zhao, “Robust monitoring and fault isolation of nonlinear industrialprocesses using denoising autoencoder and elastic net,” IEEE Transactions on ControlSystems Technology, vol. 28, no. 3, pp. 1083–1091, 2020.

[22] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review andnew perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence,vol. 35, no. 8, pp. 1798–1828, 2013.

[23] R. E. Neapolitan and R. E. Neapolitan, “Neural Networks and Deep Learning,” Ar-tificial Intelligence, pp. 389–411, 2018.

[24] X. Liang, F. Duan, I. Bennett, and D. Mba, “A sparse autoencoder-based unsuper-vised scheme for pump fault detection and isolation,” Applied Sciences (Switzerland),vol. 10, no. 19, 2020.

[25] E. Khalastchi, G. A. Kaminka, M. Kalech, and R. Lin, “Online anomaly detectionin unmanned vehicles,” 10th International Conference on Autonomous Agents andMultiagent Systems 2011, AAMAS 2011, vol. 1, no. June 2014, pp. 105–112, 2011.

[26] M. Hubert and M. Debruyne, “Minimum covariance determinant,” Wiley Interdisci-plinary Reviews: Computational Statistics, vol. 2, no. 1, pp. 36–43, 2010.

[27] P. J. Rousseeuw and K. V. Driessen, “A Fast Algorithm for the Minimum CovarianceDeterminant Estimator,” Technometrics, vol. 41, no. 3, pp. 212–223, aug 1999.

32


[28] P. Goel, G. Dedeoglu, S. I. Roumeliotis, and G. S. Sukhatme, “Fault detection andidentification in a mobile robot using multiple model estimation and neural network,”Proceedings - IEEE International Conference on Robotics and Automation, vol. 3, no.April, pp. 2302–2309, 2000.

[29] J. Fu, C. Sun, Z. Yu, and L. Liu, “A hybrid CNN-LSTM model based actuator faultdiagnosis for six-rotor UAVs,” Proceedings of the 31st Chinese Control and DecisionConference, CCDC 2019, pp. 410–414, 2019.

[30] P. Zhao, M. Kurihara, J. Tanaka, T. Noda, S. Chikuma, and T. Suzuki, “Advancedcorrelation-based anomaly detection method for predictive maintenance,” 2017 IEEEInternational Conference on Prognostics and Health Management, ICPHM 2017, pp.78–83, 2017.

[31] D. Park, Y. Hoshi, and C. C. Kemp, “A multimodal anomaly detector for robot-assisted feeding using an LSTM-based variational autoencoder,” arXiv, vol. 3, no. 3,pp. 1544–1551, 2017.

[32] H. Holz, A. Applin, B. Haberman et al., “Research methods in computing: what arethey, and how should we teach them?” ACM SIGCSE Bull., vol. 38, pp. 96–114,2006.

[33] C. Robson and K. McCartan, Real World Research, 4th Edition, 12 2017.

[34] J. Nunamaker and M. Chen, “Systems development in information systems research,”in Twenty-Third Annual Hawaii International Conference on System Sciences, vol. 3,1990, pp. 631–640 vol.3.

[35] M. Quigley, K. Conley, B. Gerkey et al., “Ros: an open-source robot operating sys-tem,” in ICRA workshop on open source software, vol. 3, no. 3.2. Kobe, Japan, 2009,p. 5.

[36] M. Abadi, A. Agarwal, P. Barham et al., “TensorFlow: Large-scale machine learningon heterogeneous systems,” 2015, software available from tensorflow.org. [Online].Available: https://www.tensorflow.org/

[37] F. Chollet et al., “Keras,” https://keras.io, 2015.

[38] F. Pedregosa, G. Varoquaux, Gramfort et al., “Scikit-learn: Machine learning inPython,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

[39] C. R. Harris, K. J. Millman, S. J. van der Walt et al., “Array programming withNumPy,” Nature, vol. 585, no. 7825, pp. 357–362, Sep. 2020.

[40] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” InternationalConference on Learning Representations, 12 2014.

33

https://www.tensorflow.org/

https://keras.io


A. Industrial Use Case Results

10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0

% F

aults

Hidden neurons = 8True Positives (TP)False Positives (FP)

(a) k = 8, w = 16

10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0

% F

aults


(b) k = 16, w = 16

10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0

% F

aults


(c) k = 32, w = 16

10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0

% F

aults


(d) k = 64, w = 16

Figure 18: The impact the number of hidden neurons has on the percentage of true and falsepositives for the industrial data with a fixed window size.

34


10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0%

Fau

ltsHidden neurons = 64

True Positives (TP)False Positives (FP)

(a) k = 64, w = 8

10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0

% F

aults


(b) k = 64, w = 16

10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0

% F

aults


(c) k = 64, w = 32

Figure 19: The impact the number of the window size has on the percentage of true and falsepositives for the industrial data with a fixed number of hidden neurons.

35


20 0 20 40 60 80Anomaly Score

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Dens

ityHealthy


0.00

0.02

0.04

0.06

0.08

0.10

0.12

Dens

ity

Encoder alignment fault front right


0.00

0.02

0.04

0.06

0.08

0.10

0.12

Dens

ity



0.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040

Dens

ity


Figure 20: Distribution of anomaly scores in the industrial use case from the healthy test set andthe fault datasets. The fault threshold with the value 336.46 is denoted by the vertical red line.

36


B. Synthetic Data Results

10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0

% F

aults


(a) k = 8, w = 16

10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0

% F

aults


(b) k = 16, w = 16

10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0

% F

aults


(c) k = 32, w = 16

10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0

% F

aults


(d) k = 64, w = 16

Figure 21: The impact of the number of hidden neurons on the percentage of true and falsepositives on the synthetic data. The window size is fixed to 16 in all cases.

37


10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0%

Fau

ltsHidden neurons = 16

True Positives (TP)False Positives (FP)

(a) k = 16, w = 8

10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0

% F

aults


(b) k = 16, w = 16

10 4 10 3 10 2 10 1 100 101 102

Top %

0.0

0.2

0.4

0.6

0.8

1.0

% F

aults


(c) k = 16, w = 32

Figure 22: The impact that the window size has on the number of true and false positives on thesynthetic data when the number of hidden neurons are fixed to 16.

0 10 20 30 40 50 60 70 80Anomaly Score

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Dens

ity

Healthy

0 10 20 30 40 50 60 70 80Anomaly Score

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

Dens

ity

Faulty

Figure 23: The distribution of anomaly scores for the synthetic datasets. The fault threshold(5.70) is denoted by the red vertical line.

38

FAULT DETECTION IN MOBILE ROBOTICS USING AUTOENCODER …

Documents

Transcript of FAULT DETECTION IN MOBILE ROBOTICS USING AUTOENCODER …