ECG-Based Subject Identification Using Statistical ...

14
Research Article ECG-Based Subject Identification Using Statistical Features and Random Forest Turky N. Alotaiby , 1 Saud Rashid Alrshoud, 1 Saleh A. Alshebeili, 2,3 and Latifah M. Aljafar 1 1 KACST, Saudi Arabia 2 KACST-TIC in Radio Frequency and Photonics for the e-Society (RFTONICS), Saudi Arabia 3 Department of Elect. Engineering, King Saud University, Saudi Arabia Correspondence should be addressed to Turky N. Alotaiby; [email protected] Received 9 September 2019; Revised 5 November 2019; Accepted 13 November 2019; Published 16 December 2019 Academic Editor: Alberto J. Palma Copyright © 2019 Turky N. Alotaiby et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In this work, a nonducial electrocardiogram (ECG) identication algorithm based on statistical features and random forest classier is presented. Two feature extraction approaches are investigated: direct and band-based approaches. In the former, eleven simple statistical features are directly extracted from a single-lead ECG signal segment. In the latter, the single-lead ECG signal is rst decomposed into bands, and the statistical features are extracted from each segment of a given band and concatenated to form the feature vector. Nonoverlapping segments of dierent lengths (i.e., 1, 3, 5, 7, 10, or 15 sec) are examined. The extracted feature vectors are applied to a random forest classier, for the purpose of identication. This study considers 290 reference subjects from the ECG database of the Physikalisch-Technische Bundesanstalt (PTB). The proposed identication algorithm achieved an accuracy rate of 99.61% utilizing the single limb lead (I) with the band-based approach. A single chest lead (V1), augmented limb lead (aVF), and Franks lead (Vx) achieved an accuracy rate of 99.37%, 99.76%, and 99.76%, respectively, using the same approach. 1. Introduction The aim of a biometric system is to uniquely identify or authenticate persons based on one or more behavioral and/or physiological characteristics, including the retina, nger- print, or gait [1, 2]. Subject recognition is essential for many modern applications, which touch dierent aspects of our daily lives such as nancial transactions, data pro- tection, access control, entertainment, cars, and smart- phones [35]. However, the current biometric traits used have dierent operational trade-os in terms of perfor- mance, robustness, measurability, and detection of liveness [610]. Around three decades ago, Forsen et al. suggested the use of the electrocardiogram (ECG) as a biometric trait [11]. Biel et al.s [12, 13] works are considered the rst attempt to use ECGs for biometric purposes, considering the biometric characteristics of measurability (ease with which the characteristic is obtained), permanence (no change over time), universality (possession of the characteristic by the individual), and uniqueness (no two individuals share the same characteristic) [1417]. Since then, many researchers have proposed various ECG-based identica- tion approaches [1, 4, 1827] using private and/or public databases [28, 29]. Biometric identication system involves three main phases: the signal denoising, feature extraction, and classi- cation. Signal denoising [3034] is an important task, which is required due to the susceptibility of the ECG signal to noise of many sources such as power interference and elec- trode movement [35, 36]. Feature extraction is needed to provide unique biomarkers for a given ECG signal. Feature extraction methods can be grouped into three main catego- ries: ducial-based approaches which extract features while preserving the characteristics of the ECG signal, e.g., the amplitudes and intervals of heartbeats [20, 31, 3743], non- ducial-based approaches which do not require such precise Hindawi Journal of Sensors Volume 2019, Article ID 6751932, 13 pages https://doi.org/10.1155/2019/6751932

Transcript of ECG-Based Subject Identification Using Statistical ...

Page 1: ECG-Based Subject Identification Using Statistical ...

Research ArticleECG-Based Subject Identification Using Statistical Features andRandom Forest

Turky N. Alotaiby ,1 Saud Rashid Alrshoud,1 Saleh A. Alshebeili,2,3 and Latifah M. Aljafar1

1KACST, Saudi Arabia2KACST-TIC in Radio Frequency and Photonics for the e-Society (RFTONICS), Saudi Arabia3Department of Elect. Engineering, King Saud University, Saudi Arabia

Correspondence should be addressed to Turky N. Alotaiby; [email protected]

Received 9 September 2019; Revised 5 November 2019; Accepted 13 November 2019; Published 16 December 2019

Academic Editor: Alberto J. Palma

Copyright © 2019 Turky N. Alotaiby et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work isproperly cited.

In this work, a nonfiducial electrocardiogram (ECG) identification algorithm based on statistical features and random forestclassifier is presented. Two feature extraction approaches are investigated: direct and band-based approaches. In the former,eleven simple statistical features are directly extracted from a single-lead ECG signal segment. In the latter, the single-lead ECGsignal is first decomposed into bands, and the statistical features are extracted from each segment of a given band andconcatenated to form the feature vector. Nonoverlapping segments of different lengths (i.e., 1, 3, 5, 7, 10, or 15 sec) areexamined. The extracted feature vectors are applied to a random forest classifier, for the purpose of identification. This studyconsiders 290 reference subjects from the ECG database of the Physikalisch-Technische Bundesanstalt (PTB). The proposedidentification algorithm achieved an accuracy rate of 99.61% utilizing the single limb lead (I) with the band-based approach.A single chest lead (V1), augmented limb lead (aVF), and Frank’s lead (Vx) achieved an accuracy rate of 99.37%, 99.76%, and99.76%, respectively, using the same approach.

1. Introduction

The aim of a biometric system is to uniquely identify orauthenticate persons based on one or more behavioral and/orphysiological characteristics, including the retina, finger-print, or gait [1, 2]. Subject recognition is essential formany modern applications, which touch different aspectsof our daily lives such as financial transactions, data pro-tection, access control, entertainment, cars, and smart-phones [3–5]. However, the current biometric traits usedhave different operational trade-offs in terms of perfor-mance, robustness, measurability, and detection of liveness[6–10]. Around three decades ago, Forsen et al. suggestedthe use of the electrocardiogram (ECG) as a biometric trait[11]. Biel et al.’s [12, 13] works are considered the firstattempt to use ECGs for biometric purposes, consideringthe biometric characteristics of measurability (ease withwhich the characteristic is obtained), permanence (no change

over time), universality (possession of the characteristicby the individual), and uniqueness (no two individualsshare the same characteristic) [14–17]. Since then, manyresearchers have proposed various ECG-based identifica-tion approaches [1, 4, 18–27] using private and/or publicdatabases [28, 29].

Biometric identification system involves three mainphases: the signal denoising, feature extraction, and classi-fication. Signal denoising [30–34] is an important task,which is required due to the susceptibility of the ECG signalto noise of many sources such as power interference and elec-trode movement [35, 36]. Feature extraction is needed toprovide unique biomarkers for a given ECG signal. Featureextraction methods can be grouped into three main catego-ries: fiducial-based approaches which extract features whilepreserving the characteristics of the ECG signal, e.g., theamplitudes and intervals of heartbeats [20, 31, 37–43], non-fiducial-based approaches which do not require such precise

HindawiJournal of SensorsVolume 2019, Article ID 6751932, 13 pageshttps://doi.org/10.1155/2019/6751932

Page 2: ECG-Based Subject Identification Using Statistical ...

knowledge of ECG characteristics [44–53], and hybrid-basedapproaches [54, 55].

The classifier is the last stage of a biometric identificationsystem. Different classifiers have been used in the literaturesuch as neural network (NN), k-nearest neighbors algorithm(k-NN), support vector machine (SVM), and random forest[30, 31, 33, 49, 54–56]. Recently, deep learning has also beenproposed for an ECG biometric identification system [57, 58].

In this study, we propose a new nonfiducial method forsubject identification based on statistical features and ran-dom forest classifier. For feature extraction, we are proposingtwo approaches: direct and band-based approaches. In thefirst approach, eleven statistical features are extracted directlyfrom the single-lead ECG signal and fed to a random forestclassifier. While in the band-based approach, the single-leadECG signal is first decomposed into bands, and the statisticalfeatures are extracted from each band and concatenated toform the feature vector, which is then fed to the randomforest classifier.

This study uses the Physikalisch-Technische Bundesan-stalt (PTB) dataset, which is a publicly available database.This database is compiled by the National MetrologyInstitute of Germany. It contains combinations of digitizedECGs of both normal and abnormal subjects’ recordings,which are provided for research via the link https://PhysioNet.org [29]. Fifteen concurrently measured signalsare included in each record: three limb leads (I, II, and III),three augmented limb leads (aVR, aVL, and aVF), six chestleads (V1, V2, V3, V4, V5, and V6), and three Frank leads(Vx, Vy, and Vz).

The present study offers several advantages over otherexisting methods due to the following:

(1) It uses simple statistics for feature extraction, includ-ing the mean, standard deviation, median, maximumvalue, minimum value, range, interquartile range,interquartile first quarter (Q1), interquartile thirdquarter (Q3), kurtosis, and skewness of the ECGsignal. We show by the t-distribution stochasticneighbor embedding (t-SNE) algorithm that subjects’features based on these statistics are separable, whichleads to high subject identification rate. The t-SNEis a nonlinear dimensionality reduction technique,which is utilized to visualize N-dimensional featurespace using a two-dimensional space [59]

(2) It provides extensive investigations using a referencepopulation of 290 subjects (238 nonhealthy subjectsand 52 healthy subjects) from the PTB ECG database.To the best of our knowledge, this is the largest num-ber of subjects considered in the literature to produceresults in the context of subject identification usingECG signals. Further, this study is the first to showidentification results using 290 subjects from thesignals of each of the 15 previously mentioned leads;see Tables 1 and 2

(3) It reports high identification accuracy results for 290(healthy and nonhealthy) subjects using featuresextracted from simple statistics. Specifically, it has

been found that a data segment length of 7 secondsfrom a single limb lead (I) gives an average accuracyof 99.61% using band-based approach. While a singlechest lead (V1), augmented limb lead (aVF), andFrank’s lead (Vx) give an average accuracy of99.73%, 99.76%, and 99.76%, respectively, using thesame approach

The rest of the paper is organized as follows. Section 2describes the proposed identification method. Section 3 pre-sents the performance evaluation results for the proposedapproaches and compares them to state-of-the-art identifica-tion systems. Finally, Section 4 gives concluding remarks.

2. Method

The proposed method comprises two phases: enrollment andidentification. Each phase consists of ECG signal acquisitionand preprocessing and feature extraction. After enrolling allthe subjects, the registered ECG signals are used to train therandom forest classifier. In the identification phase, thetrained model is adapted to identify the subjects. Figure 1shows the process of the proposed method. The details ofeach stage are presented in the following subsections.

2.1. Data Acquisition and Preprocessing. The PTB database isconstructed utilizing 15 leads, each of which measuring aspecific electrical potential difference. Each signal is sampledat 1000 samples/sec with 16-bit resolution. The length of therecording session for each subject was between 31 and120 sec. The PTB database has undergone two main prepro-cessing operations: detrending and inverting. The first oper-ation is required due to the presence of some linear trend inthe database signals, possibly originating from differentsources (e.g., voltage fluctuations in the recording deviceand subject’s muscle movements), which can potentially hin-der the data analysis, and thus requires removal before fur-ther processing. Detrending is achieved by subtracting fromeach lead the least-squares-fit straight line of data. ECG sig-nals are upside down in some cases, thus requiring inversion.Figures 2 and 3 show the time domain of processed 5 sec I,aVR, V1, and Vx lead signals and the frequency domain forthe same leads of a healthy subject (S104). The Frank leadVx signal has the highest amplitude, as shown in the timedomain, while in the frequency domain we notice that mostof the energy is concentrated below 35Hz in all leads.

2.2. Feature Extraction. We propose two approaches toextract features from the ECG signal: direct and band-basedapproaches. In the first approach, the preprocessed ECGsignal is segmented, where statistical features are extractedfrom each segment to form the feature vector. While inthe second approach, the preprocessed ECG signal isdecomposed into bands, each signal’s band is segmented.The statistical features are then extracted from each segment.The feature vector is formed by concatenating the statisticalfeatures of each segment from all bands. Figure 4 presentsthe two approaches.

The normal ECG signal’s frequency spectrum rangesfrom 0.01 to 100Hz, where 90% of the energy lies in the range

2 Journal of Sensors

Page 3: ECG-Based Subject Identification Using Statistical ...

Table1:Directapproach

classification

results

using290subjectsof

thePTBdatasetwithsegm

entlengthsof

1,3,5,7,10,and

15sec.

Lead

1second

3second

s5second

s7second

s10

second

s15

second

sAvg.

Acc.

Avg.

Sen.

Avg.

Spe.

Avg.

Acc.

Avg.

Sen.

Avg.

Spe.

Avg.

Acc.

Avg.

Sen.

Avg.

Spe.

Avg.

Acc.

Avg.

Sen.

Avg.

Spe.

Avg.

Acc.

Avg.

Sen.

Avg.

Spe.

Avg.

Acc.

Avg.

Sen.

Avg.

Spe.

I78.22

76.42

99.92

85.52

83.51

99.95

88.7

86.96

99.96

90.36

89.14

99.97

90.15

86.67

99.97

92.59

90.23

99.97

II75.47

73.27

99.92

83.82

81.08

99.94

84.38

81.66

99.95

87.2

85.26

99.96

86.37

84.11

99.95

86.42

83.68

99.95

III

73.67

72.39

99.91

82.32

80.3

99.94

84.6

82.6

99.95

87.2

84.64

99.96

84.88

81.93

99.95

86.42

83.28

99.95

aVR

78.81

76.23

99.93

84.36

81.26

99.95

86.09

83.94

99.95

87.84

86.07

99.96

89.81

87.21

99.96

87.3

83.05

99.96

aVL

76.72

75.11

99.92

84.59

83.43

99.95

88.09

86.56

99.96

90.52

89.91

99.97

89.35

86.84

99.96

89.77

87.07

99.96

aVF

75.3

73.14

99.91

82.15

79.91

99.94

83.16

80.36

99.94

85.39

82.69

99.95

85.91

83.19

99.95

85.19

82.47

99.95

V1

84.53

82.59

99.95

91.13

89.58

99.97

92.96

91.17

99.98

93.44

90.86

99.98

94.5

91.81

99.98

92.77

89.54

99.97

V2

87.3

86.07

99.96

92.49

91.35

99.97

95.24

94.3

99.98

95.42

93.69

99.98

94.16

91.44

99.98

95.41

93.45

99.98

V3

89.19

87.98

99.96

94.56

93.65

99.98

95.68

94.34

99.99

96.29

96.1

99.99

95.88

94.48

99.99

95.59

94.02

99.98

V4

89.61

88.52

99.96

93.73

93.13

99.98

94.74

94.22

99.98

96.21

95.48

99.99

95.42

93.42

99.98

95.94

94.08

99.99

V5

87.27

85.84

99.96

92.23

91.38

99.97

92.69

92.39

99.97

94.31

93.55

99.98

94.04

93.05

99.98

93.3

91.38

99.98

V6

84.69

83.57

99.95

89.46

88.04

99.96

90.86

89.86

99.97

92.58

91.76

99.97

92.67

90.52

99.97

92.59

90.52

99.97

Vx

88.17

86.84

99.96

92.03

90.94

99.97

93.8

92.64

99.98

93.29

91.45

99.98

93.59

91.26

99.98

93.12

91.03

99.98

Vy

80.75

79.13

99.93

86.62

85.58

99.95

87.76

86.29

99.96

90.36

89.24

99.97

89.46

87.24

99.96

86.42

83.16

99.95

Vz

87.81

86.28

99.96

92.73

91.72

99.97

93.85

93.43

99.98

94.63

94.1

99.98

94.04

92.18

99.98

94.18

92.7

99.98

3Journal of Sensors

Page 4: ECG-Based Subject Identification Using Statistical ...

Table2:Band-basedapproach

classification

results

using290subjectsof

thePTBdatasetwithsegm

entlengthsof

1,3,5,7,10,and

15sec.

Lead

1second

3second

s5second

s7second

s10

second

s15

second

sAvg.

Acc.

Avg.

Sen.

Avg.

Spe.

Avg.

Acc.

Avg.

Sen.

Avg.

Spe.

Avg.

Acc.

Avg.

Sen.

Avg.

Spe.

Avg.

Acc.

Avg.

Sen.

Avg.

Spe.

Avg.

Acc.

Avg.

Sen.

Avg.

Spe.

Avg.

Acc.

Avg.

Sen.

Avg.

Spe.

I98.15

97.84

99.99

99.17

98.76

100

99.34

98.92

100

99.61

99.66

100

98.97

98.39

100

98.42

97.59

99.99

II97.79

97.47

99.99

99.04

98.76

100

99.17

98.77

100

99.06

98.34

100

98.97

98.1

100

97.89

96.95

99.99

III

97.44

96.98

99.99

99.17

98.77

100

99.17

98.9

100

99.21

98.76

100

98.75

97.73

100

98.07

97.13

99.99

aVR

98.02

97.57

99.99

99.53

99.48

100

99.5

99.19

100

99.37

98.9

100

99.77

99.54

100

98.42

97.53

99.99

aVL

97.46

97.18

99.99

98.9

98.75

100

99.06

98.68

100

98.82

98.41

100

98.86

98.62

100

97.72

96.67

99.99

aVF

97.58

97.11

99.99

99.17

98.87

100

99.12

98.6

100

99.37

99.17

100

98.63

97.84

100

98.07

96.72

99.99

V1

98.38

98.09

99.99

99.53

99.36

100

99.45

98.77

100

99.76

99.52

100

99.54

99.37

100

98.6

97.87

100

V2

98.4

98.19

99.99

99.24

99.21

100

99.61

99.53

100

99.21

99.03

100

98.86

98.79

100

98.25

97.18

99.99

V3

98.89

98.77

100

99.37

99.27

100

99.61

99.17

100

99.45

98.69

100

98.63

97.87

100

98.25

97.18

99.99

V4

98.75

98.65

100

99.5

99.38

100

99.61

99.53

100

99.53

99.03

100

99.09

98.71

100

98.6

97.87

100

V5

98.41

98.32

99.99

99.5

99.43

100

99.28

98.99

100

99.61

99.1

100

98.86

98.28

100

98.25

97.36

99.99

V6

98.35

98.21

99.99

99.47

99.26

100

99.34

98.79

100

99.61

99.38

100

99.09

97.96

100

98.6

97.41

100

Vx

98.84

98.75

100

99.63

99.56

100

99.5

99.43

100

99.76

99.79

100

98.97

98.07

100

98.95

98.62

100

Vy

98.23

97.85

99.99

99.34

99.11

100

99.56

99.36

100

99.53

99.03

100

99.32

98.94

100

98.25

96.55

99.99

Vz

98.56

98.36

100

99.63

99.45

100

99.56

99.48

100

99.69

99.45

100

99.54

99.08

100

98.95

98.33

100

4 Journal of Sensors

Page 5: ECG-Based Subject Identification Using Statistical ...

of 0.25Hz to 35Hz [60]. Therefore, direct single-lead identi-fication accuracy can be improved by considering multiplespectral components. Here, the single-lead ECG signal isdecomposed into seven subbands by employing a filter bankusing seven finite impulse response band-pass filters. Eachfilter is of band 5Hz, as follows: 0.1-5, 5-10, …, 30-35Hz.Figure 5 shows the frequency responses of the filtersemployed to perform signal decomposition.

A nonoverlapping sliding window (1, 3, 5, 7, 10, or 15 sec)is applied for partitioning the ECG data into segments. Dif-ferent window sizes are used to examine the effect of segmentlength on the identification system, irrespective of the indi-vidual heartbeats or specific characteristics of ECG waves.

Eleven statistical features are extracted from each seg-ment, as listed in Section 1. These features are selected tomeasure certain ECG signal characteristics. Note that we esti-mate the mean and median to measure the ECG signal cen-tral tendency. While we use the standard deviation, range,and interquartile range to measure the statistical dispersion.The kurtosis and skewness are also used to measure thesharpness of the peak and asymmetry of the ECG signal dis-tribution, respectively. The other statistics (the minimumvalue, maximum value, interquartile first quarter, andinterquartile third quarter) are self-explained. The defini-tions of these statistics and their estimation from a datarecord of length N samples are well known and can befound in [61]. Figure 6 shows their histograms for a datasegment of length 7 sec.

2.3. The Random Forest Classifier. The random forest (RF) isan ensemble learning method developed by Breiman [62]and used for classification and regression. It includes a largenumber of decision tree classifiers. The classification processin the decision tree can be thought of as asking a series ofquestions about the available data until reaching at a deci-sion. Each tree in the forest is constructed with a randomlyselected subset of the training dataset with replacement andgrows without pruning. A tree consists of nodes which areeither branches (have children nodes) or leafs (terminal

Enrollment phase Identification phase

Random forestclassifier training

Subjectidentification

Trainedmodel

Preprocessing Preprocessing

Feature extraction:(i) Signal decomposition

(ii) Segmentation (iii) Statistical feature calculation

Feature extraction:(i) Signal decomposition

(ii) Segmentation (iii) Statistical feature calculation

RegisteredECG feature

Training data

Figure 1: Subject identification ECG-based approach using statistical features and random forest.

0 1 2 3 4 5

1 2 3 4 5

0500

1000

Am

plitu

de Limb lead I signal

0

1 2 3 4 50

1 2 3 4 50

010002000

Am

plitu

de Augmented limb lead aVR signal

010002000

Am

plitu

de Chest lead V1 signal

Time (sec)

020004000

Am

plitu

de Frank lead Vx signal

Figure 2: Time domain of four processed signals.

0 10 20 30 40 50 60 70 80 90 100Frequency (Hz)

0 10 20 30 40 50 60 70 80 90 100Frequency (Hz)

0 10 20 30 40 50 60 70 80 90 100Frequency (Hz)

0 10 20 30 40 50 60 70 80 90 100Frequency (Hz)

0

20

Mag

nitu

de

Limb lead I signal

0

50

Mag

nitu

de

Augmented limb lead AVR signal

0

50

Mag

nitu

de

Chest lead V1 signal

050

100

Mag

nitu

de

Frank lead Vx signal

Figure 3: Frequency spectrum of four leads.

5Journal of Sensors

Page 6: ECG-Based Subject Identification Using Statistical ...

nodes). The best split on each node in a tree is found byemploying feature random selection methods [51–53].Figure 7 presents an illustrative example of splitting a node.The node has balanced samples, 20 red and 20 blue. Theaim is to find the best split that generates child nodes withthe least diversity which leads to a more certain decision.The figure shows three suggested splits A, B, and C that aregenerated by randomly selecting a set of features and athreshold value. We can see that tree C has the best split, withthe set of features number 3 and threshold value of 0.23, sinceit produced branches with the highest certainty. The firstbranch has 0.77 (17 over 22) probability of the red class.The second branch has 0.83 (15 over 18) probability of theblue class. The next step of the decision tree creation processis to find the best split on both child nodes. The random for-est makes decisions based on the average of the probabilities

predicted by the trees. The major advantages of random for-est are that it does not suffer from overfitting problem [62],produces high classification accuracy, and provides featureimportance analysis [63].

The classifier undergoes two stages: training and testing.In the training phase, each tree is constructed using a samplewith replacement of the training dataset. In the testing phase,each tree classifies the testing instance and a majority votingtechnique is used to classify the instance. Random forest hasbeen used in various domains such as astronomy [64] andmedicine [65–68]. In this work, 100 decision tree classifiersare employed.

3. Results and Discussion

In this section, performance evaluation results of theproposed approaches are presented. Also, we compare theperformance of the proposed approaches with the state-of-the-art PTB-based identification systems. The results areobtained using the PTB dataset, which includes 290 subjects.Six segments of different lengths (1, 3, 5, 7, 10, or 15 sec) wereconsidered to study the effect of segment’s length on theidentification process. For each subject, the feature vectorsare extracted and split into two sets training and testing.The first set consists of 70% of the features to train a randomforest model and the remaining 30% of the features are usedin the testing step. We used three widely used metrics to eval-uate the performance of the proposed approach. These met-rics include accuracy, sensitivity, and specificity [69] and aredenoted by Avg. Acc., Avg. Sen., and Avg. Spe., respectively.

Table 1 presents the identification performance of directfeature extraction approach using different segment lengthsaveraged over all 290 subjects. For each segment’s length,15 models were created, one model for each ECG lead. By vir-tue of Table 1, we observe that lead I achieved the best accu-racy of 92.59% using a 15-second segment length. Lead II andlead III achieved the best accuracy of 87.2% and 87.2% usinga 7-second segment length. Augmented limb aVL achieved

PreprocessingSignal

segmentation Statistical feature

extraction

(a)

B10.01-5 Hz

B730-35 Hz

B25-10 Hz

Concatenatedfeatures

Statistical featureextraction

Signalsegmentation

Prep

roce

ssin

gStatistical feature

extraction Signal

segmentation

… … …

Statistical featureextraction

Signalsegmentation

Sign

alde

com

posit

ion

(b)

Figure 4: Feature extraction approaches: (a) direct-based approach and (b) band-based approach.

0 20 40 60 80 100Frequency (Hz)

0

0.2

0.4

0.6

0.8

1

1.2

Mag

nitu

de

Band1Band2Band3

Band4Band5Band6

Band7

Figure 5: Frequency response of the filter bank.

6 Journal of Sensors

Page 7: ECG-Based Subject Identification Using Statistical ...

the best accuracy of 90.52% using a 7-second segment length.Chest leads V1 to V6 achieved an average accuracy morethan 90% when the segment length is greater than 3 seconds.Lead V3 achieved the best accuracy of 96.26% using a 7-second segment length. Frank’s leads Vx and Vz achievedthe best accuracy, which is more than 92% using a segmentlength greater than one second. Figure 8 presents the averageaccuracy of direct feature extraction approach using differentsegment lengths. It is worth noting that the training phaseusing the 7-second segment length took 24.7 sec using amachine equipped with 3.3GHz Intel core i7-processor,

while the identification process of 290 subjects took 3.2 secon the same machine.

Table 2 presents the identification performance of band-based feature extraction approach. All limb leads achieved aminimum accuracy rate with more than 97.44% using a1-second segment length and an accuracy rate greater than99% using the 3- to 7-second segment lengths. The aug-mented limb leads achieved an accuracy > 97:46% using a1-second segment length and an accuracy > 98% using asegment length greater than three seconds. Among the aug-mented limb leads, lead aVR achieved the best accuracy of

1500

1000

500

0–20 –10 –2000 –1500 –1000 –500 –700 –600 –500 –400 –300 –200 –100 0 1000

1200

1000

800

600

400

200

00 10 20 0

600 800

700

600

500

400

300

200

100

0

500

400

300

200

100

0200 400 600 800 1000 1200

1200

1000

800

600

400

200

0

1200

1000

800

600

400

200

00 1 2 3 4

700

600

500

400

300

200

100

0

600

500

400

300

200

100

0–300 –200 –100 1000 –100 100 200 300 400 0 0.5 1 1.5 2 2.5 3

⨯104⨯104

0

500300

250

200

150

100

50

000 200 400 600 800 10 20 30 40 50 60 0 5 10 15 20

400

300

200

100

0

700

800

600

500

400

300

200

100

0

Mean value

Num

ber o

f seg

men

ts

STD value Minimum value Q1 value

Mean value

Num

ber o

f seg

men

ts

Q3 value Maximum value Range value

Interquartile range value Kurtosis value Skewness value

Num

ber o

f seg

men

ts

Figure 6: Histograms of statistical features for a data segment of 7 sec length.

Feature_1 < 0.4

⨯20⨯20

⨯7

⨯17

⨯13

⨯3

Yes No

(a)

Feature_2 < 15

⨯20⨯20

⨯2

⨯6

⨯18

⨯14

Yes No

(b)

Feature_3 < 0.23

⨯20⨯20

⨯17

⨯5

⨯3

⨯15

Yes No

(c)

Figure 7: Example of node splitting in a decision tree.

7Journal of Sensors

Page 8: ECG-Based Subject Identification Using Statistical ...

99.77% using a 10-second segment length. The chest leadsachieved an accuracy > 98:35% using a 1-second segmentlength. Lead V1 achieved the best accuracy rate, which is99.76% using a 7-second segment length. Leads V2 to V6achieved the best accuracy rate, which is 99.61% using the5- and 7-second segment lengths. Frank’s leads achievedan accuracy > 98:23% using a segment length greater than1 second. Lead Vx achieved an accuracy of 99.76% using a7-second segment length. Figure 9 presents the average

accuracy of band-based feature extraction approach usingdifferent segment lengths. The training phase in thisapproach using the 7-second segment length took 88.1 sec

100

95

90

85

80

75

70

I II III

aVR

aVL

aVF V1

V2

V3

V4

V5

Leads

V6

Vx

Vy Vz 1 3

5 710 15

Segment (sec)Ac

cura

cy (%

)

Figure 8: The accuracy of direct approach classification using 290 subjects of the PTB dataset with segment lengths of 1, 3, 5, 7, 10, and 15 sec.

100

99.5

99

98.5

98

97.5

I II III

aVR

aVL

aVF V1

V2

V3

V4

V5

Leads

V6

Vx

Vy Vz 1 3 5 7

10 15

Segments length (sec)

Accu

racy

(%)

Figure 9: The accuracy of band-based approach classification using 290 subjects of the PTB dataset with segment lengths of 1, 3, 5, 7,10, and 15 sec.

300

250

200

150100

500

050100150Subject ID

200250300

10.80.60.40.2

Sens

itivi

ty

Subj

ect I

D

2000

Figure 10: The confusion matrix of 290 subjects using the band-based approach with limb lead I and 7 sec segment length.

–15 –10 –5 0 5 10–10

–5

0

5

10

15

20

25

(35)(50)(97)(100)(109)

(119)(141)(184)(219)(262)

Figure 11: Separability investigation of band-based featureextraction approach using limb lead (I) signals of ten subjects.

8 Journal of Sensors

Page 9: ECG-Based Subject Identification Using Statistical ...

using a machine equipped with 3.3GHz Intel core i7-proces-sor, while the identification process of 290 subjects took4.5 sec on the same machine.

Figure 10 shows the confusion matrix of 290 subjectsusing the band-based approach with limb lead I signal oflength 7 sec. We plot the confusion matrix in the form of a

Table 3: The proposed approaches in comparison with PTB-based subject identification methods.

References Method Year NS DU (s) Sen

Agrafioti and Hatzinakos [45]Autocorrelation and discrete cosine

transform (DCT)2006 14 10 100

Wübbeler et al. [70]Fiducial features and simple distance

measure2007 74 10 99

Agrafioti and Hatzinakos [71]Normalized autocorrelation coefficients andK-nearest neighbors (K-NN) as a classifier

2008 13 10 96.2

Agrafioti and Hatzinakos [72]Same approach as in [70] with feature level

and decision level fusions2008 14 5 100

Wang et al. [73]Autocorrelation (AC) in conjunction with adiscrete cosine transform (DCT) and K-NN

as classifier2008 13 NA 84.61

Fatemian and Hatzinakos [36] Templet-based 2009 13 NA 99.62

Safie et al. [74]Pulse active ratio (PAR) technique for feature

extraction and Euclidean distance2011 112 30 93.60

Zhao et al. [49]

Ensemble empirical mode decomposition,Welch spectral analysis to extract significant

features, principal component analysis(PCA) for dimensionality reduction, and

K-NN as classifier

2013 25 NA 96.00

Tantawi et al. [75]Test set 1Test set 2

Fiducial feature set (with 28 features) andfour feature reduction methods (PCA, lineardiscriminant analysis (LDA), information

gain ratio (IGR), and rough sets) and neuralnetwork as a classifier

2013146

~8 10083.3

Wang et al. [76] Sparse representation and K-NN as classifier 2013 100 2-4 99.5

Jekova and Bortolan [77]Correlation coefficient assessment, along

with assessment of their linear and nonlinearcombinations

2015 14 10 92.9

Brás and Pinho [78]

Information-theoretic data models for datacompression and on similarity metricsrelated to the approximation of the

Kolmogorov complexity

2015 52 20 99.9

Waili et al. [79]Q-R-S feature points and multilayer

perceptional neural network as a classifier2016 14

121.02 heartbeats

96

Paiva et al. [25]Three features ST, RT, and QT and support

vector machines as a classifier2017 10 30 97.5

Dong et al. [80] Deterministic learning 2018 113 NA 92.8

Labati et al. [58]The deep convolutional neural networks are

used to extract the features from QRScomplexes and soft-max as a classifier.

2018 52 10 100

Alotaiby et al. [81]Common spatial pattern and support vector

machine as a classifier2019 200 7

Single-lead (I) 95.15

Single-lead (V3) 98.92

Proposed method11 statistical features, DWT, and random

forest as a classifier290 7

Single-lead (I) 99.66

Single-lead (aVF) 99.17

Single-lead (V1) 99.52

Single-lead (Vx) 99.79

DU: duration; NA: the information is not available or computable; NS: number of subjects; Sen: sensitivity.

9Journal of Sensors

Page 10: ECG-Based Subject Identification Using Statistical ...

3D figure to make it easier to visualize where the confusionand correct identification appear. Specifically, we observefrom Figure 6 that using the band-based approach with limblead I, all the subjects achieved 100% sensitivity exceptfour subjects: S109 (Sen: = 80%), S141 (Sen: = 80%), S184(Sen: = 60%), and S262 (Sen: = 80%). Twenty percent of thetesting segments of subjects S109, S141, and S262 were mis-classified with subjects S35, S97, and S219, respectively, whiletwenty percent of the testing segments of subject S184 weremisclassified with subject S103, and also twenty percent ofthe testing segments of the same subject (S184) were misclas-sified with S119.

The results of Figure 10 can be confirmed by investigat-ing the separability of subjects using the t-SNE algorithm.Figure 11 shows the results of the t-SNE algorithm when itis applied to the dataset of the following ten subjects with16 segments each: S35, S50, S97, S100, S109, S119, S141,S184, S219, and S262. The t-SNE algorithm visualizes the77 dimensional space features of the band-based approachusing a two-dimensional (2D) space. Therefore, the algo-rithm represents the feature vector of each segment by asingle point in a 2D space.

Figure 11 shows the clusters of subjects’ segments. Notethat the clusters of subjects S50 and S100 are well separatedfrom other subjects’ clusters. However, the cluster of S35has overlap with S109. Similar observations can be seen forthe subjects S97, S109, S119, S141, S184, S219, and S262,which explains the misclassification revealed previously bythe confusion matrix.

Table 3 shows the performance of the proposedapproaches in comparison to the results of the state-of-the-art subject identification methods, which are available in lit-erature and utilizing the PTB dataset. In the table, we listthe reference, year of publication, number of subjects consid-ered for identification, the segment’s length (if available), thesensitivity, and the method of identification used. Referringto Table 3, it is worthy of noting that the proposedapproaches have been evaluated using 290 subjects, whichis the largest number considered in the literature up to date.Further, the band-based approach, which is evaluated usingsuch a large number of subjects and utilizing simple statisti-cal features, has demonstrated performance greater than99%, which makes it very attractive for practical applications.Note that the method of Wang et al. [76] is the closest inperformance to our proposed method but considered only100 subjects for identification. Further, it adopts the sparsecoding which requires optimization involving l0 norm, whichis an NP hard problem.

4. Conclusion

This paper presents an ECG-based identification system thatrelies on statistical features and random forest classifier. Twofeature extraction approaches are investigated: direct andband-based approaches. In the direct approach, the ECG sig-nal is segmented and eleven statistical features are extractedfrom each segment to form the feature vector. In the secondapproach, the ECG signal is decomposed into seven bands,where the feature vector is formed by concatenating the sta-

tistical features extracted from each band’s segment. Six seg-ment lengths are examined: 1, 3, 5, 7, 10, and 15 sec. The datais split into training and testing datasets. The feature vectorsof the former are used to train the classifier (random forest)during the identification stage; the trained classifier is thentasked with identifying the subject using the testing data.The proposed method was evaluated using 290 referencesubjects in the PTB database. Using the band-based featureextraction approach, the identification system achieved anaccuracy rate of 99.61% utilizing a single limb lead (I). Whilea single chest lead (V1), augmented limb lead (aVF), andFrank’s lead (Vx) achieved accuracy rates of 99.37%,99.76%, and 99.76%, respectively. It is known that variancein physical, mental, or emotional stimulation levels affectsheart rate. Unfortunately, the ECG signals in the PTB datasetare recorded under the same conditions. Therefore, evaluat-ing the proposed identification system under the effect ofthese stimulations will be the topic of our future work.

Data Availability

The data used to support the findings of this study are avail-able on physioNet.org [29].

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported by King Saud University throughthe Researchers Supporting Project number RSP-2019/46.

References

[1] A. Fratini, M. Sansone, P. Bifulco, and M. Cesarelli, “Individ-ual identification via electrocardiogram analysis,” BioMedicalEngineering OnLine, vol. 14, no. 1, p. 78, 2015.

[2] A. K. Jain, A. Ross, and S. Prabhakar, “An introduction tobiometric recognition,” IEEE Transactions on Circuits andSystems for Video Technology, vol. 14, no. 1, pp. 4–20, 2004.

[3] M. Nawal and G. Purohit, “ECG based human authentication:a review,” International Journal of Emerging EngineeringResearch and Technology, vol. 2, no. 3, pp. 178–185, 2014.

[4] J. Pinto, J. Cardoso, A. Lourenço, and C. Carreiras, “Towards acontinuous biometric system based on ECG signals acquiredon the steering wheel,” Sensors, vol. 17, no. 10, p. 2228,2017.

[5] M. Vaidya, “A study of biometrics technology methods andtheir applications-a review,” International Journal of Innova-tions in Engineering and Technology, vol. 5, no. 2, p. 235, 2015.

[6] L. Ballard, D. Lopresti, and F. Monrose, “Forgery quality andits implications for behavioral biometric security,” IEEETransactions on Systems, Man and Cybernetics, Part B(Cybernetics), vol. 37, no. 5, pp. 1107–1118, 2007.

[7] Y. N. Singh and S. K. Singh, “Vitality detection from biomet-rics: state-of-the-art,” in 2011 World Congress on Informationand Communication Technologies, pp. 106–111, Mumbai,India, 2011.

10 Journal of Sensors

Page 11: ECG-Based Subject Identification Using Statistical ...

[8] T. van der Putte and J. Keuning, “Biometrical fingerprint rec-ognition: don’t get your fingers burned,” in Smart CardResearch and Advanced Applications, vol. 52 of IFIP — TheInternational Federation for Information Processing,pp. 289–303, Springer, Boston, MA, USA, 2000.

[9] J. L. Wayman, “Fundamentals of biometric authenticationtechnologies,” International Journal of Image and Graphics,vol. 1, no. 1, pp. 93–113, 2001.

[10] A. R. M. Bolle, J. H. Connell, S. Pankanti, N. K. Ratha, andA. W. Senior, Guide to Biometrics, Springer-Verlag, New York,NY, USA, 2003.

[11] G. E. Forsen, M. R. Nelson, and R. J. Staron Jr., Personal Attri-butes Authentication Techniques; Technical Report, PatternAnalysis and Recognition Corporation, Rome Air Develop-ment Center, Rome, NY, USA, 1977.

[12] L. Biel, O. Pettersson, L. Philipson, and P. Wide, “ECG analy-sis: a new approach in human identification,” in IMTC/99.Proceedings of the 16th IEEE Instrumentation and Measure-ment Technology Conference (Cat. No. 99CH36309), pp. 557–561, Venice, Italy, 1999.

[13] L. Biel, O. Pettersson, L. Philipson, and P. Wide, “ECG analy-sis: a new approach in human identification,” IEEE Transac-tions on Instrumentation and Measurement, vol. 50, no. 3,pp. 808–812, 2001.

[14] S. R. M. Prasanna, S. K. Sahoo, and T. Choubisa, “Multimodalbiometric person authentication: a review,” IETE TechnicalReview, vol. 29, no. 1, pp. 54–75, 2012.

[15] A. K. Jain, R. M. Bolle, and S. Pankanti, Biometrics: PersonalIdentification in Networked Society, Springer, 2005.

[16] F. Agrafioti, F. M. Bui, and D. Hatzinakos, “Secure telemedi-cine: biometrics for remote and continuous patient verifica-tion,” Journal of Computer Networks and Communications,vol. 2012, Article ID 924791, 11 pages, 2012.

[17] M. Li and S. Narayanan, “Robust ECG biometrics by fusingtemporal and cepstral information,” in 2010 20th InternationalConference on Pattern Recognition, pp. 1326–1329, Istanbul,Turkey, 2010.

[18] G.-H. Choi, E.-S. Bak, and S.-B. Pan, “User identification sys-tem using 2D resized spectrogram features of ECG,” IEEEAccess, vol. 7, pp. 34862–34873, 2019.

[19] S. S. Abdeldayem and T. Bourlai, “ECG-based human authen-tication using high-level spectro-temporal signal features,” in2018 IEEE International Conference on Big Data (Big Data),pp. 4984–4993, Seattle, WA, USA, 2018.

[20] P. Hong, J. Hsiao, C. Chung, Y. Feng, and S. Wu, “ECG bio-metric recognition: template-free approaches based on deeplearning,” in 2019 41st Annual International Conference ofthe IEEE Engineering in Medicine and Biology Society (EMBC),pp. 2633–2636, Berlin, Germany, 2019.

[21] X. Zhang, Y. Zhang, L. Zhang, H. Wang, and J. Tang, “Ballisto-cardiogram based person identification and authenticationusing recurrent neural networks,” in 2018 11th InternationalCongress on Image and Signal Processing, BioMedical Engineer-ing and Informatics (CISP-BMEI), pp. 1–5, Beijing,China, 2018.

[22] Y. Chen and W. Chen, “Finger ECG based two-phase authen-tication using 1D convolutional neural networks,” in 2018 40thAnnual International Conference of the IEEE Engineering inMedicine and Biology Society (EMBC), pp. 336–339, Honolulu,HI, USA, 2018.

[23] Z. Zhao, Y. Zhang, Y. Deng, and X. Zhang, “ECG authentica-tion system design incorporating a convolutional neural net-

work and generalized S-transformation,” Computers inBiology and Medicine, vol. 102, pp. 168–179, 2018.

[24] I. Odinaka, P. H. Lai, A. D. Kaplan, J. A. O'Sullivan, E. J.Sirevaag, and J. W. Rohrbaugh, “ECG biometric recognition:a comparative analysis,” IEEE Transactions on InformationForensics and Security, vol. 7, no. 6, pp. 1812–1824, 2012.

[25] J. S. Paiva, D. Dias, and J. P. S. Cunha, “Beat-ID: towards acomputationally low-cost single heartbeat biometric identitycheck system based on electrocardiogram wave morphology,”PLoS One, vol. 12, no. 7, article e0180942, 2017.

[26] I. Jekova, V. Krasteva, and R. Schmid, “Human identificationby cross-correlation and pattern matching of personalizedheartbeat: influence of ECG leads and reference database size,”Sensors, vol. 18, no. 2, p. 372, 2018.

[27] W. Lee, S. Kim, and D. Kim, “Individual biometric identifica-tion using multi-cycle electrocardiographic waveform pat-terns,” Sensors, vol. 18, no. 4, p. 1005, 2018.

[28] M. Merone, P. Soda, M. Sansone, and C. Sansone, “ECG data-bases for biometric systems: a systematic review,” Expert Sys-tems with Applications, vol. 67, pp. 189–202, 2017.

[29] https://physionet.org/physiobank/database/.

[30] C. Ye, M. T. Coimbra, and B. V. K. V. Kumar, “Investigation ofhuman identification using two-lead electrocardiogram (ECG)signals,” in 2010 Fourth IEEE International Conference on Bio-metrics: Theory, Applications and Systems (BTAS), pp. 1–8,Washington, DC, USA, 2010.

[31] T. W. Shen, W. J. Tompkins, and Y. H. Hu, “Implementationof a one lead ECG human identification system on a normalpopulation,” Journal of Engineering and Computer Innova-tions, vol. 2, no. 1, pp. 12–21, 2011.

[32] S. Poornachandra, “Wavelet-based denoising using subbanddependent threshold for ECG signals,” Digital Signal Process-ing, vol. 18, no. 1, pp. 49–55, 2008.

[33] N. Belgacem, A. Nait-Ali, R. Fournier, and F. Bereksi-Reguig,“ECG based human authentication using wavelets and randomforests,” International Journal on Cryptography and Informa-tion Security, vol. 2, no. 2, pp. 1–11, 2012.

[34] F. Porée, G. Kervio, and G. Carrault, “ECG biometric analysisin different physiological recording conditions,” Signal, Imageand Video Processing, vol. 10, no. 2, pp. 267–276, 2016.

[35] B. Singh, P. Singh, and S. Budhiraja, “Various approaches tominimise noises in ECG signal: a survey,” in 2015 Fifth Inter-national Conference on Advanced Computing & Communica-tion Technologies, pp. 131–137, Haryana, India, 2015.

[36] S. Z. Fatemian and D. Hatzinakos, “A new ECG feature extrac-tor for biometric recognition,” in 2009 16th International Con-ference on Digital Signal Processing, pp. 1–6, Santorini-Hellas,Greece, 2009.

[37] S. A. Israel, W. T. Scruggs, W. J. Worek, and J. M. Irvine, “Fus-ing face and ECG for personal identification,” in 32nd AppliedImagery Pattern Recognition Workshop, 2003. Proceedings,pp. 226–231, Washington, DC, USA, 2003.

[38] M. Kyoso and A. Uchiyama, “Development of an ECG identi-fication system,” in 2001 Conference Proceedings of the 23rdAnnual International Conference of the IEEE Engineering inMedicine and Biology Society, pp. 721–3733, Istanbul, Turkey,2001.

[39] R. Palaniappan and S. M. Krishnan, “Identifying individualsusing ECG beats,” in 2004 International Conference on SignalProcessing and Communications, 2004. SPCOM '04, pp. 569–572, Bangalore, India, 2004.

11Journal of Sensors

Page 12: ECG-Based Subject Identification Using Statistical ...

[40] R. Hoekema, G. J. H. Uijen, and A. van Oosterom, “Geometri-cal aspects of the interindividual variability of multilead ECGrecordings,” IEEE Transactions on Biomedical Engineering,vol. 48, no. 5, pp. 551–559, 2001.

[41] S. A. Israel, J. M. Irvine, A. Cheng, M. D.Wiederhold, and B. K.Wiederhold, “ECG to identify individuals,” Pattern Recogni-tion, vol. 38, no. 1, pp. 133–142, 2005.

[42] M. Kyoso, “A technique for avoiding false acceptance in ECGidentification,” in IEEE EMBS Asian-Pacific Conference onBiomedical Engineering, pp. 190-191, Kyoto, Japan, 2003.

[43] A. Fratini, M. Sansone, P. Bifulco et al., “Individual identifica-tion using electrocardiogram morphology,” in 2013 IEEEInternational Symposium on Medical Measurements andApplications (MeMeA), pp. 107–110, Gatineau, QC, Canada,2013.

[44] K. N. Plataniotis, D. Hatzinakos, and J. K. M. Lee, “ECG bio-metric recognition without fiducial detection,” in 2006 Biomet-rics Symposium: Special Session on Research at the BiometricConsortium Conference, pp. 1–6, Baltimore, MD, USA,2006.

[45] F. Agrafioti and D. Hatzinakos, “ECG based recognition usingsecond order statistics,” in 6th Annual CommunicationNetworks and Services Research Conference (cnsr 2008),pp. 82–87, Halifax, NS, Canada, 2008.

[46] S. C. Fang and H. L. Chan, “QRS detection-free electrocardio-gram biometrics in the reconstructed phase space,” PatternRecognition Letters, vol. 34, no. 5, pp. 595–602, 2013.

[47] S. Kouchaki, A. Dehghani, S. Omranian, and R. Boostani,“ECG-based personal identification using empirical modedecomposition and Hilbert transform,” in The 16th CSIInternational Symposium on Artificial Intelligence and SignalProcessing (AISP 2012), pp. 569–573, Shiraz, Fars, Iran, 2012.

[48] J. L. C. Loong, K. S. Subari, R. Besar, and M. K. Abdullah, “Anew approach to ECG biometric systems: a comparative studybetween LPC and WPD systems,” World Academy of Science,Engineering, and Technology International Journal of Biomed-ical and Biological Engineering, vol. 4, no. 8, pp. 430–445,2010.

[49] Z. Zhao, L. Yang, D. Chen, and Y. Luo, “A human ECGidentification system based on ensemble empirical modedecomposition,” Sensors, vol. 13, no. 5, pp. 6832–6864, 2013.

[50] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24,no. 2, pp. 123–140, 1996.

[51] T. K. Ho, “Random decision forests,” in Proceedings of 3rdInternational Conference on Document Analysis and Recogni-tion, pp. 278–282, Montreal, QC, Canada, 1995.

[52] T. K. Ho, “The random subspace method for constructingdecision forests,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 20, no. 8, pp. 832–844, 1998.

[53] Y. Amit and D. Geman, “Shape quantization and recognitionwith randomized trees,” Neural Computation, vol. 9, no. 7,pp. 1545–1588, 1997.

[54] M. N. Dar, M. U. Akram, A. Usman, and S. A. Khan, “ECGbiometric identification for general population using multire-solution analysis of DWT based features,” in 2015 SecondInternational Conference on Information Security and CyberForensics (InfoSec), pp. 5–10, Cape Town, South Africa,2015.

[55] S. Ergin, A. K. Uysal, E. S. Gunal, S. Gunal, and M. B.Gulmezoglu, “ECG based biometric authentication usingensemble of features,” in 2014 9th Iberian Conference on

Information Systems and Technologies (CISTI), pp. 1–6,Barcelona, Spain, 2014.

[56] Y. Wan and J. Yao, “A neural network to identify humansubjects with electrocardiogram signals,” in Proceedings ofthe World Congress on Engineering and Computer Science2008 (WCECS 2008), pp. 1–4, San Francisco, CA, USA,2008.

[57] E. J. da Silva Luz, G. J. P. Moreira, L. S. Oliveira, W. R.Schwartz, and D.Menotti, “Learning deep off-the-person heartbiometrics representations,” IEEE Transactions on Informa-tion Forensics and Security, vol. 13, no. 5, pp. 1258–1270, 2018.

[58] R. Donida Labati, E. Muñoz, V. Piuri, R. Sassi, and F. Scotti,“Deep-ECG: convolutional neural networks for ECG biometricrecognition,” Pattern Recognition Letters, vol. 126, pp. 78–85,2019.

[59] L. V. D. Maaten and G. Hinton, “Visualizing data usingt-SNE,” Journal of Machine Learning Research, vol. 9,pp. 2579–2605, 2008.

[60] J. Li, G. Deng, W. Wei, H. Wang, and Z. Ming, “Design of areal-time ECG filter for portable mobile medical systems,”IEEE Access, vol. 5, pp. 696–704, 2017.

[61] NIST/SEMATECH, “e-Handbook of statistical methods,”March 2016, https://www.itl.nist.gov/div898/handbook/.

[62] L. Breiman, “Random forests,” Machine Learning, vol. 45,no. 1, pp. 5–32, 2001.

[63] D. R. Cutler, T. C. Edwards Jr., K. H. Beard et al., “Randomforests for classification in ecology,” Ecology, vol. 88, no. 11,pp. 2783–2792, 2007.

[64] D. Gao, Y.-X. Zhang, and Y.-H. Zhao, “Random forest algo-rithm for classification of multiwavelength data,” Researchin Astronomy and Astrophysics, vol. 9, no. 2, pp. 14–39,2009.

[65] W. Hu, “Identifying predictive markers of chemosensitivity ofbreast cancer with random forests,” Journal of BiomedicalScience and Engineering, vol. 3, no. 1, pp. 59–64, 2010.

[66] A. R. Chowdhury, T. Chatterjee, and S. Banerjee, “A RandomForest classifier-based approach in the detection of abnormal-ities in the retina,” Medical & Biological Engineering & Com-puting, vol. 57, no. 1, pp. 193–203, 2019.

[67] R. Casanova, S. Saldana, E. Y. Chew, R. P. Danis, C. M. Greven,and W. T. Ambrosius, “Application of random forestsmethods to diabetic retinopathy classification analyses,” PLoSOne, vol. 9, no. 6, article e98587, 2014.

[68] M. N. M. García, J. C. B. Herráez, M. S. Barba, and F. S.Hernández, “Random forest based ensemble classifiers forpredicting healthcare-associated infections in intensive careunits,” in Distributed Computing and Artificial Intelligence,13th International Conference, S. Omatu, A. Semalat, G.Bocewicz, P. Sitek, I. E. Nielsen, J. A. García, and J. Bajo,Eds., vol. 474 of Advances in Intelligent Systems and Com-puting, Springer, Cham, Switzerland, 2016.

[69] T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements ofStatistical Learning: Data Mining, Inference, and Prediction,Springer, New York, NY, USA, 2nd edition, 2009.

[70] G. Wübbeler, M. Stavridis, D. Kreiseler, R. D. Bousseljot, andC. Elster, “Verification of humans using the electrocardio-gram,” Pattern Recognition Letters, vol. 28, no. 10, pp. 1172–1175, 2007.

[71] F. Agrafioti and D. Hatzinakos, “ECG biometric analysis incardiac irregularity conditions,” Signal, Image and VideoProcessing, vol. 3, no. 4, pp. 329–343, 2009.

12 Journal of Sensors

Page 13: ECG-Based Subject Identification Using Statistical ...

[72] F. Agrafioti and D. Hatzinakos, “Fusion of ECG sources forhuman identification,” in 2008 3rd International Symposiumon Communications, Control and Signal Processing, pp. 1542–1547, St Julians, Malta, 2008.

[73] Y. Wang, F. Agrafioti, D. Hatzinakos, and K. N. Plataniotis,“Analysis of human electrocardiogram for biometric recogni-tion,” EURASIP Journal on Advances in Signal Processing,vol. 2008, no. 1, Article ID 148658, 2007.

[74] S. I. Safie, J. J. Soraghan, and L. Petropoulakis, “Electrocardio-gram (ECG) biometric authentication using pulse active ratio(PAR),” IEEE Transactions on Information Forensics andSecurity, vol. 6, no. 4, pp. 1315–1322, 2011.

[75] M. M. Tantawi, K. Revett, A. Salem, and M. F. Tolba, “Fiducialfeature reduction analysis for electrocardiogram (ECG) basedbiometric recognition,” Journal of Intelligent InformationSystems, vol. 40, no. 1, pp. 17–39, 2013.

[76] J. Wang, M. She, S. Nahavandi, and A. Kouzani, “Humanidentification from ECG signals via sparse representation oflocal segments,” IEEE Signal Processing Letters, vol. 20,no. 10, pp. 937–940, 2013.

[77] I. Jekova and G. Bortolan, “Personal verification/identificationvia analysis of the peripheral ECG leads: influence of the per-sonal health status on the accuracy,” BioMed Research Interna-tional, vol. 2015, Article ID 135676, 13 pages, 2015.

[78] S. Brás and A. J. Pinho, “ECG biometric identification: a com-pression based approach,” in 2015 37th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety (EMBC), pp. 5838–5841, Milan, Italy, 2015.

[79] T. Waili, R. M. Nor, A. W. B. A. Rahman, K. A. Sidek, andA. A. Ibrahim, “Electrocardiogram identification: use a simpleset of features in QRS complex to identify individuals,” inRecent Advances in Information and Communication Technol-ogy 2016, P. Meesad, S. Boonkrong, and H. Unger, Eds.,pp. 139–148, Springer, Cham, Switzerland, 2016.

[80] X. Dong, W. Si, and W. Huang, “ECG-based identity recogni-tion via deterministic learning,” Biotechnology & Biotechnolog-ical Equipment, vol. 32, no. 3, pp. 769–777, 2018.

[81] T. N. Alotaiby, S. A. Alshebeili, L. M. Aljafar, and W. M.Alsabhan, “ECG-based subject identification using commonspatial pattern and SVM,” Journal of Sensors, vol. 2019,Article ID 8934905, 9 pages, 2019.

13Journal of Sensors

Page 14: ECG-Based Subject Identification Using Statistical ...

International Journal of

AerospaceEngineeringHindawiwww.hindawi.com Volume 2018

RoboticsJournal of

Hindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com Volume 2018

Shock and Vibration

Hindawiwww.hindawi.com Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwww.hindawi.com

Volume 2018

Hindawi Publishing Corporation http://www.hindawi.com Volume 2013Hindawiwww.hindawi.com

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwww.hindawi.com Volume 2018

International Journal of

RotatingMachinery

Hindawiwww.hindawi.com Volume 2018

Modelling &Simulationin EngineeringHindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com Volume 2018

Navigation and Observation

International Journal of

Hindawi

www.hindawi.com Volume 2018

Advances in

Multimedia

Submit your manuscripts atwww.hindawi.com