ECG Biometrics - ULisboa · ECG Biometrics A Dissimilarity ... Resumo A Biometria por ......

ECG BiometricsA Dissimilarity Representation Approach

Francisco Armando David da Silva Marques

Thesis to obtain the Master of Science Degree in

Electrical and Computer Engineering

Supervisor: Ana Luísa Nobre Fred

Examination Committee

Chairperson: Professor João Fernando Cardoso Silva SequeiraSupervisor: Ana Luísa Nobre Fred

Member of the Committee: Professor Luís Eduardo de Pinho Ducla Soares

October 2014

To my Mother and Father who wished for this day as much as I did.

iii

Acknowledgments

My first thanks will go to Instituto Superior Tecnico, whose strict excellence throughout these 5 years of

coursework allowed me to develop this work. Regardless, none of it would have been possible without

Professora Ana Fred who always found time to support me and my setbacks, despite her great workload

always pressing her.

A big thank you to my colleagues Jose Guerreiro, Priscilla Goncalves, Andre Lourenco and Carlos

Carreiras at Instituto de Telecomunicacoes, who were always ready for a good lunch and to demonstrate

their interest in my work.

I am also grateful for Joana Santos whose dedication for her study of the ECG and the heart spread

to me and the other members, especially as she easily pinpointed the conditions present in ECGs that

me and my colleagues were unable to see.

Diana Batista also spent a lot of her time aiding me with the beat validation algorithm for the MIT-BIH

arrhythmia database. For that, she has my huge gratitude.

Some very special thanks go to Carlos Carreiras, who never hesitated in briefing me on the tiniest

details of the existing software and gladly put up with my requests to set up experiments.

My gratitude for Andre Lourenco is enormous and his constant monitoring, advice and motivation are

immensely responsible for the successful conclusion of this thesis. It feels as if this thesis was partly

accomplished by him, thank you!

Lastly, my Mother seemed to possess an infinite source of motivation and not a single day passed

without my Father asking me how the thesis was progressing. To both of you goes my deepest gratitude!

v

Resumo

A Biometria por Electrocardiograma (ECG) e uma tendencia relativamente recente, existente ha 13

anos na comunidade cientıfica. A presente tese estuda com detalhe todos os processos inerentes

a um sistema biometrico por ECG: pre-processamento, segmentacao, extraccao de caracterısticas e

classificacao. Na fase de pre-processamento, desenvolveram-se 2 filtros para eficaz remocao de ruıdo,

e o seu efeito foi testado na classificacao biometrica. No caso da segmentacao, tres algoritmos de

segmentacao existentes foram ajustados as bases de dados utilizadas e testados na base de dados

MIT-BIH arrhythmia. Seleccionou-se o mais eficiente para uso posterior. Subsequentemente, esta tese

introduziu um novo espaco de representacao para ECGs, baseado em espacos de dessemelhanca.

Tais espacos obtem-se a partir do calculo de dessemelhancas duas-a-duas entre templates e sinais de

entrada do sistema. Esta representacao pode, adicionalmente, explorar o uso de sinais de diferentes

canais de ECG. O sistema aqui proposto foi comparado com um metodo fiducial e de canal-unico

existente, utilizando ECGs de uma base de dados com sinais medidos atraves dos 12 canais. Ambas as

tecnicas sao contrastadas com um outro metodo existente nao-fiducial, para ECGs de canal-unico cuja

base de dados contem sinais medidos atraves dos dedos. Utilizando um classificador k-NN identico,

resultados para a base de dados de 12 canais favorecem uma representacao de dessemelhancas.

Destaca-se o facto de o uso de mais do que um canal ECG nao ter oferecido vantagens substanciais.

No entanto, para os sinais ECG obtidos atraves dos dedos, que apresentam mais ruıdo, a tecnica

fiducial apresenta melhores resultados.

Palavras-chave: Electrocardiograma, ECG, Sistema Biometrico, Autenticacao, Identificacao,

Fiducial, Nao-fiducial, Batimento Cardıaco, pico-R, Segmentacao, Autocorrelacao, Analise-Discriminante-

Linear, Multi-canal, Representacao por Dessemelhancas, Ruıdo Baseline, Ruıdo Power-line, Ruıdo de

EMG.

vii

Abstract

Electrocardiogram (ECG) biometrics are a relatively novel trend in the field of biometric recognition,

comprising 13 years of development in peer-reviewed literature. This thesis studies all the inherent pro-

cesses to an ECG biometric system: pre-processing, heartbeat segmentation, feature extraction and

classification. For the pre-processing phase, it developed two filters for effective de-noising, substanti-

ated by images. The effect of both filters was then tested on biometric classification. For segmentation,

three state-of-the-art segmentation algorithms are adjusted to the working ECG databases and tested

over the MIT-BIH arrhythmia database. The most effective is picked for further use. This thesis subse-

quently introduces a novel ECG representation, based on a dissimilarity space, formed by taking pair-

wise dissimilarities between templates and input subjects’ signals. This representation can additionally

exploit the potential extra information sourced from multiple ECG leads. For a 12-lead ECG database

the hereby proposed dissimilarity-based method is compared to a single-lead state-of-the-art fiducial

technique. Both these methods are, moreover, contrasted with another non-fiducial published method,

for a single-lead, finger-obtained ECG database. Using the same k-NN classifier, results over the 12-

lead database favour a dissimilarity-based representation. However, a multi-lead configuration did not

prove itself advantageous. Contrastingly, over the noisier finger-obtained ECGs, the fiducial technique

presented better results.

Keywords: Electrocardiogram, ECG, Biometric System, Authentication, Identification, Fiducial,

Non-fiducial, Heartbeat, R-peak, Segmentation, Autocorrelation, Linear-Discriminant Analysis, Multi-

lead, Dissimilarity Representation, Baseline Noise, Power-line Noise, EMG Noise.

ix

Contents

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Resumo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi

1 List of Acronyms 1

2 Introduction 2

2.1 Biometric ID systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.2 Electrocardiogram Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 State-of-the-art 9

3.1 AC/LDA: A non-fiducial method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1.1 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Simple Heartbeat Comparison: A fiducial method . . . . . . . . . . . . . . . . . . . . . . . 14

4 De-Noising 16

4.1 Method A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 Method B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Segmentation 21

5.1 Method developed by Christov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.1.1 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.1.2 Adaptive Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.2 Method developed by Hamilton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.3 Method developed by Manikandan & Soman . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.4 Segmentation Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

xi

5.5 Algorithm Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.6 Elimination of Segments without ECG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6 AC-LDA: Revisited 33

6.1 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.2.1 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7 Dissimilarity Based Biometrics 36

7.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.2 Dissimilarity Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.3 Template Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.4 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.5 Dissimilarity Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7.6 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

8 Results 44

8.1 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

8.2 HSM Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

8.3 CVP Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

9 Conclusions 51

9.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

References 53

xii

List of Tables

5.1 Table comparing all algorithms’ QRS sensitivities and positive predictivities. Label ‘Extra’

refers to the extra step developed for the method in [33], explained in Section 5.3. . . . . 30

5.2 Table comparing all algorithms’ average execution times in seconds. . . . . . . . . . . . . 31

8.1 All experiments’ EER & EID rates over the HSM database. . . . . . . . . . . . . . . . . . . 46

8.2 All experiments’ EER & EID rates over the CVP database. Between Session results show

0 standard deviation due to only 1 executed run. . . . . . . . . . . . . . . . . . . . . . . . 49

xiii

List of Figures

2.1 A labelled prototypical ECG segment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Figure representing Einthoven’s Triangle and respective leads. The lead voltages are

taken from the plus to the minus signs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Diagram of an ECG biometric system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.1 Two valid windows from different subjects and those same subjects’ AC coefficients for all

valid windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Flowchart outlining the biometric system developed by Carreiras et al. [7]. Current Figure

was kindly ceded by the authors of [7]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1 Illustration of various noise sources affecting ECG signals. . . . . . . . . . . . . . . . . . . 17

4.2 On the left: Median Filter application on a baseline corrupted ECG. On the right: Corre-

sponding FFTs. Notice the presence of power line noise in the FFTs. . . . . . . . . . . . . 18

4.3 Figure illustrating the effect of the filter for EMG removal. . . . . . . . . . . . . . . . . . . . 19

4.4 Figure demonstrating the behaviour of filters in both Method A and B. . . . . . . . . . . . 19

5.1 Execution example of the algorithm developed by Christov. . . . . . . . . . . . . . . . . . 23

5.2 Transformation process applied to the ECG signal so as to perform beat detection. . . . . 24

5.3 Block diagram of the utilized R-peak detection technique. . . . . . . . . . . . . . . . . . . 26

5.4 Beat finding process of the Algorithm developed by Manikandan & Soman. . . . . . . . . 27

5.5 Execution example for the added step to the method developed by Manikandan & Soman. 28

5.6 Example of allocation process. Traditional methods of assignment to highest peak would

not work for this case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.7 Flowchart outlining the developed heuristic proof-reading filter. In case the conditions in

black are not verified, the heartbeat is discarded as an outlier. . . . . . . . . . . . . . . . . 31

5.8 Two execution examples of the designed proof-reading algorithm over ECG signals filtered

as in Method A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.1 Flowchart of the AC/LDA algorithm utilized in this study. . . . . . . . . . . . . . . . . . . . 33

6.2 Outlier Removal performed for the windows of 4 subjects. Green for accepted and red for

rejected windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

xv

7.1 Block diagram of the proposed ECG biometric system. Note that lead C simply refers to

an arbitrary lead. Also, the suspension points underline the possibility of using additional

leads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7.2 Outlier Removal performed for segments belonging to 4 subjects. Green for accepted and

red for rejected windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7.3 Enrolment for both approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

7.4 Block diagram illustrating the identification procedure for the Subject based dissimilarity

approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7.5 Block diagram illustrating the identification procedure for the Inter-subject based dissimi-

larity approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

8.1 ROC curves outlining obtained results for authentication over all the used metric combi-

nations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

8.2 ROC curves outlining obtained results for authentication over both dissimilarity approaches. 47

8.3 ROC curves outlining obtained results for authentication over all methods for filter in

Method A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

8.4 Plots demonstrating intra-subject temporal variation. Each plot corresponds to a subject,

where 20 heartbeats are plotted, 10 correspond to T1 & Sitting and the other 10 to T2 &

Sitting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

xvi

Chapter 1

List of Acronyms

ECG - Electrocardiogram, the electrical signal produced by the heart.

I, II, III - The three limb leads of an ECG recording.

EMG - Electromyogram, the electrical activity produced by skeletal muscles. In what concerns this work,

it is a noisy interference.

k -NN - k Nearest Neighbour Classifier. This classifier measures distances between input and template

data. Afterwards, it considers the k smallest ones for classification of input data.

PT - Periodicity Transform method proposed by Sethares and Staley [39]. It is a technique for estimating

the period of quasi-periodic signals.

AC/LDA - Autocorrelation Linear-Discriminant Analysis algorithm developed by Agrafioti [1]. It gener-

ates non-overlapping ECG windows, computes its Autocorrelation and applies Linear-Discriminant

Analysis for dimensionality reduction. Uses a k -NN classifier over this feature set.

SHC - Simple Heartbeat Comparison algorithm designed by Carreiras et al. [7]. Segments the ECG

signal into heartbeats and applies a k -NN classifier over the beats as feature arrays.

MIT-BIH - MIT’s ECG database, whose records are sourced from Beth Israel Hospital (now the Beth

Israel Deaconess Medical Center).

HSM - One of the databases employed by Instituto de Telecomunicacoes, sourced from Hospital de

Santa Marta.

CVP - Another database employed by Instituto de Telecomunicacoes, sourced from Cruz Vermelha

Portuguesa.

1

Chapter 2

Introduction

2.1 Biometric ID systems

Since their very existence, humans have used body characteristics such as face, eye colour and voice

to identify their peers. Indeed, animals have employed their species’ inherent physical features to recog-

nize their partners and or offspring/parents. Thousands of years after, automatic identity validation has

become relevant to our everyday lives. Examples of such activities are financial transactions, restricted

access control and health care, among many others. It is extremely important to perform this automatic

recognition accurately – so as to protect the goods being accessed – and safely – in order to avoid

identity theft [5].

The most widely employed automatic identity recognition procedures comprise items such as PIN

numbers, passwords and ID cards. These methods are highly susceptible to the above mentioned

problem – identity theft. For instance in 2012, approximately 16.6 million persons, or 7% of all U.S.

residents aged 16 or older, reported being victims of one or more incidents of identity theft [23].

Moreover, these tactics’ means for authentication are either token-based or knowledge-based raising

the issues of item loss or forgetfulness, which further contribute to the problem of identity theft [25].

It is on these grounds that biometric recognition gains in relevance – a user has no need to carry

anything other than himself nor is he required to remember anything.

A biometric system is a pattern recognition system that operates by acquiring biometric data from

an individual, extracting a feature set from the acquired data, and comparing this feature set against the

template set in the database.

There exist several biometric modalities each with their own pros and cons suiting them to particular

applications. No specific biometric is expected to outperform the others over all the range of applications.

Examples of biometric techniques include the fingerprints, – an already widespread modality – the iris

or the voice [25].

Unfortunately, biometric techniques also present some flaws. Among the same individual, no two

biometric readings of the same modality are identical. For instance, in a fingerprint scanner, the way

and position a finger is placed will never be identical, leading to different minutiae highlights. As a result

2

of this uniqueness in reading, biometric systems are susceptible to false rejection, a problem which

traditional ID systems do not present.

These systems are also sensitive to false matching – brute force attacks (without ever being in pos-

session of the correct user’s biometric data) are capable of breaching biometric systems [25]. Credential

Falsification is also another issue, where an iris can be falsified with contact lenses or a fingerprint might

be forged by a gummy gel finger [24].

However, multimodal biometric systems can potentially solve or reduce these problems by combining

more than one biometric feature [25]. Furthermore, the study and improvement of these methods will

no doubt decrease these negative influences, especially when one notices how recent these ID systems

are when compared to the traditional methods.

2.2 Electrocardiogram Biometrics

Electrocardiogram Signals (ECG) are a recent trend in Biometrics, dating back to 2001 from the pioneer-

ing works in Biel et al. [4], Kyoso and Uchiyama [27]. These and other studies assert and demonstrate

that ECG signals carry very detailed information of heart activity, which happens to be highly personal-

ized.

An ECG is a recording of the heart’s electrical activity. The ECG consists of three main components:

P wave, QRS complex, and T wave. The P wave occurs due to atrial depolarization, the QRS complex,

due to ventricular de-polarization, and the T wave, due to ventricular re-polarization. See Figure 2.1.

Figure 2.1: A labelled prototypical ECG segment.

The measurement of an ECG is obtained via

correct placement of a number of electrodes in

specific parts of the body and by tracing voltages

between them - a lead. Specific linear combina-

tions between electrode potentials result in differ-

ent leads. For instance, lead I corresponds to

the voltage between the electrodes in the right

and left arms, as stated by Einthoven’s Triangle,

which defines voltage relationships between the

limb leads I, II and III [11]. It is shown in Fig-

ure 2.2. A typical chest ECG is taken via 10

electrodes and uses 12 leads. Nowadays, many

other electrode and lead configurations exist, us-

ing one, two or three leads [35].

One major advantage these signals present over other biometric modalities is the fact they are ex-

tremely hard to replicate, though not impossible [45]. Other highlights consist of the intrinsic liveness

detection – an ECG is only present in living individuals – as well as the possibility to be used in contin-

uous authentication systems – depending on the hardware employed, readings can be obtained every

few seconds, in contrast with the static images of fingerprints or irises.

3

Figure 2.2: Figure representing Einthoven’s Triangle

and respective leads. The lead voltages are taken

from the plus to the minus signs.

However, some notable challenges also arise

with the employment of this technique. ECG sig-

nals are subject to high intra-personal variability.

Different measurements over different time peri-

ods, performed on the same individual, might be

distinct. These differences can be caused by al-

terations in the body’s physiology and psychology

over time, on one hand, and by the emergence of

cardiac conditions which alter the ECG morphol-

ogy, on the other hand [6].

Furthermore, ECG signals are prone to sev-

eral sources of noise, be they caused by the

recording equipment (power line noise or loss of electrode contact) or by other signals emanating from

the subjects body (muscle contractions and baseline drift caused by respiration) [21].

One last issue with ECG biometrics, is the signal acquisition itself. Given that every heartbeat is

formed approximately 1 s after the previous one, obtaining heartbeats for feature extraction is a con-

siderable more time consuming task than in other biometric modalities. It is of great importance for

ECG biometric algorithms to minimize the number of heartbeats required for safe identification and thus

reduce the acquisition and processing times.

The continuing studies in the field of Electrocardiogram Biometrics will doubtlessly reduce the set-

backs behind this technique and thus reinforce its ability and applicability.

Figure 2.3: Diagram of an ECG biometric system.

An ECG biometric system works just like any other biometric system or any supervised learning

algorithm(see Figure 2.3). Its functioning can be divided into two modes of operation – enrolment and

classification.

During enrolment the system is given a user’s ID and extracts his/her information – the training

data. From this data, the classifier generates the templates which best represent the user and build the

system’s database. More enrolled subjects imply greater difficulty for identification, as the number of

possible choices increases (and the probability of having an individual with similar data also increases).

This effect can be countered by storing more templates for a given subject.

The second mode, classification, extracts a current user’s data and compares it to the registered

training data. It then outputs the best match, i.e. the correct user in the case of a biometric system.

4

Classification can be divided into two procedures, authentication and identification.

Authentication

The system is provided an ID and will confirm or deny that a given registered user is who he/she

claims to be. For that purpose it records the user’s current data and compares it solely to that

same user’s stored templates. Authentication is typically used for positive recognition, where the

aim is to prevent multiple people from using the same identity.

Identification

In this case, the classifier is not given any ID by the user. As a result, it must search its entire

database for the subject whose templates best match the input data (or fails if the subject is not

registered in the system database). Identification is a critical component in negative recognition,

where the system establishes the user is who he/she denies to be. Identification may also be

used in positive recognition for convenience, as the subject does not need to claim an identity.

This mode of operation is more susceptible to errors as it demands the comparison between all

subjects in a database (authentication only verifies one subject). Also, non-registered users might

attempt to access the system. Two approaches can be taken towards identification, in these cases.

Closed-world approach

In this case, the biometric system only considers the existence of previously enrolled subjects.

As a result, any user seeking to access the system, whether registered or not, will be provided

a match.

Open-world approach

Here, the biometric system assumes the existence of more subjects than those currently

enrolled. Whenever a subject is searched for an appropriate identity, the score of its closest

match is compared to a pre-set threshold. If this score does not surpass this threshold, then

identification fails and the user is considered not enrolled. The current thesis adopts this

approach.

2.3 Objectives

The main goal of this thesis is to study and develop or employ efficient mechanisms for all the intrinsic

steps of an ECG Biometric System. These can be described according to the list below.

De-noising

As mentioned, an ECG is highly prone to noise from several sources [21]. The first step an ECG

biometric system must execute is the effective removal of the noise introduced into the ECG record-

ing.

Segmentation and/or Transformation

The typical procedures that convert an ECG signal into several templates are Segmentation and

5

Windowing. Segmentation consists of finding the heartbeats present in an ECG recording and

separating them from one another. Windowing simply cuts the ECG into various pieces according

to certain rules (different from those for segmentation). In any case, the segment and the window

will then represent the basic ECG unit. Both of these can be followed by one of many possible

Transformations into another domain. Whatever the choice it will clearly determine the type and

quality of the extracted features and with it the effectiveness of the biometric system.

Feature Extraction

Infinite sets of features can be developed after having an ECG unit at hand. Many possibilities and

combinations exist and choosing a set which maximizes the ratio of information over the complexity

of the set of features is of high importance.

Outlier Removal

Some feature vectors extracted from a subject’s ECG might be more correlated with one another

than others. This is especially true for noisier ECG signals where some vectors will be too cor-

rupted to appropriately represent their source subject. It is vital for an ID system to keep one

individual’s templates as similar as possible and as different from another subject’s templates as

possible. As such, the most dissimilar feature arrays, the outliers, must be detected and removed.

Classifier

This is the step which will evaluate the extracted features and decide who they belong to. A

good classifier must appropriately compare the features between the same and different subjects,

relating the first and separating the latter as much as possible.

Multi-lead Approach

As mentioned, an ECG recording might include signals from more than one-lead and up to twelve

leads. Some Authors have employed more than one lead in the hopes of perfecting their biometric

system. A multi-lead approach will be investigated in this work, as well as its effect at improving

the performance.

2.4 Thesis Contributions

This thesis begins by mixing already established filtering techniques [7, 14] and proposing two different

filters for ECG de-noising.

Afterwards, minor adaptations are carried out over published segmentation algorithms [9, 22, 33],

which are hoped to improve this biometric step over more noise-prone databases - such as the ones

hereby employed. Additionally, a heuristic proof-reading mechanism is designed so as to detect and

discard contact-loss noise.

Several feature spaces can be constructed from ECG signals. A vast majority of them are based

on ECG heartbeats [35]. This thesis adopts a dissimilarity space for ECG biometrics for the first time.

Such a space is constructed by taking pairwise comparisons between feature arrays taken from two

6

ECG beats. Furthermore, this method can easily encompass signals from more than one lead. Two

different dissimilarity spaces have been designed and these are in turn tested for single and multi-

lead configurations. These methods are contrasted with an existing single-lead fiducial system [7] over

a 12-lead ECG database. An article outlining this newly developed technique was submitted to the

BIOSIGNALS 2015 conference and awaits approval.

Lastly, there is a trend to develop simpler ECG acquisition systems in order to improve usability. This

thesis tests a database formed by single-lead ECGs acquired from the fingers on the above mentioned

techniques as well as on a non-fiducial technique [1, 20].

2.5 Thesis Outline

The current thesis follows the same structure as a functioning ECG biometric. The exception to this rule

is Chapter 3, which studies the State-of-the-Art of this field of study and outlines two already developed

techniques – the methods by Agrafioti [1] and Carreiras et al. [7] – which will be used as references here.

Chapter 4 studies the types of noise affecting ECG signals and proposes two techniques to remove

them. Results of both de-noising procedures are also shown there. The clean ECG is then segmented

into its composing heartbeats – Chapter 5 addresses three existing methods for this process, analyses

their performance with respect to a reference database, the MIT-BIH arrhythmia database, and chooses

the most adequate for this work.

From now on, this thesis fuses the feature extraction and classification in the same chapter and

outlines these according to the Biometric System of use. Thus, in Chapter 6, alterations employed to

the methodology proposed by Agrafioti [1], Foteini Agrafioti and Hatzinakos [20] are described. The

biometric system developed for this thesis is outlined in Chapter 7.

The results obtained for the developed biometric systems are analysed in Chapter 8. Lastly, conclu-

sions are taken in Chapter 9 as well as outlining possible future work.

7

Chapter 3

State-of-the-art

As was referred in the Introduction, ECG biometric systems were first proposed in 2001 by Biel et al.

[4] and Kyoso and Uchiyama [27]. These works stated that the signal from a single ECG lead contains

enough information to support a biometric system. In line with this assertion, authors in this field of

study generally develop biometric systems which use the information from only one lead of the signal

[35]. Exceptions include the works of [48, 2, 49, 18] who utilized two, three and twelve channels. The

algorithms in [2, 49, 18] were developed so as to perform a comparison between single-lead and multi-

lead versions. In all these works, it has been concluded that the fusion of features from various leads

boosts the performance of the described technique. Agrafioti and Hatzinakos [2] also studies the usage

of lead-by-lead fusion at the decision level. Influenced by these studies, the present thesis proposes a

method which is able to utilize signals from more than one ECG channel – here, comparisons between

single, triple and twelve lead configurations will be executed.

Regardless of the number of leads ECG biometric systems use in their execution, they are generally

divided into three types. They are determined by the type of extraction and features employed, i.e. what

parts of the ECG signal are employed for the identification procedure.

Fiducial

These methods use characteristic points of an ECG heartbeat and/or relationships between them

as features – the fiducial features. Characteristic points include the peak of the R-wave, while a

relationship between points can be the temporal duration of a QRS impulse. Several combinations

of the four types [44] of fiducial features have been used in the literature – temporal, amplitude,

angle and dynamic (R-R intervals) [35].

Non-Fiducial

Techniques based on non-fiducial features do not use characteristic points as features. De-

spite that, most of them rely on some characteristic points for heartbeat segmentation [8, 19]

while others simply create windows from the ECG recording [37, 2, 28]. Afterwards, these seg-

ments/windows are transformed into another domain so as to extract features from the resulting

signals. Also, heartbeat characteristic points might be used to aid in outlier removal. However,

they will never be used as features or during classification.

9

Hybrid

Algorithms in this group resort to both fiducial and non-fiducial features for their biometric system.

Some use a combination of them as features [47, 42]. Others design two classifiers where the first

uses non-fiducial features to reduce the match set and the second outputs the classification while

being fed with the fiducial features [41, 40].

Often, fiducial techniques are simpler, more direct and observable and generally effective. Nonethe-

less, they are highly sensitive to intra-personal variability and the varying heart-rate [12]. The impact of

such a problem is accentuated by beat variation over time, i.e. a subject’s beat might be substantially

different after some months, especially if the individual alters his/her lifestyle [12].

For that reason, non-fiducial methods have gained relevance thanks to their greater independence

from these factors and in spite of their extra complexity. Notwithstanding, few non-fiducial algorithms

completely discard R-peak detection as they still require heartbeat segmentation and alignment as in all

the other fiducial techniques [35]. Only few go far enough as to discard these steps all together. They

simply perform their respective transformation over ECG windows which might [28] or might not overlap

[47, 2, 37, 1].

One such method is the one described by Agrafioti [1]. This technique segments the ECG record

into non-overlapping windows and calculates their Auto-Correlation. Afterwards, Linear-Discriminant

Analysis reduces the dimension of the feature space and the windows are classified according to a k-

NN classifier through Euclidean Distance. The current thesis developed a version of this algorithm for

testing and comparison with two other methods. Below, in Section 3.1, this technique is described in

detail. It is also worth noting that Agrafioti and Hatzinakos [3] earlier published a similar iteration of this

algorithm with a form of template matching which is not tested here.

ECG Biometric systems can also be determined according to the classifier they utilize. The list in this

field is, however, extremely varied ranging from k -Nearest Neighbors and Nearest Center Classifiers,

through Support Vector Machines and Artificial Neural Networks, to mention just a few. The diversity of

classifiers in this field is further explored in [35].

Lastly, the measurement of ECG signals has evolved considerably from the standard 12-lead chest

recording. Nowadays there are single-channel recording systems which range from the placement of

two electrodes on each one of individual’s thumbs [43, 8] or palms [41] to the positioning of electrodes

bilaterally across the lower ribcage [34].

All the methods tested in this study will be compared with the technique developed by Carreiras et al.

[7] whose methodology is described in Section 3.2.

3.1 AC/LDA: A non-fiducial method

This section describes the methodology behind the biometric technique of the Auto-Correlation/Linear-

Discriminant Analysis (AC/LDA), proposed by Agrafioti [1], Foteini Agrafioti and Hatzinakos [20]. It is a

non-fiducial biometric system that intends to circumvent dependency over the heart-rate. It follows the

steps described next.

10

3.1.1 Windowing

This algorithm is one of the few which does not use individual heartbeats for the identification procedure

(as mentioned in 3). Thus, it does not require segmentation nor R-peak detection. Instead, it employs

non-overlapping windows, whose only prerequisite is the presence of more than one heartbeat in the

ECG signal to be analysed.

As a result, in the work of Agrafioti [1] non-overlapping windows of 5 s have been used. This time

period is considered fast enough and ensures the presence of multiple heartbeats – improving the quality

of the features.

Outlier Detection

The work of Foteini Agrafioti and Hatzinakos [20] estimates a quality measure for a given window. To

that end, it utilizes the periodicity transform (PT) developed in [39].

This mechanism takes into consideration the quasi-periodic properties of the ECG waveform and

attempts to estimate the best period for the input windows. The windows are then projected into this

period and a quality factor is calculated taking the original and projected windows into consideration.

The higher the quality estimation, the greater the confidence about the estimated window. Through

this rationale, windows with low quality are discarded.

3.1.2 Feature Extraction

The process of feature extraction consists of two steps, Auto-Correlation over a window and the reduction

of feature-space dimensions via Linear-Discriminant Analysis.

Auto-Correlation

The reason to use the Auto-Correlation (AC) as a feature extractor is that it captures the repetitive

property of an ECG signal, such that only significant components contribute to the waveform - mainly

the QRS complex, but also T and P waves as well.

Its main advantage is not needing fiducial points detection, as some ECGs can be relatively noisy

and some of their fiducial points might be corrupted. With this said, the AC is computed as:

Rxx[m] =

N−|m|−1∑i=0

x[i]x[i+m] (3.1)

where x[i] is a valid ECG window for i = 0, 1..., (N − |m| − 1), and x[i+m] is the time shifted version of

the windowed ECG with a time lag of m = 0, 1, ...(M − 1) with M << N .

Foteini Agrafioti and Hatzinakos [20], utilizes M = 0.1 × Sampling Rate. This time period corre-

sponds approximately to the duration of a QRS pulse, where most of a segment’s information is con-

tained. Also, it is the QRS pulse which suffers the least intra-subject variability.

11

Lastly, the AC signal varies considerably with amplitude variations in the ECG signal’s waves. As

a result, it is necessary to normalize the AC coefficients with respect to the zero lag AC coefficient i.e.

Rxx[0].

The AC procedure will then result in an array of M coefficients starting on the normalized Rxx[1] and

terminating in Rxx[M ]. Note that the normalized Rxx[0] is always 1 and is thus of no use in the future

biometric procedures. See Figure 3.1 for an example.

Figure 3.1: Two valid windows from different subjects and those same subjects’ AC coefficients for allvalid windows.

The resulting AC array is further divided into n (with n set to 4 as in [1]) equal segments and Linear-

Discriminant Analysis is performed on each of them independently. During biometric matching, different

weights are applied to each of these segments – those closer to the main AC peak weighted higher.

These weights then decrease linearly until the segment furthest from the main AC peak.

The cause of this division is the fact that the closer to the main AC peak, the greater the contribution

of the QRS complex. Furthermore, the QRS complex exhibits the least variability under varying heart

rates. In other words, the intra-subject variability increases while moving away from the main AC peak.

Linear-Discriminant Analysis

An AC array can be used directly for classification. However, it is important to further reduce the intra-

subject variability in order to control false rejection.

As such, Linear-Discriminant Analysis (LDA) is recruited for dimensionality reduction. Supervised

learning is performed in a transformed domain so that the AC array’s dimensionality is reduced and the

classes are better separable. The remaining discussion is based on the following definitions:

• Let U be the number of classes, i.e. the number of subjects registered in the system.

• Let Ui be the number of AC windows for a subject (class) i, where i = 1...U .

• Denote by zij an AC window j, where i = 1...U and j = 1...Ui .

• Let Zi be the set of AC windows for a subject (class) i, defined as Zi = {zij}Uij=1

• Let Z be a training set consisting of all AC windows of all subjects i.e., Z = {Zi}Ui=1 .

12

Then (U−1)-dimensional feature vectors may be obtained upon LDA execution. Note that when U > Mn ,

LDA is not capable of dimension-reduction and is thus not performed. In these cases the resulting AC

arrays are directly used in Biometric Matching.

The aim of LDA is to estimate a set of (U − 1) feature basis vectors {ψm}U−1m=1 by maximizing the

Fisher’s ratio. This is equivalent to solving the following eigenvalue problem:

argmaxψ

∣∣ψTSbψ∣∣|ψTSwψ|

(3.2)

where ψ = [ψ1, ..., ψU−1] and Sb and Sw are the inter-class and intra-class scatter matrices respectively,

computed as follows:

Sb =

U∑i=1

Ui(zi − z)(zi − z)T (3.3)

Sb =

U∑i=1

Cov(Zi) (3.4)

where zi = 1Ui

∑Ui

j=1 zij is the mean of class Zi, zi = 1N

∑Ui=1

∑Ui

j=1 zij is the mean window over all

classes, N is the total number of windows for all subjects N =∑Ui=1 Ui and Cov() is the covariance

matrix for a given input class.

The maximization of Fisher’s ratio is equivalent to forcing large separation between projected ECG

windows of different subjects, and small variance between windows of the same subject. The LDA

finds ψ as the U − 1 most significant eigenvectors of (Sw)−1Sb corresponding to the first U − 1 largest

eigenvalues.

If enrolling a subject, the execution ends here and the LDA transformed vectors are stored. In case of

authentication or identification, the subject’s windows are transformed into the LDA space and matching

is performed.

3.1.3 Classification

Both identification and authentication modes follow the same principles for matching. Input and template

AC windows are associated using the weighted sum of the n Euclidean distances for all the n intervals,

while the final decision is made upon voting of k-Nearest Neighbours (k-NN), with k set to 3.

The normalized Euclidean distance is computed as follows:

di(y1, y2) =1

K

√(y1 − y2)T (y1 − y2) (3.5)

where y1 and y2 are two different feature sets and K is the number of features in y1 and y2. The i in di

corresponds to the interval being compared (of the n existent).

A full distance d between a pair of windows is then given by:

d =

n∑i=1

diwin

(3.6)

13

where wi is the weight attributed to interval i. As previously explained, these weights decrease linearly

the further away from the main AC peak interval i’s samples are.

From all the calculated distances, the 3 shortest ones are taken. If working in authentication mode,

these distances are subject to comparison with a threshold tauth.

dk <= tauth (3.7)

where dk is one of the k resulting distances. If more than 2 of these respect the inequality in 3.7 the

input user is confirmed. A closed world identification approach was taken for this work. Consequently,

when under identification mode, the system simply outputs the most voted user, or fails if no consensus

was reached.

3.2 Simple Heartbeat Comparison: A fiducial method

This biometric system is an algorithm developed for single-lead ECG biometrics. It will be taken as

a reference when comparing all the other mentioned algorithms, for all the testing databases. This

technique takes as basic elements ECG heartbeats, which it uses as templates. Figure 3.2 outlines its

execution process.

Figure 3.2: Flowchart outlining the biometric system developed by Carreiras et al. [7]. Current Figurewas kindly ceded by the authors of [7].

14

Initially, the raw ECG signals from lead I are submitted to a data pre-processing block, which per-

forms a digital filtering step (band pass FIR filter of order 150, and cut-off frequencies [5, 20] Hz).

R-peak detection is performed over these filtered signals as in [22]. The outputs of this block are

segmented into individual heartbeats whose timespan consists of 600 ms. Note that the R-peak is

situated at 200 ms. Varying heart rates did not cause overlap between heartbeats. Thus, this time span

was considered sufficient to contain the QRS waveform as well as the T-wave, the most important parts

of the heartbeat in what concerns biometric information.

Some individual heartbeats deviate from the typical beat present in a given subject. In biometric

terms it is important to focus on the most representative beats for a given user. As a result, an outlier

detection block performing detection and removal of anomalous ECG heartbeats is employed. It follows

the DMEAN approach described in [31], which computes the distance of all templates in an ECG record

to the mean template for that record, with templates being considered outliers if the computed distance

is higher than an adaptive threshold.

The pattern extraction block takes the input heartbeats and computes their features. This method

considers the features to be all the amplitudes within a segmented heartbeat.

Finally, a k-NN classifier (with k set to 3) takes the extracted heartbeats for a given input subject and

compares them with the stored templates by measuring distances. These distances are computed via

the cosine distance metric, to produce a decision on the recognition of the individual (either in authenti-

cation or identification).

Altogether, this biometric system is fairly simple, being computationally light and opening the possi-

bility of integrating it into embedded systems, which have limited processing power.

15

Chapter 4

De-Noising

De-Noising comprises the first phase behind an ECG biometric system. It represents a small percentage

of the work carried out by such a system. In spite of that, it is of great importance, as it improves the

quality of the input signals. This, in turn, originates more and better information and facilitates the

classification procedure. During measurement, the ECG signal can be corrupted by various sources of

which the main ones are outlined below [21] and illustrated in Figure 4.1.

Power Line Interference – The recording process might pick up power line signal influence in the 50 Hz

(60 in the USA) region and harmonics. This noise component can be modelled as a sinusoid or

combination of sinusoids and can reach up to 50% peak-to-peak ECG amplitude.

Loss of Electrode Contact – This signal corruption is caused by loss of contact between an electrode

and the skin which disconnects the measurement system from the subject. Since the ECG signal

is usually capacitively coupled to the system, this noise source generally causes large artefacts

which might reach the maximum recording amplitude.

Motion Artefacts – These are transient baseline changes caused by vibrations and/or movement of the

subject. These movements cause electrode motion which alters the electrode-skin impedance. As

this impedance changes, the ECG amplifier sees a different source impedance, which forms a

voltage divider with the amplifier input impedance and modifies it. Their effect is of mostly low-

frequency band.

Electromyographic (EMG) Noise – This type of noise signal is originated by muscle contractions and it

generates artefactual millivolt or microvolt level potentials. It is usually negligible though sometimes

it might reach 10% of peak-to-peak ECG amplitude. The EMG signals can be assumed to be

transient bursts of zero-mean band-limited Gaussian noise, covering a large frequency spectrum

(0–10000Hz). For the scope of this work, it will be considered white noise.

Baseline Drift with Respiration – Respiration induced body motion can cause baseline drift to an ECG

signal. This drift can be represented as a sinusoidal component at the frequency of respiration

(0.15-0.3 Hz) added to the ECG.

16

(a) ECG recording corrupted by powerline noise, visible in thesmall amplitude sinusoid present in the signal.

(b) ECG suffering from contact loss at the end of the signal. Itlacks the ECG beats and shows high amplitude saturations.

(c) ECG affected by a motion artefact, the transitory signalvisible around the second heartbeat.

(d) ECG corrupted by EMG noise, a form of Gaussian whitenoise added to the record.

(e) ECG affected by baseline drift induced by respiration, thelow-frequency oscillation visible in the ECG record.

Figure 4.1: Illustration of various noise sources affecting ECG signals.

Other noise sources such as Electro-surgical noise or noise caused by instrumentation devices rarely

appear and when they do, the whole ECG measurement is irreversibly corrupted [21].

As such, this Chapter studies the effects of the above mentioned noise sources and employs methods

that effectively remove or reduce their effect. Electrode Contact Loss cannot be corrected via Signal

Processing, as even if the visual effects were to be removed, no ECG would be present – it will be

addressed in Chapter 5, Section 5.6. As a result, the focus shall be directed to the other 4.

Several methods exist for dealing with baseline wander [38, 26]. The most widely used comprise

Spline Curve Fitting, Median Filtering, Adaptive Filtering, FIR or IIR High-pass Filtering and Wavelet

Filtering.

The present pre-processing technique employs a form of median filtering to remove baseline wander,

following the work in [14]. Not only does it not add any distortion to the ECG signal in any circumstance,

but it is also extremely computationally efficient.

17

Figure 4.2: On the left: Median Filter application on a baseline corrupted ECG. On the right: Corre-sponding FFTs. Notice the presence of power line noise in the FFTs.

It consists of an application of two median filters (of 200 ms and 600 ms respectively) to the raw ECG

signal. The first removes the effect of the QRS waveforms, while the second deletes the T-waves from

the signal. Without these waveforms, the resulting signal represents the estimation of the baseline of the

raw ECG, which is subtracted to it. See Figure 4.2 for an example of its application, on the same ECG as

in Figure 4.1(e). In Figure 4.2 the FFTs of the original and post-processed ECGs are also present and

one can observe the reduction of low-frequency content and the almost lack of distortion on the other

frequency bands.

Other methods, with the exception of Wavelet Filtering, either introduce distortions or discontinuities

(Spline Curve Fitting and Adaptive Filtering) to the ECG signal or are very computationally expensive

or unsuitable for real-time applications (High-Pass Filtering for example demands a high order to be

efficient). Future work should consider and test the employment of a Wavelet Filter as in [36].

Now that the ECG has been cleaned of baseline-wander, it is necessary to focus on the other noise

forms – power-line, EMG and motion artefacts. The first one is very easy to remove but EMG corrup-

tion and motion artefacts demand a more complex filtering procedure. For that reason two techniques

addressing all three have been developed.

Both these methods employ a FIR filter in their design. The filter’s low-cutoff frequency greatly

determines the effect of motion artefacts over the output signal; a higher low-cutoff means a cleaner

signal in relation to noise but it also means more ECG information is lost in the filtering process. To rid

the ECG of power-line noise, the high-cut frequency needs to be only under 50 Hz. Taking into account

the ‘white-noise’ behaviour of EMG noise this value can be reduced for greater effects.

Note that both filtering techniques will be tested on the hereby developed biometric algorithms. The

designed filters are as follows.

18

Figure 4.3: Figure illustrating the effect of the filter for EMG removal.

4.1 Method A

Method A employs a FIR band-filter of order 300 for a sampling rate of 1000 Hz and a bandwidth of

2→ 40 Hz. Other sampling rates will be used for other ECG databases, in which the filter order changes

accordingly. Despite the presence of ECG information below 2 Hz, the lower cut-off frequency could not

be reduced as baseline drift and motion artefacts would compromise the alignment between heartbeats

– this alignment is important for biometric classification over the employed algorithms.

After FIR filtering the ECG, the effect of EMG is still greatly visible and often heavily distorts the

signal. As such a simple moving-average filter proposed in [9] is applied to the signal. It acts in 28 ms

intervals with a first zero at about 35 Hz. This filter is highly efficient at removing EMG noise but causes

some unwanted distortion, such as R-peak height reduction, as a side effect (see Figure 4.3).

The aim of this filter is to take advantage of the presence of more ECG details so as to better

distinguish between subjects. Be referred to Figure 4.4 for a direct comparison of both methods A and

B.

Figure 4.4: Figure demonstrating the behaviour of filters in both Method A and B.

19

4.2 Method B

The FIR band-filter in Method B has identical order to the filter in Method A but its bandwidth consists of

5→ 20 Hz. Such a restricted bandwidth inevitably loses some valuable ECG information. However, this

choice is connected to its relatively good ability of reducing both EMG and motion artefact influence.

Here, it is hoped that the clarity (noise-wise) of the post-processed ECG helps to counter the loss of

information in what concerns subject identification. See Figure 4.4 for the contrast between applications

of methods A and B.

20

Chapter 5

Segmentation

Segmentation is the process in which a heartbeat is detected and isolated from the remainder of the

ECG signal. It is employed in the vast majority of ECG biometric systems [35] whether fiducial or not.

In order to perform ECG segmentation there are several algorithms whose main aim is to detect the

ECG signal’s R-peaks (see Figure 2.1). The current study considers 3 of them, the algorithms developed

by Christov [9], Hamilton [22], Manikandan and Soman [33]. These techniques were decided not to be

described in Chapter 3 for the sake of better understanding the analysis carried out for segmentation.

They are thus outlined in Sections 5.1, 5.2 and 5.3 respectively. Section 5.4 details a post-processing

phase designed to fit all these techniques – the effective segmentation procedure after having the heart-

beats at hand. All the analysed methods are compared in terms of performance in Section 5.5. Lastly,

Section 5.6 explains a heartbeat dependent heuristic filter whose purpose is to eliminate contact loss

noise.

5.1 Method developed by Christov

The algorithm in [9] consists of a transformation phase (see 5.1.1) – where the input ECG is transformed

so as to emphasize signal peaks – and an R-peak detection phase (see 5.1.2) – conditioned by an

adaptive threshold. Both steps are presented in further detail below.

5.1.1 Transformation

• Generate a multi-lead signal with L leads resorting to

Y (i) =1

L

L∑j=1

|Xj(i+ 1)−Xj(i− 1)| (5.1)

where Xj(i) is the amplitude value of the sample i in lead j, and Y (i) is the current complex

lead. The present study employs single-lead segmentation over lead I and therefore equation 5.1

becomes a differentiation as in

21

Y (i) = |Xj(i+ 1)−Xj(i− 1)| (5.2)

• Moving averaging of the differentiated signal Y in 40 ms intervals→ a filter with first zero at about

25 Hz. It suppresses the noise magnified by the differentiation procedure used in the synthesis of

Y .

5.1.2 Adaptive Threshold

The above-generated signal is then iterated sample-by-sample and its values are compared to an adap-

tive threshold – MFR. This variable consists of three parts, a steep-slope threshold – M – an integrating

threshold for high frequency signal components – F – and a beat expectation threshold – R, with MFR

=M+F +R. All three components are updated sample-by-sample according to the rules below. Lastly,

an R-peak is detected if Y (i) ≥ MFR.

Adaptive steep-slope threshold – M

• The initialization in [9] was modified so as to include two modules – a standard one identical to

the one in [9] and one fitted for hand-recorded signals. ECGs recorded from finger electrodes

are extremely susceptible to contact loss artefacts (see Chapter 8 Section 8.1 for information on

employed databases). These artefacts are at times difficult to remove automatically and if present

in the start of the signal, completely ruin the work of this algorithm. With this said, M starts

as M = Mth × Ymax for the standard method and M = Mth × Y buffermed for the hand-fitted

method. Ymax represents signal Y ’s maximum peak value in Y ’s first 5 s interval while Y buffermed

consists of the median of a buffer of peaks (second by second) over that same interval. Mth is a

constant whose value was fine tuned to 0.58 from the original work’s 0.6. A buffer with 5 steep-slope

threshold values is pre-set as

MM = [M1M2M3M4M5]

where Mi =M for i = 1, 2, . . . , 5

• No detection is allowed 200 ms after the current one. In the interval [0, 200] ms after the previous

QRS detection, a new value Mnew is calculated, every sample, as

Mnew =Mth ×maxY (0 : 200)

where 0 : 200 represents the above mentioned interval. The estimated Mnew value can become

quite high if steep slope premature ventricular contraction or an artefact appeared. Therefore it is

limited to Mnew = 1.1Mk if Mnew > 1.5Mk, where k represents the oldest component of MM . The

MM buffer is updated excluding the former Mk, and setting it as Mk = Mnew. M is calculated as

22

an average value of MM . Note that experiments show that this mechanism is still not enough to

prevent the corruption of the algorithm via artefact presence.

• M is linearly decreased in an interval of 200 to 1200 ms following the last QRS detection, reaching

60% of its last updated value at 1200 ms;

• After 1200 ms M remains unchanged.

Adaptive integrating threshold – F

The integrating threshold F is intended to raise the combined threshold, if electromyographic noise is

accompanying the ECG, thus protecting the algorithm against ‘erroneous beat detection’.

• F is initially given by 1350

∑349i=0 Y (i)

• Every sample refreshes F according to

F = F +1

FK(maxYlate −maxYearly) (5.3)

where Ylate = Y (0 : 50) ms within the last 350 ms interval and Yearly = Y (300 : 350) ms of the

same 350 ms interval. FK is a constant whose value is in accordance with [9] and equal to 150.

Figure 5.1: Execution example of the algorithm developed by Christov.

Adaptive beat expectation threshold – R

The beat expectation threshold R is intended to deal with heartbeats of normal amplitude followed by

a beat with very small amplitude. This can be observed, for example, in cases of electrode artefacts.

23

Contrary to the integrating threshold protecting against erroneous QRS detection, R finds odd but true

R peaks reinforcing the detection mechanism.

A buffer with the 5 last RR interval duration (in samples) is updated at any new QRS detection. Rm

is the mean value of that buffer.

• R = 0 V in the interval from the last detected QRS to 23Rm;

• In the interval QRS + 23Rm to QRS +Rm, R decreases 1.4 times slower than the decrease of M in

the 200→ 1200 ms interval.

• After QRS +Rm the decrease of R is stopped.

See Figure 5.1 for an execution example highlighting the evolution of the three adaptive thresholds

over the transformed signal, as well as the end result over the original ECG.

5.2 Method developed by Hamilton

In this technique, beat detection is also carried out over a transformed signal, which is obtained accord-

ing to the diagram in Figure 5.2. Additionally, its basic beat detection rules (represented by the last block

in Figure 5.2) are enumerated below.

Figure 5.2: Transformation process applied to the ECG signal so as to perform beat detection.

1. Ignore all peaks that precede or follow larger peaks by less than 200 ms.

2. If the peak occurred within 360 ms of a previous detection and had a maximum slope less than

0.7×the maximum slope of the previous detection assume it is a T-wave.

3. If the peak is larger than the detection threshold – DT – call it a QRS complex, otherwise call it

noise.

4. If an interval equal to 1.5 times the average R-to-R interval has elapsed since the most recent

detection, check for a peak larger than DT2 within that interval. If the peak followed the preceding

detection by at least 360 ms then classify that peak as a QRS complex.

This detection threshold – DT mentioned in principles 3 and 4 – resorts to the following data structures:

QRS-peak buffer

Stores the 8 most recent R-peak values. Its entries are used in the detection threshold – DT

calculation. It is initialized with the highest-valued peaks in one second intervals for 8 seconds;

24

Noise-peak buffer

Behaves like the previous structure but stores the 8 most recent noise-peak values instead. It is,

however, initialized at 0;

RR-interval buffer

Stores the 8 most recent interval between R-peaks. Initialized at 1 s.

With this said:

DT = Noise Peak Buffmed

+ TH(QRS Peak Buffmed

− Noise Peak Buffmed) (5.4)

where Noise Peak Buffmed and QRS Peak Buffmed are the median of the Noise-peak buffer and

QRS-peak buffer arrays respectively. TH = 0.45 was empirically found most suitable for the database

used in testing.

5.3 Method developed by Manikandan & Soman

Just like the previous techniques, beat detection is carried out over a transformed signal which is ob-

tained according to the Diagram in Figure 5.3 taken from [33].

Stage 1 starts off band-pass filtering the signal in both directions, in order to reduce both the effects

of noise and of the P and T waves and enhance the QRS pulse. A 4th order Chebyshev filter with 6-18 Hz

of bandwidth was utilized. Note that the present work uses a type II filter rather than the cited work’s

type I, as the results were better with the former in the databases employed for testing. Afterwards, the

signal is differentiated as in 5.5, so as to provide information about the slope of the signal and further

reduce P and T wave influence.

d[n] = f [n+ 1]− f [n] (5.5)

The differentiated signal is then normalized in relation to the maximum derivative found in the signal

as in 5.6.

d[n] =d[n]

max |d[n]|(5.6)

In Stage 2, the signal in 5.6 is transformed according to 5.7 to get its Shannon energy envelope.

s[n] = −d2[n] log (d2[n]) (5.7)

Its output is bidirectionally smoothed with a rectangular window of 150 ms which is the maximum

expected length of a QRS pulse. The Shannon energy envelope is a very effective peak detection

25

Figure 5.3: Block diagram of the utilized R-peak detection technique.

mechanism. It shows small deviations between successive peaks and produces sharp local maxima

which indicate the instants of R-peaks.

Nonetheless the signal is further processed in Stage 3 by Hilbert-transforming it as in 5.8.

x(t) = IFT [X(f)] where X(f) =

jX(f), f < 0.

−jX(f), f > 0.

(5.8)

where IFT refers to the Inverse Fourier Transform and X(f) is the Fourier Transform of a real signal

x(t). The resulting output must also be rid of low-frequency drift. A moving average (MA) is performed

and subtracted to the signal obtained from 5.8. The MA filter length is 2.5 s which was empirically found

to be suitable. This final signal is called Yecg.

With Yecg, this technique proceeds to detect heart beats by finding the positive zero-crossings over it.

These points can then be directly mapped to found beats over the original ECG waveform. See Figure

5.4 for the whole execution process and beat finding procedure.

One additional step has been developed for the current study’s version of this technique. Some

ECG signals employed during testing were relatively corrupted by EMG noise, such that some parts of

the signal resembled small QRS waveforms if seen isolated (in comparison to the real ECG they were

clearly noise). Often, this technique erroneously detected them – although the slope of the zero-crossing

26

Figure 5.4: Beat finding process of the Algorithm developed by Manikandan & Soman.

corresponding to this detection was noticeably smaller than the others. As a result a proof-reading step

was developed and is outlined below. An example of its use is shown in Figure 5.5.

1. A buffer A with 5 entries was created so as to store the slopes of the detected zero-crossings. Its

entries are initialized at zero and updated as valid zero-crossings are detected.

2. If a detected zero-crossing has a slope inferior to a threshold ×Amed then it is discarded. Note that

Amed represents the median of the buffer A. The threshold was fine tuned to 0.243.

3. In theory this procedure can destroy the effectiveness of the algorithm, i.e. if the first detected beats

represent artefacts or very large QRS waveforms. In practice this did not occur. Nonetheless as a

safety mechanism this step can only discard 3 crossings in a row – should it happen, the buffer A

is reset.

5.4 Segmentation Post-Processing

The peaks detected on the transformed signals appropriately represent the found heartbeats. However

they do not map directly to the R-peaks in the ECG signal. A post-processing phase is necessary so as

to perform adequate R-peak detection.

Generally, these found peaks are simply mapped to the closest highest amplitude peak. In some

instances, this peak will not correspond to the R-peak (as evidenced in Figure 5.6). As such, a different

method was utilized.

The points found are here mapped to the exact R-peak location by searching within ±150 ms and

choosing the highest sloped peak of the two largest ones found in that interval. This process is executed

27

Figure 5.5: Execution example for the added step to the method developed by Manikandan & Soman.

only for lead I – the beats are then matched on the other eventually utilized leads by once again looking

for the highest sloped peak of the two largest ones, but this time in a smaller interval consisting of

±60 ms.

Note that for some subjects, the R-peak will appear as an inverted peak, i.e. with negative sign. For

that reason, the above mentioned process is executed twice, the first looking for positive peaks and the

second for negative ones. Afterwards, two medians over resulting peaks will be calculated – MedU and

MedD. MedU represents the median value of all R-peaks, when they are considered positive signed;

MedD is the median value of all R-peaks, when they are supposed negative signed. If |MedU | ≥ |MedD|,

detected peaks are assigned to positive peaks, otherwise negative peaks are used.

Note however, that some records will possess irregular heartbeats whose R-peak signal is symmetric

to the signal of the regular beats. The current R-peak mapping strategy only considers allocating all

found beats either to positive or to negative heartbeats. This strategy was taken because for some

ECGs the R-peak and S-peak share similar amplitudes. These result in confusion during mapping.

Furthermore, focusing on regular beats facilitates the biometric procedure.

Figure 5.6: Example of allocation process. Traditional methods of assignment to highest peak would notwork for this case.

28

Having the location of the R-peaks in hand for all leads, the segments are constructed simply by

taking the ECG window from −200 ms to 400 ms, where 0 corresponds to the R-peak. As a result, all

segments have a fixed length of 600 ms. See Figure 5.6 for an example of the allocation process here

described.

5.5 Algorithm Analysis

The described algorithms were tested on the MIT-BIH Arrhythmia database. It is the same database

employed by the authors of the analysed methods. Also, its heartbeats are appropriately labelled and

there already exists software (WFDB software package) which analyses found beats in relation to these

labels. This database will not be further used for the testing of hereby employed biometric systems due

to the presence of several pathological ECG records.

The MIT-BIH Arrhythmia database contains 48 half-hour long of two-lead ECG recordings sampled at

360 Hz with 11-bit resolution over a 10 mV range. The ECG records from this database include signals

with acceptable quality, sharp and tall P and T waves, negative QRS complex, small QRS complex, wider

QRS complex, muscle noise, baseline drift, sudden changes in QRS amplitudes, sudden changes in

QRS morphology, multiform premature ventricular contractions, long pauses and irregular heart rhythms.

This analysis is performed over lead-II signals and as such discards two records for which they were

not available, records 102 and 104. All the algorithms were implemented on a 2.4 GHz Intel core 2 Quad

CPU using Python version 2.7.6.

The analysis of the segmentation algorithms takes into account two factors. Consider a true-positive

(TP) as a correct R-peak detection by the algorithm, a false-negative (FN) when an R-peak is missed

and a false-positive (FP) when a noise spike is detected as an R-peak. To evaluate R-peak detection

per se, this study resorts to the QRS Sensitivity (SQRS)and QRS Positive Predictivity (P+QRS) factors.

They are respectively defined by Equations 5.9 and 5.10.

SQRS =TP

TP + FN(5.9)

P+QRS =

TP

TP + FP(5.10)

The detected R-peaks were automatically compared, in what concerns these factors, with the original

MIT-BIH annotated beats by employing especially designed software. Detected peaks deviating at most

30 ms from annotated beats are considered valid. Also, the validating process starts after the first 3 s

of the ECG signal. This thesis implemented its own versions of the above described segmentation

algorithms. Its results are shown in Table 5.1. Note that these algorithms were evaluated for signals

pre-processed by filters from both Method A and B (see Chapter 4).

The results in Table 5.1 are slightly worse (∼ 1.4 %) than those obtained for the authors of these

algorithms. Potential explanations for this are presented below.

29

Method in Method in Method inChristov [9] Hamilton [22] Manikandan and Soman [33]

Extra No ExtraFilter from SQRS [%] 98.35 98.35 98.48 98.82

Method A P+QRS [%] 99.95 99.93 99.95 99.83

Filter from SQRS [%] 98.02 98.34 98.48 98.82

Method B P+QRS [%] 99.96 99.93 99.95 99.83

Table 5.1: Table comparing all algorithms’ QRS sensitivities and positive predictivities. Label ‘Extra’refers to the extra step developed for the method in [33], explained in Section 5.3.

Algorithm in Christov [9]

Factors of SQRS = 99.69% and P+QRS = 99.66% were reported in [9]. This algorithm is based on

thresholds which were for this report slightly changed, so as to better adjust to other databases.

As a result, performance in the MIT-BIH arrhythmia database might be degraded. Also, the input

pre-processed signal in [9] is different and it is obtained from ECG signals originating from both

available leads whereas here a single-lead study focused on lead II was performed.

Algorithm in Hamilton [22]

Factors of SQRS = 99.74% and P+QRS = 99.81% were obtained in [22]. This algorithm is also

based on thresholds and its tuning to other databases hereby employed might have disrupted its

great behavior in the MIT-BIH arrhythmia database. The conditions over which the results in [22]

were obtained from this database are not explicit but it is assumed all records were considered for

testing. Furthermore, the input pre-processed signal employed in this thesis is also different from

the one in [22], again due to better adjustments over different databases.

Algorithm in Manikandan and Soman [33]

Factors of SQRS = 99.93% and P+QRS = 99.86% were reported in [33]. A different initial filter, a

Chebyshev filter of type II was utilized in the current study, due to better performance over other

databases. Also, the study in [33] discards beats originating from ventricular flutter in record 207.

Regardless of the better results shown for the original studies, the ones here obtained are still consid-

ered relatively good. Mostly so when considering its use on an ECG biometric system. Such a system

prefers detection of more representative heartbeats while at the same time ignoring the outliers. Note

that the MIT-BIH arrhythmia database shows several irregular heartbeats originating from ill patients,

whose biometric use is generally avoided. This effect is especially evident considering the lower SQRS

of the Method in [33] when the extra step here developed is employed. Odd, but valid beats are here

discarded. These beats are not valuable for biometrics. Moreover, note the improved P+QRS .

After analysing Table 5.1 it is evident that these algorithms, as hereby developed, are very close in

terms of performance. Performing segmentation over signals filtered as in Method A proved slightly more

advantageous, especially so for the algorithm in Christov [9]. Its effect over the technique in Manikandan

and Soman [33] is null due to an additional pre-processing phase demanded by this algorithm which is,

in terms of frequency band, more restrictive than both filtering methods.

30

Consider now the computational demand of these algorithms. Average Execution Times over all

records were measured for all the algorithms and presented in Table 5.2.

Method in Method in Method inChristov [9] Hamilton [22] Manikandan and Soman [33]

Average Execution Times [s] 20.56 9.33 186.98

Table 5.2: Table comparing all algorithms’ average execution times in seconds.

The decision over which algorithm best suits an ECG biometric system was taken based on the

algorithms’ average execution times as shown in Table 5.2. There are distinct differences between the

algorithms in this case, with the one in Hamilton [22] being fastest and Manikandan and Soman [33] the

slowest. Note also, that the method in Hamilton [22] appears to be the easiest to implement in a semi-

real-time basis as it focuses on time-spans of up to 360 ms. Contrastingly, the algorithm from Christov

[9] requires larger time-spans and the method in Manikandan and Soman [33] presents steps whose

adaptation into real-time would present a challenge (namely the computation of the Shannon energy

envelope as well as the required median filter).

With all these factors in mind, the algorithm in Hamilton [22] was deemed the most suitable for the

biometric systems developed for this study and is, from now on, employed in segmentation processes

executed throughout this report.

5.6 Elimination of Segments without ECG

When ECG signals are measured via finger electrodes, the user is not immobilized and not necessarily in

contact with the electrodes (be referred to Chapter 8 Section 8.1 for information on employed databases).

As a result, records of this type are highly subject to contact loss noise. Parts of the signal contaminated

by this noise type possess no ECG at all, thus it is important to remove them. Contact loss noise results

in very high amplitude peaks, often reaching the maximum possible recording value – see Chapter 4.

Figure 5.7: Flowchart outlining the developed heuristic proof-reading filter. In case the conditions inblack are not verified, the heartbeat is discarded as an outlier.

Standard methods of outlier detection will not effectively get rid of these segments. Not only do

31

some of them resemble high amplitude ECG heartbeats after filtering, but also if present in considerable

number, they will no longer represent outliers.

Note that the segmentation process detects these artefacts. For this reason a heuristic proof-reading

filter based on the detected heartbeats was implemented. Additionally, this filter was designed to remove

some of the offshoot segments as well, further facilitating outlier removal. Lastly, note that its employ-

ment will be restricted to finger-recorded ECGs. The heuristic filter then follows the listed rules – see

also the flowchart in Figure 5.7.

• The algorithm will attempt to look at the R-peak’s right and left neighbourhoods independently. If

more than 1 peak is found on either side whose amplitude rises over a threshold (tuned to 0.45×R-

peak), then the segment is discarded.

• The filter will then search for the second largest peak within the heartbeat. Second peaks above

a threshold (tuned to 0.7× the R-peak) and before the R-peak cause the disposal of the analysed

segment. On the other hand, if the peak appears after the R-peak it has to be distinguished from

high amplitude T-waves. As such, disposal only occurs when second peaks have a maximum

derivative larger than a threshold (set to 0.85× the R-peak’s maximum derivative).

• Lastly, a buffer BR storing the 5 most recently validated R-peaks is used in order to detect signal

artefacts. Segments whose R-peak has an amplitude larger than a threshold (fixed at 2.2×BRmed)

are disposed of.

When a segment “passes” these tests it is considered valid (see Figure 5.8 for 2 execution examples).

This technique is not perfect, as some relatively noisy heartbeats are not picked up by this method.

Nonetheless, it efficiently removes contact loss noise.

Figure 5.8: Two execution examples of the designed proof-reading algorithm over ECG signals filteredas in Method A.

32

Chapter 6

AC-LDA: Revisited

The current chapter outlines the changes carried out over the technique described in Chapter 3 Section

3.1. It consists of a modified version of the Auto-Correlation/Linear-Discriminant Analysis (AC/LDA)

developed by Agrafioti [1], Foteini Agrafioti and Hatzinakos [20]. A flowchart outlining the entire AC/LDA

procedure as hereby employed is shown in Figure 6.1.

Figure 6.1: Flowchart of the AC/LDA algorithm utilized in this study.

6.1 Windowing

The windowing method carried out in this report is similar to the one developed in [1, 20], for ECG signals

normally acquired during chest measurements. However, some of the ECG signals currently employed

were acquired through finger electrode recordings. These signals are highly susceptible to contact loss

and demand an additional pre-processing step.

The heuristic noise filter developed in Chapter 5 Section 5.6 was recruited so as to remove this noise.

33

This filter requires heartbeat segmentation which is performed as in the Method described in Chapter 5

Section 5.2. No feature extraction is then performed over the segments, as they are only employed to

aid in the windowing process.

Segments validated by the heuristic filter are then sequentially grouped into windows whose duration

is greater than 4.5 s. Since the segments may or may not overlap, the length of the resulting windows

varies. Note that the resulting windows will never overlap with one another.

Despite the varying length, they follow the basic pre-requisite for the proposed algorithm – presence

of multiple heartbeats – and are valid for feature extraction.

6.2 Feature Extraction

Feature extraction consists of two steps, Auto-Correlation of an ECG window and feature-space dimen-

sion reduction through Linear-Discriminant Analysis. These steps are identical to the ones outlined in

Chapter 3 Section 3.1.

The current work employs a distinct method for removing outliers. Just as was mentioned above,

the database employed for testing is very noisy. PT was not very effective at finding outliers and, as a

result, another procedure was developed instead. Also, outlier detection is here performed after Auto-

Correlation and applied to ECG windows.

6.2.1 Outlier Detection

A Mean Filter is hereby proposed as an outlier detector. It receives as input the AC windows for a given

subject and does the following (see Figure 6.2 for some execution examples):

Figure 6.2: Outlier Removal performed for the windows of 4 subjects. Green for accepted and red forrejected windows.

34

• Calculate the average window of the input subject.

• Measure the Euclidean distances of all subject windows to the mean.

• Find the distance dmax, below which at least a given percentage of windows (set to 90%) is present.

• Any AC windows whose distance to the mean window is larger or equal to dmax are discarded and

removed from the set of valid windows.

6.3 Classification

The employed classifier is a k-Nearest Neighbours (k-NN) whose working principles follow what is de-

fined in Chapter 3 Section 3.1, in what concerns authentication. Since this thesis takes an open-world

approach for identification, dk distances resulting from the k-NN classifier, in that mode, are compared

to a threshold tid. If these distances are smaller than tid, then they are validated for majority voting. Oth-

erwise, identification fails. In case of a tie in voting amongst the dk, the resulting ID is picked randomly.

Also, in the current work, the weight vector wn is defined with w0 = 1 while w3 = 0.6. Be reminded

that n = 4 and the weights wi decrease linearly from 0→ n.

35

Chapter 7

Dissimilarity Based Biometrics

Like any other biometric system, an ECG biometric system extracts features which will form the tem-

plates constituting its database as well as the current subject’s usable information. As mentioned in

the State-of-the-Art in 3 these features can be fiducial, hybrid or non-fiducial. Regardless, the possible

representations are endless and the most suitable is yet to be found – all the existing ones present pros

and cons in relation to one another.

It is on these grounds that the current study introduces the concept of a feature space based on a

Dissimilarity Representation for ECG biometrics by resorting to comparisons between signals as inspired

by the study in [16]. The current study explores this possibility by designing a multi-lead configuration

and comparing it with the single-lead version. Based on this representation, a new biometric system has

been designed whose flow of execution can be seen in Figure 7.1.

Figure 7.1: Block diagram of the proposed ECG biometric system. Note that lead C simply refers to anarbitrary lead. Also, the suspension points underline the possibility of using additional leads.

36

7.1 Notation

In order to better understand the methodology hereby proposed as well as the process of building the

dissimilarity representation, the notation that will be employed throughout the report is here presented.

The basic element behind the proposed method’s biometric system is the ECG heartbeat. Heart-

beats will be employed as templates so as to create the dissimilarity representations. With this in mind

we shall consider:

• A population of S existing subjects;

• A percentage p which is considered sufficient to represent the whole population variation. From

this percentage, a number of subjects Sp will be randomly chosen from S. For this work p is set to

15%, which was considered sufficient to represent the ECG variation within a large population.

• A set C of leads. For the scope of this work, C ’s elements can be either the single lead I, the 3

limb-leads I, II and III or all the 12 leads. The variable L is defined as L = |C|.

• Ni as the number of extracted heartbeats from subject i’s ECG, i = 1, . . . , S.

• N =S∑i=1

Ni as the number of all extracted heartbeats over all subjects.

• A heartbeat belonging to subject i is denoted by hlij with j = 1, . . . , Ni and l ∈ C. The variable hij

implies a heartbeat originating from lead I. Beats are represented by a 600 ms window which is

built having the R-peak as a reference at position 200 ms.

• A reference lead, R, which for the scope of this work will be lead I. Note that hRij will correspond

to signals taken from lead R.

• A Feature Space F, represented by an N×M matrix. For example, M consists of the 300 samples

present in a heartbeat, if the used sampling rate is 500 Hz.

Two metrics have been employed in various steps of this methodology – the euclidean distance and

the cosine similarity. For the sake of clarity they are respectively defined here in Equations 7.1 and 7.2.

D(hij , hik) =√(hij − hik)T (hij − hik) (7.1)

D(hij , hik) = 1−hij · hik‖hij‖‖hik‖

(7.2)

where ‖.‖ represents the euclidean norm.

7.2 Dissimilarity Representation

A Dissimilarity Representation is built on the fact that similarity between objects plays a crucial role

in class formation, i.e. a class is a set of similar objects [17]. A universal object similarity, however,

37

does not exist and always depends on the classification context, procedure and/or the domain of study.

Moreover, the presence of other classes will influence the degree to which an object should or should

not be assigned to a particular class. Humans intuitively perform this type of reasoning. For example, a

tall bush and a young tree may be directly linked to an identical class if observed isolated. However in

the presence of a fully grown tree of the same species, the young one will be immediately differentiated

from the bush and associated to its adult relative instead.

This thesis puts forward the notion of dissimilarity between ECG elements. Calculating a dissimilarity

is simply comparing elements, pairwise, according to some certain pre-defined rules [17]. Metrics, for

instance, fit this criteria. The present study will explore two different ones – euclidean distance and

cosine similarity.

A dissimilarity-based representation can be constructed from any type of elements which can be any

kind of feature array possible. Also, they can be built from as many comparisons between elements as

one wishes. Consequently, note that this process is easily extensible to using more than one-lead by

simply comparing one lead’s elements with another lead’s elements.

A dissimilarity space intends to take the original feature space F and output another one, FD, by

taking pairwise distances between ECG elements i, where i = 1, . . . , N . This study proposes two

different approaches for defining the dissimilarity space. Subsection 7.5 details their development.

The proposed biometric technique defines an ECG element as an ECG heartbeat. As such it de-

mands heartbeat segmentation prior to the computation of the dissimilarity representations. The system

hereby designed employs the Method developed by Hamilton [22] – see Chapter 5 Section 5.2 for its

detailed explanation. Once again, be referred to Figure 7.1 for the outlook of the current technique.

7.3 Template Generation

The problem of template selection may be posed as follows: given a set of N heartbeats, select K heart-

beat templates that “best” represent the variability as well as the typically observed patterns according

to a given similarity criterion [30].

Clustering methods are especially adequate for this task, and have already been used for template

selection in other modalities [46, 10, 29, 32]. In this thesis, the K-means algorithm was used, with K

empirically set to 5, the cluster’s centroids being used as templates [30]. Dissimilarity representations

are calculated by taking comparisons between input beats and these reference templates.

7.4 Outlier Detection

Outlier removal is performed as in [31]. It will only be executed for lead I – the beats here discarded will

be discarded for the other employed leads as well.

This algorithm receives as input the Ni heartbeats for subject i and begins by calculating the average

beat via:

38

havi =1

Ni

Ni∑j=1

hij , (7.3)

Heartbeats which stray away from the average beat are discarded according to the following procedure:

1. For each hij , compute its distance D(hij , havi ) to the mean waveform havi .

2. Compute the 1st and 2nd order statistical moments of the distancesD(·, havi ); µD(·,havi ) corresponds

to the mean value and σD(·,havi ) to the standard deviation.

3. Compute the median of the minimum and maximum values over all templates hij , denoted as

hmediminand hmedimax

respectively.

4. Verify the conditions below for every hij . If any is confirmed, then hij is discarded as an outlier:

(a) hijmin< 1.5× hmedimin

, hijminis the minimum value for beat hij ;

(b) hijmax> 1.5× hmedimax

, hijmaxis the maximum value for beat hij ;

(c) D(hij , havi ) > µD(hij

,havi ) + 0.5× σD(hij

,havi );

Note that D(hij , havi ) can refer to any distance metric. The present work utilizes the cosine distance in

7.2. In Figure 7.2 some execution examples are presented.

Figure 7.2: Outlier Removal performed for segments belonging to 4 subjects. Green for accepted andred for rejected windows.

7.5 Dissimilarity Computation

Dissimilarities can be calculated by measuring a distance between beats according to a metric. Two such

metrics have been explored in this study – euclidean distance and cosine similarity referred respectively

in Equations 7.1 and 7.2. In Chapter 8 their effects can be contrasted.

Two different dissimilarity extraction techniques are hereby proposed.

39

Subject based

This first and simplest approach computes the distance D(hRij , hlit) between each segment hRij and

the set of hlit template beats for each lead in set C. The process is repeated for all subjects S. It

is presented in pseudo-code below.

for each subject i in S do

for each beat j in Ni do

dsij = [ ]

for each lead l in C do

for each template t do

dsij .append(D(hRij , hlit))

Considering T templates per subject, the resulting dissimilarity representation is then an N× T *L

matrix composed by N dissimilarity arrays with T ∗ L components.

Inter-subject based

A second strategy computes the distance D(hRij , hlit) between each segment hRij and the set of hlit

template beats of the randomly chosen Sp subjects for each lead in set C. The following pseudo-

code outlines these steps.

for all S subjects do

for each beat j in N do

dsj = [ ]

for each lead l in C do

for each subject s in Sp do

for each template t do

dsj .append(D(hRij , hlst))

Once again, considering T templates per subject, the obtained dissimilarity representation is an

N× T*Sp*L matrix consisting of N dissimilarity arrays composed by T ∗ Sp ∗ L elements.

7.6 Classification

In classification, both authentication and identification follow the same principles for matching. A feature

space comprising dissimilarities supports a large variety of classifiers [15]. The current work employs

a k -Nearest Neighbours (k -NN) model applied to dissimilarity arrays, with k set to 3. This classifier

is identical to the one used for the algorithms described in Chapter 3, only the feature sets used are

different.

The steps taken during the classification process depend on the dissimilarity representation approach

taken in Sub-section 7.5. During enrolment, for a Subject based approach, the classifier will store all the

S users’ template heartbeats hlit for all C leads.

40

(a) Enrolment for a Subject based Dissimilar-ity Representation.

(b) Enrolment for an Inter-subject based Dissimi-larity Representation.

Figure 7.3: Enrolment for both approaches.

As for the Inter-subject method the algorithm saves only the template beats belonging to the Sp

subjects (which have been recorded prior to enrolment) over the C used leads.

Both methods store their N respective template dissimilarity arrays calculated for all the extracted

heartbeats hRij over S. Consider vt as one such template. See Figure 7.3 for an illustration of this mode.

As for authentication and identification, for each dissimilarity representation approach:

Subject based

Authentication or identification determine the set of retrieved hlit templates – in the former they

only originate from the requested subject while in the latter they are obtained from the S subjects.

Inter-subject based

In this case, the classifier calculates one single dissimilarity representation with respect to the

templates originating from the Sp determined subjects – regardless of the procedure taken.

The obtained dissimilarity arrays are called vid. Then, the template dissimilarity arrays vt chosen for

the determination of the 3-NN are either sourced from the input subject – for authentication – or obtained

from all subjects – for identification.

Distances between dissimilarity vectors D(vid, vt) are once again measured according to both eu-

clidean distance and cosine similarity referred in Equations 7.1 and 7.2. In Section 8 their effect in

classification is compared as well. See Figures 7.4 and 7.5 for the classification procedure for the Sub-

ject based and Inter-subject based dissimilarity approaches respectively.

After calculating all the D(vid, vt), the 3 smallest distances are taken as the 3-Nearest Neighbours.

They will then be compared with a threshold, thauth for authentication or thid for identification. This

threshold will validate the distances’ votes according to:

41

Figure 7.4: Block diagram illustrating the identification procedure for the Subject based dissimilarityapproach.

Figure 7.5: Block diagram illustrating the identification procedure for the Inter-subject based dissimilarityapproach.

42

dk <= th (7.4)

where dk is one of the 3 resulting distances and th is either thauth or thid according to the chosen

mode. Distances dk not respecting In-equation 7.4 are not considered in the final classification. If at

least 2 dk distances have been validated, then for authentication the input user is confirmed as valid. In

identification the most voted user corresponding to those dk is provided as output. In case of a tie in

voting amongst the dk, the resulting ID is picked randomly.

43

Chapter 8

Results

8.1 Databases

Two ECG databases were employed for testing the developed ECG biometric systems. They are de-

scribed below and from here on addressed according to the label here assigned.

Hospital de Santa Marta (HSM)

The HSM database consists of records from a local hospital, Hospital de Santa Marta, specialized

in cardiac issues. They were acquired during normal hospital operation, encompassing scheduled

appointments, emergency cases, and bedridden patients. This study focuses on signals originat-

ing from healthy individuals. All signals were acquired using Philips PageWriter Trim III devices,

following the standard 12-lead placement, with a sampling rate of 500 Hz and 16 bit resolution.

Each record is composed by ECGs from all 12 leads, with a duration of little more than 10 s. 832

records were then employed belonging to 618 subjects.

Cruz Vermelha Portuguesa (CVP) [13]

CVP’s ECG data was acquired at the fingers with dry Ag/AgCl electrodes. ECG acquisition was

performed using a custom two lead differential sensor design with virtual ground, found in [43].

Raw ECGs were recorded via a bioPLUX research, Bluetooth wireless biosignal acquisition unit;

the device was configured to employ a 12-bit resolution and 1 kHz sampling frequency. Signal

acquisition was made in two sessions separated by a 3-month interval, comprising a total of 63

subjects. This population is composed by 14 males and 49 females, with ages ranging between 18

and 50 years (20.68±2.83). The collected data is considered to represent normal healthy individuals

as none of the users reported health issues. In each session the subjects sat in a resting position

for 2 min, with two fingers, one from each hand, placed on each of the electrodes. The first session

will be called T1 & Sitting while the second T2 & Sitting.

Over both databases, the various results are compared through Receiver Operating Characteristic

(ROC) and corresponding Equal Error Rates (EER) for the authentication mode of operation while the

identification mode is characterized by its Identification Error (EID) as in:

44

EID =FID

TID + FID(8.1)

where FID and TID correspond to the number of incorrect and correct identifications respectively.

8.2 HSM Analysis

HSM database was only employed for testing the algorithms outlined in Chapter 3 Section 3.2 and in

Chapter 7. The technique proposed in [1] and outlined in Chapter 3 Section 3.1 was not considered

because HSM’s signals last only 10 s. Be reminded that this algorithm demands windows of at least 5 s.

All of the designed experiments were executed according to a simple random approach. The existing

valid heartbeats were randomly partitioned into two sets, a training set containing 75% of those beats

and a test set composed by the remaining 25%, whose beats are individually evaluated. This procedure

was repeated 10 times and the average EER and EID were calculated from all these 10 runs. The

minimum number of training heartbeats is 5, an amount demanded by the template generation block.

Some subjects did not satisfy this requirement and were thus not considered for testing. The employed

number of subjects then becomes 503.

The experiment carried out over the technique in [7] and outlined in Chapter 3 Section 3.2 is tagged

with the expression OI .

For the dissimilarity-based method in Chapter 7, several experimental set-ups were designed. The

effects of the metric choice over both dissimilarity computation in Chapter 7 Section 7.5 and classification

in Chapter 7 Section 7.6 were tested. The chosen metrics for both steps were the cosine similarity and

euclidean distance. To test all the possible metric combination, 4 different scenarios are possible. Given

the existence of two methods for dissimilarity computation, the number of experiments rises to 8 for

each lead configuration C. Experiments over all metrics were only carried out with C = [I, II, III]. Due

to time constraints, the set of metrics corresponding to the best results, for each dissimilarity method,

were employed for C = [I] and for C consisting of all 12 leads. The total number of dissimilarity-based

experiments is then 12. Tags are given to these experiments so as to facilitate the observation of results.

Subject based (U )

This approach is given the tag U . The employed metric scenario for a given experiment will be

described by two letters C or E whether the employed metric is a cosine similarity or euclidean

distance respectively. The first letter will correspond to the metric utilized in the dissimilarity com-

putation process while the second refers to the metric used in the classification procedure. The

utilized set C is shown subscript to the letter U . As a result, a tag of UI,II,III − CE pinpoints the

use of cosine similarity for dissimilarity calculation and the use of euclidean distance to get the

distances between dissimilarity arrays, for a set C = [I, II, III]. As a result, for approach U there

are:

• UI,II,III − CC

45

• UI,II,III − CE

• UI,II,III − EC

• UI,II,III − EE

• UI − CE

• UAll − CE

Inter-subject based (I)

The tag I was attributed to experiments following this approach. The notation associated to them

follows the same as in the previous method. As such:

• II,II,III − CC

• II,II,III − CE

• II,II,III − EC

• II,II,III − EE

• II − CC

• IAll − CC

Note that all these experiments utilized the filter in Method B for pre-processing. Table 8.1 sum-

marizes results obtained for all experiments in the form of averaged EERs and EIDs and respective

standard deviations.

EER [%] EID [%]OI 8.51± 0.30 12.04± 0.68

UI,II,III CC 21.55± 0.36 87.97± 2.21

CE 4.45± 0.13 8.89± 1.11

EC 30.25± 0.34 92.59± 1.78

EE 1.53± 0.09 23.46± 1.96

UI CE 4.45± 0.09 9.92± 0.76

UAll CE 4.26± 0.16 9.22± 1.29

II,II,III CC 2.46± 0.09 5.48± 0.33

CE 4.76± 0.24 6.90± 0.38

EC 10.75± 0.06 13.02± 0.68

EE 3.83± 0.11 19.74± 0.50

II CC 2.46± 0.09 5.37± 0.48

IAll CC 2.50± 0.06 5.32± 0.57

Table 8.1: All experiments’ EER & EID rates over the HSM database.

The ROC curve for the experiment UI,II,III−EE is shown in Figure 8.1(a). A ROC curve comparing

all the metrics over both approaches, for authentication mode, is shown in Figure 8.1(b). This plot only

shows experiments for set C = [I, II, III]. Additionally, all experiences for the subject and inter-subject

based approaches, are illustrated in the respective plots of Figures 8.2(a) and 8.2(b).

The results in Table 8.1 suggest the following observations.

46

(a) ROC curve and obtained EER for the experiment UI,II,III −EE.

(b) ROC curves comparing the various employed metrics.

Figure 8.1: ROC curves outlining obtained results for authentication over all the used metric combina-tions.

(a) ROC curves outlining obtained results for authentication,over the subject based dissimilarity approach.

(b) ROC curves outlining obtained results for authentication,over the inter-subject based dissimilarity approach.

Figure 8.2: ROC curves outlining obtained results for authentication over both dissimilarity approaches.

• Both single-lead approaches UI and II present better results than those obtained from OI . From

here, it is possible to conclude that a dissimilarity based representation can originate a more

effective biometric system.

• Dissimilarity-based classifiers, whose training computes dissimilarities based on a cosine metric,

proved much superior to their euclidean distance equivalents. This was expected due to the large

47

initial feature size of M = 300, as a cosine metric better outlines shape and signal orientation. An

euclidean metric focuses on distance which for an array of 300 samples is harder to measure. The

exception concerns authentication results for CE vs EE, in which the latter is better.

• For a Subject-based dissimilarity approach, experiments employing a cosine similarity metric dur-

ing classification present very high EER and EID indices. This was expectable, due to the very

small dissimilarity array size (T ∗ L = 15) rendering the cosine similarity metric unable to extract

viable information. Precisely the opposite can be said for an Inter-subject approach which presents

a dissimilarity array size of (T ∗ Sp ∗ L = 1125).

• Authentication EERs did not vary much by changing the lead-set C, over the dissimilarity-based

classifiers. EIDs changed slightly but neither consistently nor significantly. These deviations can be

attributed to the randomness behind the template selection process, outlined in Chapter 7 Section

7.3. Also, in case of identification, potential ties result in random ID choices, further triggering

variability.

• Lastly, the lowest EER for authentication originated from UI,II,III − EE. However it also presents

a high EID. The lowest EID is shown by II,II,III −CC which also gave a good and second lowest

EER value.

8.3 CVP Analysis

Given a large ECG signal duration of 2 min all three techniques explained in this report will be here

analysed. Two experiment settings will be carried out – within and between sessions. Both of them

follow a simple random approach and are outlined below.

Within Session

In this experiment, testing is only performed over signals from session T1 & Sitting. Valid heart-

beats or AC segments (depending on the method) are partitioned into two sets, training and testing,

each containing a random half of the ECG elements. This procedure was repeated 10 times, so as

to account for the variability in set formaton. Average EER and EID were calculated from all these

10 runs.

Between Session

For this set-up, training heartbeats or AC segments are taken from records in T1 & Sitting. Testing

is performed over T2 & Sitting’s elements. Here, only 1 run is performed, as no variability amongst

training and testing sets exist.

Furthermore, two filters were developed in Chapter 4, Method A and B. The outcome in classification

these two filters originate will be contrasted for tests over this database.

With this in mind, four experiments were run for each method, combining both sessions and both

filters.

48

Tags were given to all experiments. The Simple Heartbeat Comparison method is referred as O and

the AC/LDA as AC. For the dissimilarity-based classifiers, only the approach returning best results was

employed – I. As for the metric set, the reduction in the number of subjects in the database suggested

the change from metric set CC to CE. Furthermore, due to the greater number of heartbeats per

subject, 10 templates were used rather than 5, for the template generation block outlined in Chapter 7

Section 7.3. Thus, this method is referred by I − CE. Table 8.2 summarizes obtained results. Figures

8.3(a) and 8.3(b) show ROC curves for all authentication experiments carried over the filter in Method A.

Within Session Between SessionEER [%] EID [%] EER [%] EID [%]

Filtering O 3.97± 0.22 3.07± 0.53 14.31 39.27

Method A AC 9.65± 0.54 34.96± 1.57 17.43 63.77

I − CE 4.65± 0.34 15.61± 1.14 19.61 64.63

Filtering O 5.41± 0.16 5.11± 0.67 16.03 40.99

Method B AC 8.90± 0.62 36.56± 1.50 17.83 64.58

I − CE 5.56± 0.26 13.84± 1.63 18.76 58.58

Table 8.2: All experiments’ EER & EID rates over the CVP database. Between Session results show 0standard deviation due to only 1 executed run.

(a) ROC curves outlining all within-session authentication resultsover filter in Method A.

(b) ROC curves outlining all between-session authentication re-sults over filter in Method A. The curve for I −CE is incompletedue to a smaller number of points used during runs.

Figure 8.3: ROC curves outlining obtained results for authentication over all methods for filter in MethodA.

Obtained results suggest the following observations:

• Choosing the filter in Method A for Pre-processing substantially improves results among identical

experiences over the technique O. For the other methods, AC and I − CE, this distinction is not

noticeable – note the high standard-deviation rates for these modes.

49

• The SHC technique presents the lowest EER and EID rates underlining its better robustness to

both noise and intra-subject variability.

• The AC/LDA method seems to present highest EER and EID rates. This can be explained by

its windowing process coupled with the characteristics of this database. This technique demands

windows of 5 s. CVP’s records present a lot of noise and it will be difficult to have entire 5 s windows

devoid of noise. This not only causes some ECG records to have very few valid AC segments, but

also hinders the quality of the extracted featues.

• A dissimilarity based approach also outputs high EER and EID rates. Once again, CVP’s signals

are very noisy, which results in high intra-subject beat variability. Subject-representative templates

will try to account for this variability, causing potential heartbeat overlap with other subjects. Thus,

the distinction between subjects becomes blurred which raises EER and EID rates.

• Between Session set-ups returned substantially higher EER and EID than Within Session experi-

ments. This underlines one of the biggest problems of ECG biometrics – intra-individual variation

over time. See Figure 8.4 for illustrations of beats from the same subject over both sessions.

• Lastly, these signals originate much larger EER and EID than the HSM database. They were

acquired from a finger-electrode set-up which introduces much more noise, and severely hinders

classification. Note the much smaller subject number for this database in relation to HSM (62 vs

503).

(a) (b)

Figure 8.4: Plots demonstrating intra-subject temporal variation. Each plot corresponds to a subject,where 20 heartbeats are plotted, 10 correspond to T1 & Sitting and the other 10 to T2 & Sitting.

50

Chapter 9

Conclusions

In the present thesis, the field of ECG biometrics was studied. In Chapter 2 and 3, a brief motivation

and history of this science was introduced. Also, 3 introduces the two methods of reference for this

thesis – the algorithms in [7] and [1]. Chapter 4 dealt with the issue of noise in ECG and two filters were

developed to remove it.

Heartbeat segmentation was addressed in Chapter 5. The present thesis implemented versions

of three state-of-the-art segmentation techniques, analysed them according to a reference database,

and chose the most adequate for biometry. Furthermore, a heuristic filter was designed to eliminate

electrode contact loss.

Chapter 6 introduces modifications to the algorithm proposed by Agrafioti [1], which is outlined in 3

Section 3.1. In Chapter 7 a new biometric system based on dissimilarities was introduced. This and the

reference techniques are tested in Chapter 8 over two databases – one formed by ECGs acquired via

the standard 12-lead set-up and the other composed by finger-recorded ECGs.

9.1 Achievements

The present work began by introducing two filters for ECG noise removal. Filter from Method A was

demonstrated to be slightly more suited to biometric identification, at least for one utilized method.

A detailed study was performed over three state-of-the-art segmentation algorithms. Performance

was analysed over the MIT-BIH arrhythmia database and a more suitable algorithm for ECG biometrics

was adopted for the rest of this thesis. A new heuristic filter was also designed, which is effective at

eliminating contact loss noise from ECG signals.

A novel ECG biometric system was developed, based on dissimilarities between ECG heartbeats.

Two such systems were constructed, each based on a different dissimilarity calculation assumption.

Classification results were compared between these new systems and two reference biometric tech-

niques published in [7] and [1]. Results obtained determine this technique as a promising ECG biometric

system, with a lowest authentication EER of 1.53%, for a database with 503 subjects.

Furthermore, these dissimilarity based techniques can easily use information stemming from more

51

than one ECG lead. Thus, tests were executed with single, triple and all twelve lead configurations.

Results gleaned from these experiments show that, for this technique, a multi-lead strategy does not aid

in biometric classification.

Lastly, tests over these techniques were also carried out over ECGs obtained from finger recordings.

These present substantially more noise, when compared to records measured from the standard 12-lead

set-up. Results over these signals favour the use of the fiducial method in [7]. Nonetheless, as of now,

CVP’s recording strategy does not offer a good potential for accurate biometrics – note the low subject

number, 62 and high EER and EID rates. Moreover, experiments performed using inter-session ECGs

highlight one of the major problems behind ECG biometrics, which is intra-individual heartbeat variation

over time.

9.2 Future Work

There is potential to improve over most of the analysed areas.

For noise removal, literature [36, 38] shows that application of a wavelet filter is the most suited

strategy for removal of baseline noise. Such a filter should be studied and tested. Furthermore, it was

noted that the chosen technique for EMG removal introduces some distortion, especially over the R-

peaks. Different techniques for EMG de-noising ought to be addressed as well. Furthermore, distinction

between filters in Method A and B should be investigated by performing tests over other databases (such

as HSM).

In what concerns segmentation, focus should be given to R-peak mapping strategies. The strategy

developed in Chapter 5 Section 5.4 finds either positive peaks or negative peaks, exclusively. A more

suitable and flexible allocation technique, that is able to correctly detect positive or negative peaks,

should be developed.

Experiments over other databases should be carried out with the proposed dissimilarity-based bio-

metric systems, so as to validate this technique. Also, the results here shown are obtained by looking

at ID outputs over every test heartbeat, individually. Establishing a majority voting between a specific

number of heartbeats would no doubt improve results. Such a testing configuration should be carried

out in the future as well.

On another note, the convenience offered by measuring ECGs through fingers is immense. Improv-

ing the ECG acquisition set-up in relation to the one employed for CVP’s records is of high importance.

Also, testing should be performed over other finger-obtained ECG databases in order to find out if the

trend here shown for CVP is maintained.

Lastly, temporal variation among ECGs from the same subject remains a big hindrance, if not the

biggest, for ECG biometrics. This problem underlines the need to perform regular (probably monthly)

re-enrolment of every subject’s ECGs, in order to maintain biometric efficiency. Future work should focus

on this problem and look for the potential heartbeat features offering more stability over time.

52

References

[1] F. Agrafioti. ECG in Biometric Recognition: Time Dependency and Application Challenges. PhD

thesis, Toronto, Ont., Canada, Canada, 2011. AAINR93098.

[2] F. Agrafioti and D. Hatzinakos. Fusion of ECG sources for human identification. In Communications,

Control and Signal Processing, 2008. ISCCSP 2008. 3rd International Symposium on, pages 1542–

1547, March 2008. doi: 10.1109/ISCCSP.2008.4537472.

[3] F. Agrafioti and D. Hatzinakos. ECG based recognition using second order statistics. In Communi-

cation Networks and Services Research Conference, 2008. CNSR 2008. 6th Annual, pages 82–87,

May 2008. doi: 10.1109/CNSR.2008.38.

[4] L. Biel, O. Pettersson, L. Philipson, and P. Wide. ECG analysis: a new approach in human identifi-

cation. Instrumentation and Measurement, IEEE Transactions on, 50(3):808–812, Jun 2001. ISSN

0018-9456. doi: 10.1109/19.930458.

[5] R. Bolle and S. Pankanti. Biometrics, Personal Identification in Networked Society: Personal Iden-

tification in Networked Society. Kluwer Academic Publishers, Norwell, MA, USA, 1998. ISBN

0792383451.

[6] F. Canento, A. Fred, H. Silva, H. Gamboa, and A. Lourenco. Multimodal biosignal sensor data

handling for emotion recognition. In Sensors, 2011 IEEE, pages 647–650. IEEE, 2011.

[7] C. Carreiras, A. Lourenco, A. Fred, and R. Ferreira. ECG signals for biometric applications - are we

there yet? In 12th Int. Conf. on Informatics in Control, Automation and Robotics. Scitepress, 2014.

[8] A. D. C. Chan, M. Hamdy, A. Badre, and V. Badee. Wavelet distance measure for person identifi-

cation using electrocardiograms. Instrumentation and Measurement, IEEE Transactions on, 57(2):

248–253, Feb 2008. ISSN 0018-9456. doi: 10.1109/TIM.2007.909996.

[9] I. Christov. Real time electrocardiogram qrs detection using combined adaptive threshold. BioMed-

ical Engineering OnLine, 3(1):28, 2004. ISSN 1475-925X. doi: 10.1186/1475-925X-3-28. URL

http://www.biomedical-engineering-online.com/content/3/1/28.

[10] S. D. Connell and A. K. Jain. Template-based online character recognition. Pattern Recognition,

34:1–14, 1999.

53

http://www.biomedical-engineering-online.com/content/3/1/28

[11] M. Conover. Understanding Electrocardiography. Mosby, 2003. ISBN 9780323019057. URL

http://books.google.pt/books?id=pcPekl1Q1cAC.

[12] H. P. da Silva, A. Fred, A. Lourenco, and A. K. Jain. Finger ECG signal for user authentication:

Usability and performance. In Biometrics: Theory, Applications and Systems (BTAS), 2013 IEEE

Sixth International Conference on, pages 1–8. IEEE, 2013.

[13] H. P. Da Silva, A. Lourenco, A. Fred, N. Raposo, and M. Aires-de Sousa. Check your biosig-

nals here: A new dataset for off-the-person ECG biometrics. Computer methods and programs in

biomedicine, 113(2):503–514, 2014.

[14] P. de Chazal, M. O’Dwyer, and R. Reilly. Automatic classification of heartbeats using ECG mor-

phology and heartbeat interval features. Biomedical Engineering, IEEE Transactions on, 51(7):

1196–1206, July 2004. ISSN 0018-9294. doi: 10.1109/TBME.2004.827359.

[15] R. Duin, M. Loog, E. Pekalska, and D. Tax. Feature-based dissimilarity space classification. In

D. Unay, Z. Ctaltepe, and S. Aksoy, editors, Recognizing Patterns in Signals, Speech, Images

and Videos, volume 6388 of Lecture Notes in Computer Science, pages 46–55. Springer Berlin

Heidelberg, 2010. ISBN 978-3-642-17710-1. doi: 10.1007/978-3-642-17711-8\ 5.

[16] R. P. Duin and E. Pekalska. The dissimilarity representation for structural pattern recognition. In

C. San Martin and S.-W. Kim, editors, Progress in Pattern Recognition, Image Analysis, Com-

puter Vision, and Applications, volume 7042 of Lecture Notes in Computer Science, pages 1–24.

Springer Berlin Heidelberg, 2011. ISBN 978-3-642-25084-2. doi: 10.1007/978-3-642-25085-9\ 1.

URL http://dx.doi.org/10.1007/978-3-642-25085-9_1.

[17] R. P. Duin and E. Pekalska. The dissimilarity space: Bridging structural and statistical pattern

recognition. Pattern Recognition Letters, 33(7):826 – 832, 2012. ISSN 0167-8655. doi: http://dx.

doi.org/10.1016/j.patrec.2011.04.019. URL http://www.sciencedirect.com/science/article/

pii/S0167865511001322.

[18] S.-C. Fang and H.-L. Chan. Human identification by quantifying similarity and dissimilarity in elec-

trocardiogram phase space. Pattern Recognition, 42(9):1824 – 1831, 2009. ISSN 0031-3203. doi:

http://dx.doi.org/10.1016/j.patcog.2008.11.020. URL http://www.sciencedirect.com/science/

article/pii/S0031320308004913.

[19] S. Fatemian and D. Hatzinakos. A new ECG feature extractor for biometric recognition. In Digital

Signal Processing, 2009 16th International Conference on, pages 1–6, July 2009. doi: 10.1109/

ICDSP.2009.5201143.

[20] J. G. Foteini Agrafioti and D. Hatzinakos. Heart biometrics: Theory, methods and

applications. In D. J. Yang, editor, Biometrics. InTech, 2011. ISBN 978-953-

307-618-8. doi: 10.5772/18113. URL http://www.intechopen.com/books/biometrics/

heart-biometrics-theory-methods-and-applications.

54

http://books.google.pt/books?id=pcPekl1Q1cAC

http://dx.doi.org/10.1007/978-3-642-25085-9_1

http://www.sciencedirect.com/science/article/pii/S0167865511001322




http://www.intechopen.com/books/biometrics/heart-biometrics-theory-methods-and-applications

http://www.intechopen.com/books/biometrics/heart-biometrics-theory-methods-and-applications

[21] G. Friesen, T. Jannett, M. Jadallah, S. Yates, S. Quint, and H. Nagle. A comparison of the noise

sensitivity of nine qrs detection algorithms. Biomedical Engineering, IEEE Transactions on, 37(1):

85–98, Jan 1990. ISSN 0018-9294. doi: 10.1109/10.43620.

[22] P. Hamilton. Open source ECG analysis. In Computers in Cardiology, 2002, pages 101–104, Sept

2002.

[23] E. Harrell and L. Langton. Victims of identity theft, 2012. Dec 2013. URL http://www.bjs.gov/

index.cfm?ty=pbdetail&iid=4821.

[24] A. Jain, P. Flynn, and A. Ross. Handbook of Biometrics. Springer, 2007. ISBN 9780387710419.

URL http://books.google.pt/books?id=WfCowMOvpioC.

[25] A. K. Jain, A. Ross, and S. Prabhakar. An introduction to biometric recognition. IEEE Trans. on

Circuits and Systems for Video Technology, 14:4–20, 2004.

[26] M. Kaur, B. Singh, and Seema. Comparisons of different approaches for removal of baseline wan-

der from ECG signal. IJCA Proceedings on International Conference and workshop on Emerging

Trends in Technology (ICWET), (5):30–34, 2011. Full text available.

[27] M. Kyoso and A. Uchiyama. Development of an ECG identification system. In Engineering in

Medicine and Biology Society, 2001. Proceedings of the 23rd Annual International Conference of

the IEEE, volume 4, pages 3721–3723 vol.4, 2001. doi: 10.1109/IEMBS.2001.1019645.

[28] M. Li and S. Narayanan. Robust ECG biometrics by fusing temporal and cepstral information. In

Pattern Recognition (ICPR), 2010 20th International Conference on, pages 1326–1329, Aug 2010.

doi: 10.1109/ICPR.2010.330.

[29] N. Liu and Y. Wang. Template selection for on-line signature verification. In Proc. of the 19th Int.

Conf. on Pattern Recognition (ICPR), pages 1–4, dec. 2008. doi: 10.1109/ICPR.2008.4761537.

[30] A. Lourenco, C. Carreiras, H. Silva, and A. Fred. ECG biometrics: A template selection approach. In

Medical Measurements and Applications (MeMeA), 2014 IEEE International Symposium on, pages

1–6, June 2014. doi: 10.1109/MeMeA.2014.6860081.

[31] A. Lourenco, H. Silva, C. Carreiras, et al. Outlier detection in non-intrusive ECG biometric system.

In Image Analysis and Recognition, pages 43–52. Springer Berlin Heidelberg, 2013.

[32] A. Lumini and L. Nanni. A clustering method for automatic biometric template selection. Pattern

Recognition, 39(3):495–497, 2006. ISSN 0031-3203. doi: 10.1016/j.patcog.2005.11.004.

[33] M. Manikandan and K. Soman. A novel method for detecting r-peaks in electrocardiogram

(ECG) signal. Biomedical Signal Processing and Control, 7(2):118 – 128, 2012. ISSN 1746-

8094. doi: http://dx.doi.org/10.1016/j.bspc.2011.03.004. URL http://www.sciencedirect.com/

science/article/pii/S1746809411000292.

55

http://www.bjs.gov/index.cfm?ty=pbdetail & iid=4821

http://www.bjs.gov/index.cfm?ty=pbdetail & iid=4821

http://books.google.pt/books?id=WfCowMOvpioC



[34] I. Odinaka, P.-H. Lai, A. Kaplan, J. O’Sullivan, E. Sirevaag, S. Kristjansson, A. Sheffield, and J. W.

Rohrbaugh. ECG biometrics: A robust short-time frequency analysis. In Information Forensics and

Security (WIFS), 2010 IEEE International Workshop on, pages 1–6, Dec 2010. doi: 10.1109/WIFS.

2010.5711466.

[35] I. Odinaka, P.-H. Lai, A. Kaplan, J. O’Sullivan, E. Sirevaag, and J. Rohrbaugh. ECG biometric

recognition: A comparative analysis. Information Forensics and Security, IEEE Transactions on, 7

(6):1812–1824, Dec 2012. ISSN 1556-6013. doi: 10.1109/TIFS.2012.2215324.

[36] K. Park, K. Lee, and H. Yoon. Application of a wavelet adaptive filter to minimise distortion of

the st-segment. Medical and Biological Engineering and Computing, 36(5):581–586, 1998. ISSN

0140-0118. doi: 10.1007/BF02524427. URL http://dx.doi.org/10.1007/BF02524427.

[37] K. Plataniotis, D. Hatzinakos, and J. Lee. ECG biometric recognition without fiducial detection. In

Biometric Consortium Conference, 2006 Biometrics Symposium: Special Session on Research at

the, pages 1–6, Sept 2006. doi: 10.1109/BCC.2006.4341628.

[38] D. Rahul. Different techniques to remove baseline wander from ECG signal : A review. In VSRD-

IJEECE, 2012.

[39] W. Sethares and T. Staley. Periodicity transforms. Signal Processing, IEEE Transactions on, 47

(11):2953–2964, Nov 1999. ISSN 1053-587X. doi: 10.1109/78.796431.

[40] T. W. Shen, W. Tompkins, and Y. H. Hu. One-lead ECG for identity verification. In Engineering in

Medicine and Biology, 2002. 24th Annual Conference and the Annual Fall Meeting of the Biomedical

Engineering Society EMBS/BMES Conference, 2002. Proceedings of the Second Joint, volume 1,

pages 62–63 vol.1, 2002. doi: 10.1109/IEMBS.2002.1134388.

[41] T. W. J. Shen, Tsu-Wang and Y. H. Hu. Implementation of a one-lead ECG human identification

system on a normal population. Journal of Engineering and Computer Innovations, 2(1):12–21,

January 2011. ISSN 2141-6508.

[42] H. Silva, H. Gamboa, and A. Fred. One lead ECG based personal identification with feature sub-

space ensembles. In P. Perner, editor, Machine Learning and Data Mining in Pattern Recogni-

tion, volume 4571 of Lecture Notes in Computer Science, pages 770–783. Springer Berlin Hei-

delberg, 2007. ISBN 978-3-540-73498-7. doi: 10.1007/978-3-540-73499-4\ 58. URL http:

//dx.doi.org/10.1007/978-3-540-73499-4_58.

[43] H. Silva, A. Lourenco, R. Lourenco, P. Leite, D. Coutinho, and A. Fred. Study and evaluation of a

single differential sensor design based on electro-textile electrodes for ECG biometrics applications.

In Proc. IEEE Sensors Conf., 2011.

[44] S. Singla and A. Sharma. ECG as biometric in the automated world. International Journal of

Computer Science & Communication, 1:281–283, 2010.

56

http://dx.doi.org/10.1007/BF02524427

http://dx.doi.org/10.1007/978-3-540-73499-4_58

http://dx.doi.org/10.1007/978-3-540-73499-4_58

[45] Y.-T. Tsao, T.-W. Shen, T.-F. Ko, and T.-H. Lin. The morphology of the electrocardiogram for eeval-

uating ECG biometrics. In e-Health Networking, Application and Services, 2007 9th International

Conference on, pages 233–235, June 2007. doi: 10.1109/HEALTH.2007.381637.

[46] U. Uludag, A. Ross, and A. Jain. Biometric template selection and update: a case study in finger-

prints. Pattern Recognition, 37(7):1533–1542, 2004. ISSN 0031-3203. doi: 10.1016/j.patcog.2003.

11.012.

[47] Y. Wang, F. Agrafioti, D. Hatzinakos, and K. Plataniotis. Analysis of human electrocardiogram for

biometric recognition. EURASIP Journal on Advances in Signal Processing, 2008(1):148658, 2008.

ISSN 1687-6180. doi: 10.1155/2008/148658. URL http://asp.eurasipjournals.com/content/

2008/1/148658.

[48] G. Wubbeler, M. Stavridis, D. Kreiseler, R.-D. Bousseljot, and C. Elster. Verification of humans

using the electrocardiogram. Pattern Recognition Letters, 28(10):1172 – 1175, 2007. ISSN 0167-

8655. doi: http://dx.doi.org/10.1016/j.patrec.2007.01.014. URL http://www.sciencedirect.com/

science/article/pii/S0167865507000463.

[49] C. Ye, M. Coimbra, and B. Kumar. Investigation of human identification using two-lead electro-

cardiogram (ECG) signals. In Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth

IEEE International Conference on, pages 1–8, Sept 2010. doi: 10.1109/BTAS.2010.5634478.

57

http://asp.eurasipjournals.com/content/2008/1/148658

http://asp.eurasipjournals.com/content/2008/1/148658



ECG Biometrics - ULisboa · ECG Biometrics A Dissimilarity ... Resumo A Biometria por ......

Documents

Transcript of ECG Biometrics - ULisboa · ECG Biometrics A Dissimilarity ... Resumo A Biometria por ......