Characterization and Segmentation of Acoustic Swallowing ... · Characterization and Segmentation...

Characterization and Segmentation of Acoustic Swallowing Signals

Collected Concurrently with Dual-axis Accelerometry

by

Navid Zohouri Haghian

A thesis submitted in conformity with the requirementsfor the degree of Master of Health Science

Graduate Department of Institute of Biomaterials and Biomedical EngineeringUniversity of Toronto

c� Copyright 2014 by Navid Zohouri Haghian

Abstract

Characterization and Segmentation of Acoustic Swallowing Signals Collected Concurrently with

Dual-axis Accelerometry

Navid Zohouri Haghian

Master of Health Science

Graduate Department of Institute of Biomaterials and Biomedical Engineering

University of Toronto

2014

Dysphagia, a physiological impairment that involves di�culty swallowing, can arise due to a variety of

di↵ererent disease and injury processes. The current gold standard in assessment for dysphagia is the use

of videofluoroscopy (VFS). Due to the need for radiation exposure in VFS, of the development of valid,

noninvasive screening technologies is desirable. In this thesis, acoustics were explored as a potential

means of screening for dysphagia. The foundation of this work requires pinpointing swallow segments in

acoustic signals, which was done through the implementation of a novel acoustic swallow segmentation

algorithm. The application of this algorithm to data from 44 healthy participants swallowing water

samples resulted a sensitivity of 86%, 94% and 92% for detecting 10mL, 5mL and saliva swallows,

respectively. Moreover, this work analyzed the suitability of di↵erent signal features for characterizing

acoustic swallows and artifacts.

ii

Acknowledgements

I would like to dedicate this work to my mother, father and sister who provided the best possible support

for the continuation of my education. I would also like to thank my supervisor, Dr. Catriona M. Steele,

without whom this work would not be possible. One could not ask for better guidance and supervision.

Lastly, I would like to express my gratitude to all of my colleagues at the Swallowing Rehabilitation

Research Lab who supported my research towards the completion of my thesis.

iii

Contents

1 Introduction 1

1.1 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Deglutition Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2.1 Oral Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.2 Pharyngeal Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.3 Esophageal Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Dysphagia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Swallowing Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4.1 Videofluoroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4.2 Cervical Auscultation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4.3 Standardized Swallow Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4.4 Accelerometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Acoustic Swallowing Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Automatic Swallow Segmentation 7

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.2 Signal Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.3 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.4 Stationary Wavelet Transform Decomposition (STW) . . . . . . . . . . . . . . . . 11

2.2.5 Signal Envelope Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.6 Signal Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5 Discusssion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Swallow Signal Characterization 22

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.1 Overview of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.1 Signal Segment Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

iv

3.2.3 Fisher Feature Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.4 Logistic Regression - Binary Classification . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Acoustics and Swallow Screening 47

4.1 Acoustics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1.1 Acoustic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

v

List of Tables

2.1 Statistics comparing acoustic and nasal cannula segmentation . . . . . . . . . . . . . . . . 18

2.2 Statistics comparing acoustic and Accel-1 segmentation . . . . . . . . . . . . . . . . . . . 18

2.3 Statistics comparing acoustic and Accel-2 segmentation . . . . . . . . . . . . . . . . . . . 18

2.4 Performance comparisons between accelerometry and acoustic automatic segmentation . 19

3.1 Automatic Segmentation Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Descriptive statistics of features for samples 5 mL, 10 mL and saliva sample types . . . . 32

3.3 Fisher’s projection and feature weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.4 Sensitivity and Specificity of binary classification . . . . . . . . . . . . . . . . . . . . . . . 42

3.5 Algorithm Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1 Descriptive statistics of features from 5 mL, 10mL swallow samples . . . . . . . . . . . . . 49

4.2 Descriptive statistics of features from 5 mL, 10mL artifact samples . . . . . . . . . . . . . 50

vi

List of Figures

1.1 A) A liquid bolus is taken into the mouth and held between the tonque and hard palate.

B) The bolus is squeezed backwards in the mouth. C) The bolus enters the upper pharynx.

D) The bolus travels down through the pharynx and the upper esophageal sphincter. The

opening to the airway is closed during this phase. E) The bolus has entered the esophagus

and the upper esophageal sphincter closes behind the bolus tail. F) The pharynx returns

to a rest position and the airway opens to allow breathing to resume. . . . . . . . . . . . . 2

1.2 Accelerometry screening setup [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 Participant task flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Stationary Wavelet Transform Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 3 level approximation of Stationary Wavelet Decomposition . . . . . . . . . . . . . . . . . 12

2.4 3 level details of Stationary Wavelet Decomposition . . . . . . . . . . . . . . . . . . . . . . 13

2.5 original (top) and modified (bottom) signal envelope . . . . . . . . . . . . . . . . . . . . . 14

2.6 Envelope signal (blue) and Binary signal (red) superimposed . . . . . . . . . . . . . . . . 16

2.7 Segmentation signal superimposed on acoustic signal (top) and nasal cannula signal (bottom) 16

3.1 Signal Characterization Flow Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Sigmoid Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3 Amplitude Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 Amplitude Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.5 Amplitude Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.6 Amplitude Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.7 Dominant Frequency (Hz) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.8 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.9 Centroid Frequency (Hz) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.10 Average Wavelet Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.11 DWT Energy Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.12 Fractals Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.13 Wavelet Filtered ENtropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.14 Dominant Frequency - PSD (Hz) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.15 Signal Energy - 75 to 100 (Hz) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.16 Amplitude Distribution Width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.17 Wavelet Energy Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.18 Variance - Signal Squared . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

vii

3.19 Feature weights of fisher projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.20 Binary classification using logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.1 AKG 411 PP transducer frequency response and polarity . . . . . . . . . . . . . . . . . . 48

viii

Chapter 1

Introduction

1.1 Rationale

The work presented in this thesis builds on previous research in the field of automated detection and clas-

sification of swallowing events. It explores the realm of acoustics through analysis of acoustic swallowing

signals, which were collected concurrently with swallowing accelerometry signals. This chapter presents

the background knowledge required to understand the anatomy, physiology and pathology of swallow-

ing. It further explains dysphagia (swallowing impairment) and the incentive to develop non-invasive

screening tools to detect dysphagia.

1.2 Deglutition Overview

Ingestion of food or liquid is typically done through the oral cavity via the act of mastication and

deglutition (swallowing). Mastication consists of the mechanical breakdown of food via the use of the

teeth and the tongue. Once this process is complete, a bolus is formed through the mixture of the

fragmented food and saliva. This is followed by transfer of the bolus to the pharynx and swallowing.

The entire act requires the coordination of many di↵erent neuromuscular components, and involves both

voluntary and involuntary responses. Swallowing involves three main phases [2]:

• Oral phase

• Pharyngeal phase

• Esophageal phase

1

Chapter 1. Introduction 2

A B C

D E F

Figure 1.1: A) A liquid bolus is taken into the mouth and held between the tonque and hard palate. B)The bolus is squeezed backwards in the mouth. C) The bolus enters the upper pharynx. D) The bolustravels down through the pharynx and the upper esophageal sphincter. The opening to the airway isclosed during this phase. E) The bolus has entered the esophagus and the upper esophageal sphinctercloses behind the bolus tail. F) The pharynx returns to a rest position and the airway opens to allowbreathing to resume.

1.2.1 Oral Phase

The oral phase involves mastication (chewing) of food in order to mechanically break it down. Further-

more, it involves the addition of saliva in order to produce a bolus, which possesses a suitable consistency

for swallowing. Concurrently, the lips are sealed and the soft palate is lowered to prevent spillage of

bolus material into the pharynx (see Figures 1.1A and B). The tongue aids this action through forming

a cup shape against the hard palate. This stage occurs while the vocal cords remain open to allow

breathing via the nasal passages. Subsequent to mastication, the tongue pushes the bolus to the back

of the mouth via a squeezing movement [2] .


1.2.2 Pharyngeal Phase

During the pharyngeal phase, the bolus enters the pharynx (throat) as the soft palate moves up to seal

the entrance to the nasal cavity [2]. The contraction of many of the pharyngeal and suprahyoid muscles

causes the distance between the thyroid cartilage and the hyoid (hyolaryngeal complex) to shorten. This

phenomenon is referred to as the hyolaryngeal excursion or the hyoid burst. As the bolus enters the

pharynx and is carried down towards the esophagus, the epiglottis deflects to cover the entrance to the

airway and the larynx. Moreover, both the true and false vocal folds close to protect the airway from

aspiration (entry of foreign material into the airway). This involuntary process is followed by the opening

of the Upper Esophageal Sphincter (UES) to allow the entrance of the bolus into the esophagus. These

components are illustrated in Figures 1.1C and D.

1.2.3 Esophageal Phase

As the bolus enters the esophagus the nasal and oral cavities open to allow respiration to resume. As

the vocal folds open, a small period of exhalation occurs. Moreover, the larynx and the epiglottis return

to their original resting position. This process is followed by peristalsis, which is a continual contraction

of the esophageal muscles to propel the bolus down to the stomach.

1.3 Dysphagia

Dysphagia is a term that broadly describes di�culty swallowing; this disorder can occur as part of

many di↵erent disease and injury processes. A patient with dysphagia has di�culty in implementing

one or more of the phases described above. A main concern in dysphagia is Aspiration and Penetration.

Aspiration describes the entrance of foreign matter (food/liquids) into the airway below the level of the

true vocal folds. Penetration entails diminished severity, where foreign matter enters the supraglottic

space but does not pass below the true vocal folds [3],[2]. Aspiration can contribute to negative outcomes

including pneumonia. Dysphagia may also lead to other negative sequelae including malnutrition, weight

loss, immune system deficiencies and psychological burden. Etiologies of dysphagia include, but are not

limited to stroke, neurological disorders such as Multiple Sclerosis, trauma, head/neck surgeries, and

head/neck cancer [2]. Di↵erent clinical signs point to the specific component/process at fault during

a swallow. In addition to penetration and aspiration, residue is a serious concern in dysphagia, which

occurs when materials are left behind in the oral/pharyngeal cavities after a swallow.


1.4 Swallowing Assessment

1.4.1 Videofluoroscopy

The gold standard diagnostic tool for the assessment of swallowing is videofluoroscopy(VFS) [4]. During

this x-ray procedure a participant who is suspected of potential dysphagia is given a variety of di↵erent

food and liquid samples to eat/drink, with specific consistencies. The samples are typically mixed

with barium contrast, allowing visualization on the x-ray video. Multiple indicators are continuously

monitored on the x-ray by the speech-language pathologist (SLP) and/or other clinicians. This includes

observation for indicators of aspiration, penetration or residue.

VFS is of great value because it provides detailed, high resolution imaging of the anatomical structures

while providing continuous monitoring of anatomical landmarks and functions [4]. However, the VFS

has some disadvantages. The device is bulky and requires large hospital/clinical space allocation. It

cannot be used for regular or repeated bed-side assessment and requires appointments and long waiting

lists. Furthermore, although the x-ray exposure falls within a safe range, the patients are still exposed

to radiation and dose deposition.

1.4.2 Cervical Auscultation

Cervical Auscultation (CA) is the practice of listening to swallows via a device such as a stethoscope

in order to screen for signs of dysphagia. The main phase that is analyzed is the pharyngeal phase

[5]. The device is typically placed laterally on the neck, over the cricoid cartilage [6]. Distinguishing

features, which may aid in identifying dysphagia are 1) time of deglutition obtained from onset and

o↵set of swallow 2) delay based on the period of time from initiation of deglutition apnea to the first

acoustic burst; and post-swallow changes in breath sounds [5]. Moreover, dysphagic individuals tend to

have a higher number of swallows per bolus. Cervical auscultation is considered a subjective method

of swallowing assessment and currently lacks validation. Sources of possible error and artifact include

variability between dysphagia patients, inconsistent methods of sound amplification and silent aspiration.

1.4.3 Standardized Swallow Screening

Many best practice guidelines recommend the use of screening tests to identify signs of dysphagia early

in a patient’s healthcare episode [7]. The standardized protocols described for swallow screening include:


V-VST (volume-viscosity swallowing test)

During this examination, the participant is fed with three di↵erent sample viscosities in the following

sequence: nectar, water, and pudding. The bolus volumes also range from 5mL, 10 mL to 20 mL. The

samples are presented to the participant using a syringe. In addition to observing for signs of aspiration,

peripheral blood oxygen saturation is measured using pulse oximetry. The base line value is obtained

prior to the examination and is then compared to the values obtained after swallowing. Drops in oxygen

saturation are interpreted as a sign of possible aspiration [8],[9],[10].

TOR-BSST (Toronto Bedside Swallowing Screening Test)

Designed specifically for stroke patients, this screening test starts by evaluating the cognitive state of the

participant, and performance of basic oral motor function and vocalization tasks. This is then followed

by feeding of 10 x 1 tsp. sips of water [9]. The test outcome is a simple binary fail or pass and stops at

the first failed item. A failed response flags the subject for further assessment.

Yale Swallow Protocol/3.02 Water Swallow Test

The main principle in this approach involves presentation of a large bolus size to stimulate potential

aspiration/penetration in suspected participants [11]. Potential signs of di�culty include a wet voice,

choking or coughing [12],[9].

1.4.4 Accelerometry

The use of accelerometry for swallow screening is based on cervical auscultation, which implies a distinc-

tion between healthy and unhealthy swallows on the basis of the sound and vibrations produced during

a swallow. The use of dual-axis accelerometry provides a systematic, non-invasive screening tool that

can be used at the bedside with minimal training. Figure 1.2 demonstrates the general setup for using

this technology.

Using this tool requires the attachment of the accelerometry sensor to the person’s neck in midline over

the cricoid cartilage. The signals are then collected while individual performs di↵erent tasks such as

drinking/eating di↵erent samples. As the method is still under development, many di↵erent papers

have been published on the use of di↵erent signal processing, pattern classification and machine learning

techniques towards improving performance [13].


Figure 1.2: Accelerometry screening setup [1]

1.5 Acoustic Swallowing Signals

The practice of cervical auscultation revolves around the analysis of swallowing acoustics for screening

patients for dysphagia. This concept has been under study and it is the principle idea behind the use

of accelerometry and acoustics. The work by Morinie’re, has suggested that the swallow sounds are

comprised of three main components, which include 1) the ascension of the larynx 2) opening of the

upper esophageal sphincter 3) the laryngeal release [14].

The laryngeal ascension causes sound production as the hyoid bone moves up, while the bolus is in

the oropharynx. The opening of the upper esophageal sphincter and the passage of the bolus through

the sphincter then produce a second sound component. Lastly, opening of the larynx and the pharynx

produces sound while the bolus is passing through the esophagus [14].

The supporting work builds the foundation for the evaluation of acoustics as a mean of non-invasive

screening of swallows. Prior to the analysis of swallow acoustics, the localization of acoustic swallow

segments within time (temporal domain) is requried. The work in chapter 2 demonstrates a signal

processing technique which segments swallow signals as a basis for further analysis of swallow acoustics.

Chapter 2

Automatic Swallow Segmentation

2.1 Introduction

Dysphagia is a term that describes discomfort and di�culty with the act of swallowing. Dysphagia

encompasses a large spectrum of patient populations and can arise due to factors such as age, trauma,

cancer, surgery, psychological and/or neurological conditions. The act of swallowing itself involves the

contraction of a series of both voluntary and involuntary muscles. Thus, damage to either the somatic

or autonomic nervous systems can result in dysphagia [3].

One of the primary concerns for people with dysphagia is aspiration (the entry of foreign matter into

the airway). This may happen during swallowing of solid/liquid food or saliva. Aspiration is a serious

concern, and may lead to severe coughing, choking, airway obstruction, or lower respiratory infections

(i.e., aspiration pneumonia) [3]. A second primary concern for those living with dysphagia is swallowing

e�ciency; many people with dysphagia collect food or liquid residues in the pockets of the pharynx or

require multiple swallows to move a single food bolus from the mouth to the esophagus. Malnutrition as

a result of swallowing ine�ciency can contribute to constant fatigue and other medical anomalies such

as weakening of the immune system. In addition to these concerns, dysphagia can negatively impact

one’s quality of life by restricting a person’s ability to engage in the many social rituals that involve

eating and drinking [3].

The current gold standard for evaluating swallowing is the videofluoroscopic swallowing study (VFS).

During this process, the patient is placed within an x-ray imaging system. However, rather than taking

a single x-ray image, a dynamic x-ray video is obtained while the participant is provided with di↵er-

ent samples to swallow. This provides a high-resolution display of the anatomy and physiology of the

7

Chapter 2. Automatic Swallow Segmentation 8

oro-pharyngeal region. All samples are barium coated to provide high contrast on the x-ray. The test

itself requires large fluoroscopy equipment, exposure to radiation and it is not available at the point

of care. Due to these constraints, the development of valid and reliable non-invasive approaches to

swallowing assessment is considered desirable. Methods that have been explored include the analysis of

parameters such as peripheral blood oxygen saturation (using pulse oximetry) [15], nasal airflow (using

nasal cannula), and either sounds or vibrations monitored from the neck (using either microphones or

accelerometers). Several recent studies explore the use of dual-axis accelerometers to measure neck vi-

brations during di↵erent tasks such as coughing and swallowing [16]. Advancements in the field of digital

signal processing and pattern recognition have permitted the extraction of valuable information from

these biomedical signals towards segmenting swallow portions and eventually distinguishing individuals

with either healthy or impaired swallowing.

The analysis of swallowing signals first requires the temporal localization of swallow segments (i.e.,

segmentation). The goal of the present study was to explore the utility of acoustic swallowing signals for

segmenting swallowing signals in comparison to dual-axis accelerometry signals. As the acoustic signal

is obtained in the time domain, proper segmentation serves as an important first step in ensuring the

characterization of the signal using the correct time segments. An automatic segmentation algorithm

was developed towards locating these segments on the basis of signal energy distribution in acoustic

swallowing signals. Nasal airflow signals contain information identifying the periods of swallowing apnea

(SA) that occurs during swallowing. This feature was used as the reference standard for identifying the

temporal location of swallows.

2.2 Methods

2.2.1 Participants

The study involved 44 participants (22 males and 22 females) with a mean age of 35 (standard deviation

of 13). Two participants were excluded as they did not fully complete the required tasks. Participants

were all healthy individuals who reported no symptoms of dysphagia or prior neurological dysfunction.

The local institutional research ethics committee approved the study. Each participant was formally

consented prior to data collection.


2.2.2 Signal Acquisition

The data collection protocol required the attachment of 4 sensors:

• A dual-axis accelerometer, attached to the neck in midline, right below the thyroid cartilage by a

licensed speech-language pathologist.

• A contact microphone (model AKG C411 PP), attached to the neck via double-sided tape, 1-2 cm

laterally to the right of the accelerometer.

• A nasal cannula attached to the nares to measure the breathing of the participant during each

task.

• A head set microphone to measure ambient sound in the room. This was used to validate sources

of artifacts picked up by contact microphone and to confirm any protocol deviations.

!

Task!i!

Repetition!1!

Repetition!2!

Repetition!3!

!i!=!i+1!

Figure 2.1: Participant task flowchart

The protocol was comprised of di↵erent tasks including: counting out loud from 1-5; coughing;

throat clearing; humming; breathing quietly; swallowing saliva;, drinking 5mL/10mL of water by cup;

and drinking water from a straw. For the purpose of developing the segmentation algorithm, the following

three tasks were chosen: 1) swallowing 10mL of water by cup; 2) swallowing 5mL of water by cup; 3)


swallowing saliva. The water samples were prepared before the data collection session in appropriate

portions and placed in front of the participant during the session.

The tasks were performed in a randomized sequence with each task performed twice for a total of 20

tasks. During each task, a total of three realizations/repetitions of that particular task were obtained

subsequent to a visual cue. A sample protocol schematic is presented in figure 2.1.

2.2.3 Pre-processing

Prior to the decomposition of the acoustic signal, the raw data were preprocessed for the purpose of

removing unwanted artifacts and noise. The data collection sessions required the use of an isolation

transformer as part of the equipment setup as a safety factor. As a result, it was necessary to remove a

dominant 60Hz frequency component in the raw data, which was accomplished using a notch filter with

a corner frequency, Fc = 60Hz.

The acoustic signals were subsequently low-pass filtered using an IIR filter. The filter was designed

with a corner frequency set to remove components above the threshold ⌧ , i.e., high frequencies contribut-

ing to the last 5% of the energy spectrum. In doing so, each signal was filtered with a unique corner

frequency, Fc

.

Es

(jw) =1

N

N�1X

i=0

|X(jw)|2 (2.1)

⌧ = (0.95)Es

(jw) = Es

(jwc

) (2.2)

Fc

=w

c

2⇡(2.3)

Here Es

(jw) is the signal energy and N is the length of the Fourier spectrum X(jw). The corner

frequency, Fc

, at the energy value ⌧ was then used to develop an IIR low-pass filter. Upon low-passing the

acoustic data, the signals were downsampled by a factor of 75. This resulted in reducing the sampling

frequency from the original 22.5kHz to 300Hz, which was well above the minimum Nyquist rate of

⇠ 200Hz. The low-pass filtering allowed for proper downsampling of the signal, to reduce processing

complexity and mitigation of potential signal aliasing. Lastly, each sample was normalized by dividing

the signal by the maximum value of its absolute value in order to give an amplitude range of -1 to 1 for

all samples.


2.2.4 Stationary Wavelet Transform Decomposition (STW)

The first step after signal pre-processing was the decomposition of the acoustic signals into di↵erent

energy levels. As the acoustic biomedical signals in this study were non-stationary, the use of the

frequency spectrum alone is not adequate. The Discrete Wavelet Transform (DWT) allows for multi-

resolution analysis in both time and frequency domains. However, the DWT possesses the property

of time/shift variance, which causes the loss of valuable time information, as each decomposition level

reduces the time resolution by a factor of two. Thus, the Stationary Wavelet Transform was utilized

instead, where the high-pass, H, and the low-pass, G, filters are zero-padded (upsampled) at each

decomposition level.

!

!!!!!!!!!!!!!!!!!!! !

ℋ!!

X[n]!

!""#$%!!

!!!

ℋ!!!!!

!""#$%!! ℋ!!!!!

!""#$%!!

!"#$%&!!

!"#$%&!!

!"#$%&!!

ℋ!/!! ! 2!! ℋ!!!/!!+1 !

Figure 2.2: Stationary Wavelet Transform Flowchart


y[n]approximations

=1X

n=�1x[n]G[k � n] (2.4)

y[n]details

=1X

n=�1x[n]H[k � n] (2.5)

The signals were each broken down using a 3 level SWT decomposition resulting in decomposition

matrix containing 6 rows of N sampled coe�cients. These were comprised of three approximations,

outputted from the low-pass filter and three details, outputted from the high-pass filter. Figure 2.2

provides a schematic of the SWT workflow. The mother wavelet used for the decomposition was the

Daubechies 6 wavelet. This kernel was chosen on its basis of morphological properties. It allowed the

capture of desirable components, such as the transient aspects of the swallows. The 3 levels were chosen

on the basis of the energy distribution of the signals. The second level decomposition proved to contain

the most valuable and useful information and was used for further analysis. Figure 2.3, 2.4 presents a

sample 3 level approximations and details wavelet decomposition respectively.

Figure 2.3: 3 level approximation of Stationary Wavelet Decomposition


Figure 2.4: 3 level details of Stationary Wavelet Decomposition

2.2.5 Signal Envelope Extraction

A signal envelope in the time domain is a very useful and informative representation of the energy

distribution. The envelope of the acoustic signal was determined using the logical AND (mathematical

product) of the envelopes obtained from the approximation and detail coe�cients of the 2nd decompo-

sition level described above. This was done to maintain similar features across signals, and to diminish

the magnitude of features that di↵ered.

The envelopes y[n]1,2, were obtained using a stepwise process, beginning with filtering each signal

using the Savitsky Golay filter, S. This is a polynomial based operation that fits a p-ordered polynomial

function to a dataset of length k. As the order of the polynomials increases, the more the output of the

filter contains higher frequency components. Following this logic, a 3rd ordered polynomial filter was

used in order to contour the overall shape of the signal. The 3rd ordered polynomial was chosen in order

to limit higher frequency components while maintaining lower frequency components in the obtained

envelope. This was executed using a window length of k = 401 samples. The window length was chosen

to optimize the e↵ect of the filter in obtaining a smooth envelope. The advantage of using this operator

is the provision of a real and flat frequency response. Based on the following, the absolute value of the

approximations and details were smoothed using the filter, normalized and multiplied together. The

closed form of theses steps can be seen in equations 2.6, 2.7 and 2.8.


The resulting envelope was then further processed via using a Non-linear Energy Operator (NEO).

The main goal of this step was to further accentuate potential swallows within the signal, as this operator

is very sensitive to transient portions [17]. Equation 2.9 presents the NEO in discrete form. Lastly, to

obtain the final signal envelope used for segmentation, the NEO signal was filtered using the smoothing

filter, while negative values were set to zero (equation 2.11). Figure 2.5 compares the original and final

envelope signal.

y[n]level

= |x[n]|⌦ S (2.6)

y[n]level

= y[n]level

� 1

N

NX

n=0

y[n]level

(2.7)

[n] = y[n]2 approximations

⇤ y[n]2 details

(2.8)

�[n] = 2[n]� [n+ 1] [n� 1] (2.9)

[n]final

= |�[n]|⌦ S (2.10)

[n]final

=

8><

>:

0 [n]final

< 0

[n]final

[n]final

� 0

9>=

>;(2.11)

Figure 2.5: original (top) and modified (bottom) signal envelope


2.2.6 Signal Segmentation

Segmentation of the acoustic signal requires a time-domain binary signal �, where a magnitude of 1

corresponds to a potential swallow portion and a magnitude of 0 corresponds to non-swallow portions.

Using the signal envelope, the binary segmentation signal was generated in the following manner:

a) A global threshold ↵ was obtained as the mean value of the envelope signal.

↵ =1

N

N�1X

n=0

[n]final

(2.12)

b) The maximum value of [n]final

was obtained. The past and future values of [m]final

were

recursively compared to ↵ to obtain 0[n]final

.

argmaxn

[n]final

= {n | [n] = maxn

[n]} (2.13)

C = [m± k]final

for k = 1, 2, ...(N � 1)�m (2.14)

0[n]final

=

8><

>:

0 C > ↵

[n]final

C ↵

9>=

>;(2.15)

Here the future and past samples of point m were compared to ↵ in order to obtain the signal

0[n]final

. In this recursive process, the values that no longer meet the condition of being greater

than the threshold are used to obtain the time indices that correspond to the pre and post swal-

lows. These indices are placed in matrix I, where row 1 contains pre-swallow and row 2 contains

post-swallow indices for N potential swallows.

Ii,j

=

2

64t1,1 t1,2 t1,N

t2,1 t2,2 t2,N

3

75 (2.16)

c) This process was repeated until maxn

[n] < ↵ . Figure 2.6 shows the binary signal superimposed

on the envelope signal. Furthermore, figure 2.7 demonstrates the binary segmentation signal (red)


superimposed on the acoustic signal (blue). For the envelope signal [n]final

, a binary signal �[n]

is obtained using matrix I:

Figure 2.6: Envelope signal (blue) and Binary signal (red) superimposed

0 5 10 15 20 25 30−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Participant number:1 −−T02

time − s

0 5 10 15 20 25 30−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

time − s

Figure 2.7: Segmentation signal superimposed on acoustic signal (top) and nasal cannula signal (bottom)


�[n] =

8><

>:

1 I1,j : I2,j

0 otherwise

9>=

>;for j = 1, 2, 3...N (2.17)

As apparent in the top signal in figure 2.7 there are other transient portions which were not segmented

by the algorithm. This is due to two processing steps, which include the application of the Savitzky

Golay filter and the Non-linear Energy Operator. The former step diminishes the magnitude of transients

which have very short durations, while the later emphasizes this phenomenon as a function of the energy

of the transient component.

2.3 Validation

The results of this signal segmentation algorithm were compared with the nasal airflow reference signals,

in which the temporal locations of swallowing apnea were identified by two trained SLPs. Each nasal

airflow signal was used to obtain a pre-apnea and post-apnea time index, which together represented

the total period of a potential swallow apnea event. In addition, the results of the acoustic signal

segmentation algorithm were compared to two automatic segmentation algorithms, accel-1 and accel-

2, applied to the concurrently collected accelerometry signals. Accel-1 and accel-2 were designed to

segment swallows from temporal accelerometry signals. Based on this process, comparison of the four

segmentation methods was performed by identifying the mid-point of the segmented portion, M , from

each signal and determining time di↵erences in the locations of this mid-point.

M =Swallow Onset+ Swallow Offset

2(2.18)

Furthermore, if the signal segment identified through the segmentation algorithm did not overlap by

at least 50% with the reference swallow apnea event, segmentation was considered to have failed. The

validation of the acoustic segmentation algorithm was quantified through a “sensitivity” parameter. The

sensitivity of the algorithm was obtained with reference to all the other sources such as the accelerometry

signal, the acoustic signal and the nasal cannula signal to validate swallow presence. Thus, the number

of true swallows (true positives, A), and the number of falsely detected artifacts (false negative, B) were

used to quantify sensitivity.

Sensitivity =A

A + B⇤ 100 (2.19)


Similar to sensitivity, specificity is a parameter, which quantifies the ability of the algorithm to

correctly identify non-swallow events. However, as true-negative events cannot be distinguished from

other non-swallowing artifacts in our data, this parameter was not used as a validation tool.

2.4 Results

Descriptive statistics for the time di↵erences in segment midpoint between the acoustic and accelerome-

try segmentation results and the nasal cannula reference signal are shown in tables 2.1, 2.2, 2.3 by task.

Table 2.1: Statistics comparing acoustic and nasal cannula segmentation

Sample Type MAirflow

� MAcoustic

95% Confidence Interval

10 mL 0.05± 0.375(s) 0.004 — 0.1045 mL 0.07± 0.381(s) 0.019 — 0.127Saliva �0.09± 0.327(s) �0.139 — �0.051

Table 2.2: Statistics comparing acoustic and Accel-1 segmentation

Sample Type MAccel�1 � M

Acoustic


10 mL �0.22± 0.382(s) �0.274 — �0.1675 mL �0.24± 0.391(s) �0.296 — �0.185Saliva �0.27± 0.343(s) �0.331 — �0.221

Table 2.3: Statistics comparing acoustic and Accel-2 segmentation

Sample Type MAccel�2 � M

Acoustic


10 mL �0.21± 0.354(s) �0.265 — �0.1655 mL �0.255± 0.385(s) �0.310 — �0.201Saliva �0.27± 0.329(s) �0.332 — �0.226

The performance of the acoustic segmentation algorithm was quantified through the sensitivity pa-

rameter. True swallows and acoustic artifacts were first manually distinguished via the use of all four

signals i.e. (contact acoustics, accelerometry, nasal cannula, ambient sound). Therefore, all four modali-

ties were used to evaluate each signal individually and classify it as a true swallow. Given a contradiction

between the modalities, the signals were individually analyzed and a decision was made on the basis

of majority compliance. These data were used as the reference for evaluating the performance of the

algorithm. Table 2.4 shows the di↵erent sensitivity values obtained for both acoustic and accelerometry

segmentation.


Table 2.4: Performance comparisons between accelerometry and acoustic automatic segmentation

Sample Type Accel-1 Sensitivity Acoustic Sensitivity

10 mL 88% 86%5 mL 92% 94%Saliva 71% 92%

2.5 Discusssion

The results demonstrate similar performance between the acoustic segmentation and the other accel-1/2

algorithms. The sensitivity values obtained for the acoustics are very similar for the 5mL and 10 mL

samples. However, acoustic segmentation out-performed the accel-1 algorithm by 21% for the saliva

swallow examples. This indicates a potential di↵erence in the source of the signal as saliva samples

display a more prominent acoustic signal during a swallow.

Based on the average values of the segment midpoints, M, the di↵erence between the two accel-

1 and acoustic segmentation suggests that the accel-1 window centers are located at an earlier time

index relative to the acoustic segment midpoints. This results in a negative time di↵erence between

the acoustic and accel-1 M values. On the other hand, the timing of segment midpoints between the

nasal cannula and acoustic signals was much smaller for all three tasks. The correspondence between

nasal cannula and acoustic segmentation is further emphasized through the analysis of the statistics in

table 2.1, demonstrating very close M values.

Given the di↵erence between transducing vibrations through an accelerometer and a microphone,

the frequency components of acoustic signals are expected to contain lower values than accelerometry

signals. This is due to the relationship between acceleration and displacement. While acceleration is the

second derivative of displacement, the mathematical operation on the signal results in the accentuation

of higher frequency components.

2.6 Conclusion

One of the crucial steps towards the advancement of automatic dysphagia diagnosis is the temporal local-

ization of di↵erent swallowing phenomena in swallowing signals. This work demonstrates a new method

of acoustic signal segmentation on the basis of signal energy distribution within transient components.

Acoustics provide potential benefits and novel information that is not redundant with accelerometry

signals. This was proven through the comparison of accelerometry and acoustic based segmentation

algorithms.


The segmentation of acoustic swallowing signals has proven comparable to accelerometry signals.

Furthermore, the developed automatic acoustic segmentation has outperformed the accelerometry-based

algorithms for saliva swallow detection. This method has proven to be e�cacious in automatically seg-

menting acoustic signals of 5mL, 10mL and saliva sample swallows. This paves the way for further

analysis of acoustic signals to evaluate their feasibility for use in classification algorithms for discrimi-

nating impaired versus healthy swallows.

Chapter 3

Swallow Signal Characterization

3.1 Introduction

The current practice of dysphagia screening involves limitations that provide the motivation to develop

more intelligent, non-invasive tools for bedside screening [18], [19]. Current e↵orts to develop such

technologies include the use of the accelerometry [20], [21]. Similar to accelerometry, the realm of

acoustics has demonstrated potential for use in swallow screening. In either case, the acquired signal

must be properly segmented. A viable technique for segmentation was described in chapter 2.

Given the current performance and the desire to improve the sensitivity of the segmentation algo-

rithm, a characterization module has been developed with the goal of making the segmentation algorithm

intelligent. This module would allow each segmented portion to be evaluated in order to ensure that only

true swallow portions have been detected. Thus any artifacts such as coughs, speech, heavy breathing

sounds and similar non-swallowing portions would not be incorrectly identified as swallows. This chapter

describes the characterization technique used to di↵erentiate real swallows from artifacts. The outcome

was compared to results which were manually categorized as “swallow” or “non-swallow” on the basis

of all four collected signals: acoustics, accelerometry, nasal cannula and ambient sound.

3.1.1 Overview of Algorithm

The decision of whether a signal segment is an actual swallow or an artifact is made using the following

machine learning algorithm. The segmented portions outputted from the algorithm in chapter 2 are first

fed into a feature extraction subroutine where di↵erent signal features are extracted. These features are

then collapsed into a smaller feature set using the Fisher discriminant analysis tool. This is followed by

22

Chapter 3. Swallow Signal Characterization 23

the application of a logistic regression model, which learns from a 40 sample training set. The output

is a binary classification of “Swallows” and “Non-swallows”. Figure 3.1 presents a flow of the algorithm

and the individual steps.

!

Segmented!Acoustic!Signal! Feature!Extraction! Fisher!Discriminant!

Analysis!

Logistic!Regression!Modeling!!

Binary!Classification!

True!Swallow! Artifact!

Figure 3.1: Signal Characterization Flow Chart

3.2 Methods

3.2.1 Signal Segment Extraction

Using the algorithm explained in chapter 2, the signal segmentation was implemented to obtain time

indices of potential swallow onsets and o↵sets. All of the segmented portions were extracted and com-

pared to the other signal sources, which included the acoustic signals, accelerometry, the nasal cannula

signal and the ambient sound. This was manually executed to obtain a gold standard, categorizing true

swallows and artifacts in order to validate the performance of the binary classifier. Table 3.1 shows the

number of true swallows and falsely detected artifacts for all three sample types. It must be noted that

these numbers do not represent the entire number of swallows as the algorithm might not pick up every

single swallow. Therefore, the addition of the true swallows and falsely detected artifacts do not add to


the same total for all three sample types.

Table 3.1: Automatic Segmentation Output

Sample Type Swallows Artifacts

5 mL 249 2310 mL 256 29Saliva 233 11

3.2.2 Feature Extraction

Due to large variability within the signals, representative features were extracted to aid di↵erentiation

between real swallows and artifacts. Features chosen were on the basis of the time, frequency and

time-frequency parameters that best emphasized distinct attributes within signal samples.

• The mean amplitude represents the average of amplitudes, it is obtained in the following manner:

µ =1

N

NX

n=1

x[n] (3.1)

• The variance of the signal quantifies the average deviation of samples from the mean value:

1

N

NX

n=1

(x[n]� µ)2 (3.2)

• Skewness is a measure of the asymmetry based on a distribution. The skewness represents if the

majority of the amplitudes are lower or higher in magnitude.

1N

PN

n=1(x[n]� µ)3

{ 1N

PN

n=1(x[n]� µ)2}3/2(3.3)

• Kurtosis is a way of quantifying the peakedness of the amplitude distribution, and it is quantified

in the following manner:1N

PN

n=1(x[n]� µ)4

{ 1N

PN

n=1(x[n]� µ)2}2(3.4)

• Dominant frequency corresponds to the frequency component in the Fourier Transform, which has

the highest weight. It is quantified in the following manner:

X(jw) =N�1X

i=0

x[n]e�jwn (3.5)


argmaxw

(w2) Fdominant

= w/2⇡ (3.6)

• Entropy is a measure of the signals randomness; it is quantified in the following manner [22]:

H(x[n]) =+1X

�1FT (x[n])log(FT (x[n])) (3.7)

• Centroid frequency is a way of quantifying the weighted frequency distribution in the frequency

domain, it is quantified in the following manner:

f =

PF

max

0 f |X(jw)|2P

F

max

0 |X(jw)|2(3.8)

• The average wavelet energy represents the energy distribution, in the scale domain, where a 3 level

continuous wavelet decomposition is implemented using the db6 mother wavelet.

y1,2,3(⌧, s) =1p|s|

Z 1

0x(t) ⇤{ t� ⌧

s}d(t) (3.9)

s =y1 + y2 + y3

3(3.10)

E =1X

n=1

|FT (s)|2 (3.11)

• Wavelet energy ratio builds on the previous feature. It takes the energy ratio between the 2nd de-

composed level and the original signal. This is obtained through first decomposing the signal using

a 2 level discrete wavelet decomposition, via a db6 mother wavelet. Chosen based on obtaining

useful morphologies from the signal and the db6 filter characteristics.

y1,2[n] = x[n]⌦H[n] =1X

k=�1x[n]H[n� k]

X1(jw) =1X

n=�1y1[n]e

�jwn E1 =1X

n=0

|X1(jw)|2

X2(jw) =1X

n=�1x[n]e�jwn E2 =

1X

n=0

|X2(jw)|2

Ratio =E1

E2

(3.12)


• The wavelet filtered fractal parameter looks at the fractals complexity feature of a signal, which

has been filtered through a wavelet decomposition. The parameter is useful as it removes noisy

components and evaluates the complexity of the signal. The filtering is done through a 3 level

stationary wavelet decomposition using the db6 mother wavelet. The 2nd energy level was used to

obtain the fractals feature in the following manner:

FD =log(L/a)

log(d/a)(3.13)

Here, L, is the sum of the successing time indices and d is distance between the data n = 1 and

the data with the largest distance [23]. Furthermore, as di↵erent time series exist in a variety of

di↵erent time scales, the basic time unit a is used to normalize the series [23].

• Wavelet filtered entropy is a way of quantifying the randomness of the signal, where the signal was

filtered using a stationary continuous wavelet transform. The decomposition was done using a db6

mother wavelet, breaking down the signal into 3 energy levels. The second energy level is used to

obtain signal entropy. This is obtained in the following manner:

y(t) =1p|s|

Z 1

0x(t) ⇤{ t� ⌧

s}d(t) (3.14)

H(y2) =+1X

�1y2(t)log(y2(t)) (3.15)

• Dominant frequency from Power Spectral Density (PSD). The power spectral density estimation

is derived from taking the Fourier transform of the autocorrelation signal. This allows for more

emphasis on the dominant signal components in the signal. It is obtained in the following manner:

Rxx

[n] =1X

�1x(⌧)x(⌧ � n) (3.16)

Sxx

[jw] =1X

�1R

xx

[n]e�jwn (3.17)

wmax

= argmaxw

(Sxx

[w]2) F 0dominant

= wmax

/2⇡ (3.18)

• Signal energy of top 25% frequency distribution. This is to quantify the distribution corresponding

to higher frequency components. It is obtained through the fourier transform of the signal:


X[jw] =1X

�1x[n]e�jwn

E = |X[ws

]|2

ws

= 2⇡f where f = (75 : 100Hz)

(3.19)

• Amplitude distribution width, W , is a way of quantifying the range of di↵erent amplitudes that

exist within the signal. It is obtained through taking the absolute value of the signal, filtering it

using the 8th order Savitzky-Golay filter S, and taking the width of the amplitude distribution

at half the maximum point of the distribution. The application of S has a normalizing e↵ect on

the signal, hence allowing for a normal distribution. This method is another way of quantifying

the organization/disorganization of the signal [24]. Both c1 and c2 are points on the amplitude

distribution, which are half the value of the maximum point in the distribution. Therefore, taking

the di↵erence between these two points gives a width at half the amplitude distribution.

y = x[n]⌦ S (3.20)

Y ⇠ N(µ,�2) c1,2 =max(Y )

2(3.21)

W = Y (c2)� Y (c1) (3.22)

• Continuous Wavelet Energy Ratio takes advantange of the filtering property of wavelet decompo-

sition. The signal is decomposed using a continuous stationwary wavelet transform via the db6

mother wavelet. The second level decomposition, y2 is chosen for analysis. y2 is then taken into

the frequency domain using the Fourier transform. The ratio of its energy between f = 50 : 100Hz

and f = 0 : 49Hz is taken as a feature. This represents the relative high frequency content in the

transient portions of sample signals. The Stationary Wavelet Transform in explained in detail in

subsection 2.2.4 of chapter 2.

y1,2,3(t) =1p|s|

Z 1

0x(t) ⇤{ t� ⌧

s}d(t) (3.23)

X(jw) =

Z 1

�1y2(t)e

�jwtdt (3.24)

E1 =

Zf

max

/2

0|X(jw)|2 (3.25)


E2 =

Zf

max

f

max

/2|X(jw)|2 (3.26)

Ratio =E2

E1(3.27)

• Variance of signal squared is obtained by multiplying the signal by itself in the time domain. This

emphasizes large transients portions, while diminishing low frequency portions of the signal. Next,

the variance of this new signal was obtained in the following manner:

y = x[n] ⇤ x[n] (3.28)

µ =1

N

N�1X

n=0

y[n] �2 =1

N

N�1X

n=0

(y[n]� µ)2 (3.29)

3.2.3 Fisher Feature Projection

The application of the Fisher’s Projection is to collapse all the obtained features into a few or single

quantity, which is the optimum representation of all features through a weighted projection on a line.

This allows for dimensionality reduction [25]. In cases where multiple features exists, a higher number

of features does not necessarily benefit the decision making process. In using the Fisher’s projection, we

aim to reduce the dimensionality without losing vital information. The purpose of this step is to prepare

for a binary logistic classification using features that provide high discriminatory characteristics.

The direction of the projection line is obtained through an optimization of the criterion function

J(v). This function is obtained through a set of training data. In our case, the criterion function was

optimized towards obtaining the weights, v, for a set of 16 di↵erent signal features.

J(v) =vTS

B

v

vTSw

v(3.30)

This function is derived through first obtaining the within class scatter matrix (equation 3.31) and

the between class scatter matrix (equation 3.32)[25]. These two matrices are obtained on the basis of

the two class means µ1 and µ2.

S1,2 =X

x✏D

i

(x� µi

)(x� µi

)T (3.31)

Sw

= S1 + S2 (3.32)


Therefore, the vector v, which maximizes the criterion function J(v) is obtained in the following

manner, where the � is a constant for an eigenvalue in equation 3.33. Thus, we can reduce the dimensions

of the d-dimensional samples.

S�1w

SB

v = �v (3.33)

v = S�1w

(µ1 � µ2) (3.34)

The actual magnitude of the vector v does not play a role in the Fisher’s projection. However, its

directionality provides the optimum weights for each feature. This projection aims to maximize the

di↵erence between the means of the two classes (µ1 and µ2) [25]. This is implemented in the following

manner, consisting of the dot product of the weight vector and the d-dimensional feature vector shown

in equation 3.35 [26].

y = vTx (3.35)

3.2.4 Logistic Regression - Binary Classification

The logistic regression model was applied as a binary classifier towards categorizing sample signals

as real swallows or falsely detected artifacts. This was done through quantifying the correlation of

dependent variables and independent variables through probabilistic measures on the basis of a binomial

distribution. The classifier outputs a 0 value, corresponding to an artifact, and a value of 1 corresponding

to an actual swallow.

As the ratio of real swallows samples to artifacts is very large, simple bootstrapping was applied to

the original dataset towards mitigating this imbalance. This was done to obtain a sample of n = 10000,

where each sample was replaced at each iteration. The large number of samples were obtained through

bootstrapping in order to mimic a uniform distribution of the original features. This is desirable to

maintain consistency through each run of the algorithm. Therefore, this step was implemented prior

to the fishers feature projection. Execution of the classifier requires the use of a training set, T . The

training set consisted of evenly divided set of 20000 samples (i.e. 10000 swallows and 10000 artifacts)

obtained through the aformentioned bootstrapping method. This training set was used to obtain the

classification boundaries.

Three main parameters were used for logistic regression classification. They consist of the Fisher’s


projection (obtained using equation 3.36), the variance of the projection (obtained using equation 3.37

) and the di↵erence between the variance and the training set’s variance (obtained using equation 3.39).

Where eµ is the projected mean.

y = vTT (3.36)

M1 = argmini

(y � eµi

)2 for i = 1, 2 (3.37)

�2i

=1

N

X

y2Class

i

(y � eµi

)2 for i = 1, 2 (3.38)

M2 = argmini

(�2i

� eµi

) (3.39)

The three main parameters y,M1 and M2 were subsequently used to obtain the coe�cients of the �

matrix. Obtaining the � matrix is an optimization problem of the “cost” function shown in equation

3.41. We use the following function, J(�) in equation 3.43, through an optimization process to obtain

the coe�cients.

J(�) =1

m

mX

n=1

Cost(h�

(Tn), y0) y0 2 (0, 1) (3.40)

Cost(h�

(Tn), y0) =

8><

>:

y0 = 1 �log(h�

(T ))

y0 = 0 �log(1� h�

(T ))

9>=

>;(3.41)

Cost(h�

(Tn), y0) = �y0log(h�

(T ))� (1� y)log(1� h�

(T )) (3.42)

J(�) =1

m

mX

n=0

y(n0)log(h

�

(T (n)) + (1� y(n0))log(1� h

�

(T (n))) (3.43)

The logistic regression model uses the sigmoid function as a basis of obtaining a value that falls

between 0 and 1. The sigmoid function is applied to our feature vector h�

(T ) using equation 3.45.

h�(T ) = g(�T y) (3.44)

g(z) =1

1 + e�z

g(�TT ) =1

1 + e��

T

T

(3.45)


The cost function is the penalty for wrongly identifying a sample. Therefore, it is desirable to

minimize this function. The minimization of the cost function revolves around the use of the gradient

descent. Using the obtained coe�cients, we obtain a probability value, which can be used to categorize

the samples into the two sets. Given the probability from the sigmoid function being greater or equal

to 0.5, we predict a value of 1, corresponding to a true swallow. Similarely, a probability value smaller

than 0.5 is predicted as a falsely detected artifact. This then converts a spectrume of probabilities into

a binary output. Equations 3.46 and 3.47 demonstrate this binary conversion, illustrated in figure 3.2.

−5 −4 −3 −2 −1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 3.2: Sigmoid Function

z = �TT (3.46)

�TT � 0 or h�

(T ) � 0.5 (3.47)

3.3 Results

The characterization of acoustic signals was implemented through a feature extraction protocol. Table

3.2 presents the descriptive statistics of the obtained features. The features were then fed through a

fisher’s projection subroutine. The obtained weights can been seen in figure 3.19 with the corresponding

table 3.3. Figure 3.3 to 3.13 are forrest plots of the features, which show the mean feature value with a

95% confidence interval for the three sample types and non-swallows.


Tab

le3.2:

Descriptive

statistics

offeaturesforsamples5mL,10

mLan

dsaliva

sample

types

5mL

10mL

Saliva

Non-swallows

Featu

res

Mean±SD

⇢95%

Mean±SD

⇢95%

Mean±SD

⇢95%

Mean±SD

⇢95%

Mean

-0.006

±0.08

0.002

-0.001

±0.021

0.002

-0.001

±0.003

0.002

0.0001

±0.003

0.002

Variance

0.09

±0.53

0.008

0.06

±0.059

0.002

0.06

±0.058

0.002

0.029±

0.021

0.01

Skewness

0.123±

0.873

0.014

0.152±

0.91

0.026

-0.008

±0.63

0.02

0.136±

0.819

0.404

Kurtosis

13.53±

13.12

0.22

13.69±

14.04

0.422

10.17±

10.34

0.342

10.14±

9.65

4.76

Dom

inan

tFrequ

ency

27.32±

16.79

0.28

26.41±

15.82

0.48

22.94±

11.00

0.362

21.91±

12.13

5.98

Entropy

0.819±

0.082

0.002

0.817±

0.062

0.002

0.817±

0.058

0.002

0.783±

0.078

0.038

CentroidFrequ

ency

52.51±

18.49

0.308

51.40±

17.71

0.532

53.04±

16.65

0.548

44.29±

21.07

10.404

Average

Wavelet

Energy

34.95±

41.04

0.684

31.92±

39.11

1.178

39.71±

41.96

1.384

41.88±

40.79

20.14

Wavelet

Energy

Ratio

0.052±

0.104

0.002

0.05

±0.043

0.002

0.041±

0.032

0.002

0.033±

0.035

0.016

Fractal

Dim

enstions

2.007±

0.321

0.006

1.98

±0.299

0.01

2.075±

0.315

0.01

2.072±

0.382

0.188

Wavelet

FilteredEntropy

0.767±

0.111

0.002

0.773±

0.104

0.004

0.791±

0.071

0.002

0.795±

0.065

0.032

Dom

inan

tFrequ

ency

-PSD

27.02±

16.86

0.282

26.39±

15.40

0.464

22.66±

10.98

0.362

21.31±

12.24

6.048

75-100Signal

Energy

2.518±

7.160

0.118

1.581±

2.122

0.064

1.04

±1.587

0.052

5.029±

15.48

7.646

Amplitude-Distribution

Width

0.103±

0.095

0.002

0.101±

0.051

0.002

0.111±

0.058

0.002

0.0849

±0.042

0.022

CW

TEnergy

Ratio

0.995±

0.810

0.012

0.913±

0.862

0.024

0.996±

0.611

0.02

1.502±

1.674

0.826

Variance

ofSignal

Squ

ared

0.006±

0.005

0.012

0.006±

0.006

0.0002

0.007±

0.006

0.002

0.006±

0.006

0.002


−0.025 −0.02 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02

10 mL

5 mL

Saliva

Non−swallow

Figure 3.3: Amplitude Mean

−0.05 0 0.05 0.1 0.15 0.2 0.25 0.3

10 mL

5 mL

Saliva

Non−swallow

Figure 3.4: Amplitude Variance


−0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6

10 mL

5 mL

Saliva

Non−swallow

Figure 3.5: Amplitude Skewness

4 6 8 10 12 14 16 18

10 mL

5 mL

Saliva

Non−swallow

Figure 3.6: Amplitude Kurtosis


14 16 18 20 22 24 26 28 30 32

10 mL

5 mL

Saliva

Non−swallow

Figure 3.7: Dominant Frequency (Hz)

0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82 0.83 0.84

10 mL

5 mL

Saliva

Non−swallow

Figure 3.8: Entropy


30 35 40 45 50 55 60

10 mL

5 mL

Saliva

Non−swallow

Figure 3.9: Centroid Frequency (Hz)

20 25 30 35 40 45 50 55 60 65

10 mL

5 mL

Saliva

Non−swallow

Figure 3.10: Average Wavelet Energy


0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6

10 mL

5 mL

Saliva

Non−swallow

Figure 3.11: DWT Energy Ratio

1.8 1.85 1.9 1.95 2 2.05 2.1 2.15 2.2 2.25 2.3

10 mL

5 mL

Saliva

Non−swallow

Figure 3.12: Fractals Dimension


0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82 0.83

10 mL

5 mL

Saliva

Non−swallow

Figure 3.13: Wavelet Filtered ENtropy

14 16 18 20 22 24 26 28 30 32

10 mL

5 mL

Saliva

Non−swallow

Figure 3.14: Dominant Frequency - PSD (Hz)


−4 −2 0 2 4 6 8 10 12 14

10 mL

5 mL

Saliva

Non−swallow

Figure 3.15: Signal Energy - 75 to 100 (Hz)

0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13

10 mL

5 mL

Saliva

Non−swallow

Figure 3.16: Amplitude Distribution Width


0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

10 mL

5 mL

Saliva

Non−swallow

Figure 3.17: Wavelet Energy Ratio

3 4 5 6 7 8 9 10x 10−3

10 mL

5 mL

Saliva

Non−swallow

Figure 3.18: Variance - Signal Squared


Table 3.3: Fisher’s projection and feature weights

Feature ID Feature Weight

A Mean -129.57B Variance 36.7C Skewness 0.0125D Kurtosis 1.03E Dominant Frequency 0.002F Entropy 0.0136G Centroid Frequency -0.59H Average Wavelet Energy -0.028I Wavelet Energy Ratio -17.56J Fractal Dimensions 1.702K Wavelet Filtered Entropy -1.285L Dominant Frequency from PSD -0.016M 75-100 Hz Signal Energy -0.035N Amplitude - Distribution Width 31.04O Continuous Wavelet Energy Ratio -1.07P Variance of squared signal -10.577

A B C D E F G H I J K L M N O A

−120

−100

−80

−60

−40

−20

0

20

Feature ID

Wei

ghts

Figure 3.19: Feature weights of fisher projections

The application of the binary classifier resulted in a 96.77% sensitivity and a 14.51% specificity. Table

3.4 shows the results of the binary classification. It can be seen that the algorithm is biased towards

classification of samples in the “True Swallow” category. This can further be quantified using equations

3.50, indicating how well the classifier identifies a sample as a true swallow or an artifact using positive


and negative predictive values.

Figure 3.20 demonstrates the binary classification of a set of 124 signal samples evenly distributed

between swallows (n = 1:62) and artifacts (n = 63:125). A value of 1 indicating classification as a “true

swallow” and a value of 0 indicating classification as an “artifact”.

0 20 40 60 80 100 120 1400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sample Number

Figure 3.20: Binary classification using logistic regression

Table 3.4: Sensitivity and Specificity of binary classification

Real Swallow Artifacts Total

Swallow Detected True Positive = 60 False Positive = 53 113

Artifact Detected False Negative = 2 True Negative = 9 11

Sensitivity =True Positive

True Positive+ False Negative⇤ 100 =

60

60 + 2⇤ 100 = 96.77% (3.48)

Specificity =True Negative

False Positive+ True Negative⇤ 100 =

9

53 + 9⇤ 100 = 14.51% (3.49)


PPV =True Positive

True Positive+ False Positive⇤ 100 =

60

60 + 54⇤ 100 = 53.1% (3.50)

NPV =True Negative

True Negative+ False Negative⇤ 100 =

9

9 + 2⇤ 100 = 82% (3.51)

Table 3.5: Algorithm Performance

Acoustic Segmentation Acoustic Segmentation + CharacterizationSensitivity (%) 90.49 96.77Specificity (%) 66 14.51

3.4 Discussion

While a large array of features was extracted from each sample, the fisher’s projection clearly suggests

that the majority of the features do not prove to be highly discriminatory between the two classes. This

is due to one class containing distinct attributes (true swallows), while the other exhibits an enormous

amount of variability. The work by Youmans and Stierwalt in [27] reports dominant frequencies for

acoustic swallowing signals in healthy individuals, ranging from 2-3 kHz. Our finding in section 3.3

shows a much lower dominant frequency, ranging from 22 to 27 Hz. The discrepancy could be explained

due to di↵erent sensor modalities used for data collection. While Youmans and Stierwalt [27] use an

accelerometry sensor, our data was transduced through a true acoustics transducer (contact microphone

model: AKG C411). As the contact microphone is not prone to picking up motion artifacts, it is less

sensitive to motion artifacts produced via non-swallowing tasks. As the source of these artifacts can be

anything such as breathing, movement and unintentional sounds such as speech or other physiological

acoustics, the concept of finding a specific discriminatory attribute from the artifacts is not feasable.

Forest plots of all extracted features, shown in Figures 3.3 to 3.18, were also used towards assessing the

suitability of each feature towards classification. These demonstrate the mean feature value for all sample

data and their respective 95% confidence interval. They were categorized on the basis of real swallows

for 5mL, 10mL, Saliva swallows and non-swallow artifacts. Based on the results in figure 3.19, the

discriminatory features that maximize the di↵erence between the “projection means”, eµ, are amplitude

variance and amplitude distribution width. This was further evaluated by running the algorithm to

evaluate the discriminatory power of di↵erent feature combinations.

Due to the large variability of artifacts and the overlap of their features with real swallow features,

the binary classification has a tendency towards over-detecting true swallows. Hence, the number of false


positives is larger than false negatives. The comparison of the segmentation performance, before and

after the addition of the binary classifier, reveals an overall improvement in the sensitivity. Comparing

the performance of segmentation and segmentation with classifier, the results were obtained through

comparison with the manually segmented samples. As seen in table 3.5, the results show an overall

sensitivity of 90.49% for segmentation and 96.77% sensitivity for segmentation with binary classification.

Similarly, the negative predictive value (NPV) was improved from 70.76% to 82%; the implications being

the enhancement of the algorithm towards identifying a sample as an artifact, given the sample being

an actual artifact. Contrary to this advancement, specificity of the algorithm was lowered from 66% to

14.51%. This denotes a process, which is skewed towards identifying samples as true swallows. Moreover,

the positive predictive value, which is the probability of identifying samples as swallow given it being a

true swallow, changed from an 88.54% to a 53.1%.

3.5 Conclusion

The addition of the logistic regression binary classifier to the previously designed segmentation subroutine

has yielded an overall performance advancement for true swallow detection. The segmentation algorithm

performs on the basis of energy distributions within the acoustic signals. Many of the candidate features

used for the classifier revolved around signal energy parameters. Moreover, due to the large variability of

the samples, di↵erent filtering methodologies such as discrete wavelet, continuous wavelet and Savitzky-

Golay filters were applied to remove noisy, undesirable components.

Furthermore, the 16 chosen features provided a large array of descriptive information from the set

of training data. However, based on the fisher projection, it was evident that only a few features were

discriminatory enough to aid in classification. Features with a relatively high projection weight stem

from quantifying signal complexity. Table 3.3 shows positive values for fractals dimensions, variance and

amplitude distribution width. These features quantify signal complexity in di↵erent ways.

The sensitivity of the the segmentation algorithm improved by 6.28% from an original 90.49% to

96.77%. The advanced ability to pick out true swallows was an incentive for the development of the

binary classifier. However, the added benefits of the classifier are accompanied with a loss in correctly

identifying artifacts. This can be seen via the dramatic drop of the specificity; from an original 66%

to 14%. These figures clearly describe a system, which is prone to falsely classifying artifacts as true

swallows. This work demonstrates the both the benefits and pitfalls of using machine learning towards

signal segmentation. Moreover, it has demonstrated the discriminatory power of di↵erent signal features

obtained in time, frequency and time-frequency.


Potential work contributing to a better classification of these two classes can be done through devel-

oping a better training set for the logistic regression step. This can be done through the use of intentional

artifact recordings such as coughs, throat clears, vocalization, quiet breathing and other potential arti-

facts. By having a better training set, a more e�cacious model can be developed towards obtaining a

better classifier.

Chapter 4

Acoustics and Swallow Screening

4.1 Acoustics

The current use of accelerometry, which possesses many similar attributes as our sensor, has proven

the feasibility of monitoring surface vibrations for screening. This work has evaluated the suitability

of acoustic signals towards the development of non-invasive tools and our analysis has shown that the

realm of acoustics definitely has potential for providing novel information. Moreover, the use of a contact

microphone as a transducer has demonstrated added benefit as it overcomes some of the deficiencies of

the accelerometer. These include a flat frequency response and neglection of participant movement.

Similarly, the acoustic signals have shown to be useful towards temporal localization of swallow

segements. Our work has demonstrated similar performance of swallow segmentation in comparison

to accelerometry signals for 5 mL and 10 mL sample types. Surprisingly, the acoustic segmentation

algorithm outperformed the accelerometry segmentation with a 21% sensitivity di↵erence on Saliva

samples.

Since the long term goal is to have a device that is used by the bed-side, it is important to be

able to reduce any potential recording of unwanted information. This may include other sounds in

the environment. As the accelerometry doesn’t record sound it does not have this problem. From the

acoustic side, the chosen transducer has a figure 8 polarity, which makes it bidirectionally sensitive.

Figure 4.1 demonstrates the frequency response and polarity of the transducer. Moreover, only sound

that is transmitted through the participant’s tissue will be picked up. This property permits the use of a

contact microphone in a noisy environment. Therefore, given the aforementioned attributes of acoustics,

it must be noted that the use of acoustics presents a great potential towards designing a standalone or

47

Chapter 4. Acoustics and Swallow Screening 48

hybrid accelerometry-acoustic screening tool.

Figure 4.1: AKG 411 PP transducer frequency response and polarity

4.1.1 Acoustic Features

The features used for this work were chosen on the basis of the characteristics of the acoustic signals.

In depth analysis of di↵erent samples in time and frequency domain led to the chosen list in table

3.3. The completion of our analysis revealed a short list of features, which successfully contributed

to discriminating swallows and artifacts. It must be noted that these features still possesses potential

for classification by other means. As the non-swallow class possesses low discriminatory characteristics

and a large variability, the discriminatory power of these features is only evaluated in the realm of

discriminating artifacts from true swallows.

The analysis of the features through descriptive statistics shown in table 3.2 demonstrates that the

top four features contributing to discriminating the two mentioned classes are 1) Amplitude variance

2) Amplitude distribution width 3) Fractals dimensions and 4) Amplitude kurtosis with a fisher’s pro-

jection weight of 36.7, 31.04, 1.7 and 1.3 respectively. The commonality between these features is the

quantification of signal complexity in di↵erent ways. Contrary to the original hypothesis, the use of

frequency and time-frequency features such as dominant frequency, centroid frequency and wavelets did

not outperform these features. Tables 4.1 and 4.2 provide detailed statistics of features for each sample

type and each class.


Tab

le4.1:

Descriptive

statistics

offeaturesfrom

5mL,10mLsw

allow

samples

5mL

10mL

Saliva

Features

Mean±

SD

⇢95%

Mean±

SD

⇢95%

Mean±

SD

⇢95%

Mean

-0.006

±0.080

�0.016—

0.004

-0.001

±0.021

�0.003—

0.002

-0.001

±0.022

�0.004—

0.001

Variance

0.091±

0.525

0.026—

0.156

0.062±

0.059

0.054—

0.069

0.066±

0.058

0.058—

0.073

Skewness

0.123±

0.873

0.014—

0.232

0.152±

0.906

0.041—

0.263

-0.008

±0.634

�0.090—

0.073

Kurtosis

13.5393±

13.123

11.909

—15.169

13.695

±14.047

11.971

—15.419

10.171

±10.337

8.843—

11.498

Dom

inatnFrequ

ency

27.311

±16.790

25.235

—29.406

26.409

±15.821

24.467

—28.352

22.935

±11.005

21.522

—24.348

Entropy

0.818983

±0.082

0.808—

0.829

0.817±

0.062

0.811—

0.826

0.816±

0.058

0.809—

0.824

CentroidFrequ

ency

52.508

±18.491

50.212

—54.806

51.403

±17.707

49.23—

53.576

53.045

±16.658

50.906

—55.185

Average

Wavelet

Energy

34.95±

41.045

29.852

—40.048

31.924

±39.109

27.123

—36.724

39.709

±41.966

34.321

—45.098

Wavelet

Energy

Ratio

0.052±

0.104

0.038—

0.064

0.051±

0.043

0.045—

0.055

0.041±

0.032

0.036—

0.045

FractalsDim

ension

s2.007±

0.321

1.967—

2.047

1.982±

0.299

1.946)

��

2.019

2.075±

0.315

2.035—

2.116

Wavelet

filtered

entrop

y0.767±

0.111

0.753—

0.781

0.773±

0.104

0.76

—0.786

0.791±

0.071

0.781—

0.799

Dom

inan

tFrequ

ency

-PSD

27.021

±16.861

24.927

—29.116

26.391

±15.401

24.501

—28.282

22.655

±10.978

21.246

—24.065

Signal

Energy

70-100

Hz

2.518±

7.160

1.628—

3.407

1.581±

2.122

1.321—

1.842

1.043±

1.587

0.839—

1.247

HalfDistribution

Width

0.103±

0.095

0.092—

0.115

0.102±

0.051

0.095—

0.108

0.111±

0.058

0.103—

0.119

DW

TEnergy

Ratio

0.995±

0.811

0.894—

1.096

0.913±

0.862

0.807—

1.019

0.997±

0.611

0.919—

1.075

Signal

Squ

ared

Variance

0.006±

0.005

0.005—

0.006

0.006±

0.006

0.005—

0.006

0.007±

0.006

0.006—

0.007


Tab

le4.2:

Descriptive

statistics

offeaturesfrom

5mL,10mLartifact

samples

5mL

10mL

Saliva

Features

Mean±

SD

⇢95%

Mean±

SD

⇢95%

Mean±

SD

⇢95%

Mean

0.0002

±0.001

�0.0002

—0.0007

�8.437⇥10

�05±

0.004

�0.002—

0.001

0.0003

±0.002

�0.001—

0.002

Variance

0.027±

0.021

0.019—

0.035

0.027±

0.021

0.019—

0.036

0.035±

0.021

0.023—

0.049

Skewness

0.391±

1.063

�0.043—

0.825

0.046±

0.479

�0.136—

0.228

-0.159

±0.797

�0.654—

0.334

Kurtosis

12.413

±12.294

7.388—

17.438

7.622±

5.898

5.378—

9.865

12.022

±9.716

6.001—

18.044

Dom

inatnFrequ

ency

20.425

±7.362

17.416

—23.434

22.681

±15.574

16.758

—28.604

22.962

±9.139

17.297

—28.627

Entropy

0.796±

0.072

0.766—

0.825

0.773±

0.084

0.742—

0.806

0.782±

0.073

0.736—

0.828

CentroidFrequ

ency

52.478

±15.479

46.152

—58.804

37.451

±19.511

30.031

—44.871

45.217

±27.8

27.986

—62.448

Average

Wavelet

Energy

40.856

±42.196

23.611

—58.102

38.425

±36.184

24.664

—52.186

53.155

±46.860

24.111

—82.199

Wavelet

Energy

Ratio

0.032±

0.023

0.023—

0.042

0.032±

0.043

0.016—

0.049

0.035±

0.027

0.018—

0.053

FractalsDim

ension

s2.039±

0.419

1.868—

2.211

2.118±

0.323

1.995—

2.242

2.017±

0.427

1.752—

2.282

Wavelet

filtered

entrop

y0.804±

0.062

0.778—

0.829

0.794±

0.062

0.771—

0.818

0.778±

0.072

0.734—

0.823

Dom

inan

tFrequ

ency

-PSD

20.267

±7.548

17.182

—23.352

21.765

±15.741

15.779

—27.752

22.3

±9.114

16.651

—27.949

Signal

Energy

70-100

Hz

12.458

±23.837

2.716—

22.201

0.677±

0.922

0.326—

1.028

0.972±

1.067

0.309—

1.634

HalfDistribution

Width

0.102±

0.032

0.088—

0.114

0.077±

0.046

0.059—

0.095

0.071±

0.037

0.047—

0.094

DW

TEnergy

Ratio

1.787±

2.374

0.817—

2.758

1.281±

1.042

0.885—

1.678

1.486±

1.054

0.832—

2.139

Signal

Squ

ared

Variance

0.006±

0.006

0.004—

0.009

0.006±

0.006

0.003—

0.008

0.009±

0.005

0.005—

0.012


As seen through the comparison of di↵erent sample types in table 4.1, the features for the 5 mL

and 10 mL tasks possess much closer values than the saliva task. This raises the possibility that a

portion of the liquid swallow sound arises from the liquid bolus itself, and is missing in the context of

dry (saliva) swallows. Moreover, it is hypothesized that the presence of a bolus contributes to additional

artifacts as dry swallows have the least number of artifacts and the lowest number of falsely detected

swallows.Therefore, it can be stated that saliva swallows do di↵er from liquid swallows.

4.2 Machine Learning

The use of a machine learning subroutines has shown an enhancement in our segmentation algorithm.

The classification of signals into “true swallow” or “artifacts” was needed in order to add intelligence to

the segmentation algorithm. This was required as the acoustic signals exhibit high levels of variability.

The machine learning algorithm using logistic regression classification, complimented with Fisher’s

discriminant analysis proved to perform well. However, the majority of the chosen features proved

insignificant in discriminating the two classes. As the characteristics of artifacts can have infinite vari-

ability, the performance of this algorithm towards distinguishing artifacts from true swallows was not

highly e�cacious.

4.3 Future Work

The developed method of Fisher’s projection and logistic regression classification is a good candidate

towards the classification of healthy and unhealthy participants. Future work can look into the com-

plete evaluation of the designed algorithm via the provision of two distinct classes. As only the ”true

swallow” class possesses distinct characteristics, while the artifact class possesses a larger variability in

its attributes, this algorithm’s full potential is not fully evaluated. Potential work can revolved around

healthy and unhealthy subjects, di↵erent sexed groups, di↵erent aged groups and di↵erent etiologies of

participants, classified through this protocol.

Moreover, analysis of intentional artifacts such as coughs, throat clears and speech/vocalizations

can contribute to a better classification between real swallows and artifacts. Furthermore, the analysis

of intentional artifacts such can allow for quantitatively answering the question of whether intentional

artifacts such as an intentional cough di↵ers from a naturally induced cough. Lastly, further analysis

of saliva swallows proves beneficial as current results have shown saliva swallows to di↵er from samples

swallows (i.e. 5 mL and 10 mL).


4.4 Conclusion

In this work we have evaluated the values and drawbacks of an acoustic transducer in the application of

non-invasive swallow screening. The analysis of the selected features in chapter 3 has been justified and

it is highly recommended for future work to further discover features that quantify signal complexity.

The analysis of individual sample types also demonstrated that saliva swallows di↵er from liquid sample

swallows.

Bibliography

[1] E. Sejdic, T. H. Falk, C. M. Steele, and T. Chau, “Vocalization removal for improved automatic

segmentation of dual-axis swallowing accelerometry signals,” Medical Engineering Physics, vol. 32,

no. 6, pp. 668 – 672, 2010.

[2] D. C. Gleeson, “Oropharyngeal swallowing and aging: A review,” Journal of Communication Dis-

orders, vol. 32, no. 6, pp. 373 – 396, 1999.

[3] G. Malandraki and J. Robbins, “Chapter 21 - dysphagia,” in Neurological Rehabilitation, ser. Hand-

book of Clinical Neurology, M. P. Barnes and D. C. Good, Eds. Elsevier, 2013, vol. 110, pp. 255

– 271.

[4] R. Bulat and R. Orlando, “Oropharyngeal dysphagia,” Current Treatment Options in Gastroen-

terology, vol. 8, no. 4, pp. 269–274, 2005.

[5] C. Borr, M. Hielscher-Fastabend, and A. Lcking, “Reliability and validity of cervical auscultation,”

Dysphagia, vol. 22, no. 3, pp. 225–234, 2007. [Online]. Available: http://dx.doi.org/10.1007/s00455-

007-9078-3

[6] K. Takahashi, M. Groher, and K.-i. Michi, “Methodology for detecting swallowing sounds,” Dys-

phagia, vol. 9, no. 1, pp. 54–62, 1994.

[7] C. M. Steele, S. M. Molfenter, G. L. Bailey, R. C. Polacco, A. A. Waito, D. C. B. H. Zoratto,

and T. Chau, “Exploration of the utility of a brief swallow screening protocol with comparison to

concurrent videofluoroscopy,” Canadian Journal of Speech-Language Pathology Audiology, vol. 35,

2011.

[8] F. E. Ester Marco, Esther Duarte, “Usefulness of the volume-viscosity swallow test for screening

dysphagia in subacute stroke patients in rehabilitation,” NeuroRehabilitation, pp. 631–638.

54

Bibliography 55

[9] B. Kertscher, R. Speyer, M. Palmieri, and C. Plant, “Bedside screening to detect oropharyngeal

dysphagia in patients with neurological disorders: An updated systematic review,” Dysphagia,

vol. 29, no. 2, pp. 204–212, 2014. [Online]. Available: http://dx.doi.org/10.1007/s00455-013-9490-9

[10] P. Clav, V. Arreola, M. Romea, L. Medina, E. Palomera, and M. Serra-Prat, “Accuracy of the

volume-viscosity swallow test for clinical screening of oropharyngeal dysphagia and aspiration,”

Clinical Nutrition, vol. 27, no. 6, pp. 806 – 815, 2008.

[11] D. Suiter, J. Sloggy, and S. Leder, “Validation of the yale swallow protocol: A prospective double-

blinded videofluoroscopic study,” Dysphagia, vol. 29, no. 2, pp. 199–203, 2014.

[12] A. Osawa, S. Maeshima, and N. Tanahashi, “Water-swallowing test: Screening for aspiration in

stroke patients,” Cerebrovascular Diseases, vol. 35, no. 3, pp. 276–81, 04 2013.

[13] C. Steele, E. Sejdi, and T. Chau, “Noninvasive detection of thin-liquid aspiration using dual-axis

swallowing accelerometry,” Dysphagia, vol. 28, no. 1, pp. 105–112, 2013.

[14] S. Morinire, M. Boiron, D. Alison, P. Makris, and P. Beutter, “Origin of the sound components

during pharyngeal swallowing in normal subjects,” Dysphagia, vol. 23, no. 3, pp. 267–273, 2008.

[15] B. Sherman, J. M. Nisenboum, B. L. Jesberger, C. A. Morrow, and J. A. Jesberger, “Assessment

of dysphagia with the use of pulse oximetry,” Dysphagia, vol. 14, no. 3, pp. 152–156, 1999.

[16] J. Lee, C. M. Steele, and T. Chau, “Classification of healthy and abnormal swallows based on

accelerometry and nasal airflow signals,” Artif Intell Med, vol. 52, no. 1, pp. 17–25, May 2011.

[17] S. Mukhopadhyay and G. C. Ray, “A new interpretation of nonlinear energy operator and its e�cacy

in spike detection,” Biomedical Engineering, IEEE Transactions on, vol. 45, no. 2, pp. 180–187, Feb

1998.

[18] C. Steele, E. Sejdi, and T. Chau, “Noninvasive detection of thin-liquid aspiration using dual-axis

swallowing accelerometry,” Dysphagia, vol. 28, no. 1, pp. 105–112, 2013.

[19] J. Lee, E. Sejdi, C. Steele, and T. Chau, “E↵ects of liquid stimuli on dual-axis swallowing accelerom-

etry signals in a healthy population,” BioMedical Engineering OnLine, vol. 9, no. 1, 2010.

[20] I. Orovic, S. Stankovic, T. Chau, C. M. Steele, and E. Sejdic, “Time-frequency analysis and hermite

projection method applied to swallowing accelerometry signals,” EURASIP J. Adv. Sig. Proc., 2010.

Bibliography 56

[21] E. Sejdic, T. H. Falkemail, C. M. Steeleemail, and T. Chauemail, “Vocalization removal for improved

automatic segmentation of dual-axis swallowing accelerometry signals,” Medical Engineering and

Physics, vol. 32, p. 668672, 2010.

[22] J. F. Bercher and C. Vignat, “Estimating the entropy of a signal with applications,” Signal Pro-

cessing, IEEE Transactions on, vol. 48, no. 6, pp. 1687–1694, Jun 2000.

[23] S. Anisheh and H. Hassanpour, “Designing an adaptive approach for segmenting non-stationary

signals,” International Journal of Electronics, vol. 98, no. 8, pp. 1091–1102, 2011.

[24] K. Umapathy, F. H. Foomany, P. Dorian, T. Farid, G. Sivagangabalan, K. Nair, S. Masse, S. Kr-

ishnan, and K. Nanthakumar, “Real-time electrogram analysis for monitoring coronary blood flow

during human ventricular fibrillation: Implications for {CPR},” Heart Rhythm, vol. 8, no. 5, pp.

740 – 749, 2011.

[25] H. Wang, X. Lu, Z. Hu, and W. Zheng, “Fisher discriminant analysis with l1-norm,” Cybernetics,

IEEE Transactions on, vol. 44, no. 6, pp. 828–842, June 2014.

[26] A. Yadollahi and Z. Moussavi, “Feature selection for swallowing sounds classification,” in Engineer-

ing in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of

the IEEE, Aug 2007, pp. 3172–3175.

[27] S. Youmans and J. Stierwalt, “Normal swallowing acoustics across age, gender, bolus viscosity, and

bolus volume,” Dysphagia, vol. 26, no. 4, pp. 374–384, 2011.

Characterization and Segmentation of Acoustic Swallowing ... · Characterization and Segmentation...

Documents

Transcript of Characterization and Segmentation of Acoustic Swallowing ... · Characterization and Segmentation...