Embedded Deep Learning for Sleep StagingEmbedded Deep Learning for Sleep Staging Engin Turetken¨...

2
Embedded Deep Learning for Sleep Staging Engin T¨ uretken Embedded Vision System Group, CSEM Neuchˆ atel, Switzerland [email protected] erˆ ome Van Zaen Signal Processing Group, CSEM Neuchˆ atel, Switzerland [email protected] Ricard Delgado-Gonzalo Embedded Software Group, CSEM Neuchˆ atel, Switzerland [email protected] Abstract—The rapidly-advancing technology of deep learning (DL) into the world of the Internet of Things (IoT) has not fully entered in the fields of m-Health yet. Among the main reasons are the high computational demands of DL algorithms and the inherent resource-limitation of wearable devices. In this paper, we present initial results for two deep learning architectures used to diagnose and analyze sleep patterns, and we compare them with a previously presented hand-crafted algorithm. The algorithms are designed to be reliable for consumer healthcare applications and to be integrated into low-power wearables with limited computational resources. Index Terms—CNN, RNN, deep learning, embedded, SoC, sleep, polysomnography, e-health, m-health I. I NTRODUCTION Deep learning (DL) is a branch of machine learning based on artificial neural networks that hierarchically model high- level abstractions in data [1]. Thanks to the recent availability of large-scale labeled datasets and powerful hardware to process them, the field has been successful in many fields including computer vision [2], natural language processing [3], and speech analysis [4]. The technology is likely to be disruptive in many application areas and expected to render conventional machine learning techniques obsolete. Conse- quently, substantial research is being performed to adopt it in the rapidly-growing wearables and IoT markets [5], [6]. According to CCS Insight, the wearable market is expected to grow from over $10 billion in 2017 to $17 billion by 2021. In this race of building DL-powered wearables, multiple efforts have been made to streamline and simplify current DL frameworks to enable them to be used in the edge. Since wear- able devices are at the extreme edge, they need to minimize their computational footprint in order to maximize battery life time for a proper life-style assessment, only the most low-level frameworks, such as CMSIS-NN [7], are simple enough to be considered given the current state of System-On-Chip (SoC) technology. This paper presents a proof of concept (PoC) using deep learning architectures for sleep staging and evaluates two implementations fully embeddable in low-power SoC. We compare the performance of these architectures with a hand- crafted algorithm designed for low-power variables. II. DATA The data for the PoC was taken from the Phys- ioNet/Computing in Cardiology Challenge 2018 (CinC18) 1 . 1 https://www.physionet.org/challenge/2018/ The dataset was contributed by the Massachusetts General Hospital’s (MGH) Computational Clinical Neurophysiology Laboratory, and the Clinical Data Animation Laboratory. The whole dataset includes 1985 subjects who were monitored at an MGH sleep laboratory for the diagnosis of sleep disorders. However, only the training data (994 subjects) was used in this research since it was the only part that was open to the public. The sleep stages of the subjects were annotated by the MGH according to the American Academy of Sleep Medicine [8]. Specifically, every 30-second segment was an- notated with one of the following labels: wakefulness, rapid eye movement (REM), NREM stage 1, NREM stage 2, and NREM stage 3, where NREM stands for non-REM. For our analysis, we merged NREM stage 1, NREM stage 2, and NREM stage 3. The dataset contained the following physio- logical signals at a high sampling rate: electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), electrocardiology (ECG), and oxygen saturation (SaO2). We split the dataset into training and test subsets with 90% for training and 10% for testing. It is worth noting that this dataset contains a high number of pathological cases since it is a very specific cohort. Therefore, it is not representative of the general population but constitutes a challenging medical-grade dataset. III. METHODS A. Hand-designed In order to establish a baseline, we used our previously- descried algorithm based on physiological cardiorespiratory cues [9] extracted from photoplethysmographic sensors [10]. For a healthy population, this algorithm proved to achieve a sensitivity and specificity for REM of 89.2% and 77.9% respectively; and for NREM 83.4% and 84.9% respectively. The algorithm was exclusively based on the analysis of the heart rate variability (HRV) and movement (Fig. 1). B. Deep learning The input to the algorithms is a short temporal sequence (around 8.5 minutes) of HRV values at 4Hz and a binary value that denotes whether the subject moved. We developed several deep learning architectures for sleep staging and evaluated two embedded implementations: Multi-Layer Recurrent Neural Networks (ML-RNNs): RNNs are powerful architectures designed to model long- term temporal relations in the data. They have been shown to work very well in natural language processing. arXiv:1906.09905v1 [cs.LG] 18 Jun 2019

Transcript of Embedded Deep Learning for Sleep StagingEmbedded Deep Learning for Sleep Staging Engin Turetken¨...

  • Embedded Deep Learning for Sleep StagingEngin Türetken

    Embedded Vision System Group, CSEMNeuchâtel, Switzerland

    [email protected]

    Jérôme Van ZaenSignal Processing Group, CSEM

    Neuchâtel, [email protected]

    Ricard Delgado-GonzaloEmbedded Software Group, CSEM

    Neuchâtel, [email protected]

    Abstract—The rapidly-advancing technology of deep learning(DL) into the world of the Internet of Things (IoT) has not fullyentered in the fields of m-Health yet. Among the main reasonsare the high computational demands of DL algorithms and theinherent resource-limitation of wearable devices. In this paper,we present initial results for two deep learning architecturesused to diagnose and analyze sleep patterns, and we comparethem with a previously presented hand-crafted algorithm. Thealgorithms are designed to be reliable for consumer healthcareapplications and to be integrated into low-power wearables withlimited computational resources.

    Index Terms—CNN, RNN, deep learning, embedded, SoC,sleep, polysomnography, e-health, m-health

    I. INTRODUCTIONDeep learning (DL) is a branch of machine learning based

    on artificial neural networks that hierarchically model high-level abstractions in data [1]. Thanks to the recent availabilityof large-scale labeled datasets and powerful hardware toprocess them, the field has been successful in many fieldsincluding computer vision [2], natural language processing [3],and speech analysis [4]. The technology is likely to bedisruptive in many application areas and expected to renderconventional machine learning techniques obsolete. Conse-quently, substantial research is being performed to adopt itin the rapidly-growing wearables and IoT markets [5], [6].According to CCS Insight, the wearable market is expected togrow from over $10 billion in 2017 to $17 billion by 2021.

    In this race of building DL-powered wearables, multipleefforts have been made to streamline and simplify current DLframeworks to enable them to be used in the edge. Since wear-able devices are at the extreme edge, they need to minimizetheir computational footprint in order to maximize battery lifetime for a proper life-style assessment, only the most low-levelframeworks, such as CMSIS-NN [7], are simple enough to beconsidered given the current state of System-On-Chip (SoC)technology.

    This paper presents a proof of concept (PoC) using deeplearning architectures for sleep staging and evaluates twoimplementations fully embeddable in low-power SoC. Wecompare the performance of these architectures with a hand-crafted algorithm designed for low-power variables.

    II. DATAThe data for the PoC was taken from the Phys-

    ioNet/Computing in Cardiology Challenge 2018 (CinC18)1.

    1https://www.physionet.org/challenge/2018/

    The dataset was contributed by the Massachusetts GeneralHospital’s (MGH) Computational Clinical NeurophysiologyLaboratory, and the Clinical Data Animation Laboratory. Thewhole dataset includes 1985 subjects who were monitored atan MGH sleep laboratory for the diagnosis of sleep disorders.However, only the training data (994 subjects) was used inthis research since it was the only part that was open tothe public. The sleep stages of the subjects were annotatedby the MGH according to the American Academy of SleepMedicine [8]. Specifically, every 30-second segment was an-notated with one of the following labels: wakefulness, rapideye movement (REM), NREM stage 1, NREM stage 2, andNREM stage 3, where NREM stands for non-REM. For ouranalysis, we merged NREM stage 1, NREM stage 2, andNREM stage 3. The dataset contained the following physio-logical signals at a high sampling rate: electroencephalography(EEG), electrooculography (EOG), electromyography (EMG),electrocardiology (ECG), and oxygen saturation (SaO2). Wesplit the dataset into training and test subsets with 90% fortraining and 10% for testing. It is worth noting that this datasetcontains a high number of pathological cases since it is a veryspecific cohort. Therefore, it is not representative of the generalpopulation but constitutes a challenging medical-grade dataset.

    III. METHODS

    A. Hand-designed

    In order to establish a baseline, we used our previously-descried algorithm based on physiological cardiorespiratorycues [9] extracted from photoplethysmographic sensors [10].For a healthy population, this algorithm proved to achievea sensitivity and specificity for REM of 89.2% and 77.9%respectively; and for NREM 83.4% and 84.9% respectively.The algorithm was exclusively based on the analysis of theheart rate variability (HRV) and movement (Fig. 1).

    B. Deep learning

    The input to the algorithms is a short temporal sequence(around 8.5 minutes) of HRV values at 4Hz and a binary valuethat denotes whether the subject moved. We developed severaldeep learning architectures for sleep staging and evaluated twoembedded implementations:

    • Multi-Layer Recurrent Neural Networks (ML-RNNs):RNNs are powerful architectures designed to model long-term temporal relations in the data. They have been shownto work very well in natural language processing.

    arX

    iv:1

    906.

    0990

    5v1

    [cs

    .LG

    ] 1

    8 Ju

    n 20

    19

    https://www.physionet.org/challenge/2018/

  • Freq

    uenc

    y (H

    z)Sl

    eep

    clas

    s

    Time (h)

    0.0

    0.1

    0.4

    0.5

    0.2

    0.3

    WAKE

    REM

    NREM

    0 1 2 3 4 5 6

    HypnogramBaseline algorithm

    HRV spectrum

    Fig. 1. Time frequency representation of the HRV spectrum (top) andestimated hypnogram versus reference (bottom).

    TABLE ILAYERS OF THE CNN ARCHITECTURE.

    Layer type Stride Filter shape Input sizeConv 1 3 × 3 × 32 2048 × 3

    Max Pool 1 2 × 1 2048 × 32Conv 2 3 × 32 × 32 1024 × 32Conv 2 3 × 32 × 48 512 × 32Conv 2 3 × 48 × 64 256 × 48Conv 2 3 × 64 × 64 128 × 64Conv 2 3 × 64 × 64 64 × 64Conv 2 3 × 64 × 64 32 × 64Conv 2 3 × 64 × 64 16 × 64Conv 2 3 × 64 × 64 8 × 64Conv 1 3 × 64 × 64 4 × 64Conv 1 1 × 64 × 64 4 × 64FC 1 256 × 256 1 × 256FC 1 256 × 21 1 × 256

    • Convolutional Neural Networks (CNNs): CNNs are neu-ral networks which can capture local patterns with in-creasing semantic complexity in spatiotemporal data.

    In our implementation, the CNN and the ML-RNN archi-tectures employ fully connected and normalization layers, andthe latter is based on an efficient version of LSTM withforget gates. More specifically, the ML-RNN architecture iscomprised of two LSTM layers respectively with 128 and 32hidden units, which are followed by two fully connected layerswith bias. The CNN architecture contains eleven convolutionallayers with a temporal size of 3, a single max-pooling layer,and two fully connected layers (see Table I). The convolutionallayers are followed by batch normalization and ReLU units.

    IV. RESULTS

    The RNN-based network brings approximately 20% im-provement in mean accuracy over the baseline method andthe CNN approximately 35% (see Table II). Higher accuracy

    TABLE IIACCURACY AND COMPLEXITY OF THE PROPOSED ALGORITHMS.

    Algorithm Accuracy MACs ParametersHand-designed (baseline) 47% — —

    ML-RNN 68% 1.2M 1.2MCNN 81% 6.2M 166K

    obtained by the CNN network suggests that high frequencypatterns in the data, which RNNs are ineffective at modeling,is informative for the sleep staging task. As shown in Table II,both architectures are small in size and require limited pro-cessing resources to run. Their particular execution time willdepend on the SoC that is selected.

    V. CONCLUSION

    We presented results for two prominent DL architecturesused to diagnose and analyze sleep patterns on the CinC2018dataset, and compared them to a hand-crafted algorithm. TheDL architectures outperform by more than 20% the hard-designed baseline which was developed on a database onhealthy subjects. Inter-observer agreement of expert sleepscorers on more reliable and multi-modal sensory data such asEEG and EOG is less than 85% [11], [12]. Considering thatwe used only HRV as input and the pathological nature ofthe dataset used to train the efficient CNN model, an averageaccuracy of 81% on 3-class sleep staging is very promising.Future work includes embedding the networks on a resource-limited processor from the ARM Cortex-M family or a low-power neural network hardware accelerator, such as the GAP8.

    REFERENCES[1] Y. LeCun et al., “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–

    444, 2015.[2] K. He et al., “Deep residual learning for image recognition,” in

    CVPR’16, 2016, pp. 770–778.[3] R. Socher et al., “Deep learning for NLP (without magic),” in Tutorial

    Abstracts of ACL 2012, 2012, p. 5.[4] L. Deng et al., “New types of deep neural network learning for speech

    recognition and related applications: An overview,” in ICASSP’13, 2013,pp. 8599–8603.

    [5] R. Miotto et al., “Deep learning for healthcare: Review, opportunitiesand challenges,” Briefings in bioinformatics, vol. 19, no. 6, pp. 1236–1246, 2017.

    [6] J. Van Zaen et al., “Classification of cardiac arrhythmias from singlelead ECG with a convolutional recurrent neural network,” in BIOSIG-NALS’19, 2019.

    [7] L. Lai et al., “CMSIS-NN: Efficient neural network kernels for ArmCortex-M CPUs,” arXiv preprint arXiv:1801.06601, 2018.

    [8] R. B. Berry et al., “The AASM manual for the scoring of sleep andassociated events,” Rules, Terminology and Technical Specifications,Darien, Illinois, American Academy of Sleep Medicine, 2012.

    [9] P. Renevey et al., “Optical wrist-worn device for sleep monitoring,” inEMBEC’17 & NBC’2017, 2017, pp. 615–618.

    [10] ——, “Respiratory and cardiac monitoring at night using a wristwearable optical system,” in EMBC’18, 2018, pp. 2861–2864.

    [11] R. G. Norman et al., “Interobserver agreement among sleep scorers fromdifferent centers in a large dataset.” Sleep, vol. 23, no. 7, pp. 901–908,2000.

    [12] T. Penzel et al., “Inter-scorer reliability between sleep centers can teachus what to improve in the scoring rules,” Journal of Clinical SleepMedicine, vol. 9, no. 1, pp. 89–91, 2013.

    I IntroductionII DataIII MethodsIII-A Hand-designedIII-B Deep learning

    IV ResultsV ConclusionReferences