Deep learning in bioinformatics - Persagen Consulting · Deep learning in bioinformatics Seonwoo...

Deep learning in bioinformaticsSeonwoo Min Byunghan Lee and Sungroh YoonCorresponding author Sungroh Yoon Department of Electrical and Computer Engineering Seoul National University Seoul 08826 KoreaTel +82-2-880-1401 Fax +82-2-871-5974 E-mail sryoonsnuackr

Abstract

In the era of big data transformation of biomedical big data into valuable knowledge has been one of the most importantchallenges in bioinformatics Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields Accordingly application of deep learning in bioinformatics to gain insight from data hasbeen emphasized in both academia and industry Here we review deep learning in bioinformatics presenting examples ofcurrent research To provide a useful and comprehensive perspective we categorize research both by the bioinformatics do-main (ie omics biomedical imaging biomedical signal processing) and deep learning architecture (ie deep neural net-works convolutional neural networks recurrent neural networks emergent architectures) and present brief descriptions ofeach study Additionally we discuss theoretical and practical issues of deep learning in bioinformatics and suggest futureresearch directions We believe that this review will provide valuable insights and serve as a starting point for researchersto apply deep learning approaches in their bioinformatics studies

Key words deep learning neural network machine learning bioinformatics omics biomedical imaging biomedical signalprocessing

Introduction

In the era of lsquobig datarsquo transformation of large quantities of datainto valuable knowledge has become increasingly important invarious domains [1] and bioinformatics is no exceptionSignificant amounts of biomedical data including omics imageand signal data have been accumulated and the resulting po-tential for applications in biological and healthcare researchhas caught the attention of both industry and academia For in-stance IBM developed Watson for Oncology a platform analyz-ing patientsrsquo medical information and assisting clinicians withtreatment options [2 3] In addition Google DeepMind havingachieved great success with AlphaGo in the game of Go re-cently launched DeepMind Health to develop effective health-care technologies [4 5]

To extract knowledge from big data in bioinformatics ma-chine learning has been a widely used and successful method-ology Machine learning algorithms use training data to uncover

underlying patterns build models and make predictions basedon the best fit model Indeed some well-known algorithms (iesupport vector machines random forests hidden Markov mod-els Bayesian networks Gaussian networks) have been appliedin genomics proteomics systems biology and numerous otherdomains [6]

The proper performance of conventional machine learningalgorithms relies heavily on data representations called fea-tures [7] However features are typically designed by human en-gineers with extensive domain expertise and identifying whichfeatures are more appropriate for the given task remains diffi-cult Deep learning a branch of machine learning has recentlyemerged based on big data the power of parallel and distributedcomputing and sophisticated algorithms Deep learning hasovercome previous limitations and academic interest hasincreased rapidly since the early 2000s (Figure 1) Furthermoredeep learning is responsible for major advances in diverse fields

Seonwoo Min is a MSPhD candidate at the Department of Electrical and Computer Engineering Seoul National University Korea His research areas in-clude high-performance bioinformatics machine learning for biomedical big data and deep learningByunghan Lee is a PhD candidate at the Department of Electrical and Computer Engineering Seoul National University Korea His research areas includehigh-performance bioinformatics machine learning for biomedical big data and data miningSungroh Yoon is an associate professor at the Department of Electrical and Computer Engineering Seoul National University Seoul Korea He receivedhis PhD and postdoctoral training from Stanford University Stanford USA His research interests include machine learning and deep learning for bio-informatics and high-performance bioinformaticsSubmitted 20 March 2016 Received (in revised form) 16 June 2016

VC The Author 2016 Published by Oxford University Press For Permissions please email journalspermissionsoupcom

1

Briefings in Bioinformatics 2016 1ndash19

doi 101093bibbbw068Paper

Briefings in Bioinformatics Advance Access published July 29 2016 at C

ornell University L

ibrary on July 29 2016httpbiboxfordjournalsorg

Dow

nloaded from

where the artificial intelligence (AI) community has struggledfor many years [8] One of the most important advancementsthus far has been in image and speech recognition [9ndash15] al-though promising results have been disseminated in naturallanguage processing [16 17] and language translation [18 19]Certainly bioinformatics can also benefit from deep learning(Figure 2) splice junctions can be discovered from DNA se-quences finger joints can be recognized from X-ray imageslapses can be detected from electroencephalography (EEG) sig-nals and so on

Previous reviews have addressed machine learning in bio-informatics [6 20] and the fundamentals of deep learning [7 821] In addition although recently published reviews by Leunget al [22] Mamoshina et al [23] and Greenspan et al [24] dis-cussed deep learning applications in bioinformatics researchthe former two are limited to applications in genomic medicineand the latter to medical imaging In this article we provide amore comprehensive review of deep learning for bioinformaticsand research examples categorized by bioinformatics domain(ie omics biomedical imaging biomedical signal processing)and deep learning architecture (ie deep neural networks con-volutional neural networks recurrent neural networks emer-gent architectures) The goal of this article is to provide valuableinsight and to serve as a starting point to facilitate the applica-tion of deep learning in bioinformatics studies To the best ofour knowledge we are one of the first groups to review deeplearning applications in bioinformatics

Deep learning a brief overview

Efforts to create AI systems have a long history Figure 3 illus-trates the relationships and high-level schematics of differentdisciplines Early approaches attempted to explicitly programthe required knowledge for given tasks however these faceddifficulties in dealing with complex real-world problems be-cause designing all the detail required for an AI system to ac-complish satisfactory results by hand is such a demanding job[7] Machine learning provided more viable solutions with thecapability to improve through experience and data Althoughmachine learning can extract patterns from data there are limi-tations in raw data processing which is highly dependent onhand-designed features To advance from hand-designed to

data-driven features representation learning particularly deeplearning has shown great promise Representation learning candiscover effective features as well as their mappings from datafor given tasks Furthermore deep learning can learn complexfeatures by combining simpler features learned from data Inother words with artificial neural networks of multiple non-lin-ear layers referred to as deep learning architectures hierarch-ical representations of data can be discovered with increasinglevels of abstraction [25]

Key elements of deep learning

The successes of deep learning are built on a foundation of sig-nificant algorithmic details and generally can be understood intwo parts construction and training of deep learning architec-tures Deep learning architectures are basically artificial neuralnetworks of multiple non-linear layers and several types havebeen proposed according to input data characteristics and re-search objectives (Table 1) Here we categorized deep learningarchitectures into four groups (ie deep neural networks (DNNs)[26ndash30] convolutional neural networks (CNNs) [31ndash33] recurrentneural networks (RNNs) [34ndash37] emergent architectures [38ndash41])and explained each group in detail (Table 2) Some papers haveused lsquoDNNsrsquo to encompass all deep learning architectures [7 8]however in this review we use lsquoDNNsrsquo to refer specifically tomultilayer perceptron (MLP) [26] stacked auto-encoder (SAE)[27 28] and deep belief networks (DBNs) [29 30] which use per-ceptrons [42] auto-encoders (AEs) [43] and restricted Boltzmannmachines (RBMs) [44 45] as the building blocks of neural net-works respectively CNNs are architectures that have suc-ceeded particularly in image recognition and consist ofconvolution layers non-linear layers and pooling layers RNNsare designed to utilize sequential information of input datawith cyclic connections among building blocks like perceptronslong short-term memory units (LSTMs) [36 37] or gated recur-rent units (GRUs) [19] In addition many other emergent deeplearning architectures have been suggested such as deepspatio-temporal neural networks (DST-NNs) [38] multi-dimensional recurrent neural networks (MD-RNNs) [39] andconvolutional auto-encoders (CAEs) [40 41]

The goal of training deep learning architectures is optimiza-tion of the weight parameters in each layer which gradually

Figure 1 Approximate number of published deep learning articles by year The number of articles is based on the search results on httpwwwscopuscom with the

two queries lsquoDeep learningrsquo lsquoDeep learningrsquo AND lsquobiorsquo

2 | Min et al

at Cornell U

niversity Library on July 29 2016

httpbiboxfordjournalsorgD

ownloaded from

combines simpler features into complex features so that themost suitable hierarchical representations can be learned fromdata A single cycle of the optimization process is organized asfollows [8] First given a training dataset the forward pass se-quentially computes the output in each layer and propagatesthe function signals forward through the network In the finaloutput layer an objective loss function measures error between

the inferenced outputs and the given labels To minimize thetraining error the backward pass uses the chain rule to back-propagate error signals and compute gradients with respect toall weights throughout the neural network [46] Finally theweight parameters are updated using optimization algorithmsbased on stochastic gradient descent (SGD) [47] Whereas batchgradient descent performs parameter updates for each

Figure 2 Application of deep learning in bioinformatics research (A) Overview diagram with input data and research objectives (B) A research example in the omics

domain Prediction of splice junctions in DNA sequence data with a deep neural network [94] (C) A research example in biomedical imaging Finger joint detection

from X-ray images with a convolutional neural network [145] (D) A research example in biomedical signal processing Lapse detection from EEG signal with a recurrent

neural network [178]

Figure 3 Relationships and high-level schematics of artificial intelligence machine learning representation learning and deep learning [7]

Deep learning in bioinformatics | 3

at Cornell U



ownloaded from

complete dataset SGD provides stochastic approximations byperforming the updates for each small set of data examplesSeveral optimization algorithms stem from SGD For exampleAdagrad [48] and Adam [49] perform SGD while adaptively mod-ifying learning rates based on update frequency and momentsof the gradients for each parameter respectively

Another core element in the training of deep learning archi-tectures is regularization which refers to strategies intended toavoid overfitting and thus achieve good generalization perform-ance For example weight decay [50] a well-known conven-tional approach adds a penalty term to the objective lossfunction so that weight parameters converge to smaller abso-lute values Currently the most widely used regularization ap-proach is dropout [51] Dropout randomly removes hidden unitsfrom neural networks during training and can be considered anensemble of possible subnetworks [52] To enhance the capabil-ities of dropout a new activation function maxout [53] and avariant of dropout for RNNs called rnnDrop [54] have been pro-posed Furthermore recently proposed batch normalization [55]provides a new regularization method through normalization ofscalar features for each activation within a mini-batch andlearning each mean and variance as parameters

Deep learning libraries

To actually implement deep learning algorithms a great deal ofattention to algorithmic details is required Fortunately manyopen source deep learning libraries are available online (Table3) There are still no clear front-runners and each library has itsown strengths [56] According to benchmark test resultsof CNNs specifically AlexNet [33] implementation inBaharampour et al [57] Python-based Neon [58] shows a great

advantage in the processing speed Cthornthornbased Caffe [59] andLua-based Torch [60] offer great advantages in terms of pre-trained models and functional extensionality respectivelyPython-based Theano [61 62] provides a low-level library to de-fine and optimize mathematical expressions moreover numer-ous higher-level wrappers such as Keras [63] Lasagne [64]and Blocks [65] have been developed on top of Theano to pro-vide more intuitive interfaces Google recently released theCthornthorn based TensorFlow [66] with a Python interface This librarycurrently shows limited performance but is undergoing con-tinuous improvement as heterogeneous distributed computingis now supported In addition TensorFlow can also take advan-tage of Keras which provides an additional model-levelinterface

Deep neural networks

The basic structure of DNNs consists of an input layer multiplehidden layers and an output layer (Figure 4) Once input dataare given to the DNNs output values are computed sequentiallyalong the layers of the network At each layer the input vectorcomprising the output values of each unit in the layer below ismultiplied by the weight vector for each unit in the current layerto produce the weighted sum Then a non-linear function suchas a sigmoid hyperbolic tangent or rectified linear unit (ReLU)[67] is applied to the weighted sum to compute the output val-ues of the layer The computation in each layer transforms therepresentations in the layer below into slightly more abstractrepresentations [8] Based on the types of layers used in DNNsand the corresponding learning method DNNs can be classifiedas MLP SAE or DBN

MLP has a similar structure to the usual neural networks butincludes more stacked layers It is trained in a purely supervisedmanner that uses only labeled data Since the training methodis a process of optimization in high-dimensional parameterspace MLP is typically used when a large number of labeleddata are available [25]

SAE and DBN use AEs and RBMs as building blocks of thearchitectures respectively The main difference between theseand MLP is that training is executed in two phases unsuper-vised pre-training and supervised fine-tuning First in unsuper-vised pre-training (Figure 5) the layers are stacked sequentiallyand trained in a layer-wise manner as an AE or RBM using un-labeled data Afterwards in supervised fine-tuning an outputclassifier layer is stacked and the whole neural network is opti-mized by retraining with labeled data Since both SAE and DBNexploit unlabeled data and can help avoid overfitting re-searchers are able to obtain fairly regularized results evenwhen labeled data are insufficient as is common in the realworld [68]

DNNs are renowned for their suitability in analyzing high-dimensional data Given that bioinformatics data are typicallycomplex and high-dimensional DNNs have great promise forbioinformatics research We believe that DNNs as hierarchicalrepresentation learning methods can discover previously un-known highly abstract patterns and correlations to provide in-sight to better understand the nature of the data However ithas occurred to us that the capabilities of DNNs have not yetfully been exploited Although the key characteristic of DNNs isthat hierarchical features are learned solely from data human-designed features have often been given as inputs instead ofraw data forms We expect that the future progress of DNNs inbioinformatics will come from investigations into proper waysto encode raw data and learn suitable features from them

Table 1 Abbreviations in alphabetical order

Abbreviation Full word

AE Auto-Encoder

AI Artificial intelligence

AUC Area-under-the-receiver operation characteristics curve

AUC-PR Area-under-the-precisionndashrecall curve

BRNN Bidirectional recurrent neural network

CAE Convolutional auto-encoder

CNN Convolutional neural network

DBN Deep belief network

DNN Deep neural network

DST-NN Deep spatio-temporal neural network

ECG Electrocardiography

ECoG Electrocorticography

EEG Electroencephalography

EMG Electromyography

EOG Electrooculography

GRU Gated recurrent unit

LSTM Long short-term memory

MD-RNN Multi-dimensional recurrent neural network

MLP Multilayer perceptron

MRI Magnetic resonance image

PCA Principal component analysis

PET Positron emission tomography

PSSM Position specific scoring matrix

RBM Restricted Boltzmann machine

ReLU Rectified linear unit

RNN Recurrent neural network

SAE Stacked auto-encoder

SGD Stochastic gradient descent

4 | Min et al

at Cornell U



ownloaded from

Convolutional neural networks

CNNs are designed to process multiple data types especiallytwo-dimensional images and are directly inspired by the visualcortex of the brain In the visual cortex there is a hierarchy oftwo basic cell types simple cells and complex cells [69] Simplecells react to primitive patterns in sub-regions of visual stimuliand complex cells synthesize the information from simple cellsto identify more intricate forms Since the visual cortex is sucha powerful and natural visual processing system CNNs areapplied to imitate three key ideas local connectivity invarianceto location and invariance to local transition [8]

The basic structure of CNNs consists of convolution layersnon-linear layers and pooling layers (Figure 6) To use highlycorrelated sub-regions of data groups of local weighted sumscalled feature maps are obtained at each convolution layer bycomputing convolutions between local patches and weight vec-tors called filters Furthermore since identical patterns can ap-pear regardless of the location in the data filters are appliedrepeatedly across the entire dataset which also improves train-ing efficiency by reducing the number of parameters to learnThen non-linear layers increase the non-linear properties offeature maps At each pooling layer maximum or average sub-sampling of non-overlapping regions in feature maps is per-formed This non-overlapping subsampling enables CNNs tohandle somewhat different but semantically similar featuresand thus aggregate local features to identify more complexfeatures

Currently CNNs are one of the most successful deep learn-ing architectures owing to their outstanding capacity to analyze

spatial information Thanks to their developments in the fieldof object recognition we believe the primary research achieve-ments in bioinformatics will come from the biomedical imagingdomain Despite the different data characteristics between nor-mal and biomedical imaging CNN will nonetheless offerstraightforward applications compared to other domainsIndeed CNNs also have great potential in omics and biomedical

Table 2 Categorization of deep learning applied research in bioinformatics

Omics Biomedical imaging Biomedical signal processing

Research topics Reference Research topics Reference Research topics Reference

Deep neural networks Protein structure [84ndash87] Anomaly classification [122ndash124] Brain decoding [158ndash163]Gene expression regulation [93ndash98] Segmentation [133] Anomaly classification [171ndash175]Protein classification [108] Recognition [142 143]Anomaly classification [111] Brain decoding [149 150]

Convolutional neuralnetworks

Gene expression regulation [99ndash104] Anomaly classification [125ndash132] Brain decoding [164ndash167]Segmentation [134ndash140] Anomaly classification [176]Recognition [144ndash147]

Recurrent neuralnetworks

Protein structure [88ndash90] Brain decoding [168]Gene expression regulation [105ndash107] Anomaly classification [177 178]Protein classification [109 110]

Emergent architectures Protein structure [91 92] Segmentation [141] Brain decoding [169 170]

Table 3 Comparison of deep learning libraries

Core Speed for batch (ms) Multi-GPU Distributed Strengths [5657]

Caffe Cthornthorn 6516 O X Pre-trained models supportedNeon Python 3868 O X SpeedTensorFlow Cthornthorn 9620 O O Heterogeneous distributed computingTheano Python 7335 X X Ease of use with higher-level wrappersTorch Lua 5066 O X Functional extensionality

Notes Speed for batch is based on the averaged processing times for AlexNet [33] with batch size of 256 on a single GPU [57] Caffe Neon Theano Torch was utilized

with cuDNN v3 while TensorFlow was utilized with cuDNN v2

Figure 4 Basic structure of DNNs with input units x three hidden units h1 h2

and h3 in each layer and output units y [26] At each layer the weighted sum

and non-linear function of its inputs are computed so that the hierarchical rep-

resentations can be obtained


at Cornell U



ownloaded from

signal processing The three keys ideas of CNNs can be appliednot only in a one-dimensional grid to discover meaningfulrecurring patterns with small variance such as genomic se-quence motifs but also in two-dimensional grids such as inter-actions within omics data and in timendashfrequency matrices ofbiomedical signals Thus we believe that the popularity andpromise of CNNs in bioinformatics applications will continue inthe years ahead

Recurrent neural networks

RNNs which are designed to utilize sequential informationhave a basic structure with a cyclic connection (Figure 7) Sinceinput data are processed sequentially recurrent computation isperformed in the hidden units where cyclic connection existsTherefore past information is implicitly stored in the hiddenunits called state vectors and output for the current input iscomputed considering all previous inputs using these state vec-tors [8] Since there are many cases where both past and futureinputs affect output for the current input (eg in speech recogni-tion) bidirectional recurrent neural networks (BRNNs) [70] havealso been designed and used widely (Figure 8)

Although RNNs do not seem to be deep as DNNs or CNNs interms of the number of layers they can be regarded as an evendeeper structure if unrolled in time (Figure 7) Therefore for along time researchers struggled against vanishing gradientproblems while training RNNs and learning long-term depend-ency among data were difficult [35] Fortunately substitutingthe simple perceptron hidden units with more complex unitssuch as LSTM [36 37] or GRU [19] which function as memorycells significantly helps to prevent the problem More recently

RNNs have been used successfully in many areas including nat-ural language processing [16 17] and language translation [1819]

Even though RNNs have been explored less than DNNs andCNNs they still provide very powerful analysis methods for se-quential information Since omics data and biomedical signalsare typically sequential and often considered languages of na-ture the capabilities of RNNs for mapping a variable-length in-put sequence to another sequence or fixed-size prediction arepromising for bioinformatics research With regard to biomed-ical imaging RNNs are currently not the first choice of many re-searchers Nevertheless we believe that dissemination ofdynamic CT and MRI [71 72] would lead to the incorporation ofRNNs and CNNs and elevate their importance in the long termFurthermore we expect that their successes in natural languageprocessing will lead RNNs to be applied in biomedical text ana-lysis [73] and that employing an attention mechanism [74ndash77]will improve performance and extract more relevant informa-tion from bioinformatics data

Emergent architectures

Emergent architectures refer to deep learning architectures be-sides DNNs CNNs and RNNs In this review we introduce threeemergent architectures (ie DST-NNs MD-RNNs and CAEs) andtheir applications in bioinformatics

DST-NNs [38] are designed to learn multi-dimensional out-put targets through progressive refinement The basic structureof DST-NNs consists of multi-dimensional hidden layers (Figure9) The key aspect of the structure progressive refinement con-siders local correlations and is performed via input feature

Figure 5 Unsupervised layer-wise pre-training process in SAE and DBN [29] First weight vector W1 is trained between input units x and hidden units h1 in the first hid-

den layer as an RBM or AE After the W1 is trained another hidden layer is stacked and the obtained representations in h1 are used to train W2 between hidden units

h1 and h2 as another RBM or AE The process is repeated for the desired number of layers

Figure 6 Basic structure of CNNs consisting of a convolution layer a non-linear layer and a pooling layer [32] The convolution layer of CNNs uses multiple learned fil-

ters to obtain multiple filter maps detecting low-level filters and then the pooling layer combines them into higher-level features

6 | Min et al

at Cornell U



ownloaded from

compositions in each layer spatial features and temporal fea-tures Spatial features refer to the original inputs for the wholeDST-NN and are used identically in every layer However tem-poral features are gradually altered so as to progress to theupper layers Except for the first layer to compute each hiddenunit in the current layer only the adjacent hidden units of thesame coordinate in the layer below are used so that local correl-ations are reflected progressively

MD-RNNs [39] are designed to apply the capabilities of RNNsto non-sequential multi-dimensional data by treating them asgroups of sequential data For instance two-dimensional dataare treated as groups of horizontal and vertical sequence dataSimilar to BRNNs which use contexts in both directions in one-dimensional data MD-RNNs use contexts in all possible direc-tions in the multi-dimensional data (Figure 10) In the exampleof a two-dimensional dataset four contexts that vary with theorder of data processing are reflected in the computation of fourhidden units for each position in the hidden layer The hiddenunits are connected to a single output layer and the final re-sults are computed with consideration of all possible contexts

CAEs [40 41] are designed to utilize the advantages of bothAE and CNNs so that it can learn good hierarchical representa-tions of data reflecting spatial information and be well regular-ized by unsupervised training (Figure 11) In training of AEsreconstruction error is minimized using an encoder and

decoder which extract feature vectors from input data and re-create the data from the feature vectors respectively In CNNsconvolution and pooling layers can be regarded as a type of en-coder Therefore the CNN encoder and decoder consisting ofdeconvolution and unpooling layers are integrated to form aCAE and are trained in the same manner as in AE

Deep learning is a rapidly growing research area and aplethora of new deep learning architecture is being proposedbut awaits wide applications in bioinformatics Newly proposedarchitectures have different advantages from existing architec-tures so we expect them to produce promising results in vari-ous research areas For example the progressive refinement ofDST-NNs fits the dynamic folding process of proteins and canbe effectively utilized in protein structure prediction [38] thecapabilities of MD-RNNs are suitable for segmentation of bio-medical images since segmentation requires interpretation oflocal and global contexts the unsupervised representationlearning with consideration of spatial information in CAEs canprovide great advantages in discovering recurring patterns inlimited and imbalanced bioinformatics data

Omics

In omics research genetic information such as genome transcrip-tome and proteome data is used to approach problems in bioinfor-matics Some of the most common input data in omics are rawbiological sequences (ie DNA RNA amino acid sequences) whichhave become relatively affordable and easy to obtain with next-generation sequencing technology In addition extracted featuresfrom sequences such as a position specific scoring matrices (PSSM)[78] physicochemical properties [79 80] Atchley factors [81] andone-dimensional structural properties [82 83] are often used as in-puts for deep learning algorithms to alleviate difficulties from com-plex biological data and improve results In addition proteincontact maps which present distances of amino acid pairs in theirthree-dimensional structure and microarray gene expression dataare also used according to the characteristics of interest We cate-gorized the topics of interest in omics into four groups (Table 4)One of the most researched problems is protein structure predic-tion which aims to predict the secondary structure or contact mapof a protein [84ndash92] Gene expression regulation [93ndash107] includingsplice junctions or RNA binding proteins and protein classification

Figure 7 Basic structure of RNNs with an input unit x a hidden unit h and an

output unit y [8] A cyclic connection exists so that the computation in the hid-

den unit receives inputs from the hidden unit at the previous time step and

from the input unit at the current time step The recurrent computation can be

expressed more explicitly if the RNNs are unrolled in time The index of each

symbol represents the time step In this way ht receives input from xt and htndash1

and then propagates the computed results to yt and htthorn1

Figure 8 Basic structure of BRNNs unrolled in time [70] There are two hidden

units ht and h t for each time step ht receives input from xt and hethtthorn1THORN to reflect

past information h t receives input from xt and h ethtthorn1THORN to reflect future informa-

tion The information from both hidden units is propagated to yt

Figure 9 Basic structure of DST-NNs [38] The notation hkij represents the hidden

unit at (i j) coordinate of the kth hidden layer To conduct the progressive refine-

ment the neighborhood units of hkij and input units x are used in the computa-

tion of hkthorn1ij


at Cornell U



ownloaded from

[108ndash110] including super family or subcellular localization arealso actively investigated Furthermore anomaly classification[111] approaches have been used with omics data to detect cancer


DNNs have been widely applied in protein structure prediction[84ndash87] research Since complete prediction in three-dimensionalspace is complex and challenging several studies have usedsimpler approaches such as predicting the secondary struc-ture or torsion angles of protein For instance Heffernan et al[85] applied SAE to protein amino acid sequences to solve pre-diction problems for secondary structure torsion angle and ac-cessible surface area In another study Spencer et al [86]applied DBN to amino acid sequences along with PSSM andAtchley factors to predict protein secondary structure DNNshave also shown great capabilities in the area of gene

expression regulation [93ndash98] For example Lee et al [94] uti-lized DBN in splice junction prediction a major research av-enue in understanding gene expression [112] and proposed anew DBN training method called boosted contrastive diver-gence for imbalanced data and a new regularization term forsparsity of DNA sequences their work showed not only signifi-cantly improved performance but also the ability to detect sub-tle non-canonical splicing signals Moreover Chen et al [96]applied MLP to both microarray and RNA-seq expression datato infer expression of up to 21 000 target genes from only 1000landmark genes In terms of protein classification Asgari et al[108] adopted the skip-gram model a widely known method innatural language processing that can be considered a variantof MLP and showed that it could effectively learn a distributedrepresentation of biological sequences with general use formany omics applications including protein family classifica-tion For anomaly classification Fakoor et al [111] used

Figure 10 Basic structure of MD-RNNs for two-dimensional data [39] There are four groups of two-dimensional hidden units each reflecting different contexts For ex-

ample the (i j) hidden unit in context 1 receives input from the (indash1 j) and (i jndash1) hidden units in context 1 and the (i j) unit from the input layer so that the upper-left

information is reflected The hidden units from all four contexts are propagated to compute the (i j) unit in the output layer

Figure 11 Basic structure of CAEs consisting of a convolution layer and a pooling layer working as an encoder and a deconvolution layer and an unpooling layer work-

ing as a decoder [41] The basic idea is similar to the AE which learns hierarchical representations through reconstructing its input data but CAE additionally utilizes

spatial information by integrating convolutions

8 | Min et al

at Cornell U



ownloaded from

principal component analysis (PCA) [113] to reduce the dimen-sionality of microarray gene expression data and applied SAEto classify various cancers including acute myeloid leukemiabreast cancer and ovarian cancer


Relatively few studies have used CNNs to solve problemsinvolving biological sequences specifically gene expressionregulation problems [99ndash104] nevertheless those have intro-duced the strong advantages of CNNs showing their greatpromise for future research First an initial convolution layercan powerfully capture local sequence patterns and can be con-sidered a motif detector for which PSSMs are solely learnedfrom data instead of hard-coded The depth of CNNs enableslearning more complex patterns and can capture longer motifsintegrate cumulative effects of observed motifs and eventuallylearn sophisticated regulatory codes [114] Moreover CNNs aresuited to exploit the benefits of multitask joint learning Bytraining CNNs to simultaneously predict closely related factorsfeatures with predictive strengths are more efficiently learnedand shared across different tasks

For example as an early approach Denas et al [99] prepro-cessed ChIP-seq data into a two-dimensional matrix with therows as transcription factor activity profiles for each gene andexploited a two-dimensional CNN similar to its use in imageprocessing Recently more studies focused on directly usingone-dimensional CNNs with biological sequence dataAlipanahi et al [100] and Kelley et al [103] proposed CNN-basedapproaches for transcription factor binding site prediction and164 cell-specific DNA accessibility multitask prediction respect-ively both groups presented downstream applications fordisease-associated genetic variant identification FurthermoreZeng et al [102] performed a systematic exploration of CNNarchitectures for transcription factor-binding site predictionand showed that the number of convolutional filters is moreimportant than the number of layers for motif-based tasksZhou et al [104] developed a CNN-based algorithmic frameworkDeepSEA that performs multitask joint learning of chromatinfactors (ie transcription factor binding DNase I sensitivityhistone-mark profile) and prioritizes expression quantitativetrait loci and disease-associated genetic variants based on thepredictions

Table 4 Deep learning applied bioinformatics research avenues and input data

Input data Research avenues

Omics Sequencing data (DNA-seq RNA-seq ChIP-seq DNase-seq)Features from genomic sequence

Position specific scoring matrix (PSSM)Physicochemical properties (steric parameter volume)Atchley factors (FAC)1-Dimensional structural properties

Contact map (distance of amino acid pairs in 3D structure)Microarray gene expression

Protein structure prediction [84ndash92]1-Dimensional structural propertiesContact mapStructure model quality assessment

Gene expression regulation [93ndash107]Splice junctionGenetic variants affecting splicingSequence specificity

Protein classification [108ndash110]Super familySubcellular localization

Anomaly classification [111]Cancer

Biomedical imaging Magnetic resonance image (MRI)Radiographic imagePositron emission tomography (PET)Histopathology imageVolumetric electron microscopy imageRetinal imageIn situ hybridization (ISH) image

Anomaly classification [122ndash132]Gene expression patternCancerAlzheimerrsquos diseaseSchizophrenia

Segmentation [133ndash141]Cell structureNeuronal structureVessel mapBrain tumor

Recognition [142ndash147]Cell nucleiFinger jointAnatomical structure

Brain decoding [149ndash150]Behavior

Biomedical signalprocessing

ECoG ECG EMG EOGEEG (raw wavelet frequency differential entropy)Extracted features from EEG

Normalized decayPeak variation

Brain decoding [158ndash170]BehaviorEmotion

Anomaly classification [171ndash178]Alzheimerrsquos diseaseSeizureSleep stage


at Cornell U



ownloaded from


RNNs are expected to be an appropriate deep learning architec-ture because biological sequences have variable lengths andtheir sequential information has great importance Severalstudies have applied RNNs to protein structure prediction [88ndash90] gene expression regulation [105ndash107] and protein classifica-tion [109 110] In early studies Baldi et al [88] used BRNNs withperceptron hidden units in protein secondary structure predic-tion Thereafter the improved performance of LSTM hiddenunits became widely recognized so Soslashnderby et al [110] appliedBRNNs with LSTM hidden units and a one-dimensional convo-lution layer to learn representations from amino acid sequencesand classify the subcellular locations of proteins FurthermorePark et al [105] and Lee et al [107] exploited RNNs with LSTMhidden units in microRNA identification and target predictionand obtained significantly improved accuracy relative to state-of-the-art approaches demonstrating the high capacity of RNNsto analyze biological sequences


Emergent architectures have been used in protein structure pre-diction research [91 92] specifically in contact map predictionDi Lena et al [91] applied DST-NNs using spatial features includ-ing protein secondary structure orientation probability andalignment probability Additionally Baldi et al [92] applied MD-RNNs to amino acid sequences correlated profiles and proteinsecondary structures

Biomedical imaging

Biomedical imaging [115] is another an actively researched do-main with a wide application of deep learning in general image-related tasks Most biomedical images used for clinical treatmentof patientsmdashmagnetic resonance imaging (MRI) [116 117] radio-graphic imaging [118 119] positron emission tomography (PET)[120] and histopathology imaging [121]mdashhave been used as inputdata for deep learning algorithms We categorized the researchavenues in biomedical imaging into four groups (Table 4) One ofthe most researched problems is anomaly classification [122ndash132]to diagnose diseases such as cancer or schizophrenia As in gen-eral image-related tasks segmentation [133ndash141] (ie partitioningspecific structures such as cellular structures or a brain tumor)and recognition [142ndash147] (ie detection of cell nuclei or a fingerjoint) are studied frequently in biomedical imaging Studies ofpopular high content screening [148] which involves quantifyingmicroscopic images for cell biology are covered in the formergroups [128 134 137] Additionally cranial MRIs have been usedin brain decoding [149 150] to interpret human behavior oremotion


In terms of biomedical imaging DNNs have been applied in sev-eral research areas including anomaly classification [122ndash124]segmentation [133] recognition [142 143] and brain decoding[149 150] Plis et al [122] classified schizophrenia patients frombrain MRIs using DBN and Xu et al [142] used SAE to detect cellnuclei from histopathology images Interestingly similar tohandwritten digit image recognition Van Gerven et al [149]classified handwritten digit images with DBN not by analyzingthe images themselves but by indirectly analyzing indirectlyfunctional MRIs of participants who are looking at the digitimages


The largest number of studies has been conducted in biomed-ical imaging since these avenues are similar to general image-related tasks In anomaly classification [125ndash132] Roth et al[125] applied CNNs to three different CT image datasets to clas-sify sclerotic metastases lymph nodes and colonic polypsAdditionally Ciresan et al [128] used CNNs to detect mitosis inbreast cancer histopathology images a crucial approach forcancer diagnosis and assessment PET images of esophagealcancer were used by Ypsilantis et al [129] to predict responsesto neoadjuvant chemotherapy Other applications of CNNs canbe found in segmentation [134ndash140] and recognition [144ndash147]For example Ning et al [134] studied pixel-wise segmentationpatterns of the cell wall cytoplasm nuclear membrane nucleusand outside media using microscopic image and Havaei et al[139] proposed a cascaded CNN architecture exploiting bothlocal and global contextual features and performed brain tumorsegmentation from MRIs For recognition Cho et al [144] re-searched anatomical structure recognition among CT imagesand Lee et al [145] proposed a CNN-based finger joint detectionsystem FingerNet which is a crucial step for medical examin-ations of bone age growth disorders and rheumatoid arthritis[151]


Traditionally images are considered data that involve internalcorrelations or spatial information rather than sequential infor-mation Treating biomedical images as non-sequential datamost studies in biomedical imaging have chosen approachesinvolving DNNs or CNNs instead of RNNs


Attempts to apply the unique capabilities of RNNs to imagedata using augmented RNN structures have continued MD-RNNs [39] have been applied beyond two-dimensional imagesto three-dimensional images For example Stollenga et al [141]applied MD-RNNs to three-dimensional electron microscopyimages and MRIs to segment neuronal structures

Biomedical signal processing

Biomedical signal processing [115] is a domain where re-searchers use recorded electrical activity from the human bodyto solve problems in bioinformatics Various data from EEG[152] electrocorticography (ECoG) [153] electrocardiography(ECG) [154] electromyography (EMG) [155] and electrooculogra-phy (EOG) [156 157] have been used with most studies focusingon EEG activity so far Because recorded signals are usuallynoisy and include many artifacts raw signals are often decom-posed into wavelet or frequency components before they areused as input in deep learning algorithms In addition human-designed features like normalized decay and peak variation areused in some studies to improve the results We categorized theresearch avenues in biomedical signal processing into twogroups (Table 4) brain decoding [158ndash170] using EEG signals andanomaly classification [171ndash178] to diagnose diseases


Since biomedical signals usually contain noise and artifactsdecomposed features are more frequently used than raw sig-nals In brain decoding [158ndash163] An et al [159] applied DBN to

10 | Min et al

at Cornell U



ownloaded from

the frequency components of EEG signals to classify left- andright-hand motor imagery skills Moreover Jia et al [161] andJirayucharoensak et al [163] used DBN and SAE respectively foremotion classification In anomaly classification [171ndash175]Huanhuan et al [171] published one of the few studies applyingDBN to ECG signals and classified each beat into either a normalor abnormal beat A few studies have used raw EEG signalsWulsin et al [172] analyzed individual second-long waveformabnormalities using DBN with both raw EEG signals and ex-tracted features as inputs whereas Zhao et al [174] used onlyraw EEG signals as inputs for DBN to diagnose Alzheimerrsquosdisease


Raw EEG signals have been analyzed in brain decoding [164ndash167]and anomaly classification [176] via CNNs which performone-dimensional convolutions For instance Stober et al [165]classified the rhythm type and genre of music that participantslistened to and Cecotti et al [167] classified characters that theparticipants viewed Another approach to apply CNNs to bio-medical signal processing was reported by Mirowski et al [176]who extracted features such as phase-locking synchrony andwavelet coherence and coded them as pixel colors to formulatetwo-dimensional patterns Then ordinary two-dimensionalCNNs like the one used in biomedical imaging were used topredict seizures


Since biomedical signals represent naturally sequential dataRNNs are an appropriate deep learning architecture to analyzedata and are expected to produce promising results To presentsome of the studies in brain decoding [168] and anomaly classi-fication [177 178] Petrosian et al [177] applied perceptron RNNsto raw EEG signals and corresponding wavelet decomposed fea-tures to predict seizures In addition Davidson et al [178] usedLSTM RNNs on EEG log-power spectra features to detect lapses


CAE has been applied in a few brain decoding studies [169 170]Wang et al [169] performed finger flex and extend classifica-tions using raw ECoG signals In addition Stober et al [170] clas-sified musical rhythms that participants listened to with rawEEG signals

DiscussionLimited and imbalanced data

Considering the necessity of optimizing a tremendous numberof weight parameters in neural networks most deep learningalgorithms have assumed sufficient and balanced dataUnfortunately however this is usually not true for problems inbioinformatics Complex and expensive data acquisition proc-esses limit the size of bioinformatics datasets In addition suchprocesses often show significantly unequal class distributionswhere an instance from one class is significantly higher than in-stances from other classes [179] For example in clinical ordisease-related cases there is inevitably less data from treat-ment groups than from the normal (control) group The formerare also rarely disclosed to the public due to privacy restrictionsand ethical requirements creating a further imbalance in avail-able data [180]

A few assessment metrics have been used to clearly observehow limited and imbalanced data might compromise the per-formance of deep learning [181] While accuracy often givesmisleading results the F-measure the harmonic mean of preci-sion and recall provides more insightful performance scoresTo measure performance over different class distributions thearea-under-the-receiver operating characteristic curve (AUC)and the area-under-the-precisionndashrecall curve (AUC-PR) arecommonly used These two measures are strongly correlatedsuch that a curve dominates in one measure if and only if itdominates in the other Nevertheless in contrast with AUC-PRAUC might present a more optimistic view of performancesince false positive rates in the receiver operating characteristiccurve fail to capture large changes of false positives if classesare negatively skewed [182]

Solutions to limited and imbalanced data can be divided intothree major groups [181 183] data preprocessing cost-sensitivelearning and algorithmic modification Data preprocessing typ-ically provides a better dataset through sampling or basic fea-ture extraction Sampling methods balance the distribution ofimbalanced data and several approaches have been proposedincluding informed undersampling [184] the synthetic minorityoversampling technique [185] and cluster-based sampling [186]For example Li et al [127] and Roth et al [146] performed enrich-ment analyses of CT images through spatial deformations suchas random shifting and rotation Although basic feature extrac-tion methods deviate from the concept of deep learning theyare occasionally used to lessen the difficulties of learning fromlimited and imbalanced data Research in bioinformatics usinghuman designed features as input data such as PSSM from gen-omics sequences or wavelet energy from EEG signals can beunderstood in the same context [86 92 172 176]

Cost-sensitive learning methods define different costs formisclassifying data examples from individual classes to solvethe limited and imbalanced data problems Cost sensitivity canbe applied in an objective loss function of neural networks ei-ther explicitly or implicitly [187] For example we can explicitlyreplace the objective loss function to reflect class imbalance orimplicitly modify the learning rates according to data instanceclasses during training

Algorithmic modification methods accommodate learning algo-rithms to increase their suitability for limited and imbalanced dataA simple and effective approach is adoption of pre-trainingUnsupervised pre-training can be a great help to learn representa-tion for each class and to produce more regularized results [68] Inaddition transfer learning which consists of pre-training with suf-ficient data from similar but different domains and fine-tuningwith real data has great advantages [24 188] For instance Leeet al [107] proposed a microRNA target prediction method whichexploits unsupervised pre-training with RNN based AE andachieved agt25 increase in F-measure compared to the existingalternatives Bar et al [132] performed transfer learning using nat-ural images from the ImageNet database [189] as pre-training dataand fine-tuned with chest X-ray images to identify chest patholo-gies and to classify healthy and abnormal images In addition topre-training sophisticated training methods have also been exe-cuted Lee et al [94] suggested DBN with boosted categorical RBMand Havaei et al [139] suggested CNNs with two-phase trainingcombining ideas of undersampling and pre-training

Changing the black-box into the white-box

A main criticism against deep learning is that it is used as ablack-box even though it produces outstanding results we


at Cornell U



ownloaded from

know very little about how such results are derived internallyIn bioinformatics particularly in biomedical domains it is notenough to simply produce good outcomes Since many studiesare connected to patientsrsquo health it is crucial to change theblack-box into the white-box providing logical reasoning just asclinicians do for medical treatments

Transformation of deep learning from the black-box into thewhite-box is still in the early stages One of the most widelyused approaches is interpretation through visualizing a traineddeep learning model In terms of image input a deconvolutionalnetwork has been proposed to reconstruct and visualize hier-archical representations for a specific input of CNNs [190] Inaddition to visualize a generalized class representative imagerather than being dependent on a particular input gradientascent optimization in input space through backpropagation-to-input (cf backpropagation-to-weights) has provided anothereffective methodology [191 192] Regarding genomic sequenceinput several approaches have been proposed to infer PSSMsfrom a trained model and to visualize the corresponding motifswith heat maps or sequence logos For example Lee et al [94]extracted motifs by choosing the most class discriminativeweight vector among those in the first layer of DBN DeepBind[100] and DeMo [101] extracted motifs from trained CNNs bycounting nucleotide frequencies of positive input subsequenceswith high activation values and backpropagation-to-input foreach feature map respectively

Specifically for transcription factor binding site predictionAlipanahi et al [100] developed a visualization method a muta-tion map for illustrating the effects of genetic variants on bind-ing scores predicted by CNNs A mutation map consists of aheat map which shows how much each mutation alters thebinding score and the input sequence logo where the height ofeach base is scaled as the maximum decrease of binding scoreamong all possible mutations Moreover Kelley et al [103] fur-ther complemented the mutation map with a line plot to showthe maximum increases as well as the maximum decreases ofprediction scores In addition to interpretation through visual-ization attention mechanisms [74ndash77] designed to focus expli-citly on salient points and the mathematical rationale behinddeep learning [193 194] are being studied

Selection of an appropriate deep learning architectureand hyperparameters

Choosing the appropriate deep learning architecture is crucialto proper applications of deep learning To obtain robust and re-liable results awareness of the capabilities of each deep learn-ing architecture and selection according to capabilities inaddition to input data characteristics and research objectivesare essential However to date the advantages of each architec-ture are only roughly understood for example DNNs are suit-able for analysis of internal correlations in high-dimensionaldata CNNs are suitable for analysis of spatial information andRNNs are suitable for analysis of sequential information [7]Indeed a detailed methodology for selecting the most appropri-ate or lsquobest fitrsquo deep learning architecture remains a challengeto be studied in the future

Even once a deep learning architecture is selected there aremany hyperparametersmdashthe number of layers the number ofhidden units weight initialization values learning iterationsand even the learning ratemdashfor researchers to set all of whichcan influence the results remarkably [195] For many yearshyperparameter tuning was rarely systematic and left up tohuman machine learning experts Nevertheless automation of

machine learning research which aims to automatically opti-mize hyperparameters is growing constantly [196] A few algo-rithms have been proposed including sequential model basedglobal optimization [197] Bayesian optimization with Gaussianprocess priors [198] and random search approaches [199]

Multimodal deep learning

Multimodal deep learning [200] which exploits informationfrom multiple input sources is a promising avenue for the fu-ture of deep learning research In particular bioinformatics isexpected to benefit greatly as it is a field where various types ofdata can be assimilated naturally [201] For example not onlyare omics data images signals drug responses and electronicmedical records available as input data but X-ray CT MRI andPET forms are also available from a single image

A few bioinformatics studies have already begun to usemultimodal deep learning For example Suk et al [124] studiedAlzheimerrsquos disease classification using cerebrospinal fluid andbrain images in the forms of MRI and PET scan and Soleymaniet al [168] conducted an emotion detection study with both EEGsignal and face image data

Accelerating deep learning

As more deep learning model parameters and training data be-come available better learning performances can be achievedHowever at the same time this inevitably leads to a drastic in-crease in training time emphasizing the necessity for acceler-ated deep learning [7 25]

Approaches to accelerating deep learning can be divided intothree groups advanced optimization algorithms parallel anddistributed computing and specialized hardware Since themain reason for long training times is that parameter optimiza-tion through plain SGD takes too long several studies havefocused on advanced optimization algorithms [202] To this endsome widely employed algorithms include Adagrad [48] Adam[49] batch normalization [55] and Hessian-free optimization[203] Parallel and distributed computing can significantly accel-erate the time to completion and have enabled many deeplearning studies [204ndash208] These approaches exploit both scale-up methods which use a graphic processing unit and scale-outmethods which use large-scale clusters of machines in a dis-tributed environment A few deep learning frameworks includ-ing the recently released DeepSpark [209] and TensorFlow [210]provide parallel and distributed computing abilities Althoughdevelopment of specialized hardware for deep learning is still inits infancy it will provide major accelerations and become farmore important in the long term [211] Currently field program-mable gate array-based processors are under development andneuromorphic chips modeled from the brain are greatly antici-pated as promising technologies [212ndash214]

Future trends of deep learning

Incorporation of traditional deep learning architectures is apromising future trend For instance joint networks of CNNsand RNNs integrated with attention models have been appliedin image captioning [75] video summarization [215] and imagequestion answering [216] A few studies toward augmenting thestructures of RNNs have been conducted as well Neural Turingmachines [217] and memory networks [218] have adopted ad-dressable external memory in RNNs and shown great results fortasks requiring intricate inferences such as algorithm learningand complex question answering Recently adversarial

12 | Min et al

at Cornell U



ownloaded from

examples which degrade performance with small human-imperceptible perturbations have received increased attentionfrom the machine learning community [219 220] Since adver-sarial training of neural networks can result in regularization toprovide higher performance we expect additional studies inthis area including those involving adversarial generative net-works [221] and manifold regularized networks [222]

In terms of learning methodology semi-supervised learningand reinforcement learning are also receiving attention Semi-supervised learning exploits both unlabeled and labeled dataand a few algorithms have been proposed For example laddernetworks [223] add skip connections to MLP or CNNs and simul-taneously minimize the sum of supervised and unsupervisedcost functions to denoise representations at every level of themodel Reinforcement learning leverages reward outcome sig-nals resulting from actions rather than correctly labeled dataSince reinforcement learning most closely resembles howhumans actually learn this approach has great promise for arti-ficial general intelligence [224] Currently its applications aremainly focused on game playing [4] and robotics [225]

Conclusion

As we enter the major era of big data deep learning is takingcenter stage for international academic and business interestsIn bioinformatics where great advances have been made withconventional machine learning deep learning is anticipated toproduce promising results In this review we provided an ex-tensive review of bioinformatics research applying deep learn-ing in terms of input data research objectives and thecharacteristics of established deep learning architectures Wefurther discussed limitations of the approach and promisingdirections of future research

Although deep learning holds promise it is not a silver bulletand cannot provide great results in ad hoc bioinformatics appli-cations There remain many potential challenges includinglimited or imbalanced data interpretation of deep learning re-sults and selection of an appropriate architecture and hyper-parameters Furthermore to fully exploit the capabilities ofdeep learning multimodality and acceleration of deep learningrequire further study Thus we are confident that prudent prep-arations regarding the issues discussed herein are key to thesuccess of future deep learning approaches in bioinformaticsWe believe that this review will provide valuable insight andserve as a starting point for application of deep learning to ad-vance bioinformatics in future research

Key Points

bull As a great deal of biomedical data has been accumu-lated various machine algorithms are now beingwidely applied in bioinformatics to extract knowledgefrom big data

bull Deep learning which has evolved from the acquisitionof big data the power of parallel and distributedcomputing and sophisticated training algorithms hasfacilitated major advances in numerous domains suchas image recognition speech recognition and naturallanguage processing

bull We review deep learning for bioinformatics and pre-sent research categorized by bioinformatics domain(ie omics biomedical imaging biomedical signal pro-cessing) and deep learning architecture (ie deep

neural networks convolutional neural networks re-current neural networks emergent architectures)

bull Furthermore we discuss the theoretical and practicalissues plaguing the applications of deep learningin bioinformatics including imbalanced data inter-pretation hyperparameter optimization multimodaldeep learning and training acceleration

bull As a comprehensive review of existing works we be-lieve that this paper will provide valuable insight andserve as a launching point for researchers to applydeep learning approaches in their bioinformaticsstudies

Acknowledgements

The authors would like to thank Prof Russ Altman and ProfTsachy Weissman at Stanford University Prof Honglak Leeat University of Michigan Prof V Narry Kim and ProfDaehyun Baek at Seoul National University and ProfYoung-Han Kim at University of California San Diego forhelpful discussions on applying artificial intelligence andmachine learning to bioinformatics

Funding

This research was supported by the National ResearchFoundation (NRF) of Korea grants funded by the KoreanGovernment (Ministry of Science ICT and Future Planning)(Nos 2014M3C9A3063541 and 2015M3A9A7029735) theKorea Health Technology RampD Project through the KoreaHealth Industry Development Institute (KHIDI) funded bythe Ministry of Health amp Welfare (No HI15C3224) and SNUECE Brain Korea 21+ project in 2016

References1 Manyika J Chui M Brown B et al Big data the next frontier

for innovation competition and productivity Technicalreport McKinsey Global Institute 2011

2 Ferrucci D Brown E Chu-Carroll J et al Building Watson anoverview of the DeepQA project AI Magazine201031(3)59ndash79

3 IBM Watson for Oncology IBM httpwwwibmcomsmarterplanetusenibmwatsonwatson-oncologyhtml 2016

4 Silver D Huang A Maddison CJ et al Mastering the game ofGo with deep neural networks and tree search Nature2016529(7587)484ndash9

5 DeepMind Health Google DeepMind httpswwwdeepmindcomhealth 2016

6 Larra~naga P Calvo B Santana R et al Machine learning inbioinformatics Brief Bioinformatics 20067(1)86ndash112

7 Goodfellow I Bengio Y Courville A Deep Learning Book inpreparation for MIT Press 2016

8 LeCun Y Bengio Y Hinton G Deep learning Nature2015521(7553)436ndash44

9 Farabet C Couprie C Najman L et al Learning hierarchicalfeatures for scene labeling IEEE Trans Pattern Anal MachIntell 201335(8)1915ndash29

10 Szegedy C Liu W Jia Y et al Going deeper with convolu-tions arXiv Preprint arXiv14094842 2014

11 Tompson JJ Jain A LeCun Y et al Joint training of a convolu-tional network and a graphical model for human pose


at Cornell U



ownloaded from

estimation In Advances in Neural Information ProcessingSystems 2014 1799ndash807

12 Liu N Han J Zhang D et al Predicting eye fixations usingconvolutional neural networks In Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2015 p362ndash70

13 Hinton G Deng L Yu D et al Deep neural networks foracoustic modeling in speech recognition the shared viewsof four research groups IEEE Signal Process Mag201229(6)82ndash97

14 Sainath TN Mohamed A-R Kingsbury B et al Deep convolu-tional neural networks for LVCSR In 2013 IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSP)2013 p 8614ndash8 IEEE New York

15 Chorowski JK Bahdanau D Serdyuk D et al Attention-basedmodels for speech recognition In Adv Neural Inf Process Syst2015577ndash85

16 Kiros R Zhu Y Salakhutdinov RR et al Skip-thought vectorsIn Advances in Neural Information Processing Systems 2015 p3276ndash84

17 Li J Luong M-T Jurafsky D A hierarchical neural autoen-coder for paragraphs and documents arXiv PreprintarXiv150601057 2015

18 Luong M-T Pham H Manning CD Effective approaches toattention-based neural machine translation arXiv PreprintarXiv150804025 2015

19 Cho K Van Merrieuroenboer B Gulcehre C et al Learningphrase representations using RNN encoderndashdecoder forstatistical machine translation arXiv PreprintarXiv14061078 2014

20 Libbrecht MW Noble WS Machine learning applications ingenetics and genomics Nat Rev Genet 201516(6)321ndash32

21 Schmidhuber J Deep learning in neural networks an over-view Neural Networks 20156185ndash117

22 Leung MK Delong A Alipanahi B et al machine learning ingenomic medicine a review of computational problems anddata sets Proc IEEE 2016104176ndash97

23 Mamoshina P Vieira A Putin E et al Applications of deeplearning in biomedicine Mol Pharm 2016131445ndash54

24 Greenspan H van Ginneken B Summers RM Guest editorialdeep learning in medical imaging overview and futurepromise of an exciting new technique IEEE Trans MedImaging 201635(5)1153ndash9

25 LeCun Y Ranzato M Deep learning tutorial In Tutorials inInternational Conference on Machine Learning (ICMLrsquo13) 2013Citeseer

26 Svozil D Kvasnicka V Pospichal J Introduction to multi-layer feed-forward neural networks Chemometr Intell LabSyst 199739(1)43ndash62

27 Vincent P Larochelle H Bengio Y et al Extracting and com-posing robust features with denoising autoencoders InProceedings of the 25th International Conference on MachineLearning 2008 p 1096ndash103 ACM New York

28 Vincent P Larochelle H Lajoie I et al Stacked denoisingautoencoders learning useful representations in a deep net-work with a local denoising criterion J Mach Learn Res2010113371ndash408

29 Hinton G Osindero S Teh Y-W A fast learning algorithm fordeep belief nets Neural Comput 200618(7)1527ndash54

30 Hinton G Salakhutdinov RR Reducing the dimensionality ofdata with neural networks Science 2006313(5786)504ndash7

31 LeCun Y Boser B Denker JS et al Handwritten digit recogni-tion with a back-propagation network In Advances in NeuralInformation Processing Systems 1990 Citeseer

32 Lawrence S Giles CL Tsoi AC et al Face recognition a con-volutional neural-network approach IEEE Trans Neural Netw19978(1)98ndash113

33 Krizhevsky A Sutskever I Hinton G Imagenet classificationwith deep convolutional neural networks In Advances inNeural Information Processing Systems 2012 p 1097ndash105

34 Williams RJ Zipser D A learning algorithm for continuallyrunning fully recurrent neural networks Neural Comput19891(2)270ndash80

35 Bengio Y Simard P Frasconi P Learning long-term depend-encies with gradient descent is difficult IEEE Trans NeuralNetw 19945(2)157ndash66

36 Hochreiter S Schmidhuber J Long short-term memoryNeural Comput 19979(8)1735ndash80

37 Gers FA Schmidhuber J Cummins F Learning to forget con-tinual prediction with LSTM Neural Comput200012(10)2451ndash71

38 Lena PD Nagata K Baldi PF Deep spatio-temporal architec-tures and learning for protein structure prediction In Advancesin Neural Information Processing Systems 2012 p 512ndash20

39 Graves A Schmidhuber J Offline handwriting recognitionwith multidimensional recurrent neural networks InAdvances in Neural Information Processing Systems 2009 p545ndash52

40 Hadsell R Sermanet P Ben J et al Learning long-range visionfor autonomous off-road driving J Field Robot200926(2)120ndash44

41 Masci J Meier U Ciresan D et al Stacked convolutionalauto-encoders for hierarchical feature extraction InArtificial Neural Networks and Machine Learning ndash ICANN 2011Springer Berlin Heidelberg 2011 52ndash9

42 Minsky M Papert S Perceptron an introduction to computa-tional geometry MIT Press Cambridge Expanded Edition196919(88)2

43 Fukushima K Cognitron a self-organizing multilayeredneural network Biol Cybern 197520(3ndash4)121ndash36

44 Hinton G Sejnowski TJ Learning and releaming inBoltzmann machines Parallel Distrib Process ExplorMicrostruct Cogn 19861282ndash317

45 Hinton G A practical guide to training restricted Boltzmannmachines Momentum 20109(1)926

46 Hecht-Nielsen R Theory of the backpropagation neural net-work In International Joint Conference on Neural Networks1989 IJCNN 1989 p 593ndash605 IEEE Washington DC

47 Bottou L Stochastic gradient learning in neural networksProc Neuro-Nımes 199191(8)

48 Duchi J Hazan E Singer Y Adaptive subgradient methodsfor online learning and stochastic optimization J Mach LearnRes 2011122121ndash59

49 Kingma D Ba J Adam a method for stochastic optimizationarXiv preprint arXiv14126980 2014

50 Moody J Hanson S Krogh A et al A simple weight decay canimprove generalization Adv Neural Inf Process Syst19954950ndash7

51 Srivastava N Hinton G Krizhevsky A et al Dropout a simpleway to prevent neural networks from overfitting J MachLearn Res 201415(1)1929ndash58

52 Baldi P Sadowski PJ Understanding dropout In Advances inNeural Information Processing Systems 2013 2814ndash22

53 Goodfellow IJ Warde-Farley D Mirza M et al Maxout net-works arXiv Preprint arXiv13024389 2013

54 Moon T Choi H Lee H et al RnnDrop a novel dropout forRNNs in ASR In Automatic Speech Recognition andUnderstanding (ASRU) Scottsdale AZ 2015

14 | Min et al

at Cornell U



ownloaded from

55 Ioffe S Szegedy C Batch normalization accelerating deepnetwork training by reducing internal covariate shift arXivPreprint arXiv150203167 2015

56 Deeplearning4j Development Team Deeplearning4j open-source distributed deep learning for the JVM ApacheSoftware Foundation License 20 httpdeeplearning4jorg2016

57 Bahrampour S Ramakrishnan N Schott L et alComparative study of deep learning software frameworksarXiv Preprint arXiv151106435 2015

58 Nervana Systems Neon httpsgithubcomNervanaSystemsneon 2016

59 Jia Y Caffe an open source convolutional architecture forfast feature embedding In ACM International Conference onMultimedia ACM Washington DC 2014

60 Collobert R Kavukcuoglu K Farabet C Torch7 a matlab-likeenvironment for machine learning In BigLearn NIPSWorkshop 2011

61 Bergstra J Breuleux O Bastien F et al Theano a CPU and GPUmath expression compiler In Proceedings of the Python forScientific Computing Conference (SciPy) 2010 p 3 Austin TX

62 Bastien F Lamblin P Pascanu R et al Theano new featuresand speed improvements arXiv Preprint arXiv121155902012

63 Chollet F Keras Theano-based Deep Learning library Codehttpsgithub comfchollet Documentation httpkerasio2015

64 Dieleman S Heilman M Kelly J et al Lasagne First Release2015

65 van Merrieuroenboer B Bahdanau D Dumoulin V et al Blocksand fuel frameworks for deep learning arXiv PreprintarXiv150600619 2015

66 Abadi M Agarwal A Barham P et al TensorFlow large-scalemachine learning on heterogeneous distributed systemsarXiv Preprint arXiv160304467 2016

67 Nair V Hinton G Rectified linear units improve re-stricted boltzmann machines In Proceedings of the 27thInternational Conference on Machine Learning (ICML-10) 2010 p807ndash14

68 Erhan D Bengio Y Courville A et al Why does unsupervisedpre-training help deep learning J Mach Learn Res201011625ndash60

69 Hubel DH Wiesel TN Receptive fields and functional archi-tecture of monkey striate cortex J Physiol 1968195(1)215ndash43

70 Schuster M Paliwal KK Bidirectional recurrent neural net-works IEEE Trans Signal Process 199745(11)2673ndash81

71 Cenic A Nabavi DG Craen RA et al Dynamic CT measure-ment of cerebral blood flow a validation study Am JNeuroradiol 199920(1)63ndash73

72 Tsao J Boesiger P Pruessmann KP k-t BLAST and k-t SENSEdynamic MRI with high frame rate exploiting spatiotempo-ral correlations Magn Reson Med 200350(5)1031ndash42

73 Cohen AM Hersh WR A survey of current work in biomed-ical text mining Brief Bioinformatics 20056(1)57ndash71

74 Bahdanau D Cho K Bengio Y Neural machine translationby jointly learning to align and translate arXiv PreprintarXiv14090473 2014

75 Xu K Ba J Kiros R et al Show attend and tell neural imagecaption generation with visual attention arXiv PreprintarXiv150203044 2015

76 Cho K Courville A Bengio Y Describing multimedia contentusing attention-based encoderndashdecoder networks IEEETrans Multimed 201517(11)1875ndash86

77 Mnih V Heess N Graves A Recurrent models of visual at-tention In Advances in Neural Information Processing Systems2014 p 2204ndash12

78 Jones DT Protein secondary structure prediction based onposition-specific scoring matrices J Mol Biol 1999292(2)195ndash202

79 Ponomarenko JV Ponomarenko MP Frolov AS et alConformational and physicochemical DNA features specificfor transcription factor binding sites Bioinformatics199915(7)654ndash68

80 Cai Y-D Lin SL Support vector machines for predictingrRNA- RNA- and DNA-binding proteins from amino acidsequence Biochim Biophys Acta (BBA) ndash Proteins Proteomics20031648(1)127ndash33

81 Atchley WR Zhao J Fernandes AD et al Solving the proteinsequence metric problem Proc Natl Acad Sci USA2005102(18)6395ndash400

82 Branden CI Introduction to protein structure Garland ScienceNew York 1999

83 Richardson JS The anatomy and taxonomy of protein struc-ture Adv Protein Chem 198134167ndash339

84 Lyons J Dehzangi A Heffernan R et al Predicting backboneCa angles and dihedrals from protein sequences by stackedsparse auto-encoder deep neural network J Comput Chem201435(28)2040ndash6

85 Heffernan R Paliwal K Lyons J et al Improving prediction ofsecondary structure local backbone angles and solvent ac-cessible surface area of proteins by iterative deep learningSci Rep 2015511476

86 Spencer M Eickholt J Cheng J A deep learning network ap-proach to ab initio protein secondary structure predictionIEEEACM Trans Comput Biol Bioinformat 201512(1)103ndash12

87 Nguyen SP Shang Y Xu D DL-PRO A novel deep learningmethod for protein model quality assessment In 2014International Joint Conference on Neural Networks (IJCNN) 2014p 2071ndash8 IEEE New York

88 Baldi P Brunak S Frasconi P et al Exploiting the past andthe future in protein secondary structure predictionBioinformatics 199915(11)937ndash46

89 Baldi P Pollastri G Andersen CA et al Matching proteinbeta-sheet partners by feedforward and recurrent neuralnetworks In Proceedings of the 2000 Conference on IntelligentSystems for Molecular Biology (ISMB00) La Jolla CA 2000 p25ndash36

90 Soslashnderby SK Winther O Protein secondary structure pre-diction with long short term memory networks arXivPreprint arXiv14127828 2014

91 Lena PD Nagata K Baldi P Deep architectures for proteincontact map prediction Bioinformatics 201228(19)2449ndash57

92 Baldi P Pollastri G The principled design of large-scale re-cursive neural network architectures ndash dag-rnns and theprotein structure prediction problem J Mach Learn Res20034575ndash602

93 Leung MK Xiong HY Lee LJ et al Deep learning of thetissue-regulated splicing code Bioinformatics201430(12)i121ndash9

94 Lee T Yoon S Boosted categorical restricted boltzmann ma-chine for computational prediction of splice junctions InInternational Conference on Machine Learning Lille France2015 p 2483ndash92

95 Zhang S Zhou J Hu H et al A deep learning framework formodeling structural features of RNA-binding protein targetsNucleic Acids Res 2015gkv1025

96 Chen Y Li Y Narayan R et al Gene expression inferencewith deep learning Bioinformatics 2016btw074


at Cornell U



ownloaded from

97 Li Y Shi W Wasserman WW Genome-wide prediction ofcis-regulatory regions using supervised deep learning meth-ods bioRxiv 2016041616

98 Liu F Ren C Li H et al De novo identification of replication-timing domains in the human genome by deep learningBioinformatics 2015btv643

99 Denas O Taylor J Deep modeling of gene expression regula-tion in an Erythropoiesis model In International Conferenceon Machine Learning workshop on Representation LearningAtlanta Georgia USA 2013

100 Alipanahi B Delong A Weirauch MT et al Predicting the se-quence specificities of DNA-and RNA-binding proteins bydeep learning Nat Biotechnol 201533(8)825ndash6

101 Lanchantin J Singh R Lin Z et al Deep motif visualizinggenomic sequence classifications arXiv Preprint arXiv160501133 2016

102 Zeng H Edwards MD Liu G et al Convolutional neural net-work architectures for predicting DNAndashprotein bindingBioinformatics 201632(12)i121ndash7

103 Kelley DR Snoek J Rinn J Basset learning the regulatorycode of the accessible genome with deep convolutional neu-ral networks bioRxiv 2015028399

104 Zhou J Troyanskaya OG Predicting effects of noncodingvariants with deep learning-based sequence model NatMethods 201512(10)931ndash4

105 Park S Min S Choi H-S et al deepMiRGene deep neural net-work based precursor microRNA prediction arXiv PreprintarXiv160500017 2016

106 Lee B Lee T Na B et al DNA-level splice junction predictionusing deep recurrent neural networks arXiv PreprintarXiv151205135 2015

107 Lee B Baek J Park S et al deepTarget end-to-end learningframework for microRNA target prediction using deep recur-rent neural networks arXiv Preprint arXiv160309123 2016

108 Asgari E Mofrad MR Continuous distributed representationof biological sequences for deep proteomics and genomicsPloS One 201510(11)e0141287

109 Hochreiter S Heusel M Obermayer K Fast model-based pro-tein homology detection without alignment Bioinformatics200723(14)1728ndash36

110 Soslashnderby SK Soslashnderby CK Nielsen H et al ConvolutionalLSTM networks for subcellular localization of proteins arXivPreprint arXiv150301919 2015

111 Fakoor R Ladhak F Nazi A et al Using deep learning to en-hance cancer diagnosis and classification In Proceedings ofthe International Conference on Machine Learning 2013

112 Nilsen TW Graveley BR Expansion of the eukaryotic prote-ome by alternative splicing Nature 2010463(7280)457ndash63

113 Jolliffe I Principal component analysis Wiley Online Library2002

114 Park Y Kellis M Deep learning for regulatory genomics NatBiotechnol 201533(8)825ndash6

115 Najarian K Splinter R Biomedical Signal and Image ProcessingCRC Press New York 2005

116 Edelman RR Warach S Magnetic resonance imaging N EnglJ Med 1993328(10)708ndash16

117 Ogawa S Lee T-M Kay AR et al Brain magnetic resonanceimaging with contrast dependent on blood oxygenationProc Natl Acad Sci 199087(24)9868ndash72

118 Hsieh J Computed tomography principles design artifactsand recent advances In SPIE Bellingham WA 2009

119 Chapman D Thomlinson W Johnston R et al Diffractionenhanced x-ray imaging Phys Med Biol 199742(11)2015

120 Bailey DL Townsend DW Valk PE et al Positron EmissionTomography Springer London 2005

121 Gurcan MN Boucheron LE Can A et al Histopathologicalimage analysis a review Biomed Eng IEEE Rev 20092147ndash71

122 Plis SM Hjelm DR Salakhutdinov R et al Deep learning forneuroimaging a validation study Front Neurosci 20148229

123 Hua K-L Hsu C-H Hidayati SC et al Computer-aided classi-fication of lung nodules on computed tomography imagesvia deep learning technique Onco Targets Ther 201582015ndash22

124 Suk H-I Shen D Deep learning-based feature representationfor ADMCI classification In Medical Image Computing andComputer-Assisted Intervention ndash MICCAI 2013 Springer NewYork 2013 583ndash90

125 Roth HR Lu L Liu J et al Improving computer-aided detec-tion using convolutional neural networks and random viewaggregation arXiv Preprint arXiv150503046 2015

126 Roth HR Yao J Lu L et al Detection of sclerotic spine meta-stases via random aggregation of deep convolutional neuralnetwork classifications In Recent Advances in ComputationalMethods and Clinical Applications for Spine Imaging SpringerHeidelberg 2015 3ndash12

127 Li Q Cai W Wang X et al Medical image classification withconvolutional neural network In 2014 13th InternationalConference on Control Automation Robotics amp Vision (ICARCV)2014 p 844ndash8 IEEE Singapore

128 Ciresan DC Giusti A Gambardella LM et al Mitosis detec-tion in breast cancer histology images with deep neural net-works In Medical Image Computing and Computer-AssistedIntervention ndash MICCAI 2013 Springer Heidelberg 2013 411ndash8

129 Ypsilantis P-P Siddique M Sohn H-M et al Predicting re-sponse to neoadjuvant chemotherapy with PET imagingusing convolutional neural networks PloS One201510(9)e0137036

130 Zeng T Li R Mukkamala R et al Deep convolutional neuralnetworks for annotating gene expression patterns in themouse brain BMC Bioinformatics 201516(1)1ndash10

131 Cruz-Roa AA Ovalle JEA Madabhushi A et al A deep learn-ing architecture for image representation visual interpret-ability and automated basal-cell carcinoma cancerdetection In Medical Image Computing and Computer-AssistedIntervention ndash MICCAI 2013 Springer Heidelberg 2013403ndash10

132 Bar Y Diamant I Wolf L et al Deep learning with non-medical training used for chest pathology identification InSPIE Medical Imaging International Society for Optics andPhotonics 2015 94140V-V-7

133 Li Q Feng B Xie L et al A cross-modality learning approachfor vessel segmentation in retinal images IEEE Trans MedImaging 201535(1)109ndash8

134 Ning F Delhomme D LeCun Y et al Toward automatic phe-notyping of developing embryos from videos IEEE TransImage Process 200514(9)1360ndash71

135 Turaga SC Murray JF Jain V et al Convolutional networkscan learn to generate affinity graphs for image segmenta-tion Neural Comput 201022(2)511ndash38

136 Helmstaedter M Briggman KL Turaga SC et alConnectomic reconstruction of the inner plexiform layer inthe mouse retina Nature 2013500(7461)168ndash74

137 Ciresan D Giusti A Gambardella LM et al Deep neural net-works segment neuronal membranes in electron micros-copy images In Advances in Neural Information ProcessingSystems 2012 2843ndash51

16 | Min et al

at Cornell U



ownloaded from

138 Prasoon A Petersen K Igel C et al Deep feature learning forknee cartilage segmentation using a triplanar convolutionalneural network Medical Image Computing and Computer-Assisted Intervention ndash MICCAI 2013 Springer Heidelberg2013 246ndash53

139 Havaei M Davy A Warde-Farley D et al Brain tumor seg-mentation with deep neural networks arXiv PreprintarXiv150503540 2015

140 Roth HR Lu L Farag A et al Deeporgan multi-level deepconvolutional networks for automated pancreas segmenta-tion In Medical Image Computing and Computer-AssistedIntervention ndash MICCAI 2015 Springer Heidelberg 2015556ndash64

141 Stollenga MF Byeon W Liwicki M et al Parallel multi-dimensional LSTM with application to fast biomedical volu-metric image segmentation arXiv Preprint arXiv1506074522015

142 Xu J Xiang L Liu Q et al Stacked Sparse Autoencoder (SSAE)for nuclei detection on breast cancer histopathology imagesIEEE Trans Med Imaging 201535(1)119ndash30

143 Chen CL Mahjoubfar A Tai L-C et al Deep learning in label-free cell classification Sci Rep 20166

144 Cho J Lee K Shin E et al Medical image deep learning withhospital PACS dataset arXiv Preprint arXiv151106348 2015

145 Lee S Choi M Choi H-S et al FingerNet Deep learning-based robust finger joint detection from radiographs InBiomedical Circuits and Systems Conference (BioCAS) 2015 IEEE2015 p 1ndash4 IEEE New York

146 Roth HR Lee CT Shin H-C et al Anatomy-specific classifica-tion of medical images using deep convolutional nets arXivPreprint arXiv150404003 2015

147 Roth HR Lu L Seff A et al A new 25 D representation forlymph node detection using random sets of deep convolu-tional neural network observations In Medical ImageComputing and Computer-Assisted Intervention ndash MICCAI 2014Springer Heidelberg 2014 520ndash7

148 Kraus OZ Frey BJ Computer vision for high content screen-ing Crit Rev Biochem Mol Biol 201651(2)102ndash9

149 Gerven MAV De Lange FP Heskes T Neural decoding withhierarchical generative models Neural Comput201022(12)3127ndash42

150 Koyamada S Shikauchi Y Nakae K et al Deep learning offMRI big data a novel approach to subject-transfer decod-ing arXiv Preprint arXiv150200093 2015

151 Duryea J Jiang Y Countryman P et al Automated algorithmfor the identification of joint space and phalanx margin lo-cations on digitized hand radiographs Med Phys199926(3)453ndash61

152 Niedermeyer E da Silva FL Electroencephalography BasicPrinciples Clinical Applications and Related Fields LippincottWilliams amp Wilkins New York 2005

153 Buzsaki G Anastassiou CA Koch C The origin of extracellu-lar fields and currents ndash EEG ECoG LFP and spikes Nat RevNeurosci 201213(6)407ndash20

154 Marriott HJL Wagner GS Practical electrocardiographyWilliams amp Wilkins Baltimore 1988

155 De Luca CJ The use of surface electromyography in bio-mechanics J Appl Biomech 199713135ndash63

156 Young LR Sheena D Eye-movement measurement tech-niques Am Psychol 197530(3)315

157 Barea R Boquete L Mazo M et al System for assisted mobil-ity using eye movements based on electrooculography IEEETrans Neural Syst Rehabil Eng 200210(4)209ndash18

158 Freudenburg ZV Ramsey NF Wronkeiwicz M et al Real-time naive learning of neural correlates in ECoG electro-physiology Int J Mach Learn Comput 2011

159 An X Kuang D Guo X et al A deep learning method for clas-sification of EEG data based on motor imagery In IntelligentComputing in Bioinformatics Springer Heidelberg 2014203ndash10

160 Li K Li X Zhang Y et al Affective state recognition from EEGwith deep belief networks In 2013 IEEE InternationalConference on Bioinformatics and Biomedicine (BIBM) 2013 p305ndash10 IEEE New York

161 Jia X Li K Li X et al A novel semi-supervised deep learningframework for affective state recognition on EEG signals In2014 IEEE International Conference on Bioinformatics andBioengineering (BIBE) 2014 p 30ndash7 IEEE New York

162 Zheng W-L Guo H-T Lu B-L Revealing critical channels andfrequency bands for emotion recognition from EEG withdeep belief network In 2015 7th International IEEEEMBSConference on Neural Engineering (NER) 2015 p 154ndash7 IEEENew York

163 Jirayucharoensak S Pan-Ngum S Israsena P EEG-basedemotion recognition using deep learning network with prin-cipal component based covariate shift adaptation Sci WorldJ 20142014 doi1011552014627892

164 Stober S Cameron DJ Grahn JA Classifying EEG recordingsof rhythm perception In 15th International Society for MusicInformation Retrieval Conference (ISMIRrsquo14) 2014 p 649ndash54

165 Stober S Cameron DJ Grahn JA 20141449ndash57 Using convo-lutional neural networks to recognize rhythm In Advancesin Neural Information Processing Systems

166 Cecotti H Graeser A Convolutional neural network withembedded Fourier transform for EEG classification In 19thInternational Conference on Pattern Recognition 2008 ICPR 20082008 p 1ndash4 IEEE New York

167 Cecotti H Greuroaser A Convolutional neural networks for P300detection with application to brain-computer interfacesIEEE Trans Pattern Anal Mach Intell 201133(3)433ndash45

168 Soleymani M Asghari-Esfeden S Pantic M et al Continuousemotion detection using EEG signals and facial expressionsIn 2014 IEEE International Conference on Multimedia and Expo(ICME) 2014 p 1ndash6 IEEE New York

169 Wang Z Lyu S Schalk G et al Deep feature learning usingtarget priors with applications in ECoG signal decoding forBCI In Proceedings of the Twenty-Third International JointConference on Artificial Intelligence 2013 p 1785ndash91 AAAIPress Palo Alto

170 Stober S Sternin A Owen AM et al Deep feature learning forEEG Recordings arXiv Preprint arXiv151104306 2015

171 Huanhuan M Yue Z Classification of electrocardiogram sig-nals with deep belief networks In 2014 IEEE 17thInternational Conference on Computational Science andEngineering (CSE) 2014 p 7ndash12 IEEE New York

172 Wulsin D Gupta J Mani R et al Modeling electroencephalog-raphy waveforms with semi-supervised deep belief netsfast classification and anomaly measurement J Neural Eng20118(3)036015

173 Turner J Page A Mohsenin T et al Deep belief networksused on high resolution multichannel electroencephalog-raphy data for seizure detection In 2014 AAAI SpringSymposium Series 2014

174 Zhao Y He L Deep learning in the EEG diagnosis ofAlzheimerrsquos disease In Computer Vision-ACCV 2014Workshops Springer New York 2014 340ndash53


at Cornell U



ownloaded from

175 Leuroangkvist M Karlsson L Loutfi A Sleep stage classificationusing unsupervised feature learning Adv Artif Neural Syst201220125

176 Mirowski P Madhavan D LeCun Y et al Classification of pat-terns of EEG synchronization for seizure prediction ClinNeurophysiol 2009120(11)1927ndash40

177 Petrosian A Prokhorov D Homan R et al Recurrent neuralnetwork based prediction of epileptic seizures in intra-andextracranial EEG Neurocomputing 200030(1)201ndash18

178 Davidson PR Jones RD Peiris MT EEG-based lapse detectionwith high temporal resolution IEEE Trans Biomed Eng200754(5)832ndash9

179 Oh S Lee MS Zhang B-T Ensemble learning with active ex-ample selection for imbalanced biomedical data classifica-tion IEEEACM Trans Comput Biol Bioinformatics (TCBB)20118(2)316ndash25

180 Malin BA El Emam K OrsquoKeefe CM Biomedical data privacyproblems perspectives and recent advances J Am MedInform Assoc 201320(1)2ndash6

181 He H Garcia EA Learning from imbalanced data IEEE TransKnowledge Data Eng 200921(9)1263ndash84

182 Davis J Goadrich M The relationship between PrecisionndashRecalland ROC curves In Proceedings of the 23rd International Conferenceon Machine Learning 2006 p 233ndash40 ACM New York

183 Lopez V Fernandez A Garcıa S et al An insight into classifi-cation with imbalanced data empirical results and currenttrends on using data intrinsic characteristics Inform Sci2013250113ndash41

184 Liu X-Y Wu J Zhou Z-H Exploratory undersampling forclass-imbalance learning IEEE Trans Syst Man Cybern Part BCybern 200939(2)539ndash50

185 Chawla NV Bowyer KW Hall LO et al SMOTE synthetic mi-nority over-sampling technique J Artif Intell Res 2002321ndash57

186 Jo T Japkowicz N Class imbalances versus small disjunctsACM Sigkdd Explor Newslett 20046(1)40ndash9

187 Kukar M Kononenko I Cost-sensitive learning with neuralnetworks In ECAI 1998 445ndash9 Citeseer

188 Pan SJ Yang Q A survey on transfer learning IEEE TransKnowl Data Eng 201022(10)1345ndash59

189 Deng J Dong W Socher R et al Imagenet a large-scale hier-archical image database In CVPR 2009 IEEE Conference onComputer Vision and Pattern Recognition 2009 2009 p 248ndash55IEEE

190 Zeiler MD Fergus R Visualizing and understanding convo-lutional networks Computer VisionndashECCV 2014 Springer2014 818ndash33

191 Erhan D Bengio Y Courville A et al Visualizing higher-layer features of a deep network University of Montreal 20091341

192 Simonyan K Vedaldi A Zisserman A Deep inside convolu-tional networks visualising image classification models andsaliency maps arXiv Preprint arXiv13126034 2013

193 Choromanska A Henaff M Mathieu M et al The loss surfacesof multilayer networks arXiv Preprint arXiv14120233 2014

194 Dauphin YN Pascanu R Gulcehre C et al Identifying and at-tacking the saddle point problem in high-dimensional non-convex optimization In Advances in Neural InformationProcessing Systems 2014 2933ndash41

195 Bengio Y Practical recommendations for gradient-basedtraining of deep architectures In Neural Networks Tricks ofthe Trade Springer Heidelberg 2012 437ndash78

196 Bergstra J Bardenet R Bengio Y et al Algorithms for hyper-parameter optimization In Advances in Neural InformationProcessing Systems 2011 2546ndash54

197 Hutter F Hoos HH Leyton-Brown K Sequential model-based optimization for general algorithm configuration InLearning and Intelligent Optimization Springer Berlin 2011507ndash23

198 Snoek J Larochelle H Adams RP Practical bayesian opti-mization of machine learning algorithms In Advances inNeural Information Processing Systems 2012 2951ndash9

199 Bergstra J Bengio Y Random search for hyper-parameteroptimization J Mach Learn Res 201213(1)281ndash305

200 Ngiam J Khosla A Kim M et al Multimodal deep learningIn Proceedings of the 28th International Conference on MachineLearning (ICML-11) 2011 p 689ndash96

201 Cao Y Steffey S He J et al Medical image retrieval a multi-modal approach Cancer Inform 201413(Suppl 3)125

202 Ngiam J Coates A Lahiri A et al On optimization me-thods for deep learning In Proceedings of the 28thInternational Conference on Machine Learning (ICML-11) 2011 p265ndash72

203 Martens J Deep learning via Hessian-free optimization InProceedings of the 27th International Conference on MachineLearning (ICML-10) 2010 p 735ndash42

204 Raina R Madhavan A Ng AY Large-scale deep unsupervisedlearning using graphics processors In Proceedings of the 26thAnnual International Conference on Machine Learning 2009 p873ndash80 ACM

205 Ho Q Cipar J Cui H et al More effective distributed ml via astale synchronous parallel parameter server In Advances inNeural Information Processing Systems 2013 p 1223ndash31

206 Bengio Y Schwenk H Senecal J-S et al Neural probabilisticlanguage models Innovations in Machine Learning SpringerBerlin 2006 137ndash86

207 Li M Andersen DG Park JW et al Scaling distributed ma-chine learning with the parameter server In 11th USENIXSymposium on Operating Systems Design and Implementation(OSDI 14) 2014 p 583ndash98

208 Dean J Corrado G Monga R et al Large scale distributeddeep networks In Advances in Neural Information ProcessingSystems 2012 1223ndash31

209 Kin H Park J Jang J et al DeepSpark spark-based deep learn-ing supporting asynchronous updates and caffe compatibil-ity arXiv Preprint arXiv160208191 2016

210 Abadi M Agarwal A Barham P et al TensorFlow Large-scale machine learning on heterogeneous systems 2015Software available from tensorflow org 2015

211 Simonite T Thinking in Silicon MIT Technology Review2013

212 Ovtcharov K Ruwase O Kim J-Y et al Accelerating deepconvolutional neural networks using specialized hardwareMicrosoft Res Whitepaper 20152

213 Farabet C Poulet C Han JY et al Cnp an fpga-based proces-sor for convolutional networks In FPL 2009 InternationalConference on Field Programmable Logic and Applications 20092009 p 32ndash7 IEEE New York

214 Hof RD Neuromorphic Chips MIT Technology Review 2014215 Yao L Torabi A Cho K et al Describing videos by exploiting

temporal structure In Proceedings of the IEEE InternationalConference on Computer Vision 2015 p 4507ndash15

216 Noh H Seo PH Han B Image question answering using con-volutional neural network with dynamic parameter predic-tion arXiv preprint arXiv151105756 2015

217 Graves A Wayne G Danihelka I Neural turning machinesarXiv Preprint arXiv14105401 2014

218 Weston J Chopra S Bordes A Memory networks arXivPreprint arXiv14103916 2014

18 | Min et al

at Cornell U



ownloaded from

219 Szegedy C Zaremba W Sutskever I et al Intriguing prop-erties of neural networks arXiv Preprint arXiv131261992013

220 Goodfellow IJ Shlens J Szegedy C Explaining and harness-ing adversarial examples arXiv Preprint arXiv141265722014

221 Goodfellow I Pouget-Abadie J Mirza M et al Generative ad-versarial nets In Advances in Neural Information ProcessingSystems 2014 2672ndash80

222 Lee T Choi M Yoon S Manifold regularized deep neural net-works using adversarial examples arXiv Preprint arXiv151106381 2015

223 Rasmus A Berglund M Honkala M et al Semi-supervisedlearning with ladder networks In Advances in NeuralInformation Processing Systems 2015 3532ndash40

224 Arel I Deep reinforcement learning as foundation for artifi-cial general intelligence In Theoretical Foundations of ArtificialGeneral Intelligence Springer Berlin 2012 89ndash102

225 Cutler M How JP Efficient reinforcement learning for robots usinginformative simulated priors In 2015 IEEE International Conferenceon Robotics and Automation (ICRA) 2015 p 2605ndash12 IEEE New York


at Cornell U



ownloaded from

bbw068-TF1

where the artificial intelligence (AI) community has struggledfor many years [8] One of the most important advancementsthus far has been in image and speech recognition [9ndash15] al-though promising results have been disseminated in naturallanguage processing [16 17] and language translation [18 19]Certainly bioinformatics can also benefit from deep learning(Figure 2) splice junctions can be discovered from DNA se-quences finger joints can be recognized from X-ray imageslapses can be detected from electroencephalography (EEG) sig-nals and so on

Previous reviews have addressed machine learning in bio-informatics [6 20] and the fundamentals of deep learning [7 821] In addition although recently published reviews by Leunget al [22] Mamoshina et al [23] and Greenspan et al [24] dis-cussed deep learning applications in bioinformatics researchthe former two are limited to applications in genomic medicineand the latter to medical imaging In this article we provide amore comprehensive review of deep learning for bioinformaticsand research examples categorized by bioinformatics domain(ie omics biomedical imaging biomedical signal processing)and deep learning architecture (ie deep neural networks con-volutional neural networks recurrent neural networks emer-gent architectures) The goal of this article is to provide valuableinsight and to serve as a starting point to facilitate the applica-tion of deep learning in bioinformatics studies To the best ofour knowledge we are one of the first groups to review deeplearning applications in bioinformatics

Deep learning a brief overview

Efforts to create AI systems have a long history Figure 3 illus-trates the relationships and high-level schematics of differentdisciplines Early approaches attempted to explicitly programthe required knowledge for given tasks however these faceddifficulties in dealing with complex real-world problems be-cause designing all the detail required for an AI system to ac-complish satisfactory results by hand is such a demanding job[7] Machine learning provided more viable solutions with thecapability to improve through experience and data Althoughmachine learning can extract patterns from data there are limi-tations in raw data processing which is highly dependent onhand-designed features To advance from hand-designed to

data-driven features representation learning particularly deeplearning has shown great promise Representation learning candiscover effective features as well as their mappings from datafor given tasks Furthermore deep learning can learn complexfeatures by combining simpler features learned from data Inother words with artificial neural networks of multiple non-lin-ear layers referred to as deep learning architectures hierarch-ical representations of data can be discovered with increasinglevels of abstraction [25]

Key elements of deep learning

The successes of deep learning are built on a foundation of sig-nificant algorithmic details and generally can be understood intwo parts construction and training of deep learning architec-tures Deep learning architectures are basically artificial neuralnetworks of multiple non-linear layers and several types havebeen proposed according to input data characteristics and re-search objectives (Table 1) Here we categorized deep learningarchitectures into four groups (ie deep neural networks (DNNs)[26ndash30] convolutional neural networks (CNNs) [31ndash33] recurrentneural networks (RNNs) [34ndash37] emergent architectures [38ndash41])and explained each group in detail (Table 2) Some papers haveused lsquoDNNsrsquo to encompass all deep learning architectures [7 8]however in this review we use lsquoDNNsrsquo to refer specifically tomultilayer perceptron (MLP) [26] stacked auto-encoder (SAE)[27 28] and deep belief networks (DBNs) [29 30] which use per-ceptrons [42] auto-encoders (AEs) [43] and restricted Boltzmannmachines (RBMs) [44 45] as the building blocks of neural net-works respectively CNNs are architectures that have suc-ceeded particularly in image recognition and consist ofconvolution layers non-linear layers and pooling layers RNNsare designed to utilize sequential information of input datawith cyclic connections among building blocks like perceptronslong short-term memory units (LSTMs) [36 37] or gated recur-rent units (GRUs) [19] In addition many other emergent deeplearning architectures have been suggested such as deepspatio-temporal neural networks (DST-NNs) [38] multi-dimensional recurrent neural networks (MD-RNNs) [39] andconvolutional auto-encoders (CAEs) [40 41]

The goal of training deep learning architectures is optimiza-tion of the weight parameters in each layer which gradually

Figure 1 Approximate number of published deep learning articles by year The number of articles is based on the search results on httpwwwscopuscom with the

two queries lsquoDeep learningrsquo lsquoDeep learningrsquo AND lsquobiorsquo

2 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from













AE Auto-Encoder




























4 | Min et al

at Cornell U



ownloaded from

























at Cornell U



ownloaded from















6 | Min et al

at Cornell U



ownloaded from






Omics
















tion of hkthorn1ij


at Cornell U



ownloaded from











8 | Min et al

at Cornell U



ownloaded from

























at Cornell U



ownloaded from





Biomedical imaging














10 | Min et al

at Cornell U



ownloaded from

















at Cornell U



ownloaded from
















12 | Min et al

at Cornell U



ownloaded from



Conclusion



Key Points







Acknowledgements


Funding















at Cornell U



ownloaded from













































14 | Min et al

at Cornell U



ownloaded from












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1









at Cornell U



ownloaded from













AE Auto-Encoder




























4 | Min et al

at Cornell U



ownloaded from

























at Cornell U



ownloaded from















6 | Min et al

at Cornell U



ownloaded from






Omics
















tion of hkthorn1ij


at Cornell U



ownloaded from











8 | Min et al

at Cornell U



ownloaded from

























at Cornell U



ownloaded from





Biomedical imaging














10 | Min et al

at Cornell U



ownloaded from

















at Cornell U



ownloaded from
















12 | Min et al

at Cornell U



ownloaded from



Conclusion



Key Points







Acknowledgements


Funding















at Cornell U



ownloaded from













































14 | Min et al

at Cornell U



ownloaded from












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1













AE Auto-Encoder




























4 | Min et al

at Cornell U



ownloaded from

























at Cornell U



ownloaded from















6 | Min et al

at Cornell U



ownloaded from






Omics
















tion of hkthorn1ij


at Cornell U



ownloaded from











8 | Min et al

at Cornell U



ownloaded from

























at Cornell U



ownloaded from





Biomedical imaging














10 | Min et al

at Cornell U



ownloaded from

















at Cornell U



ownloaded from
















12 | Min et al

at Cornell U



ownloaded from



Conclusion



Key Points







Acknowledgements


Funding















at Cornell U



ownloaded from













































14 | Min et al

at Cornell U



ownloaded from












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1

























at Cornell U



ownloaded from















6 | Min et al

at Cornell U



ownloaded from






Omics
















tion of hkthorn1ij


at Cornell U



ownloaded from











8 | Min et al

at Cornell U



ownloaded from

























at Cornell U



ownloaded from





Biomedical imaging














10 | Min et al

at Cornell U



ownloaded from

















at Cornell U



ownloaded from
















12 | Min et al

at Cornell U



ownloaded from



Conclusion



Key Points







Acknowledgements


Funding















at Cornell U



ownloaded from













































14 | Min et al

at Cornell U



ownloaded from












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1















6 | Min et al

at Cornell U



ownloaded from






Omics
















tion of hkthorn1ij


at Cornell U



ownloaded from











8 | Min et al

at Cornell U



ownloaded from

























at Cornell U



ownloaded from





Biomedical imaging














10 | Min et al

at Cornell U



ownloaded from

















at Cornell U



ownloaded from
















12 | Min et al

at Cornell U



ownloaded from



Conclusion



Key Points







Acknowledgements


Funding















at Cornell U



ownloaded from













































14 | Min et al

at Cornell U



ownloaded from












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1






Omics
















tion of hkthorn1ij


at Cornell U



ownloaded from











8 | Min et al

at Cornell U



ownloaded from

























at Cornell U



ownloaded from





Biomedical imaging














10 | Min et al

at Cornell U



ownloaded from

















at Cornell U



ownloaded from
















12 | Min et al

at Cornell U



ownloaded from



Conclusion



Key Points







Acknowledgements


Funding















at Cornell U



ownloaded from













































14 | Min et al

at Cornell U



ownloaded from












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1











8 | Min et al

at Cornell U



ownloaded from

























at Cornell U



ownloaded from





Biomedical imaging














10 | Min et al

at Cornell U



ownloaded from

















at Cornell U



ownloaded from
















12 | Min et al

at Cornell U



ownloaded from



Conclusion



Key Points







Acknowledgements


Funding















at Cornell U



ownloaded from













































14 | Min et al

at Cornell U



ownloaded from












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1

























at Cornell U



ownloaded from





Biomedical imaging














10 | Min et al

at Cornell U



ownloaded from

















at Cornell U



ownloaded from
















12 | Min et al

at Cornell U



ownloaded from



Conclusion



Key Points







Acknowledgements


Funding















at Cornell U



ownloaded from













































14 | Min et al

at Cornell U



ownloaded from












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1





Biomedical imaging














10 | Min et al

at Cornell U



ownloaded from

















at Cornell U



ownloaded from
















12 | Min et al

at Cornell U



ownloaded from



Conclusion



Key Points







Acknowledgements


Funding















at Cornell U



ownloaded from













































14 | Min et al

at Cornell U



ownloaded from












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1

















at Cornell U



ownloaded from
















12 | Min et al

at Cornell U



ownloaded from



Conclusion



Key Points







Acknowledgements


Funding















at Cornell U



ownloaded from













































14 | Min et al

at Cornell U



ownloaded from












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1
















12 | Min et al

at Cornell U



ownloaded from



Conclusion



Key Points







Acknowledgements


Funding















at Cornell U



ownloaded from













































14 | Min et al

at Cornell U



ownloaded from












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1



Conclusion



Key Points







Acknowledgements


Funding















at Cornell U



ownloaded from













































14 | Min et al

at Cornell U



ownloaded from












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1













































14 | Min et al

at Cornell U



ownloaded from












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1












































at Cornell U



ownloaded from










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1










































16 | Min et al

at Cornell U



ownloaded from







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1







































at Cornell U



ownloaded from













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1













































18 | Min et al

at Cornell U



ownloaded from









at Cornell U



ownloaded from

bbw068-TF1









at Cornell U



ownloaded from

bbw068-TF1

Deep learning in bioinformatics - Persagen Consulting · Deep learning in bioinformatics Seonwoo...

Documents

Transcript of Deep learning in bioinformatics - Persagen Consulting · Deep learning in bioinformatics Seonwoo...