Computational StudiesonAlzheimer’s Diseaseassociatedpathways

13
Computational studies on Alzheimers disease associated pathways and regulatory patterns using microarray gene expression and network data: Revealed association with aging and other diseases Priya P. Panigrahi, Tiratha Raj Singh n Department of Biotechnology and Bioinformatics, Jaypee University of InformationTechnology, Waknaghat, Solan, Himachal Pradesh 173234, India HIGHLIGHTS The association of novel genes and variants for their interaction with AD. TFBS identied as a mediocre biolo- gical process for AD and AG. Physico-chemical analysis for TFBS revealed novel associations. Novel information for network motifs such as BiFan, MIM, SIM, and others. Unique miRNA targets such as LDB2, and DOPEY1 as a regulatory process for AD. GRAPHICAL ABSTRACT article info Article history: Received 29 October 2012 Received in revised form 7 June 2013 Accepted 10 June 2013 Available online 26 June 2013 Keywords: Enrichment analysis TFBS Mirna Network motifs Nucleosomes abstract Alzheimers disease (AD), which is one of the most common age-associated neurodegenerative disorders, affects millions of people worldwide. Due to its polygenic nature, AD is believed to be caused not by defects in single genes, but by variations in a large number of genes and their complex interactions, which ultimately contribute to the broad spectrum of disease phenotypes. Extraction of insights and knowledge from microarray and network data will lead to a better understanding of complex diseases. The present study aimed to identify genes with differential topology and their further association with other biological processes that regulate causative factors for AD, ageing (AG) and other diseases. Our analysis revealed a common sharing of important biological processes and putative candidate genes among AD and AG. Some signicant novel genes and other variants for various biological processes have been reported as being associated with AD, AG, and other diseases, and these could be implicated in biochemical events leading to AD from AG through pathways, interactions, and associations. Novel information for network motifs such as BiFan, MIM (multiple input module), and SIM (single input module) and their close variants has also been discovered and this implicit information will help to improve research into AD and AG. Ten major classes for TFs (transcription factors) have been identied in our data, where hundreds of TFBS patterns are being found associated with AD, and other disease. Structural and physico-chemical properties analysis for these TFBS classes revealed association of biological processes involved with other severe human disease. Nucleosomes and linkers positional information could provide insights into key cellular processes. Unique miRNA (micro RNA) targets were identied as another regulatory process for AD. The association of novel genes and variants of existing genes have also been explored for their interaction and association with other diseases that are either directly or indirectly implicated through AG and AD. & 2013 Elsevier Ltd. All rights reserved. Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/yjtbi Journal of Theoretical Biology 0022-5193/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jtbi.2013.06.013 n Corresponding author. Tel.: +91 1792 239385; fax: +91 1792 245362. E-mail address: [email protected] (T.R. Singh). Journal of Theoretical Biology 334 (2013) 109121

description

mmm

Transcript of Computational StudiesonAlzheimer’s Diseaseassociatedpathways

  • Computational studies onand regulatory patterns udata: Revealed association

    aj Sincs, Jaype

    d be implicated inssociations. NovelSIM (single inputation will help toe been identied inand other disease.led association oflinkers positionalRNA) targets were

    identied as another regulatory process for AD. The association of novel genes and variants of existing

    Contents lists available at ScienceDirect

    Journal of Theoretical Biology

    Journal of Theoretical Biology 334 (2013) 109121E-mail address: [email protected] (T.R. Singh).& 2013 Elsevier Ltd. All rights reserved.

    0022-5193/$ - see front matter & 2013 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.jtbi.2013.06.013

    n Corresponding author. Tel.: +91 1792 239385; fax: +91 1792 245362.genes have also been explored for their interaction and association with other diseases that are eitherdirectly or indirectly implicated through AG and AD.TFBSMirnaNetwork motifsNucleosomes

    among AD and AG. Some signicant novel genes and other variants for various biologbeen reported as being associated with AD, AG, and other diseases, and these coulbiochemical events leading to AD from AG through pathways, interactions, and ainformation for network motifs such as BiFan, MIM (multiple input module), andmodule) and their close variants has also been discovered and this implicit informimprove research into AD and AG. Ten major classes for TFs (transcription factors) havour data, where hundreds of TFBS patterns are being found associated with AD,Structural and physico-chemical properties analysis for these TFBS classes reveabiological processes involved with other severe human disease. Nucleosomes andinformation could provide insights into key cellular processes. Unique miRNA (microKeywords:Enrichment analysis

    The present study aimed to identify genes with differential topology and their further association withother biological processes that regulate causative factors for AD, ageing (AG) and other diseases. Ouranalysis revealed a common sharing of important biological processes and putative candidate genes

    ical processes have7 June 2013Accepted 10 June 2013Available online 26 June 2013 knowledge from microarra Novel information for networkmotifs such as BiFan, MIM, SIM, andothers.

    Unique miRNA targets such as LDB2,and DOPEY1 as a regulatory processfor AD.

    a r t i c l e i n f o

    Article history:Received 29 October 2012Received in revised formwhich ultimately contribute to the broad spectrum of disease phenotypes. Extraction of insights andy and network data will lead to a better understanding of complex diseases.a b s t r a c t

    Alzheimers disease (AD), which is one of the most common age-associated neurodegenerative disorders,affects millions of people worldwide. Due to its polygenic nature, AD is believed to be caused not bydefects in single genes, but by variations in a large number of genes and their complex interactions,Physico-chemical analysis for TFBSrevealed novel associations.Priya P. Panigrahi, Tiratha RDepartment of Biotechnology and Bioinformati

    H I G H L I G H T S

    The association of novel genes andvariants for their interaction withAD.

    TFBS identied as a mediocre biolo-gical process for AD and AG.

    Alzheimers disease associated pathwayssing microarray gene expression and networkwith aging and other diseases

    gh n

    e University of Information Technology, Waknaghat, Solan, Himachal Pradesh 173234, India

    G R A P H I C A L A B S T R A C T

    urnal homepage: www.elsevier.com/locate/yjtbijo

  • P.P. Panigrahi, T.R. Singh / Journal of Theoretical Biology 334 (2013) 1091211101. Introduction

    AD affects millions of people worldwide and is one of themost common age associated neurodegenerative disorders (Chou,2004). AD is characterized by a progressive decline in memoryassociated with other cognitive decits: judgment, abstraction,language, attention and visuoconstructive abilities (Hommet et al.,2011). It is polygenic in nature and involves large number ofvariations in genes and their critical interactions that lead to thisdisease (Ray et al., 2008). Extracellular amyloid beta (A) plaques,intracellular neurobrillary tangles (NFT), cerebrovascular amy-loid, dystrophic neuritis and loss of synaptic connections arestandard markers of neurodegeneration in AD (Panigrahi andSingh, 2012; Tarawneh and Holtzman, 2010). It is found that theaberrant toxic A aggregation causes synaptic dysfunction, oxida-tive stress, ionic dyshomeostasis, tau aggregation, and apoptosis(Hardy and Selkoe, 2002). The cytotoxic A bril is one of thecoherent contenders for causing the starting damage to neurons inAD (Carter and Chou, 1998). Another noticeable fact is that ADdoes not affect entire brain at once as middle temporal gyrus(MTG) shows early AD pathology compared to the other regions ofbrain like entorhinal cortex (EC), hippocampus (HIP), and posteriorcingulate cortex (PCC) (Ray and Zhang, 2010).

    Various techniques have been used for the analysis of geneexpression data associated with neurodegenerative disorders likeAD. DNA microarrays constitute a contemporary tool for hypoth-esis generation (Newman and Weiner, 2005). Using this techniquelarge amount of gene expression data have been easily accumu-lated. Now the challenging task is to extract valuable biologicalinformation from this immense amount of data (Kong et al., 2011).It is done either by identifying critical genes which might single-handedly produce a biological effect or by nding patterns in thelist that point to an underlying biological process. Then annotatingeach gene on the list and looking for groups of genes that share aparticular characteristic (Stekel and Bioinformatics, 2003). Thisshared or interacting nature of genes is crucial for the analysis ofcomplex polygenic disease like AD. The development of micro-array technology provides researchers, a tool that measures theexpression levels of thousands of genes at once, offering possiblemolecular clues regarding mechanisms underlying the diseasepathophysiology (Huang et al., 2009). The Gene Ontology (GO)consortium has brought systematic order to the eld of geneannotation by pre-categorizing genes by biological process, mole-cular function, and cellular component (Ashburner et al., 2000).

    The focus of bioinformatics development has now shiftedfrom understanding networks encoded by model species to under-standing the networks underlying human disease, by the increaseof the human protein interaction data (Kann, 2007). Combiningthese network-based disease studies with the original analyses ofnetwork properties in model organisms may override the conclu-sion that genes associated with a particular phenotype or function,including the progression of disease, are not randomly positionedin the network. Rather, they tend to exhibit specic patterns suchas high connectivity, cluster together, and occur in central networklocations. There are evidences based on network property valueswhere it has been concluded that overall degree or averagedistance to one another tends to lie between essential andnonessential genes, and provide patterns for the inclusion of allavailable interacting partners for a specic biological network(Said et al., 2004; Shachar et al., 2008). Network motifs play acentral role in the identication and analysis of such specicpatterns in biological networks and yield signicant new insightsinto understanding complex biological processes involved in theintricate human disease such as AD.

    Recently some studies have been proposed which have common-

    alities in methods, while objectives are discrete (Miller et al., 2008;Data ltering or normalization can reduce the dataset by remov-ing poor or questionable data, data deemed uninteresting or irrele-vant to the analysis. In this study normalization of the datasets wasdone using one of the tool of TM4 called Microarray Data AnalysisSystem (MIDAS) and normalization modules used were locallyweighted linear regression (Cleveland and Devlin, 1988) and totalintensity normalization (Yang et al., 2002). The factors considered inthe ltration of dataset include low-intensity cutoff, intensity-dependent Z-score cutoffs and replicate consistency trimming, creat-ing a highly customizable method for preparing expression data forsubsequent comparison and analysis. MIDAS provides scatter plotsthat illustrate the effects of each algorithm on the data (Saeed et al.,2003). Preprocessed data was subjected to individual differentialgene expression followed by manually scrutinized.

    2.3. Differential gene expression

    Differential expression of probe sets for each dataset was per-formed using signicance analysis of microarrays (SAM) (Tusher et al.,2001). This supervised learning software for genomic expressionRay and Zhang, 2010). Additionally approaches as well as ndings arenovel in all of these studies including ours. Also there are scienticproofs and suggestions for common features between AD and otherdiseases to establish a link and common pathogenic mechanisms forthe treatment strategies (Gotz et al., 2009). Additionally several effortshave been made recently to investigate AD by using myriad computa-tional approaches (Chou, 2004, 2005; Wei et al., 2005; Gu et al., 2009).This overall coordination of studies designates the functional com-monalities for the complex mechanisms involved in AD and its links toother diseases and suggest common prediction practices and treat-ment strategies. Objective of this study was to nd out the relationshipbetween one of the most threatening disease AD with the normal AGor in other words the impact of AG factor on this disease. This studyalso identies genes with differential topology and their furtherassociation with other biological processes regulating causing factorsfor AD, aging and other diseases. This analysis has been performed byapplying integrative approach on various aspects of molecular data,markers, and networks to study a complex disease AD. This analysishas implications and applications for early AD detection and novelmarker identication for AD.

    2. Materials and methods

    2.1. Data

    Three separate microarray data sets were used in this study: oneconsists of microarrays assessing gene expression from the CA1region of the hippocampus from 31 individuals, comprising ninecontrols, sevenwith incipient AD, eight with moderate AD, and sevenwith severe AD (Blalock et al., 2004). Second data set is of 30microarrays representing a study of the effects of aging on frontallobe gene expression of individuals who died of natural causesbetween the ages of 26 and 106 (Lu et al., 2004). The AD study usedAffymetrix HG-U133A chips containing 22,283 probe sets, and theaging study used HG-U95A chips with 12,625 probe sets. In additionto these data sets, one more dataset used for comparative analysisconsists of 14 normal controls and 14 AD affected samples obtainedfrom Gene Expression Omnibus (GEO Accession Number: GDS2601)(Maes et al., 2007). Additionally to incorporate network prole andnetwork motif studies, network/pathways data associated with ADhas been utilized from KEGG and other popular interaction resources.

    2.2. Data pre-processingdata mining determines differentially expressed genes in a two class

  • P.P. Panigrahi, T.R. Singh / Journal of Theoretical Biology 334 (2013) 109121 111experiment based on the statistical analysis of modied gene specic ttest (Dziuda, 2010). The expressed genes of control and various ADstages (incipient, moderate, and severe) have been considered andcompared for the identication of nal set of differentially expressedgenes in AG and AD. After the process of normalization inclusive of allavailable methods and then scrutiny of genes based on the differentialgene expression, and comparative analysis for both AD and AGindividually, amongst all three datasets we select 62 and 658 numbersof informative or important genes for AD and AG, respectively.

    2.4. Clustering of co-expressed genes

    After selection of differentially expressed genes from all thedatasets in this study, the genes were clustered based on theexpression level to nd the co-expressed gene clusters. MultiExperiment Viewer (MEV) package from TM4 was used for cluster-ing of microarray data, using Euclidean distance metrics (Deza andDeza, 2009) and average linkage clustering algorithms (Johnson,1967). WebGestalt (WEB-based GEne SeT AnaLysis Toolkit), wasused to analyze functional, genomic, proteomic, and large-scalegenetic studies from which large number of gene lists (e.g. differ-entially expressed gene sets, co-expressed gene sets etc.) weregenerated (Duncan et al., 2010; Zhang et al., 2005). WebGestaltincorporates information from different public resources and pro-vides an easy way for biologists to make sense out of gene lists.

    Self-Organizing Maps (SOM) or Kohonen network (Kohonen,1982a, 1982b) was used to generate distinct clusters based uponthe functional parameter and expression prole of each clusterindependently. SOM is a visualization tool based on the unsuper-vised articial neural network (ANN) as a learning algorithm. SOMis mostly used to cluster either genes or biological samples. In caseof gene expression analysis, this method is used to group genesinto clusters of similar expression proles. We applied SOM withthe input layer of N inputs representing the original N variables(here, AD and AG common genes), and the output layer being thegrid of neurons corresponding to cluster prototypes. PrincipalComponent Analysis (PCA) (Hotelling, 1933) was also performedto determine the relationship between each modules of AD geneset and their phenotypic assessments for all the three data types.

    2.5. Hypergeometric distribution and association of ranked genes

    Ranked list of genes is also informative while dealing withmultiple (polygenic) processes which are required in AD case asstated earlier. To implement this aspect GOrilla was used thatidenties enriched GO terms in ranked lists of genes (Eden et al.,2009). GOrilla computes an exact p-value for the observedenrichment, taking threshold multiple testing into account with-out the need for simulations. Both WebGestalt and GOrilla uses thesame statistical approach that is hyper-geometric distribution(HGD) for the enrichment analysis or statistical signicance test-ing, while WebGestalt additionally uses the Fishers exact test fortwo independent gene sets (Eden et al., 2007; Sealfon et al., 2006).

    In a group of N number of genes there are K genes which areassociated with a particular GO term. If we take sample n genesout of N, then we found k associated genes and the probability ofobtaining k or more GO term associated genes in a sample of ncould be calculated via HGD.

    pvalue 1 k1

    i 0f HGi;N;K ;n 1

    k1

    i 0

    K

    i

    NKni

    N

    n

    GOrilla employs a exible threshold statistical approach to

    discover GO terms that are signicantly enriched at the top of aranked gene list. It applies a variant of standard HGD based on acomplete theoretical characterization of the underlying distribu-tion called mHG (minimum hypergeometric) (Eden et al., 2007). Inmost cases a xed threshold (n) is not known but rather a rankingof all the elements, to nd n which minimizes the HGD. Formally,if a ranked gene list: g1,,gN is provided in place of a target set, wedene a label vector 1,,N {0, 1}N according to theassociation of the ranked genes to the given GO term, i1 if giis associated with the term (Eden et al., 2007). The mHG score isthen dened as

    mHG min1nNHGTN;B;n; bnwhere

    bn ni 1i

    2.6. Topological overlap between co-expressed networks and otherassociated factors

    For analyzing topological overlap (TO) between co-expressednetworks, method developed by Ray and Zhang (2010) was imple-mented. We identied genes with a topological difference (i.e. low TO)between co-expressed networks, where the actual amount of similar-ity between two neighborhoods (brain regions such as EC, HIP, MTG,and PCC, and their co-expressed networks) is only 5%. Randomadditions or deletion of links to the original network while keepingthe degree of the genes equal to the original network using t statisticswere made to perform comparisons against 1000 random networks toassess the signicance of the zero TO genes. The signicance values(p-values) were calculated (with 999 degrees of freedom) using theformula and method given in Ray and Zhang (2010).

    2.7. Prioritization of gene candidates with molecular triangulation

    Presence of large molecular networks that encompass multiplegenes harboring disease-related genetic variation, and are avail-able in a computer-accessible form, motivated to prioritize candi-date genes. Distance-dependent decay function was implicated.Let each seed node project its evidence value to its immediateneighbor nodes, such that the secondary-evidence value decays withthe distance from the seed node. Then the secondary-evidence valueis calculated in the following way:

    Eu vB

    EPvf duv

    where E(u) is the secondary evidence for node u, Ep(v) is the primaryevidence for seed node v, B is the set of all seed nodes, duv is thedistance between nodes u and v, and nally, f is a distance-dependent decay function (Krauthammer et al., 2004).

    STRING (Szklarczyk et al., 2011), oPOSSUM (Ho et al., 2007), andJASPAR (Sandelin et al., 2004) web applications were used forinteraction studies (network mapping for association), and enrich-ment analysis of microarray gene expression data. Each cluster ofgene set was analyzed for enrichment of TFBS using the oPOSSUM.The conserved non-coding regions of the promoters were searchedfor matches to all TFBS proles in the JASPAR database. The posi-tioning of nucleosomes has played important roles in key cellularprocesses such as mRNA splicing, DNA replication, and DNA repair(Berbenetz et al., 2010; Tilgner et al., 2009; Yasuda et al., 2005;Sehgal and Singh, 2012), and to evaluate this phenomenon, asequence-based predictor named iNuc-PhysChem (Chen et al.,2012) was used for identifying nucleosomes of the genes (investi-gated in this study), by their physico-chemical properties. All theabove mentioned tools/web applications have been applied to inferknowledge from the processed data about various aspects of gene

    expression, their interactions, pathways involved, and role of other

  • biological processes. Identied processes involved in our expressiondata after comprehensive analysis are ranging from proteinproteininteraction (PPI), transcription targets (TTGS) which includes TFBS,miRNA targets, and KEGG le results (KEGG).

    2.8. Network motif analysis

    Network motifs are statistically overrepresented sub-structures(sub-graphs) in a network, and have been recognized as the simplebuilding blocks of complex networks (Alon, 2006). Network motifsare important to understand the modularity and the large-scalestructure of biological networks. In this study AD associatedbiological networks were used for network motifs identicationand analysis. Available pathways and networks were taken fromKEGG, REACTOME, BioGRID and other sources (Kanehisa and Goto,2000; Croft et al., 2010; Stark et al., 2006; Kandasamy et al., 2010).MFinder (Kashtan et al., 2005), FANMOD (Wernicke and Rasche,2006), and MAVisto (Schreiber and Schwbbermeyer, 2005) toolswere used to identify and analyze network motifs. Motifs in therange of 38 nodes were selected for the study. Statisticallysignicant motifs (depending upon the criteria of tools used) wereused for further annotations and analysis.

    3. Results and discussion

    3.1. Differential gene expression, clustering of co-expressed genes

    candidates have been selected through preliminary preprocessingand manual scrutiny based upon the comparative analysis. Theresults obtained through SAM revealed total 720 genes from allthree datasets that are differentially expressed at a false discoveryrate of 0.1%. Dynamic feature of SAM through t-statistics andANOVA F-statistic was applied in MeV for SAM plot generation(Fig. 2). Our study is based on multiple datasets and for the t test,the overall p-value of the F test has to be adjusted for multiplecomparisons. After adjustments, we obtained nal p-value of 0.55,which covers all plausible genes, which were subjected to furtherannotation and analysis. When manually compared, 36 genesout of these 720 genes were common for both AD and AG basedon relevant factors involved in both processes. After the GO

    Fig. 2. A scattered plot of the observed relative difference d (Y-axis) versus theexpected relative difference dE (X-axis) indicates the result of SAM on 720 geneswhere the dash lines are at a threshold distance from the ddE identity line. Inthe above plot the red colored points, that are outside the threshold linesrepresents common genes for AG and AD at the threshold level . Plot wasgenerated for multiple comparisons for three data sets by applying t-statistic andANOVA F-statistic with adjustable p-value. (For interpretation of the references tocolor in this gure caption, the reader is referred to the web version of this article.)

    P.P. Panigrahi, T.R. Singh / Journal of Theoretical Biology 334 (2013) 109121112It is relatively difcult to nd lists of differentially expressedgenes with signicant overlap between microarray studies (Kuoet al., 2002). To overcome from this limitation, multiple data setswere analyzed in this study to reveal novel patterns amid inter-acting and associating entities for AG, AD, and other diseases byapplying myriad statistical and computational techniques. Anintegrated multifaceted approach is applied to analyze this poly-genic disease (Fig. 1). Final set of important genes as robustFig. 1. Pictographic representation of an integrated multifaceted approach applied in this study which describes the whole methodology.

  • enrichment analysis it was found that out of the 36 genes that arecommon in both AD and AG, 26 genes are found to have essentialrole in AD, while 622 are AG specic (Table 1).

    Volcano plot was generated between the controls and diseasedstage genes of AD. Here we were able to clearly measure thedifference in the gene expressions between two groups. Groupingwas done based on the available expression data. In this case thegroup A is for controls and group B is for the various diseased states(incipient, moderate, and severe). Genes were found well organizedin control case (A) while diseased states genes (B) were scattered andshows random expression patterns (Fig. 3). SOM image was gener-

    components (Figs. 57). Based upon the functional enrichmentanalysis, nally 43 genes were found to be involved in PPI, TTGS,miRNAs, and KEGG (Supplementary Table S1). While manuallyanalyzed amongst 658 processed genes of AG and 62 processedgenes of AD, we found 36 genes common (Figs. 4 and 8). Thisnding prompted us to perform co-expression analysis of net-works. Co-expression network method developed by Ruan and

    ven different types of analyses applied on all data types and number of genes identied

    VA F

    Fig. 4. SOM result, showing grouped genes into 9 hexagonal clusters of similarexpression proles. Black for more conserved and white for less conservedexpression proles. 36 common genes were found distributed amongst 5 clusters(clusters 15).

    Fig. 3. Volcano plot for differentially expressed genes. The black longitudinal linebetween the group A and B is the mean of both groups. It can be clearly identiedthat the red dots (representation of genes) of group B (diseased states: incipient,moderate, and severe) are more scattered than the group A (control). (Forinterpretation of the references to color in this gure caption, the reader is referredto the web version of this article.)

    P.P. Panigrahi, T.R. Singh / Journal of Theoretical Biology 334 (2013) 109121 113ated to group common AD and AG genes into clusters of similarexpression proles (Fig. 4). Scale of the clusters in image ranges fromblack to white through few gray shades indicating maximal (black),moderate (gray), and minimal (white) co-expressed genes fall underrespective clusters. Out of 9, 5 clusters were identied with sig-nicant measures (cutoff for black and dark gray color only). Whilelooking for genes common in AD and AG, we were able to map all 36genes which were found in these 5 clusters where 20 were found in2 clusters (cluster 1, and 2: Fig. 4) and remaining 16 were distributedamong 3 other clusters (clusters 35: Fig. 4). Similar informationcould also be generated through SOM (distance based scale). Weveried these two scaled conditions for SOM and found same set ofgenes in both the conditions.

    PCA revealed the relationship between each modules of AG andAD gene set (total 89 samples from three experiments) and theirphenotypic assessments. PCA projections have been observed for3 different conditions for 3 component combinations. Analysis wasperformed for data 1, 2, and 3 for all three data sets, respectively, inthis study. It has been observed that component 1 (data set 1)interacting with components 2 (data set 2) (Supplementary Fig. S1)and 3 (Supplementary Fig. S2) respectively have shown similarpatterns while interaction of components 2 and 3 (data set 3)(Supplementary Fig. S3) indicates different pattern than others andsupport the differential expression levels at component level. Similarkind of modular patterns have been observed in other studies (Rayand Zhang, 2010) and support the modularity of expression levelswith reference to regions in the brain. One possible region for thisdifferentiation would be association of diverse brain regions in thesestudies. PCA mainly applicable for continuous data while our data setscould be categorized into different groups i.e. control, and variousdisease states. To evaluate this parameter and to further extendour PCA analysis, we applied correspondence analysis projections forall three data sets and were able to trace co-expression patternsfor genes in all three combinations. Black squares corresponds to co-expression patterns of genes in all three combinations (Supplemen-tary Figs. S4S6).

    3.2. Enrichment analysis through co-expressed networks and rankedlist of genes

    GO enrichment analysis shows the signicantly enriched GOcategories for biological process, molecular function and cellular

    Table 1Signicant number of genes identied through multifaceted integrative approach. Sefor all categories indicated against each one, respectively.

    Analysis types

    Differential gene expression (after multiple comparisons through t- and ANOClustering of co-expressed genesRanked list of genesGene enrichmentNetwork motifexpression of genes in brain regionsOver-represented transcription factor binding sitesNo. of Genes involved in interactions and associations

    -statistics) 62 (AD), 658 (AG)36434334754

  • Fig. 6. GO enrichment analysis shows the signicantly enriched bar chart ofmolecular function categories for differentially expressed genes. Major genes fall

    P.P. Panigrahi, T.R. Singh / Journal of Theoretical Biology 334 (2013) 109121114Fig. 5. GO enrichment analysis shows the signicantly enriched bar chart ofbiological process categories for differentially expressed genes. Major genes fallZhang (Ruan and Zhang, 2006; Ruan et al., 2010) was applied tothese genes to measure the pairwise expression similaritybetween genes and to construct gene co-expressed networks.Besides primary hub genes, some experimentally veried entrieswere identied and pairwise expression similarity was appliedbetween genes (co-expressed). This set of genes was also beenmapped on a sub-network through co-expression analysis (Fig. 9and Supplementary Table S2).

    Some genes are found to be involved in multiple processes suchas PSME3 (proteasome activator subunit 3) and PSME4 (protea-some 26S subunit, non-ATPase, 4) genes are involved in PPI, TTGS,and KEGG pathways. These two genes are also found to beinvolved in normal ageing process. While cross checking theoutput of GO enrichment analysis (using WebGestalt and GOrilla)the above mentioned two genes were found in nucleoplasm withhigh level of signicance (p-value 5.43E-04; GOrilla). It indicateshow a particular method might not capture all the informationlatent in biological data and similar analysis with other tools, ormethods could provide insightful annotations. Similarly genessuch as TBL1X (transducin (beta)-like 1X-linked) and KIAA0528are found to be involved in TTGS, miRNA targets, and also involvedin both AD and AG related metabolic processes.

    3.3. Novel gene variants, transcription factors, and miRNA targets

    Another interesting facet of this study was the identication ofsome novel entries or variants of genes which were not reported inthe previous studies. These novel entries are chromosome 1 openreading frame 115 (C1orf115); D4, zinc and double PHD ngers,family 3 (DPF3); proteasome (prosome, macropain) 26S subunit,non-ATPase, 4 (PSMD4); ubiquitin specic peptidase 25 (USP25);potassium voltage-gated channel, shaker-related subfamily, member

    under the categories biological regulation, localizations, and various metabolic,organismal, and development process.5 (KCNA5); leucine zipper, putative tumor suppressor 1 (LZTS1);chondroitin sulfate proteoglycan 5 (neuroglycan C) (CSPG5); andsolute carrier family 25 (mitochondrial carrier; adenine nucleotide

    under the categories protein, and ion binding, various activities which includeshydrolase, transporter, and transcription regulation etc.

    Fig. 7. GO enrichment analysis shows the signicantly enriched bar chart ofcellular components categories for differentially expressed genes. Major genes fallunder membrane, macromolecular complex, and nucleus. Others are distributedamongst Golgi apparatus, cytoskeleton, vesicle, and endoplasmic reticulum.

  • translocator), member 6 (SLC25A6). All these novel variants informa-tion found associated with AD could be useful for targeting eitherdifferent brain regions (conditioned to their presence) or variousbiomolecular entities for designing treatment strategies for AD andother diseases.

    Furthermore we detected over-representation of TFBS in thepromoter regions of co-expressed genes. For this analysis, rst,differentially expressed genes along with other signicant genesinvolved in AD pathways were taken as input (all the genes fromSupplementary Tables S1 and S2). Then, we grouped all TFsaccording to the TFBS classes and found 10 different groups ofclasses for over-represented TFBS for all the genes. One represen-tative TF was selected from each class based on its frequency ofoccurrence (most frequent in a class is being selected as putativerepresentative candidate for that class) in the input set of genes(Table 2). Another aspect that we considered while selectingrepresentative TF for each class was availability of its binding sitein almost all the genes that were important to AD (Table 2 and

    Supplementary Table S3). Furthermore literature analysis of theserepresentative TFs revealed that AP1 (representative of Class 1) oractivating protein-1 (IUPAC code: NNNSTCA) is a leucine-zipper TF,which is a heterodimer formed by c-Jun and c-Fos. AP1 actssynergistically with NFAT family proteins on composite regulatoryelements involved in the regulation of the immune system(Macian et al., 2001). The second class representative ZEB1, zincnger/ homeodomain serve as DNA binding domain. ZEB1/zfh-1transcriptional repressor regulates muscle differentiation andexpressed in central nervous system (CNS). Schmalhofer et al.(2009) reveal in their study, the molecular interconnection of ZEB1with E-cadherin, -catenin, and WNT signaling in cancerogenesis.FEV, the third class representative functions as a transcriptionalregulator essentially in the differentiation and the maintenance ofthe adult human brain central serotonergic neurons. Kriegebaumet al. (2010) in their study, assumed that any severe mutation inFEV would result in fetal death. Class 4 representative SRY (sex-determining region Y) is a sex-determining gene on the Ychromosome in the therians (placental mammals and marsupials).It plays a major role in determining gender in humans (Ottolenghiet al., 2007).

    Class 5 representative, Tcfcp2l1 is preferentially upregulated inembryonic stem cells as component of the LIF and BMP signalingpathways, self-renewal regulator and key reprogramming factor buthas uncharacterized DNA-binding properties and function (Chenet al., 2008). Coming to RUNX1, the representative of class 6 is acritical regulator of CD41 expression in early embryos. It alsocontrols the early stages of hematopoietic development and an

    Fig. 8. Diagram showing independent and common genes amongst AD and AGafter multiple comparison analysis through various statistical measures.

    co-e

    P.P. Panigrahi, T.R. Singh / Journal of Theoretical Biology 334 (2013) 109121 115Fig. 9. Sub-network representing the set of genes those have been mapped through

    for co-expression.essential regulator of blood cell specication (Tanaka et al., 2012).Class 7 representative, TLX1::NFIC displayed cell type specicpatterns and these footprinting patterns are highly correlated withgene expression differences. Also there are evidences where thehomeoprotein TLX1 is known to interact with the CCAAT binding TFNFIC (Boyle et al., 2011). This suggests that the differential bindingof the TLX1/NFIC complex in these cell types identied by thefootprinting data is likely mediated by NFIC expression.

    xpression analysis. BACE1 is being held at central position to look for its associates

  • r gesent

    ring

    AGA

    P.P. Panigrahi, T.R. Singh / Journal of Theoretical Biology 334 (2013) 109121116Class 8 representative HOXA5 (Homeobox A5) is a member of theHomeobox family. During development it regulates organogenesis inlung, mammary, and tracheal tissues, and in adult tissues regulatesmammary gland development and function. HOXA5 is also believed tofunction as a tumor suppressor (breast, colon, and lung) by transacti-vating p53 to promote p53-dependent and p53-independent apopto-tic signaling. The expression of HOXA5 is regulated at least in partthrough epigenetic modications of the HOXA5 gene in these tumors(Atabakhsh et al., 2012). The representative of class 9 named, TBP(TATA box Binding Protein) is one of the most recognized DNAbenders, and their interactions with TFs are majorly responsible forthe spatiotemporal gene expression patterns (Hooghe et al., 2012).Coming to the class 10, T (T-box transcription family) is vital formorphogenetic movements in various processes of animal develop-ment (Yamada et al., 2012). All the important class representative TFsfor TFBS were provided in Fig. 10a (classes 15) and b (classes 610).

    It has been observed that conserved TFs are potent generegulators for AG and AD. We also explored this conservation ofTFs towards their regulation mechanism for gene involved in AGand AD. We found shared nature of TFs amongst AG and AD whichwas associated with other disease such as haematopoiesis,EpsteinBarr virus infection, viral carcinogenesis and other carci-nomas, dominant optic atrophy, atrial brillation, coronary heartdisease and sudden cardiac death. Major classes for secondarystructure elements were dominated by helix-turn-helix for thefamilies homeo, Arid, and Myb and its winged version for thefamilies Forkhead, IRF, and Ets (Aravind et al., 2005). Anothermajor class found was Zinc-coordinating which belongs to twofamilies hormone-nuclear receptor, and beta-beta-alpha-zinc n-ger (Supplementary Table S3). These structural level constraintsproposed plausible targets for neuronal tangles and plaquesthrough normal and winged helix-turn-helix and beta-beta-alpha structures. Linked TFBS for these physico-chemical elementscould be manipulated to deal with involved complexities.

    Another interesting observation that emerged from the analysis

    Table 2Details of 10 representative TFs for 10 different TFBS classes that were identied fofrequency of occurrence in that class. Most frequent was selected as putative repre

    Class no. Class name Representative TFs Most frequently occur

    1 Zipper-type AP1 TGAGTCA2 Zinc-coordinating ZEB1 CACCTG3 Winged helix-turn-helix FEV CAGGAAGT4 Other alpha-helix SRY ATAAACAAT5 Other Tcfcp2l1 CCAGTCTGAGCCAG, CC6 Ig-fold RUNX1 Not found7 Helix-turn-helix::other TLX1::NFIC TGGCAGCATGCCAA8 Helix-turn-helix HOXA5 CATTAGTG, AATTTATG9 Beta-sheet TBP Not found10 Beta-hairpin-ribbon T CTAGGTGTGAAis association of miRNAs and their target genes. miRNAs regulatetarget genes at the posttranscriptional level and plays an impor-tant role in the development, and in other human diseasesincluding heart disease, schizophrenia and psoriasis. In this study,out of the 43 genes (signicant hub genes identied throughenrichment analyses), 13 genes were found to be miRNA targets(Supplementary Table S1). There are some other genes such asLDB2 (LIM domain binding 2), DOPEY1 (dopey family member 1),DNM1L (dynamin 1-like) and EHD1 (EH-domain containing 1) arefound to be common in both AD and AG, but are involved indifferent biological functions in AD. LDB2 is found in both TTGSand miRNA. DOPEY1 is found in miRNA and DNM1L is found to becommon in KEGG whereas EHD1 is common in TTGS and KEGG.Differential patterns of interaction and association of these geneswith diverse biological processes support the premise that allinteractions are associations but not all associations are interactions.3.4. Brain regions and their pathway mapping

    In a recent study, microarray data of four different brain regionsthat are EC, HIP, PCC and MTG from AD affected and normal oneshave been analyzed. Six sets of intersection genes were obtainedfrom six comparisons (1) EC and HIP; (2) EC and PCC; (3) EC andMTG; (4) HIP and PCC; (5) HIP and MTG; (6) PCC and MTG (formore details see Ray and Zhang, 2010). Co-expressed networkswere built for each brain region using the intersection genes.When the genes of the present study were compared with theresults of Ray and Zhang (2010), interestingly many commongenes were found involved in various brain regions. At least onecommon gene is found in almost all regions. Between EC and HIPregion, KCNAB2 and SPF3 are found, between EC and PCC regionGPR22 is found, between HIP and PCC region KCNAB2 is found.There are four genes from this study named TBL1X, EFNB2, RND2and CDH10 which were found to be involved between HIP andMTG region and TBL1X, EFNB2 genes between MTG, EC and HIPregion (Ray and Zhang, 2010). Concerned pathways such as Wnt,axon guidance, and Akt have also been found associated and thedetailed descriptions and signicance of these genes are describedin Supplementary Table S1. These genes and their associatedpathways could be treated as hotspots while planning experi-mental procedures for association studies.

    Wnt proteins are secreted morphogens, that are required forbasic developmental processes, such as cell-fate specication,progenitor-cell proliferation and the control of asymmetric celldivision, in many different species and organs. There are at leastthree different Wnt pathways: the canonical pathway, the planarcell polarity (PCP) pathway and the Wnt/Ca2+ pathway. TBL1X isone of the core proteins of canonical Wnt signaling pathway. In thispathway, the major effect of Wnt ligand binding to its receptor isthe stabilization of cytoplasmic beta-catenin through inhibition ofthe beta-catenin degradation complex. Beta-catenin is then free toenter the nucleus and activate Wnt-regulated genes through its

    nes involved in this study. Representative TF for a class was selected based on itsative for a class.

    TFBS Gene hits No. of TFs in one class Total no. of target TFBS hits

    46 21 221649 37 479848 19 379245 8 1403

    CTGAACCAG 26 2 15141 8 9082 1 2

    48 18 543530 1 1586 1 9interaction with TCF (T-cell factor) family TFs and concomitantrecruitment of coactivators (Nalbantoglu et al., 2012). The hub ofthe canonical pathways are obtained as KC1AL (casein kinase Iisoform alphalike), YWHAZ (protein kinase C inhibitor protein 1)and TBL1XR1 (F-box-like/WD repeat containing protein). TBL1XR1,also a core protein in the canonical Wnt signaling pathway, isinvolved in signal transduction and cytoskeletal assembly and playsan essential role in transcription activation mediated by nuclearreceptors and has effects on cytotypic differentiation. Besides, lowlevels of TBL1XR1 gene expression cause breast cancer (Kadotaet al., 2009). TBL1X is a protein that plays an essential role intranscription activation mediated by nuclear receptors. It is acomponent of E3 ubiquitin ligase complex and could be a promisingcandidate for targeting APP regulatory inhibition.

    Axon guidance represents a key stage in the formation ofneuronal network. Axons are guided by a variety of guidance

  • P.P. Panigrahi, T.R. Singh / Journal of Theoretical Biology 334 (2013) 109121 117factors, such as netrins, ephrins, slits, and semaphorins. Theseguidance cues are read by growth cone receptors, and signaltransduction pathways downstream of these receptors convergeonto the Rho GTPases to elicit changes in cytoskeletal organizationthat determine which way the growth cone will turn. The geneEFNB2 encodes an EFNB class ephrin (EPH) which binds to theEPHB4 and EPHA3 receptors (Hinck, 2004). EFNB2 could be aputative marker for disease progression and analysis. There are

    Fig. 10. (a) The sequence LOGOs for classes 15 representative TFs. Class representativ(b) The sequence LOGOs from classes 610 representative TFs. Class representatives atranscription family (class 10).several evidences where various inhibitors, biomarkers, and otherbiological entities associated with the progression of AD and otherdiseases have been proposed and could provide molecular insightsfor the cure of AD (Chou et al., 2000; Chou and Howe, 2002).

    NOTCH2NL (Notch homolog 2 N-terminal-like), also known asN2N is identied as another putative candidate shared among ADand AG. NOTCH2NL is a 236 amino acid protein that has a nonspecicfunction in Notch signaling. The Notch signaling pathway controls

    es are AP1 (class 1), ZEB1 (class 2), FEV (class 3), SRY (class 4), Tcfcp2l1 (class 5).re RUNX1 (class 6), TLX1::NFIC (class 7), HOAX5 (class 8), TBP (class 9), T: T-box

  • cellular interactions important for the specication of a variety offates in both invertebrates and vertebrates. The Notch genes areexpressed in a variety of tissues in both the embryonic and adultorganism, suggesting that the genes are involved in multiple signal-ing pathways. The Notch proteins have been found to be over-expressed or rearranged in human tumors. In addition, mutations inNotch genes may cause hyperplasia of the nervous system (Duanet al., 2004). Association of Notch signaling pathway and MAPKpathway is very well dened and identication of some markergenes involved (sharing biochemical signals) in AD and other diseasesuch as cancer, cardiovascular disease, diabetes etc. justify thepurpose of such comparative analyses.

    The results obtained from nucleosome analysis are providedin Supplementary Table S4. In this table output is divided intove columns. The rst column of the table contain the name ofgenes, the second column contain the length of genes with respectto their DNA sequence in base pair (bp). The third columngives the information about the number of segments or sub-sequences (each of these sub-sequence is 150-bp long) in eachgene sequence. The fourth column has the information aboutthe presence or absence of nucleosomes in each segments of thegenes. The fth column gives the information about the presence

    century, we are far from having a complete understanding of the

    the relationship between the structural properties of networks andthe nature of dynamics taking place on these networks (Sporns andHoney, 2006; Quackenbush, 2006). The present study providesfurther support for the presence of small-world features in func-tional brain networks and demonstrates that AD is characterized byan association of small-world network distinctiveness.

    Interaction data was collected from resources such as KEGG,Reactome, and others for AD pathways. Network motif analysiswas performed through FANMOD and crosschecked using MFin-der, and MAVisto. Identied network motifs were selected basedupon the statistical criteria for the tools used (Z-score and p-value). We found the common network motifs which occurs intranscriptional network such as SIM, Bifan, MIM, and complexnetworks (Fig. 11an). While annotating these motifs for theirbiological signicance, we found specic ids matching withstandard motif dictionary (Alon, 2006), which are mentioned asidentied motifs (Fig. 12). Analyzed network motifs were inferredfor their biological signicance through association mapping overstandard biological network motifs.

    SIM (Shen-Orr et al., 2002) are a family of structures with freeparameters and are strong network motifs. In the SIM networkmotif, a master TF X controls a group of target genes, Y1, Y2, ., Yn.

    que

    P.P. Panigrahi, T.R. Singh / Journal of Theoretical Biology 334 (2013) 109121118intricate network of molecular processes involved in diseases andstill searching for the cures of most complex diseases. Diseases arecaused by the effect of several genes: for instance, comprehensivestudies on mutations in complex diseases, such as breast cancer(Sjoblom et al., 2006) or other types of cancer. In recent years manyimportant properties of complex networks have been delineated.In particular, signicant progress has been made in understanding

    Fig. 11. (an) Network motifs identied for AD and AG associated pathways. Most freor absence of linkers in the respective sequence. Range of linkersand their respective frequency is also mentioned in square brackets([]) along with each entry in column 5 (Supplementary Table S4).It is believed that generated information for nucleosomes, andlinkers could proved to be biologically meaningful for future structurebased studies associated with genes and proteins involved in AGand AD.

    3.5. Network motifs and their disease associated annotation

    Despite many ground-breaking discoveries during the pastjl), then 1 each for 6 and 7 node classes (1 type: m and n, respectively). Search was perEach of these target genes has only one input and are not beingregulated by any other TF. Regulation sign is also same for all thegenes in SIM and master TF X is usually autoragulatory. SIM has adynamical function and can generate temporal programs ofexpression, where genes are activated one by one in a denedorder. Temporal order associated with SIM exhibits the earlier theprotein functions in the pathway, the earlier the gene is activated(Kalir and Alon, 2004). The temporal order generated by SIM ismaintained against mutations due to selective advantage affordedby just- when- needed production strategies. These are evidencesof temporal order found in damage repair systems controlled bySIMs, where genes responsible for the mildest form of repair areturned off rst, and for more severe damage are turned off later(Ronen et al., 2002). MIM arose as a generalization of SIM.Common four-mode motifs among neuronal networks are dia-mond, biparallel, and bifan. A bifan motif is built by two regulatorsand two regulated genes, with the two regulators jointly regulat-ing each target gene (Fig. 12, id282).

    This report is rst of its kind where network motif analysis forAD associated biological networks is presented. All the identied

    nt motifs belongs to four node class (9 types: ai) followed by 5 node class (3 types:

    formed for 38 node motifs. No signicant motif was found in 3 and 8 node classes.

  • and simulation studies would provide system level insights while

    . Bifar ea

    P.P. Panigrahi, T.R. Singh / Journal of Theoretical Biology 334 (2013) 109121 119network motifs resembles the standard motifs and have beenfound associated with specic proteins for their activation orinhibition as an active role in these networks. Signicant entrieshave been found for APP, APO-1, CAPN1, APAF1, APBB1 and otherswhile performing annotations. As a multifactorial disorder, AD hasbeen frequently linked to vascular risk factors like hypertension,obesity, diabetes, hyperlipidemia etc. in numerous prospective

    Fig. 12. Biologically signicant network motifs identied from AD and AG pathwaysinhibition, and stage change properties. Respective gene/protein names are given focohort studies of the general population (Qiu, 2012). Previousstudies have suggested that the molecular pathophysiology of ADsignicantly overlaps with that of type 2 diabetes and the meta-bolic syndrome, most notably in insulin resistance (Craft et al.,2012). Talbot et al. (2012) demonstrated that insulin resistance inAD occurs not only in peripheral tissues but also in the brain. Theauthors show that hippocampal brain slices in AD were lessresponsive to insulin than controls because of increased phosphor-ylation of IRS-1 that attenuated downstream Akt and ERK signal-ing. Brain insulin resistance in AD was not dependent on diabetes,or on the APOE4 genotype, which also affects the Akt pathway andis a major determinant of risk for non-Mendelian AD (Warren andStrittmatter, 2012). The kind of analyses propose common techni-cal capabilities and treatment solutions not for AD but for otherdisease where biological markers are being shared.

    This is a kind of study which not only performs computationalenrichment analysis but also evaluate the performance of variouskinds of tools, methods, applications available to analyze geneexpression and network data. In this study similar kind of analyseswere performed using various available popular, and best qualitymethods and tools which will denitely propose pros and cons ofusing freely available tools for academics and research purpose.This study revealed common sharing of important biologicalprocesses and genes among AD and AG and supports previousstudies and hypothesis for the same. Some novel genes and othervariants for various biological processes have been reportedassociated with AD and AG and could be implicated in biochemicalevents leads to AD form AG through pathways and interactions.Quantitative measurement and assessment of patho-physiologicalprocesses amongst AD and AG could identify suitable gene candidatesselecting these newly reported entities as workhorses after theirexperimental verications.

    4. Conclusionand provide more information about therapeutic targets. Modeling

    n, 4-node sub-graph, and SIM (single input module) were common with activation,ch node.In this study, an integrative systems biology approach ispresented to cram a complex disease like AD and its associationwith AG and other diseases. Genome-wide expression prolingalong with their interaction mapping studies allow researchers todiscover disease genes systematically. In this paper, we studiedseveral approaches for prioritizing genes by integrating geneexpression proles. Unique study on network motifs and associa-tion of network motif entities with AD and AG related markersrevealed a new direction to its biological annotations. The resultsshow the extensive links between AD and AG at a molecular level,identifying core biological processes and genes they share. Identi-ed TFs and their respective TFBS information would be biologi-cally meaningful for the associated cellular and molecular processfor AD and AG. Major classes for secondary structure elementswere dominated by helix-turn-helix for the family homeo, and itswinged version for the family forkhead. There is no doubt that ADand AG share common patho-physiological processes. It has beenfound that not only AG and AD share common biological processesbut also there is involvement of other important human diseasewith these biological processes. Interactions of TFBS, genes, andencoded proteins at molecular level for other disease such asdiabetes, dominant optic atrophy, coronary heart disease, suddencardiac arrest, Gaucher disease, myriad carcinomas, and cirrhosissignies putative association among AG, AD and above mentioneddisease. Identied unique miRNA targets as a regulatory processfor AD such as LDB2, and DOPEY1 could be veried to look fortheir active participation in the process of gene regulation andinhibitory activities. Given these results, a comprehensive analysisof both conditions in tandem, for example using the same tissues

  • Cancer Res. 2, 549565.Berbenetz, N.M., Nislow, C., Brown, G.W., 2010. Diversity of eukaryotic DNA

    replication origins revealed by genome-wide analysis of chromatin structure.

    456464.Carter, D.B., Chou, K.C., 1998. A model for structure dependent binding of Congo

    Biophys. Res. Commun. 331, 5660.

    regression analysis by local tting. J. Am. Stat. Assoc. 83, 596610.Craft, S., et al., 2012. Intranasal insulin therapy for Alzheimer disease and amnestic

    P.P. Panigrahi, T.R. Singh / Journal of Theoretical Biology 334 (2013) 109121120mild cognitive impairment. Arch. Neurol. 69, 2938.Croft, D., OKelly, G., Wu, G., et al., 2010. Reactome: a database of reactions,

    pathways and biological processes. Nucleic Acids Res. 39, D691D697.Deza, E., Deza, M.M., 2009. Encyclopedia of Distances. Springer p. 94.Duan, Z., Li, F.Q., Wechsler, J., et al., 2004. A novel notch protein, N2N, targeted by

    neutrophil elastase and implicated in hereditary neutropenia. Mol. Cell. Biol.Chou, K.C., Howe, W.J., 2002. Prediction of the tertiary structure of the beta-secretase zymogen. Biochem. Biophys. Res. Commun. 292, 702708.

    Chou, K.C., Tomasselli, A.G, Heinrikson, R.L., 2000. Prediction of the tertiarystructure of a caspase-9/inhibitor complex. FEBS Lett. 470, 249256.

    Cleveland, W.S., Devlin, S.J., 1988. Locally weighted regression: an approach toRed to Alzeheimer beta-amyloid brils. Neurobiol. Aging 19, 3740.Chen, W., Lin, H., Feng, P.M., et al., 2012. iNuc-PhysChem: a sequence-based

    predictor for identifying nucleosomes via physicochemical properties. PLoSOne 7, e47843.

    Chen, X., Xu, H., Yuan, P., et al., 2008. Integration of external signaling pathwayswith the core transcriptional network in embryonic stem cells. Cell 133,11061117.

    Chou, K.C., 2004. Review: structural bioinformatics and its impact to biomedicalscience. Curr. Med. Chem. 11, 21052134.

    Chou, K.C., 2004. Insights from modelling the tertiary structure of BACE2. J.Proteome Res. 3, 10691072.

    Chou, K.C., 2005. Modeling the tertiary structure of human cathepsin-E. Biochem.PLoS Genet. 6.Blalock, E., Geddes, J., Chen, K., et al., 2004. Incipient Alzheimers disease:

    microarray correlation analyses reveal major transcriptional and tumor sup-pressor responses. Proc. Natl. Acad. Sci. U. S. A. 101, 21732178.

    Boyle, A.P., Song, L., Lee, B.K., et al., 2011. High-resolution genome-wide in vivofootprinting of diverse transcription factors in human cells. Genome Res. 21,and microarray platforms across both age and AD progression,would be quite powerful. Such direct comparison would comple-ment these analyses, permitting quantitative assessment of biolo-gical differences, as well as the similarities, between AG and AD.

    Conict of interest

    The authors declare that they have no conict of interest.

    Acknowledgment

    We would like to express our gratitude to the editor and threeanonymous reviewers whose constructive and insightful com-ments were very helpful in strengthening this paper. PanigrahiPP is funded by Jaypee University of Information Technology (JUIT)in house research scholar funding scheme under Jaypee EducationSystem (JES).

    Appendix A. Supporting information

    Supplementary data associated with this article can be found inthe online version at http://dx.doi.org/10.1016/j.jtbi.2013.06.013.

    References

    Alon U., Introduction to Systems Biology: Design Principles of Biological Circuits,London, UK:Chapman and Hall, 2006.

    Aravind, L., Anantharaman, V., Balaji, S., et al., 2005. The many faces of the helix-turn-helix domain: transcription regulation and beyond. FEMS Microbiol. Rev.2, 231262.

    Ashburner, M., Ball, C.A., Blake, J.A., et al., 2000. Gene ontology: tool for theunication of biology. The Gene Ontology Consortium. Nat. Genet. 25, 2529.

    Atabakhsh, E., Wang, J.H., Wang, X., et al., 2012. RanBPM expression regulatestranscriptional pathways involved in development and tumorigenesis. Am. J.24, 5870.Duncan, D.T., Prodduturi, N., Zhang, B., 2010. WebGestalt2: an updated andexpanded version of the Web-based Gene Set Analysis Toolkit. BMC Bioinform.11, P10.

    D.M. Dziuda, 2010. Basic analysis of gene expression microarray data, Data Miningfor Genomics and Proteomics: analysis of Gene and Protein Expression Data,John Wiley and Sons, Hoboken, pp. 1793.

    Eden, E., Lipson, D., Yogev, S., Yakhini, Z., 2007. Discovering motifs in ranked lists ofDNA sequences. PLoS Comput. Biol. 3, e39.

    Eden, E., Navon, R., Steinfeld, I., Lipson, D., Yakhini, Z., 2009. GOrilla: a tool fordiscovery and visualization of enriched GO terms in ranked gene lists. BMCBioinform. 10, 48.

    Gotz, J., Ittner, L.M., Lim, Y.A., 2009. Common features between diabetes mellitusand Alzheimers disease. Cell. Mol. Life Sci. 66, 13211325.

    Gu, R.X., Gu, H., Xie, Z.Y., Wang, J.F., Arias, H.R., et al., 2009. Possible drug candidatesfor Alzheimers disease deduced from studying their binding interactions withalpha 7 nicotinic acetylcholine receptor. Med. Chem. 5, 250262.

    Hardy, J., Selkoe, D.J., 2002. The amyloid hypothesis of Alzheimers disease:progress and problems on the road to therapeutics. Science 297, 353356.

    Hinck, L., 2004. The Versatile Roles of Axon Guidance. Cues Tissue Morphog. Dev.Cell 7, 783793.

    Ho, S.S.J., Fulton, L.D., Arenillas, D.J., Kwon, A.T., Wasserman, W.W., 2007. oPOSSUM:integrated tools for analysis of regulatory motif over-representation. NucleicAcids Res. 35, W245W252.

    Hommet, C., Mondon, K., Constans, T., et al., 2011. Review of cerebral microangio-pathy and Alzheimers disease: relation between white matter hyperintensitiesand microbleeds. Dementia Geriatric Cognitive Dis. 32, 367378.

    Hooghe, B., Broos, S., van, R.F., De, B.P., 2012. A exible integrative approach basedon random forest improves prediction of transcription factor binding sites.Nucleic Acids Res. 40, e106.

    Hotelling, H., 1933. Analysis of a complex of statistical variables into principlecomponents. J. Educ. Psychol. 24, 417441.

    Huang Y., Zhou X., Miao B., et al., 2009. An image based system biology approach forAlzheimers disease pathway analysis. IEEE NIH Life Sci. Syst. Appl. Workshop.pp. 12832.

    Johnson, S.C., 1967. Hierarchical clustering schemes. Psychometrika 2, 241254.Kadota, M., Sato, B, Duncan, et al., 2009. Identication of novel gene amplications

    in breast cancer and coexistence of gene amplication with an activatingmutation of PIK3CA. Cancer Res. 69, 73577365.

    Kalir, S., Alon, U., 2004. Using a quantitative blueprint to reprogram the dynamics ofthe agella gene network. Cell 117, 713720.

    Kandasamy, K., Mohan, S.S., Raju, R., Keerthikumar, S., et al., 2010. NetPath: a publicresource of curated signal transduction pathways. Genome Biol. 11, R3.

    Kanehisa, M., Goto, S., 2000. KEGG: kyoto encyclopedia of genes and genomes.Nucleic Acids Res. 28, 2730.

    Kann, M.G., 2007. Protein interactions and disease: computational approaches touncover the etiology of diseases. Briengs Bioinform. 8, 333346.

    Kashtan N., Itzkovitz S., Milo R., et al., 2005. Network motif detection tool Mndertool guide, Technical report, Departments of Molecular Cell Biology andComputer Science and Applied Mathematics, Weizmann Institute of Science,Rehovot.

    Kohonen, T., 1982a. Self-organized formation of topologically correct feature maps.Biol. Cybern. 43, 5969.

    Kohonen, T., 1982b. Analysis of a simple self-organizing process. Biol. Cybern. 44,135140.

    Kong, W., Mou, X., Hu, X., 2011. Exploring matrix factorization techniques forSignicant genes identication of Alzheimers disease microarray gene expres-sion data. BMC Bioinform. 12, S7.

    Krauthammer, M., Kaufmannd, C.A., Gilliam, T.C., Rzhetsky, A., 2004. Moleculartriangulation: bridging linkage and molecular-network information for identi-fying candidate genes in Alzheimers disease. Proc. Natl. Acad. Sci. U. S. A. 101,1514815153.

    Kriegebaum, B.C., Gutknecht, L., Bartke, L., et al., 2010. The expression of thetranscription factor FEV in adult human brain and its association with affectivedisorders. J. Neural Trans. 117, 831836.

    Kuo, W., Jenssen, T., Butte, A., et al., 2002. Analysis of matched mRNA measure-ments from two different microarray technologies. Bioinformatics 18, 405412.

    Lu, T., Pan, Y., Kao, S.Y., et al., 2004. Gene regulation and DNA damage in the ageinghuman brain. Nature 429, 883891.

    Macian, F., Lopez-Rodriguez, C., Rao, A., 2001. Partners in transcription: NFAT andAP-1. Oncogene 20, 24762489.

    Maes, O.C., Xu, S., Yu, B., Chertkow, H.M., et al., 2007. Transcriptional proling ofAlzheimer blood mononuclear cells by microarray. Neurobiol. Aging 28,17951809.

    Miller, J.A., Oldham, M.C., Geschwind, D.H., 2008. A systems level analysis oftranscriptional changes in Alzheimers disease and normal aging. J. Neurosc. 28,14101420.

    Nalbantoglu B., Tekir S.D., lgen K. ., 2012.Wnt signaling network in homosapiens. In: Paula Bubulya (Ed.), Cell Metabolism Cell Homeostasis StressResponse .

    Newman, J.C., Weiner, A.M., 2005. L2L: a simple tool for discovering the hiddensignicance in microarray expression data. Genome Biol. 6, r81.

    Ottolenghi, C., Uda, M., Crisponi, L., et al., 2007. Determination and stability of sex.BioEssays: News Rev. Mol. Cell. Dev. Biol. 29, 1525.

    Panigrahi, P.P., Singh, T.R., 2012. Computational analysis for functional and evolu-tionary aspects of BACE-1 and associated Alzheimers releted proteins. IJCI Stud.

    1, 322332.

  • Qiu, C., 2012. Preventing Alzheimers disease by targeting vascular risk factors:hope and gap. J. Alzheimers Dis. 32, 721731.

    Quackenbush, J., 2006. Microarray analysis and tumor classication. N Engl J. Med.354, 24632472.

    Ray, M., Zhang, W., 2010. Analysis of Alzheimers disease severity across brainregions by topological analysis of gene co-expression networks. BMC Syst. Biol.4, 136.

    Ray, M., Ruan, J., Zhang, W., 2008. Variations in the transcriptome of Alzheimersdisease reveal molecular networks involved in cardiovascular diseases. GenomeBiol. 9, r148.

    Ronen, M., Rosenberg, R., Shraiman, B.I., Alon, U., 2002. Assigning numbers to thearrows: parameterizing a gene regulation network by using accurate expres-sion kinetics. PNAS 99, 1055510560.

    Ruan J., Zhang W., 2006. Identication and evaluation of functional modules in geneco-expression networks. In: Proceedings of RECOMB Satellite Conferences onSystems Biology and Computational Proteomics, San Diego, CA, pp. 5776.

    Ruan, J., Dean, A.K., Zhang, W., 2010. A general co-expression network-basedapproach to gene expression analysis: comparison and applications. BMC Syst.Biol. 4, 8.

    Saeed, A.I., Sharov, V., White, J., Li, J., et al., 2003. TM4: a free, open-source systemfor microarray data management and analysis. Biotechniques 34, 374378.

    Said, M.R., Begley, T.J., Oppenheim, A.V., et al., 2004. Global network analysis ofphenotypic effects: protein networks and toxicity modulation in Saccharo-myces cerevisiae. PNAS 101, 1800618011.

    Sandelin, A., Alkema, W., Engstrom, P., et al., 2004. JASPAR: an open-access databasefor eukaryotic transcription factor binding proles. Nucleic Acids Res. 32,D91D94.

    Schmalhofer, O., Brabletz, S., Brabletz, T., 2009. E-cadherin, -catenin, and ZEB1 inmalignant progression of cancer. Cancer Metastasis Rev. 28, 151166.

    Schreiber, F., Schwbbermeyer, H., 2005. Mavisto: a tool for the exploration ofnetwork motifs. Bioinformatics 21, 35723574.

    Sealfon, R., Hibbs, M., Huttenhower, C., et al., 2006. GOLEM: an interactive graph-based gene-ontology navigation and analysis tool. BMC Bioinformatics 7, 443.

    Sehgal, M., Singh, T.R., 2012. Identication and analysis of biomarkers for repairproteins: a bioinformatic approach. J. Nat. Sci. Biol. Med. 2, 139146.

    Shachar, R., Unger, L., Kupiec, M., et al., 2008. A systems-level approach to mappingthe telomere-length maintenance gene circuitry. Mol. Syst. Biol. 4, 172.

    Shen-Orr, S.S., Milo, R., Mangan, S., Alon, U., 2002. Network motifs in the

    Sporns, O., Honey, Ch.J., 2006. Small world inside big brains. PNAS 103,1921919220.

    Stark, C., Breitkreutz, B.J., Reguly, T., et al., 2006. BioGRID: a general repository forinteraction datasets. Nucleic Acids Res. 34, D535D539.

    Stekel, D., Bioinformatics, Microarray, 2003. Analysis of Differentially ExpressedGenes. Cambridge University Press, New York, pp. 110138.

    Szklarczyk, D., Franceschini, A., Kuhn, M., et al., 2011. The STRING database in 2011:functional interaction networks of proteins, globally integrated and scored.Nucleic Acids Res. 39, D561D568.

    Talbot, K., Wang, H.Y., Kazi, H., et al., 2012. Demonstrated brain insulin resistance inAlzheimers disease patients is associated with IGF-1 resistance, IRS-1 dysre-gulation, and cognitive decline. J. Clin. Invest. 122, 13161338.

    Tanaka, Y., Joshi, A., Wilson, N.K., et al., 2012. The transcriptional programmecontrolled by Runx1 during early embryonic blood development. Dev. Biol. 366,404419.

    Tarawneh, R., Holtzman, D.M., 2010. Biomarkers in translational research ofAlzheimers disease. Neuropharmacology 59, 310322.

    Tilgner, H., Nikolaou, C., Althammer, S., et al., 2009. Nucleosome positioning as adeterminant of exon recognition. Nat. Struct. Mol. Biol. 16, 9961001.

    Tusher, V.G., Tibshirani, R., et al., 2001. Signicance analysis of microarrays appliedto the ionizing radiation response. Proc.Natl. Acad. Sci. 98, 51165121.

    Warren, J., Strittmatter, M.D., 2012. Alzheimers disease: the new promise. J. Clin.Invest. 122, 1191.

    Wei, D.Q., Sirois, S., Du, Q.S., Arias, H.R., Chou, K.C., 2005. Theoretical studies ofAlzheimers disease drug candidate [(2,4-dimethoxy) benzylidene]-anabaseinedihydrochloride (GTS-21) and its derivatives. Biochem. Biophys. Res. Commun.338, 10591064.

    Wernicke, S., Rasche, F., 2006. FANMOD: a tool for fast network motif detection.Bioinformatics 22, 11521153.

    Yamada, A., Koyanagi, K.O., Watanabe, H., 2012. In silico and in vivo identication ofthe intermediate lament vimentin that is downregulated downstream ofBrachyury during Xenopus embryogenesis. Gene 491, 232236.

    Yang, Y.H., Dudoit, S., Luu, P., et al., 2002. Normalization for cDNA microarray data: arobust composite method addressing single and multiple slide systematicvariation. Nucleic Acids Res. 30, e15.

    Yasuda, T., Sugasawa, K., Shimizu, Y., et al., 2005. Nucleosomal structure ofundamaged DNA regions suppresses the non-specic DNA binding of the XPCcomplex. DNA Repair (Amst) 4, 389395.

    Zhang, B., Kirov, S., Snoddy, J., 2005. WebGestalt: an integrated system for exploringgene sets in various biological contexts. Nucleic Acids Res. 33, W741W748.

    P.P. Panigrahi, T.R. Singh / Journal of Theoretical Biology 334 (2013) 109121 121transcriptional regulation network of Escherichia coli. Nat. Genet. 31, 6468.Sjoblom, T., Jones, S., Wood, L.D., et al., 2006. The consensus coding sequences of

    human breast and colorectal cancers. Science 314, 268274.

    Computational studies on Alzheimers disease associated pathways and regulatory patterns using microarray gene expression...IntroductionMaterials and methodsDataData pre-processingDifferential gene expressionClustering of co-expressed genesHypergeometric distribution and association of ranked genesTopological overlap between co-expressed networks and other associated factorsPrioritization of gene candidates with molecular triangulationNetwork motif analysis

    Results and discussionDifferential gene expression, clustering of co-expressed genesEnrichment analysis through co-expressed networks and ranked list of genesNovel gene variants, transcription factors, and miRNA targetsBrain regions and their pathway mappingNetwork motifs and their disease associated annotation

    ConclusionConflict of interestAcknowledgmentSupporting informationReferences