Genomics and Bioinformatics of Parkinson’s...

16
Genomics and Bioinformatics of Parkinson’s Disease Sonja W. Scholz 1 , Tim Mhyre 1 , Habtom Ressom 2 , Salim Shah 3 , and Howard J. Federoff 1,4 1 Department of Neuroscience, Georgetown University, Washington, DC 20057 2 Department of Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20057 3 Department of Biochemistry, Molecularand Cellular Biology, Georgetown University, Washington, DC 20057 4 Department of Neurology, Georgetown University, Washington, DC 20057 Correspondence: [email protected] Within the last two decades, genomics and bioinformatics have profoundly impacted our understanding of the molecular mechanisms of Parkinson’s disease (PD). From the descrip- tion of the first PD gene in 1997 until today, we have witnessed the emergence of new technologies that have revolutionized our concepts to identify genetic mechanisms impli- cated in human health and disease. Driven by the publication of the human genome se- quence and followed by the description of detailed maps for common genetic variability, novel applications to rapidly scrutinize the entire genome in a systematic, cost-effective manner have become a reality. As a consequence, about 30 genetic loci have been unequiv- ocally linked to the pathogenesis of PD highlighting essential molecular pathways underly- ing this common disorder. Herein we discuss how neurogenomics and bioinformatics are applied to dissect the nature of this complex disease with the overall aim of developing rational therapeutic interventions. K nowing is not enough; we must apply. Will- ing is not enough; we must do.” The pathogenesis of Parkinson’s disease (PD) as we understand it today includes a broad spectrum of metabolic pathways, from oxidative stress caused by mitochondrial dys- function, inflammation, abnormal protein me- tabolism, and aging (reviewed in Dawson and Dawson 2003; Dauer and Przedborski 2003). Most of these pathophysiological links with PD are the result of studying the functional consequences of gene mutations implicated in familial PD. With the introduction of modern genomic technologies numerous genetic risk loci involved in the more common nonfamil- ial form of PD are also being uncovered iden- tifying novel pathways, raising new research questions and hopes to better understand and treat this neurodegenerative condition more effectively. In this article, we will introduce some of the main concepts and discussions in PD genetics, outline the current status of PD genomics, dis- cuss the impact of new sequencing technologies, and chart the necessary steps for translating these findings into targeted therapeutics. Editor: Serge Przedborski Additional Perspectives on Parkinson’s Disease available at www.perspectivesinmedicine.org Copyright # 2012 Cold Spring Harbor Laboratory Press; all rights reserved; doi: 10.1101/cshperspect.a009449 Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449 1 www.perspectivesinmedicine.org on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/ Downloaded from

Transcript of Genomics and Bioinformatics of Parkinson’s...

Page 1: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

Genomics and Bioinformatics of Parkinson’sDisease

Sonja W. Scholz1, Tim Mhyre1, Habtom Ressom2, Salim Shah3, and Howard J. Federoff1,4

1Department of Neuroscience, Georgetown University, Washington, DC 200572Department of Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 200573Department of Biochemistry, Molecular and Cellular Biology, Georgetown University, Washington, DC 200574Department of Neurology, Georgetown University, Washington, DC 20057

Correspondence: [email protected]

Within the last two decades, genomics and bioinformatics have profoundly impacted ourunderstanding of the molecular mechanisms of Parkinson’s disease (PD). From the descrip-tion of the first PD gene in 1997 until today, we have witnessed the emergence of newtechnologies that have revolutionized our concepts to identify genetic mechanisms impli-cated in human health and disease. Driven by the publication of the human genome se-quence and followed by the description of detailed maps for common genetic variability,novel applications to rapidly scrutinize the entire genome in a systematic, cost-effectivemanner have become a reality. As a consequence, about 30 genetic loci have been unequiv-ocally linked to the pathogenesis of PD highlighting essential molecular pathways underly-ing this common disorder. Herein we discuss how neurogenomics and bioinformatics areapplied to dissect the nature of this complex disease with the overall aim of developingrational therapeutic interventions.

“Knowing is not enough; we must apply. Will-ing is not enough; we must do.”

The pathogenesis of Parkinson’s disease(PD) as we understand it today includes abroad spectrum of metabolic pathways, fromoxidative stress caused by mitochondrial dys-function, inflammation, abnormal protein me-tabolism, and aging (reviewed in Dawson andDawson 2003; Dauer and Przedborski 2003).Most of these pathophysiological links withPD are the result of studying the functionalconsequences of gene mutations implicated infamilial PD. With the introduction of modern

genomic technologies numerous genetic riskloci involved in the more common nonfamil-ial form of PD are also being uncovered iden-tifying novel pathways, raising new researchquestions and hopes to better understand andtreat this neurodegenerative condition moreeffectively.

In this article, we will introduce some of themain concepts and discussions in PD genetics,outline the current status of PD genomics, dis-cuss the impact of new sequencing technologies,and chart the necessary steps for translatingthese findings into targeted therapeutics.

Editor: Serge Przedborski

Additional Perspectives on Parkinson’s Disease available at www.perspectivesinmedicine.org

Copyright # 2012 Cold Spring Harbor Laboratory Press; all rights reserved; doi: 10.1101/cshperspect.a009449

Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449

1

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 2: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

“MENDELIAN” VERSUS COMPLEX DISEASE:SIMILAR IDEAS, DIFFERENT CONCEPTS

For simplicity, geneticists have separated geneticdiseases into two main categories. The first onerefers to “Mendelian” diseases such as Hunting-ton’s disease, muscular dystrophy, and cystic fi-brosis. “Mendelian” diseases are defined as theclassical familial forms of disease in which theunderlying genetic defect causes disease in alarge proportion of mutation carriers and there-fore typical inheritance patterns can be inferred.Previously, this disease category was the main-stay of genetic research; this is primarily becauseof the fact that until recently the standard ge-netic approach for disease gene discovery was alinkage study, a technique that relies on ascer-taining large families with multiple affected in-dividuals. For the most part, this approach wasvery successful with nearly 3000 Mendelian dis-orders deciphered to date (Lander 2011); how-ever, a linkage study design is only applicable to asmall proportion of human ailments that aretypically rather rare in the general population.Studying the genetics of complex diseases, whichconstitute the second disease category, proved tobe more challenging. Indeed, PD is a good ex-ample of a complex neurodegenerative disease;only a small subset of PD patients report a pos-itive family history of PD, and it is not surprisingthat the first mutations identified as a cause forPD were identified in this small subset of pa-tients. In the vast majority of patients, however,a family history of PD is absent and the etiologyis less clear. It is in these patients that most oftheresearch inthe last few yearshasbeenfocused.

The differences in studying complex diseaseas opposed to familial disease lie within thetechniques used, the study designs, the amountat which a particular genetic variant confers riskto disease, and the mechanistic ideas of diseasepathogenesis. To illustrate these concepts wehave to introduce some technical terms.

The distinction between association and cau-sation. When a genetic variant is investigated fora potential contribution to disease, a number ofquestions need to be addressed. Is the variantcommonly present in the population? If so, isthe frequency of this variant significantly differ-

ent in cases versus controls? If this is the obser-vation, significant association with a particularphenotype has been determined. It is importantto emphasize that establishing significant asso-ciation should not be misconstrued as drawinginferences about causation. For example, theAPOE14 allele on chromosome 19 has been con-sistently associated with increased risk for devel-oping Alzheimer’s disease, but carrying this riskallele is neither necessary nor sufficient to causedisease. Although an association signal some-times implies genes or genetic regions that playacausativerole in thepathogenesis ofdisease, it isnot appropriate to assume that this applies to allinstances in which associations are observed.

“Common disease–common variant hypoth-esis” versus “common disease–rare variants hy-pothesis.” In contrast to monogenic diseases inwhich a single mutation is sufficient to causedisease, complex diseases are thought to becaused by a combination of multiple genetic,environmental, and stochastic factors. Two dis-tinctive concepts have been postulated for thedetection of genetic variants underlying com-mon, complex diseases. The “common disease–common variant hypothesis” posits that multi-ple, common small-risk variants of small effectsize interact to cause common disease (Reichand Lander 2001). This hypothesis is the corebasis for genome-wide association studies(GWAS), a study design that relies on testingseveral hundred thousand common genetic var-iants throughout the human genome in largecase-control cohorts. Over the past few years,hundreds of new gene loci and pathways, in-cluding sixteen PD loci (Fig. 1) (Simon-San-chez et al. 2009; Consortium IPDG 2011; Con-sortium IPDG unpubl.), have been implicatedwith various human disease traits using a GWASdesign (an updated catalog of identified geneticrisk loci can be found at www.genome.gov/gwastudies/).

Despite the interest of the immediate past,heritability estimates have shown that large pro-portions of genetic risk underlying complex dis-ease have not yet been explained (Manolio et al.2009). In PD for example only �60% of herita-bility is understood, depending on the popula-tion studied (Consortium IPDG 2011). This

S.W. Scholz et al.

2 Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 3: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

“missing heritability” is at the center of much ofthe current debate, and possible explanationsthat have been brought forward include lack ofpower to detect common low-risk variants, rarevariants, gene–gene interactions, gene–environ-ment interactions, structural variants such asdeletions or duplications, and inversions (Ma-nolio et al. 2009). Inparticular the“commondis-ease–rare variants hypothesis” has gained muchtraction, mainly because of the introduction ofadvanced sequencing technologies that allowscost-effective sequencing of entire genomes. Anearly lesson learned from these next-generationsequencing technologies is that there are numer-ous rare variants in the human genome, whichhave not yet been systematically explored. It ishoped that over the next couple of years moreinsights will be gained into the pathogenic rele-vance of rare genetic variability.

DISSECTING THE GENETICS OF COMPLEXDISEASE: THE REVOLUTION OF GENOMICSAND BIOINFORMATICS

Genomic research in the past few years has beendefined by the rapid integration of technologi-

cal advances as the immediate ramificationsof the human genome project. To reconstructthe developments, successes and challenges ingenomic PD research, we will point out some ofthe past milestones of the genomics revolu-tion, and discuss the events and developmentsachieved so far (Fig. 2 illustrates the selectedlandmark discoveries in genomic PD research).

The year 1997 marks the starting point forPD genomics. Using a linkage study approach,Polymeropoulos et al. (1997) reported the dis-covery of missense mutations in the SNCA gene,coding for a-synuclein, to underlie a rare fa-milial form of PD. This finding was crucial inthat it provided clear evidence that there aregenetic forms of disease, a view of which wasevolving from the concept that PD was onlya nongenetic disease. In consequence of thisseminal work, the availability of a disease-caus-ing gene allowed the generation of cell- andanimal-based model systems for studying thefunctional mechanisms in disease pathogenesis(Feany and Bender 2000; Masliah et al. 2000).Moreover, subsequent screening studies showedthat variability at the SNCA locus not only playsa role in this familial form of PD, but is also

DJ1

ATP13A2

PINK1

PARK10

PARK3

ACMSD

STK39

PARK11MCCC1/LAMP3

GAK

BST1

STBD1

SNCA

HLA-DR GPNM8 FGF20

LRRK2

SCA2

CCDC62/HIP1RPRKN

GBA

STY11

PARK16

1

13 14 15

SCA3

GWAS

Linkage study

Candidate gene study

Candidate gene study/GWAS

Linkage study/GWAS

STX1BMAPT FBX07

PLA2G6

16

PARK12

17 18 19 20 21 22 X

2 3 4 5 6 7 8 9 10 11 12

Figure 1. An overview of the genetic loci implicated in the pathogenesis of PD. The position of each locus relativeto the ideogram of each chromosome is depicted. The background color of each box indicates the method thatwas used to identify this locus. Abbreviation: GWAS, genome-wide association study.

Genomics and Bioinformatics of Parkinson’s Disease

Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449 3

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 4: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

associated with risk for disease in sporadic cases(Farrer et al. 2001; Maraganore et al. 2006).Soon after the discovery of SNCA other “Men-delian” PD genes (Parkin, PINK1, DJ1, andLRRK2) were revealed using a linkage mappingdesign (KItada et al. 1998; Valente et al. 2001;Bonifati et al. 2003; Paisan-Ruiz et al. 2004;Zimprich et al. 2004).

In contrast to the advances in dissecting thegenetics of rare familial forms of PD, the geneticfactors influencing common sporadic cases re-mained enigmatic. The publication of the hu-man genome sequence in 2001 marked an ex-citing turning point (Lander et al. 2001; Venteret al. 2001). For the first time researchers hadthe opportunity to examine the sequence of anentire human genome, and four years later adetailed catalog of common genetic variants(the HapMap) became available (http://hapmap.ncbi.nlm.nih.gov/). This catalog was thestarting point for studying the genomics ofcomplex diseases. Soon microarray platformsfor genotyping hundreds of thousands of thesecommon variants throughout the human ge-nome were developed and genome-wide asso-ciation testing became a reality. The revolution-ary aspect of this new technology was that itprovided a cost effective tool to rapidly scanhundreds of thousands of variants in the ge-nome in thousands of individuals. In PD,genome-wide association strategies have beenremarkably successful; because of this new tech-

nology the number of risk loci implicated in PDpathogenesis has doubled over the last 3 years(Fig. 2).

As new sequencing technologies for se-quencing entire genomes emerge, the era ofGWAS is diminishing in importance. In thesame way as GWAS was the standard technologyused to search for common risk variants, next-generation sequencing technologies are goingto determine the exploration of rare genetic var-iability and of structural genomic rearrange-ments. With costs of sequencing dropping andthe speed of genomic data generation reachingunprecedented scales, data handling and analy-sis have become big challenges in modern PDresearch. In fact, the generation of genomic datahas accelerated so rapidly that the amount ofdata produced exceeds the exponential growthof computer processing speed known as Moore’slaw. In other words, the bottleneck for genomicdiscovery is no longer generating extensive data-sets, but rather storing and analyzing the infor-mation; this problem has even generated interestin the lay press (Pollack 2011). To put the ad-vances of genomics into perspective, it took sev-eral thousand researchers 13 years to sequencethe first human genome at a cost of $3 billion(http://www.genome.gov/); today, one techni-cian can sequence an entire genome in less thana week for about $5000. With even faster se-quencing technologies about to be released, the$1000 genome, a commonly used benchmark

SNCA missensemutations cause

familial PD

First transgenic PDfly and mouse

models

Parkin mutationscause recessive

parkinsonism

Feany et al; Masliah et al.

Kitada et al. Paisan-Ruiz/Zimprich et al. Int. PD Genomics Consort.

SNCAtriplicationcauses PD

Singleton et al. Simon-Sanchez et al.

GWAS IdentifiesSNCA and MAPT ascommon risk genes

Mutations in LRRK2are a commoncause of PD

GWASIdentifies

five new risk loci

1997 1998.... 2000.... 2003 2004.... 2009.... 2011

Polymeropoulos et al.

Figure 2. Highlights of key genomic discoveries in PD over the past decade and a half.

S.W. Scholz et al.

4 Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 5: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

at which genomic sequencing is considered eco-nomical for routine diagnostic testing, is immi-nent (Mardis 2006).

At the same time as genomics hastily moveson, the innovation of computational medi-cine and its diverse applications, commonly re-ferred to as bioinformatics, is confronted withchallenging demands to develop user-friendlymethods to parse, analyze, and share genomicdata. Automation of sequence data filtering,alignment with reference genomes, variant call-ing and statistical analyses across various plat-forms still pose a daunting task. Moreover, indepth analyses of epistatic effects, or gene–geneinteractions, as well as investigations of gene–environment interactions, which are at the cen-ter of research ambitions aimed at resolving thecomplete genomic architecture of complex dis-eases, depend on sophisticated bioinformaticsalgorithms.

INFORMATICS OF GENOMICS,TRANSCRIPTOMICS, PROTEOMICS,AND METABOLOMICS

Developing comprehensive informatics toolsrequires, concurrent with the development ofgenome-centric algorithms, a deeper under-standing of the functional consequences ofgenetic variability on disease vulnerability. Inother words, how do we translate the relativelyinvariant nature of our genomes into what issurely a dynamic, evolving risk of disease? Thisunderstanding involves connecting genetic in-formation onto the full molecular space, whichincludes genetics, epigenetics (i.e., noncodingchanges such DNA methylation, chromatinconformation, noncoding RNAs—all affectingthe read-out of the genome), gene expression,protein expression, and the resultant metabolicoutput of all of these processes. So far, techno-logical advances in gene expression analyses(RNA expression or transcriptomics) have par-alleled those in genome analyses, as similarchemistries allow for the rapid, high-through-put ascertainment of both DNA and RNA. Forexample, studies have begun to link data fromPD GWAS to specific gene expression and epi-genetic changes in brain tissue, as was recently

reported by the International Parkinson’s Dis-ease Genomics Consortium (2011). However,moving forward will involve a more complexunderstanding of the relationships between thestatic genome, the epigenome, and RNA expres-sion, including regulatory RNAs such as smallinterfering RNA (siRNA) and microRNA. Ad-ditionally, the technologies for detecting andquantifying nucleic acids are progressing, asare methods for detecting and quantifying pro-teins (proteomics) and metabolites (metabolo-mics). Thus, the next step forward is couplingthe information in the nucleic acid space to theproteomic (both at the transcriptional andposttranscriptional levels) and the metabolo-mic spaces. Collectively, we refer to the scienceand technology of measuring these biologicalmolecules as “-omics”—genomics, epigenom-ics, transcriptomics, proteomics, and meta-bolomics. The ongoing revolution in “-omics”technologies, coupled with advances in bioin-formatics, has greatly expanded our ability tolink genetic architecture to its functional out-put, namely the expression of genes, proteins,and metabolites. Thus, a key challenge to thefield is to detect and quantitatively measurethe full complement of biological moleculesand to integrate these into a meaningful under-standing of both normal function and dysfunc-tion (e.g., disease).

Although “-omics”-driven research holdsgreat promise in understanding disease biology,enthusiasm should be tempered. The currentstate-of-the-art technologically to meaningfullyinterpret large and diverse datasets is limiting.We greatly expand the complexity of analyses aswe integrate data across multiple domains,which includes not just those from the molec-ular space (e.g., RNA, protein, and metaboliteexpression), but also the clinical space, whichincludes neuroimaging among other sources.For example, how does genetic risk translatefrom the subcellular organelle (e.g., mitochon-drion), cellular (e.g., midbrain dopaminergicneuron), circuit (e.g., nigrostriatal system), or-gan (central nervous system) levels to an indi-vidual’s risk for developing PD? This is furthercomplicated by the complex interactions we asindividuals have with other organisms, be they

Genomics and Bioinformatics of Parkinson’s Disease

Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449 5

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 6: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

the microbiome of our gastrointestinal systemsto the human interactions that create our socialand environmental communities. Indeed, therelevance of this holistic view is exemplified bythe pioneering work of neuropathologist, HeikoBraak. On examining human postmortem tis-sues, Braak proposed a new framework for PDwherein the disease process begins outside themidbrain, the traditional locus of vulnerablenigrostriatal dopaminergic neuronal cell bodiesthat are the uniformly affected in PD (Braaket al. 2002, 2003a). Braak’s most recent conceptis that PD could be initiated in the enteric ner-vous system with ascending pathology to in-volve the dorsal motor nucleus of the vagus(Braak et al. 2003b, reviewed in Hawkes et al.2010). As we test new hypotheses of pathogen-esis such as these and others additional data willbe needed, which will further increase infor-matic complexity. One possibility is that theinformatic challenges will require the integra-tion of -omics data with other data to deriveempirically testable biological networks—thosethat explain an individual’s risk for PD. Oncerisk is stratified, we may then be positioned toconsider earlier interventions that could alterthe natural history of PD.

IDENTIFICATION OF ABERRANTNETWORK ACTIVITIES

As discussed above, increasing evidence indi-cates that complex diseases such as PD are as-sociated with multiple genetic polymorphismsthat are postulated to affect biological networks(Schadt 2009; Tan et al. 2009; Meyerson et al.2010). Biological networks represent series ofactions among molecules that lead to a certainspecific change in a cell (Croft et al. 2011). Thesenetworks may subserve many categories of cel-lular activities (Chang et al. 2009; Wang et al.2009), including cell signaling networks, pro-tein–protein interactions, and metabolic net-works (Kim et al. 2010). Identifying aberranciesinvolving small numbers of genes in specificbiological networks may lead to more precisediagnosis and treatment of a disease. How-ever, because tens or hundreds of genes areoften involved, conventional experimental sys-

tems developed for identifying aberrant gene(s)are inadequate for deciphering aberrancies innetwork activity. High throughput technolo-gies enable the simultaneous detection of a largenumber of alterations in molecular compo-nents, or nodes in network parlance. As notedabove, these high throughput technologies weredeveloped to fill this gap and the first successfuluse of technology was in DNA sequencing(Sanger et al. 1977), which later developed intoa gambit of genomics. As other -omics technol-ogies developed and enabled the simultaneousdetection of a large number of alterations inprotein expression and metabolites, the corre-lations and dependencies between molecularcomponents have become complex. However,these studies, when understood in a networkcontext, offer a systems level perspective on pro-cesses underling disease initiation and progres-sion.

Analyses of high dimensional data using ro-bust new informatics tools allows for the inte-gration of these different data sources to yieldnew information about network functions anddysfunctions. Figure 3 depicts the workflowof a typical study involving high dimensional-omics data to identify aberrant network activ-ities. Such data analyses often reveal that ourcurrent understanding of molecular and chem-ical biology underlying cellular functions re-mains incomplete (Ochs et al. 2011). However,an increasingly greater number of open-accessdatabases complement the analysis of aberrantnetwork activities from these high dimensionaldata (Peri et al. 2003; Kanehisa et al. 2004; Schae-fer et al. 2009). In addition, numerous tools havebeen developed for visually exploring and ana-lyzing biological networks, including Cytoscape(Smoot et al. 2010), VisANT (Hu et al. 2009),GeneGO (http://www.genego.com/), Ingenui-ty (http://www.ingenuity.com/), and PathwayStudio (Nikitin et al. 2003).

In recent years, questions have been raised asto how genetic differences between individualslead to differences in disease networks and, thus,to differences in phenotypes (Taylor et al. 2009;Vaske et al. 2010). Data driven computationalmodels, such as PARADIGM (Vaske et al. 2010)and ResponseNet (Yeger-Lotem et al. 2009)

S.W. Scholz et al.

6 Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 7: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

among others (Amit et al. 2009; Tan et al. 2009;Kreeger and Lauffenburger 2010; Chien et al.2011), have been especially useful in this area.For example, progress has been made in apply-ing information about aberrant networks to theidentification of functional modules; that is,groups of biological entities (e.g., gene, protein)that perform biological tasks (e.g., protein deg-

radation) that are dependent on each constit-uent part (Qiu et al. 2009; Wu et al. 2010).However, sequenced mutations, copy numberalterations, gene fusion events, or epigeneticchanges are not well represented in these mod-els. Nevertheless, information derived from pu-tative aberrant network activities will facilitatethe detection of biomarkers (Singh et al. 2009),

Genomics

Sample collection/preparation High-throughput data acquisition

Transcriptomics

Proteomics

Metabolomics

PubMed KEGG

Databases and literature

Identification and validationIdentification of networksApplications

HMDB Ingenuity MetaGene

Data analysis

Figure 3. A hypothetical systems based approach to identify aberrant networks of disease. Data (includingbiological, clinical, imaging) and samples are collected from a population. High-dimensional -omics data areacquired, integrated with clinical data, analyzed, and validated to identify networks involved in disease. Geno-mics will predict aberrant networks, whereas transcriptomics, proteomics, and metabolomics will report theoutcomes of these networks. In turn, these networks and the aberrant nodes that are perturbed in disease canthen be used to develop biomarkers, prognostic markers (e.g., markers that report disease progress or thera-peutic efficacy), and rational therapeutics. The process is not inherently unidirectional nor is it intended to besingle pass. Instead, as technologies improve, the process can be employed in an iterative fashion to refine nodeswithin aberrant disease networks and to generate better biomarkers, targets, and therapies. The approach ispredicated on robust bioinformatics, and analytics that are critical to our abilities translate high-dimensionaldata to our understanding of disease and its treatment. (Image is from Wang et al. 2012; reprinted, withpermission, from the author.)

Genomics and Bioinformatics of Parkinson’s Disease

Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449 7

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 8: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

the identification of novel drug targets (Andreet al. 2009), the classification of disease types(Gatza et al. 2010), and the prediction of clinicaloutcomes (Taylor et al. 2009; Cerami et al. 2010;Chen et al. 2010; Gatza et al. 2010). Increasinguse of -omics data driven methodologies foridentifying biomarkers and therapeutic targetswill also include machine learning methods(Andre et al. 2009; Singh et al. 2009), graphtheory (Taylor et al. 2009), and statistical meth-ods (Singh et al. 2009).

Obviously, understanding of the aberran-cies in biological networks responsible for com-plex diseases is far from complete and the use ofhigh dimensional data together with large-scalebiological databases (e.g., protein–protein in-teraction and pathway databases) will be crucialfor uncovering aberrant biological processes.However, the challenge is to accommodate thelarge volume of -omics data, which is growingexponentially. Additionally, much work needsto be done to identify the subnetworks of met-abolic reactions associated with diseases suchas PD. Moreover, a reliable computational ap-proach to identify subnetwork-associated dis-ease processes is currently limited by the incom-pleteness of the available interactome maps andlimitations of the existing tools.

As widely acknowledged, gaining an inte-grated understanding of the interactions amongthe genome, proteome, metabolome, and envi-ronment as mediated by the underlying cellularnetwork, may offer a basis for future advances.However, some of the most difficult problems inthis area include discovering dynamic (ratherthan static) processes in cells, connecting mo-lecular level network activities to functionalbehavior at the cellular level, developing data-driven computational models that reflect thecausal relationships between molecules (in-cluding those designated as drug targets), bio-markers (as potential read-outs of networkmodulation as would be the case with a diseasemodifying therapy), measuring the consequentchanges that occur in cellular dynamic process-es, and predicting the impact of an interventionon the system to effect a change consistent witha beneficial outcome. An especially challengingfocus of importance is to find ways to model

changes in biological entities which could affectthe dynamics of the biological process usinglarge scale and diverse -omics data. To accom-modate these challenges, network-based re-search is shifting toward integrated multiple net-works or networks composed of heterogeneouslarge-scale data elements. To process large-scale-omics data, we need to develop next-generationalgorithms and tools to study the relationshipsbetween aberrant human genes, proteins, andinteractome networks. It is in this context thatwe delineate new disease-associated biologicalmolecules in relation to disease-specific net-works, that we understand how network pertur-bations can lead to disease, and that we use thisknowledge to develop better diagnostics andtherapeutics for disease. It is in the integrationof these various datasets both temporally andspatially that we begin to see the emergent prop-erties of disease-specific networks and that wethen use this knowledge to modify disease nat-ural history (Fig. 4).

INFORMATICS IN BIOMARKERDISCOVERY

In the era of -omics, moving forward in theclinical management of PD will require the de-velopment of specific and sensitive biomarkersthat could detect disease early, define a prognos-tic trajectory of disease, and/or provide an in-dicator of response to therapy. Ideally, these bio-markers would be present early enough in thedisease course that interventions could halt pro-gression or even reverse damage. Risk alleles inand of themselves are biomarkers, measurableentities that denote risk for disease; however,they do so in a nonspecific and nonsensitivemanner. The technologies outlined above pro-vide a way forward as we link genetic informa-tion to the other molecular and organismal lev-els, translating disease risk to disease-specificnetworks. In fact, a number of studies haveused various -omics approaches to identify po-tential PD biomarkers (reviewed in Caudle et al.2010). As we increase our understanding ofthese disease-specific networks, we can beginto identify other tissues which may be affectedand which could be used as surrogates for on-

S.W. Scholz et al.

8 Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 9: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

going monitoring of disease processes withinthe CNS. For example, induced pluripotentstem (iPS) cells derived from patients may be-come an important resource not only for un-derstanding disease biology (Park et al. 2008)and fueling biomarker elucidation, but also asa strategic part of therapeutics development(Wernig et al. 2008; Cooper et al. 2010). Amore immediate application of recent advancesin genomics and informatics to biomarker de-velopment is as an initial screen for risk. Asmore risk loci are identified and as the costs ofscreening decline, more widespread screening ofpopulations of individuals will become feasible.Although this would not be a particularly sen-sitive or specific screen, it would provide for aneconomical triaging of high-risk subjects for

more in-depth screening. It is also importantto note that modalities other than molecularmarkers will form an important componentof our biomarker armamentarium and thatthese modalities will need to be integrated with-in the network-centric approach. These modal-ities range from the relatively inexpensive andinsensitive (history of constipation; hyposmia)to the expensive and sensitive (PET or SPECTneuroimaging).

As noted above, GWAS have been per-formed to reveal the genetic association of dis-ease with SNPs. Despite the success of GWAS,the heritability of common disorders such asPD cannot be fully explained by the genes thathave been discovered. A similar pattern is seenwhen we expand our search for biomarkers of

DiscoveryGenomics

Proteomics

DNAEpigenome

Transcriptomics

RNAmiRNA

Metabolomics

MetabolitesProteinPTMs

‘‘-Omics’’-based biomarkers Validation Application

Develop/monitor therapies

Rx

Rx

Rx

Rx

Rx

Rx

Prognosis/disease stratification Diagnosis

Figure 4. Application of “-omics” based biomarker strategy to discover, validate, and apply molecular profiles todisease diagnosis, prognosis, and therapeutic development. Biomarker discovery efforts follow a predictablemodel. A discovery cohort of case (red) and control (green) subjects is amassed. Biological samples along withclinical, demographic, and other data are collected. High-dimensional genomic, transcriptomic, proteomic, andmetabolomic data are generated and integrated with clinical data to elucidate the dynamic networks and theircritical nodes that contribute to risk and evolution of disease. These pathways are then validated in a separatecohort of cases and controls. Robust, sensitive, and specific profiles can then be applied on a population scale toprovide readouts for individuals’ risk (pink) for disease. These profiles can be informative to disease diagnosis(pink to red), to prognosis and disease stratification (affected individuals with different temporal progressionand severity), and in developing and monitoring therapies that slow, halt, or reverse disease progression.Additional historical (environment, lifestyle), clinical, and imaging data (e.g., PET, SPECT) will be integratedwith molecular pathway data that will also be informative in disease diagnosis, stratification, and therapies.(PTMs, posttranslational modifications.) Images of myoglobin structure (http://en.wikipedia.org/wiki/File:Myoglobin.png) and ribosome/mRNA translation (http://en.wikipedia.org/wiki/File:Ribosome_mRNA_translation_en .svg) have been released to the public domain.

Genomics and Bioinformatics of Parkinson’s Disease

Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449 9

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 10: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

disease risk to the other -omics. A number ofreasons can be ascribed to the limitations ofthese studies for common diseases with com-plex traits. One explanation is that the currentbiostatistical analyses are agnostic or unbiasedand thus ignore what is known about diseasepathology. In addition, the linear modeling inGWAS analyses usually considers only one SNPat a time, whereas ignoring the genomic andepigenomic factors of each SNP. However, re-cently, there has been a shift away from thisapproach toward a more holistic one that rec-ognizes the complexity of the genotype–pheno-type relationship that is likely characterized bysignificant genetic heterogeneity and gene–gene and gene–environment interactions. Fur-thermore, strategies have been used to iterativelymine GWAS data, including the identificationof potential PD targets and pathways usingmeta-analyses of previous GWAS and neuronaltranscriptomic profiling studies (Zheng et al.2010; Edwards et al. 2011).

The limitations of a linear model and otherparametric statistical approaches have motivat-ed the development of data mining and ma-chine learning methods (Hastie et al. 2009).The advantage of these computational ap-proaches is that they make fewer assumptionsabout the functional form of the model and theeffects being modeled. In effect, data miningand machine learning methods are much moreconsistent with the notion of having the datadirect the model, rather than forcing the datato fit a predetermined model. Several recent re-views highlight the need for these newer meth-ods, including machine learning approachessuch as random forests (RFs) and multifactordimensionality reduction (MDR), which havebeen developed to address some of these issues(Tarca et al. 2007; Ressom et al. 2008; Moore2010; Sun 2010). It is also clear that evolvinginformatics approaches will play an importantrole in addressing the complexity of the under-lying molecular basis of many common humandiseases. These methodologies have the poten-tial to identify other molecular species (pro-teins, metabolites), which could serve as diseasebiomarkers in addition to the genome and in-teractome. In identifying nodal proteins, pro-

teins that are present at the nodes of a network,proteomics approach provides a great platform,which typically assesses proteins in an unbiasedfashion and provides the means to study theproteomic profile of a complex biological sys-tem on a large scale. Several technologies con-tinue to evolve that are composed of integratedtechnical components, including separationtechnology, mass spectrometry (MS), and bio-informatics data processing. With advances inanalytical technology and statistical analyses,several studies have set out to develop proteomic“molecular profiles” of PD tissues, includingblood, CSF, and postmortem brain (reviewedin Caudle et al. 2010). Similar methodologiesand analytical approaches are being used inthe search for “metabolic profiles” of PD, whichinclude compounds such as lipids, amino acids,fatty acids, amines, alcohols, sugars, organicphosphates, hydroxyl acids, aromatics, purines,and other high abundance or clinically impor-tant molecules; however, these databases are in-complete for secondary metabolites, drugs, andenvironmental compounds.

As technology and analytic methods im-prove, we will generate more complete anno-tations of the genomic, transcriptomic, proteo-mic, and metabolic spaces, which would greatlyenhance the analysis of specific pathways andmolecules involved in PD and would yield ad-ditional insight into the pathogenesis of the dis-ease. In addition, it would provide a platformfor future meta-analytic studies, which couldassuage much of the between-study variabilitycurrently encountered when analyzing multi-ple studies. However, despite the advance in-omics, there are still several issues that needto be addressed and resolved. Most importantly,issues around data integration, analysis, and in-terpretation pose a great challenge, especially incontext of ever expanding data being generated.

TRANSLATING GENOMICS INTORATIONAL THERAPEUTICS

Drug discovery is the process by which newdrugs are identified. The traditional method re-lies on trial-and-error in testing chemical sub-stances against purified molecules and cultured

S.W. Scholz et al.

10 Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 11: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

cells, and subsequently examining their effectson a model of disease before taking such a can-didate into clinical studies. However, in the lastfew decades, a new approach, termed rationaldrug discovery (RDD) has been adopted, whichrelies on characterized molecular mechanismsof disease. RDD posits that modulation of aspecific target, putatively causal in disease path-ogenesis, will have therapeutic value. This raisestwo fundamental questions: What is a specifictarget? and How is a modulator of this targetfound? The first question is at the core of drugdiscovery.

Current drug discovery assumes that diseasescan be characterized by a faulty protein structureor aberrant expression of a protein encoded by avariant gene, and that identification of candidatedrugs that modulate the activity of the proteinswill have an effect on phenotype and/or diseaseoutcome. This view is exemplified by expecta-tions heralded with human genome sequencingtechnology, in which it was estimated that about8000 genes would be available as drug targets(Imming et al. 2006). Currently, the numberof drug targets correlated with genetic variationor polymorphisms is �220 (Russ and Lampel2005). As we move from monogenic diseases tocomplex diseases, there is no consensus as tohow genetics will inform drug discovery. How-ever, it is expected that discovery of aberrantgenetic networks, populated by aberrant prote-omic and metabolic nodes, will provide targetcandidates for therapeutics development. It is inthis arena that disease specific networks andnodes can be used for the rational design ofcommon and even personalized therapeutics.

From the above, it should be clear that de-fining a disease target is not formulaic. How-ever, once a target is chosen, its prosecution un-folds in one of two ways: target-based or ligand-based drug discovery. Target based-drug discov-ery starts with the three-dimensional structureof a target. If a three-dimensional structure ofthe target is not available, it may be empiricallydetermined by crystallography or generated in-formatically via homology modeling using pro-teins with similar domains as a template.

Alternatively, ligand-based drug discoverystarts with structural information of a known

or predicted ligand. In either case, a pharmaco-phore is designed as bait. A pharmacophore isan abstract description of the molecular featuresthat are required for interaction between a li-gand and a target. More specifically, the IUPACdefines a pharmacophore to be “an ensemble ofsteric and electronic features that are necessaryto ensure the optimal supramolecular interac-tions with a specific biological target and totrigger (or block) its biological response” (Wer-muth et al. 1998). Because target/ligand inter-actions are “polar positive,” “polar negative,” or“hydrophobic,” typical features considered indesigning a pharmacophore are hydrophobic,aromatic, a hydrogen bond acceptor, a hydrogenbond donor, cation, or anion moieties. Becauseligand-based drug discovery relies on knowledgeof other molecules known to bind a biologicaltarget, the minimum necessary structural char-acteristics derived from these molecules are usedin designing the pharmacophore, which thencan be used to identify similar compounds viascreening of chemical libraries or through denovo synthesis. The basic principle is that similarmolecules behave similarly. In other words, sim-ilar chemical groups and entities will have sim-ilar biological effects. This gives rise to the con-cept of the structure-activity relationship, orSAR. As three-dimensional structures of biolog-ical targets increase, and detailed informationabout molecular interactions between ligandand target become available, the application ofSAR has become more complex. Drug discoveryhas evolved with the use of high performancecomputing to enable computer aided drug de-sign (CADD) and sophisticated statistical algo-rithms and molecular dynamic tools to providequantitative methodologies for SAR (QSAR) torank order the potential potency of a number ofbiologically similar compounds. Once a series ofcompounds is identified using the above ap-proach, theyare labeled as “hits,” which are readyto be tested in biological assay systems. As illus-trated by the analytical bottlenecks in PD geno-mics and biomarker discovery, the evolution ofbioinformatics will be critical to the successfuldevelopment of improved PD therapeutics.

The role of bioinformatics in connectingaberrant networks and nodes fundamental to

Genomics and Bioinformatics of Parkinson’s Disease

Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449 11

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 12: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

PD pathogenesis with validated targets devel-oped through computational chemistry andtarget modeling is anticipated to accelerate theprocess of drug discovery.

THE ROAD AHEAD . . .

The main aim of genomic research is to identifypathways that are suitable for targeted therapeu-tic interventions to prevent, slow, halt, or re-verse neurodegenerative disease processes. Tothat end, the success of translational researchrests on the resolution of the complex genomicarchitecture of human disease, translating thisto understanding aberrant networks and nodesassociated with disease, and implementing thisknowledge in the rational design of therapeu-tics, which could be tailored to the individual(Personalized Therapy, Fig. 5). However, thissuccess is not only dependent on advancement

of technologies and their applications. Successwill also depend on regional, national, and eveninternational collaborative efforts. In much thesame way that the neurogenetics field hasevolved, moving forward will require the collec-tive efforts of scientists, clinicians, healthcareproviders, policy makers, and importantly, pa-tients. The impact of these efforts will also gobeyond translational research and therapeuticsdevelopment. Given the potential to revolution-ize medicine, a host of societal issues will needto be addressed, including socioeconomic, eth-ical, clinical acceptance, medical education, costeffectiveness, and regulatory considerations.

ACKNOWLEDGMENTS

This work is supported in part by DAMD17-03-1-0009, W81XWH-09-1-0103, and W81XWH-09-1-0107 (H.J.F.).

Genetic risk locusdetermination

Functional analysis

Personalized therapy

Phenotypecharacterization

Figure 5. Genomics will play an integral role in the development of personalized therapeutics. The availability ofdetailed phenotype data from large patient/control cohorts is an important prerequisite for high-throughputgenetic screening studies, including GWAS and genomic sequencing. After genetic risk loci have been dissected,in silico, in vitro, and in vivo analyses establish the underlying functional pathways and help to posit targets forrational, personalized therapies.

S.W. Scholz et al.

12 Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 13: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

REFERENCES

Amit I, Garber M, Chevrier N, Leite AP, Donner Y, Eisen-haure T, Guttman M, Grenier JK, Li W, Zuk O, et al. 2009.Unbiased reconstruction of a mammalian transcriptionalnetwork mediating pathogen responses. Science 326:257–263.

Andre F, Job B, Dessen P, Tordai A, Michiels S, Liedtke C,Richon C, Yan K, Wang B, Vassal G, et al. 2009. Molecularcharacterization of breast cancer with high-resolutionoligonucleotide comparative genomic hybridization ar-ray. Clin Cancer Res 15: 441–451.

Bonifati V, Rizzu P, van Baren MJ, Schaap O, Breedveld GJ,Krieger E, Dekker MC, Squitieri F, Ibanez P, Joosse M,et al. 2003. Mutations in the DJ-1 gene associated withautosomal recessive early-onset parkinsonism. Science299: 256–259.

Braak H, Del Tredici K, Bratzke H, Hamm-Clement J, Sand-mann-Keil D, Rub U. 2002. Staging of the intracerebralinclusion body pathology associated with idiopathic Par-kinson’s disease (preclinical and clinical stages). J Neurol249: III/1–5.

Braak H, Del Tredici K, Rub U, de Vos RA, Jansen Steur EN,Braak E. 2003a. Staging of brain pathology related tosporadic Parkinson’s disease. Neurobiol Aging 24: 197–211.

Braak H, Rub U, Gai WP, Del Tredici K. 2003b. IdiopathicParkinson’s disease: Possible routes by which vulnerableneuronal types may be subject to neuroinvasion by anunknown pathogen. J Neural Transm 110: 517–536.

Caudle WM, Bammler TK, Lin Y, Pan S, Zhang J. 2010.Using “omics” to define pathogenesis and biomarkersof Parkinson’s disease. Expert Rev Neurother 10: 925–942.

Cerami E, Demir E, Schultz N, Taylor BS, Sander C. 2010.Automated network analysis identifies core pathways inglioblastoma. PLoS ONE 5: e8918.

Chang JT, Carvalho C, Mori S, Bild AH, Gatza ML, Wang Q,Lucas JE, Potti A, Febbo PG, West M, et al. 2009. A ge-nomic strategy to elucidate modules of oncogenic path-way signaling networks. Mol Cell 34: 104–114.

Chen J, Sam L, Huang Y, Lee Y, Li J, Liu Y, Xing HR, LussierYA. 2010. Protein interaction network underpins concor-dant prognosis among heterogeneous breast cancer sig-natures. J Biomed Inform 43: 385–396.

Chien CH, Sun YM, Chang WC, Chiang-Hsieh PY, Lee TY,Tsai WC, Horng JT, Tsou AP, Huang HD. 2011. Identify-ing transcriptional start sites of human microRNAs basedon high-throughput sequencing data. Nucleic Acids Res392: 9345–9356.

Cooper O, Hargus G, Deleidi M, Blak A, Osborn T, MarlowE, Lee K, Levy A, Perez-Torres E, Yow A, et al. 2010.Differentiation of human ES and Parkinson’s diseaseiPS cells into ventral midbrain dopaminergic neuronsrequires a high activity form of SHH, FGF8a and specificregionalization by retinoic acid. Mol Cell Neurosci 45:258–266.

Croft D, O’Kelly G, Wu G, Haw R, Gillespie M, Matthews L,Caudy M, Garapati P, Gopinath G, Jassal B, et al. 2011.Reactome: A database of reactions, pathways and biolog-ical processes. Nucleic Acids Res 39: D691–D697.

Dauer W, Przedborski S. 2003. Parkinson’s disease: Mecha-nisms and models. Neuron 39: 889–909.

Dawson TM, Dawson VL. 2003. Molecular pathways of neu-rodegeneration in Parkinson’s disease. Science 302: 819–822.

Edwards YJ, Beecham GW, Scott WK, Khuri S, Bademci G,Tekin D, Martin ER, Jiang Z, Mash DC, ffrench-Mullen J,et al. 2011. Identifying consensus disease pathways inParkinson’s disease using an integrative systems biologyapproach. PLoS ONE 6: e16917.

Farrer M, Maraganore DM, Lockhart P, Singleton A, LesnickTG, de Andrade M, West A, de Silva R, Hardy J, Hernan-dez D. 2001. a-Synuclein gene haplotypes are associatedwith Parkinson’s disease. Hum Mol Genet 10: 1847–1851.

Feany MB, Bender WW. 2000. A Drosophila model of Par-kinson’s disease. Nature 404: 394–398.

Gatza ML, Lucas JE, Barry WT, Kim JW, Wang Q, CrawfordMD, Datto MB, Kelley M, Mathey-Prevot B, Potti A, et al.2010. A pathway-based classification of human breastcancer. Proc Natl Acad Sci 107: 6994–6999.

Hastie T, Tibshirani R, Friedman J. 2009. The elements ofstatistical learning: Data mining, inference, and prediction,2nd ed. Springer, New York.

Hawkes CH, Del Tredici K, Braak H. 2010. A timeline forParkinson’s disease. Parkinsonism Relat Disord 16: 79–84.

Hu Z, Hung JH, Wang Y, Chang YC, Huang CL, Huyck M,DeLisi C. 2009. VisANT 3.5: Multi-scale network visual-ization, analysis and inference based on the gene ontol-ogy. Nucleic Acids Res 37: W115–W121.

Imming P, Sinning C, Meyer A. 2006. Drugs, their targetsand the nature and number of drug targets. Nat Rev DrugDiscov 5: 821–834.

International HapMap Consortium. 2005. Haplotype mapof the human genome. Nature 437: 1299–1320.

International Parkinson Disease Genomics Consortium,Nalls MA, Plagnol V, Hernandez DG, Sharma M, SheerinUM, Saad M, Simon-Sanchez J, Schulte C, Lesage S, et al.2011. Imputation of sequence variants for identificationof genetic risks for Parkinson’s disease: A meta-analysis ofgenome-wide association studies. Lancet 377: 641–649.

International Parkinson’s Disease Genomics Consortium(IPDGC); Wellcome Trust Case Control Consortium 2(WTCCC2). 2011. A two-stage meta-analysis identifiesseveral new loci for Parkinson’s disease. PLoS Genet 7:e1002142.

Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M.2004. The KEGG resource for deciphering the genome.Nucleic Acids Res 32: D277–D280.

Kim TY, Kim HU, Lee SY. 2010. Data integration and anal-ysis of biological networks. Curr Opin Biotechnol 21:78–84.

Kitada T, Asakawa S, Hattori N, Matsumine H, Yamamura Y,Minoshima S, Yokochi M, Mizuno Y, Shimizu N. 1998.Mutations in the parkin gene cause autosomal recessivejuvenile parkinsonism. Nature 392: 605–608.

Kreeger PK, Lauffenburger DA. 2010. Cancer systems biol-ogy: A network modeling perspective. Carcinogenesis 31:2–8.

Lander ES. 2011. Initial impact of the sequencing of thehuman genome. Nature 470: 187–197.

Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC,Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W,

Genomics and Bioinformatics of Parkinson’s Disease

Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449 13

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 14: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

et al. 2001. Initial sequencing and analysis of the humangenome. Nature 409: 860–921.

Manolio TA, Collins FS, Cox NJ, Goldstein DB, HindorffLA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR,Chakravarti A, et al. 2009. Finding the missing heritabil-ity of complex diseases. Nature 461: 747–753.

Maraganore DM, de Andrade M, Elbaz A, Farrer MJ, Ioan-nidis JP, Kruger R, Rocca WA, Schneider NK, Lesnick TG,Lincoln SJ, et al. 2006. Collaborative analysis of a-synu-clein gene promoter variability and Parkinson disease.JAMA 296: 661–670.

Mardis ER. 2006. Anticipating the 1,000 dollar genome.Genome Biol 7: 112.

Masliah E, Rockenstein E, Veinbergs I, Mallory M, Hashi-moto M, Takeda A, Sagara Y, Sisk A, Mucke L. 2000.Dopaminergic loss and inclusion body formation in a-synuclein mice: Implications for neurodegenerative dis-orders. Science 287: 1265–1269.

Meyerson M, Gabriel S, Getz G. 2010. Advances in under-standing cancer genomes through second-generation se-quencing. Nat Rev Genet 11: 685–696.

Moore JH. 2010. Detecting, characterizing, and interpretingnonlinear gene-gene interactions using multifactor di-mensionality reduction. Adv Genet 72: 101–116.

Nikitin A, Egorov S, Daraselia N, Mazo I. 2003. Pathwaystudio—the analysis and navigation of molecular net-works. Bioinformatics 19: 2155–2157.

Ochs MF, Karchin R, Ressom H, Gentleman R. 2011. Iden-tification of aberrant pathway and network activity fromhigh-throughput data—workshop introduction. Pac SympBiocomput 2011: 364–368.

Paisan-Ruiz C, Jain S, Evans EW, Gilks WP, Simon J, van derBrug M, Lopez de Munain A, Aparicio S, Gil AM, KhanN, et al. 2004. Cloning of the gene containing mutationsthat cause PARK8-linked Parkinson’s disease. Neuron 44:595–600.

Park IH, Arora N, Huo H, Maherali N, Ahfeldt T, Shima-mura A, Lensch MW, Cowan C, Hochedlinger K, DaleyGQ. 2008. Disease-specific induced pluripotent stemcells. Cell 134: 877–886.

Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnala-gadda CK, Surendranath V, Niranjan V, Muthusamy B,Gandhi TK, Gronborg M, et al. 2003. Development ofhuman protein reference database as an initial platformfor approaching systems biology in humans. Genome Res13: 2363–2371.

Pollack A. 2011. DNA sequencing caught in deluge of data.The New York Times, November 30.

Polymeropoulos MH, Lavedan C, Leroy E, Ide SE, DehejiaA, Dutra A, Pike B, Root H, Rubenstein J, Boyer R, et al.1997. Mutation in the a-synuclein gene identified infamilies with Parkinson’s disease. Science 276: 2045–2047.

Qiu YQ, Zhang S, Zhang XS, Chen L. 2009. Identifyingdifferentially expressed pathways via a mixed integer lin-ear programming model. IET Syst Biol 3: 475–486.

Reich DE, Lander ES. 2001. On the allelic spectrum of hu-man disease. Trends Genet 17: 502–510.

Ressom HW, Varghese RS, Zhang Z, Xuan J, Clarke R. 2008.Classification algorithms for phenotype prediction in ge-nomics and proteomics. Front Biosci 13: 691–708.

Russ AP, Lampel S. 2005. The druggable genome: An update.Drug Discov Today 10: 1607–1610.

Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR,Fiddes CA, Hutchison CA, Slocombe PM, Smith M.1977. Nucleotide sequence of bacteriophage phi X174DNA. Nature 265: 687–695.

Schadt EE. 2009. Molecular networks as sensors and driversof common human diseases. Nature 461: 218–223.

Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, HannayT, Buetow KH. 2009. PID: The Pathway Interaction Da-tabase. Nucleic Acids Res 37: D674–D679.

Simon-Sanchez J, Schulte C, Bras JM, Sharma M, Gibbs JR,Berg D, Paisan-Ruiz C, Lichtner P, Scholz SW, HernandezDG, et al. 2009. Genome-wide association study revealsgenetic risk underlying Parkinson’s disease. Nat Genet 15:15.

Singh A, Greninger P, Rhodes D, Koopman L, Violette S,Bardeesy N, Settleman J. 2009. A gene expression signa-ture associated with “K-Ras addiction” reveals regulatorsof EMTand tumor cell survival. Cancer Cell 15: 489–500.

Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. 2010.Cytoscape 2.8: New features for data integration and net-work visualization. Bioinformatics 27: 431–432.

Sun YV. 2010. Multigenic modeling of complex disease byrandom forests. Adv Genet 72: 73–99.

Tan CS, Bodenmiller B, Pasculescu A, Jovanovic M, Hen-gartner MO, Jorgensen C, Bader GD, Aebersold R, Paw-son T, Linding R. 2009. Comparative analysis reveals con-served protein phosphorylation networks implicated inmultiple diseases. Sci Signal 2: ra39.

Tarca AL, Carey VJ, Chen XW, Romero R, Draghici S. 2007.Machine learning and its applications to biology. PLoSComput Biol 3: e116.

Taylor IW, Linding R, Warde-Farley D, Liu Y, Pesquita C,Faria D, Bull S, Pawson T, Morris Q, Wrana JL. 2009.Dynamic modularity in protein interaction networkspredicts breast cancer outcome. Nat Biotechnol 27:199–204.

Valente EM, Bentivoglio AR, Dixon PH, Ferraris A, IalongoT, Frontali M, Albanese A, Wood NW. 2001. Localizationof a novel locus for autosomal recessive early-onset par-kinsonism, PARK6, on human chromosome 1p35–p36.Am J Hum Genet 68: 895–900.

Vaske CJ, Benz SC, Sanborn JZ, Earl D, Szeto C, Zhu J,Haussler D, Stuart JM. 2010. Inference of patient-specificpathway activities from multi-dimensional cancer geno-mics data using PARADIGM. Bioinformatics 26: 237–245.

Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, SuttonGG, Smith HO, Yandell M, Evans CA, Holt RA, et al.2001. The sequence of the human genome. Science 291:1304–1351.

Wang CC, Cirit M, Haugh JM. 2009. PI3K-dependent cross-talk interactions converge with Ras as quantifiable inputsintegrated by Erk. Mol Syst Biol 5: 246.

Wang J, Zhang Y, Marian C, Ressom HW. 2012. Identifica-tion of aberrant pathways and network activities fromhigh-throughput data. Brief Bioinform doi: 10.1093/bib/bbs001.

Wermuth CG, Ganellin CR, Lindberg P, Mitscher LA. 1998.Glossary of terms used in medicinal chemistry (IUPAC

S.W. Scholz et al.

14 Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 15: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

Recommendations 1998). Pure Appl Chem 70: 1129–1143.

Wernig M, Zhao JP, Pruszak J, Hedlund E, Fu D, Soldner F,Broccoli V, Constantine-Paton M, Isacson O, Jaenisch R.2008. Neurons derived from reprogrammed fibroblastsfunctionally integrate into the fetal brain and improvesymptoms of rats with Parkinson’s disease. Proc NatlAcad Sci 105: 5856–5861.

Wu G, Feng X, Stein L. 2010. A human functional proteininteraction network and its application to cancer dataanalysis. Genome Biol 11: R53.

Yeger-Lotem E, Riva L, Su LJ, Gitler AD, Cashikar AG, KingOD, Auluck PK, Geddie ML, Valastyan JS, Karger DR,

et al. 2009. Bridging high-throughput genetic and tran-scriptional data reveals cellular responses to a-synucleintoxicity. Nat Genet 41: 316–323.

Zheng B, Liao Z, Locascio JJ, Lesniak KA, Roderick SS, WattML, Eklund AC, Zhang-James Y, Kim PD, Hauser MA,et al. 2010. PGC-1a, a potential therapeutic target forearly intervention in Parkinson’s disease. Sci Transl Med2: 52ra73.

Zimprich A, Biskup S, Leitner P, Lichtner P, Farrer M, Lin-coln S, Kachergus J, Hulihan M, Uitti RJ, Calne DB, et al.2004. Mutations in LRRK2 cause autosomal-dominantparkinsonism with pleomorphic pathology. Neuron 44:601–607.

Genomics and Bioinformatics of Parkinson’s Disease

Cite this article as Cold Spring Harb Perspect Med 2012;2:a009449 15

ww

w.p

ersp

ecti

vesi

nm

edic

ine.

org

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from

Page 16: Genomics and Bioinformatics of Parkinson’s Diseaseperspectivesinmedicine.cshlp.org/content/2/7/a009449.full.pdf · Genomics and Bioinformatics of Parkinson’s Disease Sonja W.

15, 20122012; doi: 10.1101/cshperspect.a009449 originally published online MayCold Spring Harb Perspect Med 

 Sonja W. Scholz, Tim Mhyre, Habtom Ressom, Salim Shah and Howard J. Federoff Genomics and Bioinformatics of Parkinson's Disease

Subject Collection Parkinson's Disease

Functional Neuroanatomy of the Basal Ganglia

ObesoJosé L. Lanciego, Natasha Luquin and José A. Dysfunction in Parkinson's Disease

as a Model to Study MitochondrialDrosophila

Ming Guo

GeneticsAnimal Models of Parkinson's Disease: Vertebrate

DawsonYunjong Lee, Valina L. Dawson and Ted M. Pathways

and DJ-1 and Oxidative Stress and Mitochondrial Parkinsonism Due to Mutations in PINK1, Parkin,

Mark R. CooksonInnate Inflammation in Parkinson's Disease

V. Hugh PerryProgrammed Cell Death in Parkinson's Disease

Katerina Venderova and David S. Park

NeuropathologyParkinson's Disease and Parkinsonism:

Dennis W. DicksonDiseaseGenomics and Bioinformatics of Parkinson's

al.Sonja W. Scholz, Tim Mhyre, Habtom Ressom, et

Parkinson's DiseasePhysiological Phenotype and Vulnerability in

Sanchez, et al.D. James Surmeier, Jaime N. Guzman, Javier

DiseaseMotor Control Abnormalities in Parkinson's

CortésPietro Mazzoni, Britne Shabbott and Juan Camilo

ManagementFeatures, Diagnosis, and Principles of Clinical Approach to Parkinson's Disease:

João Massano and Kailash P. Bhatia

Parkinson's Disease: Gene Therapies

Patrick AebischerPhilippe G. Coune, Bernard L. Schneider and

The Role of Autophagy in Parkinson's DiseaseMelinda A. Lynch-Day, Kai Mao, Ke Wang, et al.

Functional Neuroimaging in Parkinson's Disease

EidelbergMartin Niethammer, Andrew Feigin and David

Parkinson's DiseaseDisruption of Protein Quality Control in

PetrucelliCasey Cook, Caroline Stetler and Leonard

Key QuestionsLeucine-Rich Repeat Kinase 2 for Beginners: Six

Lauren R. Kett and William T. Dauer

http://perspectivesinmedicine.cshlp.org/cgi/collection/ For additional articles in this collection, see

Copyright © 2012 Cold Spring Harbor Laboratory Press; all rights reserved

on May 24, 2020 - Published by Cold Spring Harbor Laboratory Press http://perspectivesinmedicine.cshlp.org/Downloaded from