GENOME-WIDE GENETIC INTERACTION ANALYSIS OF GLAUCOMA · PDF fileGENOME-WIDE GENETIC...
Transcript of GENOME-WIDE GENETIC INTERACTION ANALYSIS OF GLAUCOMA · PDF fileGENOME-WIDE GENETIC...
GENOME-WIDE GENETIC INTERACTION ANALYSIS OF GLAUCOMAUSING EXPERT KNOWLEDGE DERIVED FROM HUMAN PHENOTYPE
NETWORKS
TING HU†, CHRISTIAN DARABOS†, MARIA E. CRICCO, EMILY KONG, JASON H. MOORE
Institute for the Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth CollegeHanover, NH 03755, U.S.A.
E-mail: [email protected]
The large volume of GWAS data poses great computational challenges for analyzing geneticinteractions associated with common human diseases. We propose a computational framework forcharacterizing epistatic interactions among large sets of genetic attributes in GWAS data. We buildthe human phenotype network (HPN) and focus around a disease of interest. In this study, we usethe GLAUGEN glaucoma GWAS dataset and apply the HPN as a biological knowledge-based filterto prioritize genetic variants. Then, we use the statistical epistasis network (SEN) to identify asignificant connected network of pairwise epistatic interactions among the prioritized SNPs. Theseclearly highlight the complex genetic basis of glaucoma. Furthermore, we identify key SNPs byquantifying structural network characteristics. Through functional annotation of these key SNPsusing Biofilter, a software accessing multiple publicly available human genetic data sources, wefind supporting biomedical evidences linking glaucoma to an array of genetic diseases, proving ourconcept. We conclude by suggesting hypotheses for a better understanding of the disease.
Keywords: GWAS; Epistasis; Gene-gene interaction; Human phenotype network; Statistical epistasisnetwork; Biofilter; Eye diseases; Glaucoma; SNPs; Pathways.
1. Introduction
With the rapid development of genotyping technologies and exponential increase in computa-tional power, we are able to leverage the wealth of genome-wide association studies (GWAS)data to test millions of genetic variations (single nucleotide polymorphisms, SNPs) for theirassociations with common human diseases.1,2 However, common disorders are usually theresult of the non-linear combined effect of many variations. The study of these complex,epistatic interactions among multiple genetic attributes is crucial in explaining portions of thegenotype-to-phenotype associations.3,4
Detecting and analyzing complex genetic interactions in GWAS data pose great statisticaland computational challenges. Epistatic effects potentially involve any number of SNPs, froma couple to hundreds or even thousands. In turn, the uncertainty of size of the interacting SNPset would require interaction studies to comprehensively explore all possible combinations ofSNPs. A typical GWAS dataset encompasses hundreds of thousands of variations, makingthe enumeration of all pairwise SNP interactions computationally infeasible, and the com-putational cost increases exponentially with the number of features at hand. This inevitablybecomes a computational bottleneck of processing and analyzing GWAS data.
Therefore, genetic interaction studies meant to process modern GWAS data require thedevelopment of advanced informatics methodologies. These novel algorithms must be able to
†Co-first authors – TH and CD have contributed equally to this work.
simultaneously analyze large sets of interactions and identify genetic attributes potentiallyinvolved in higher-order interactions. Network modeling has emerged as a suitable frameworkfor such purposes,5–7 thanks to its ability to represent a large number of entities, as vertices,and their relationships, as edges.
In this study, we propose an informatics framework of detecting and analyzing genetic in-teractions for GWAS data. We use the example of the GLAUGEN study, a GWAS on glaucomaconsisting of over a million attributes. We pre-screen the dataset using a knowledge-based fil-ter by building human phenotype network (HPN) to prioritize SNPs and reduce the searchspace according to their known relationship to the disease/phenotype in question. This re-duced SNP list is then used to quantitatively evaluate all pairwise epistatic interactions andto build statistical epistasis network (SEN) to characterize their global interaction structure.The key SNPs identified by the SEN are functionally annotated and evaluated for their inter-action with other disorders. This new framework has the potential to elucidate large parts ofthe complex genetic architecture of common human diseases, such as glaucoma, and identifykey SNPs for further biological validations.
2. Dataset and Methods
2.1. Glaucoma and the GLAUGEN Study
Glaucoma, a neurodegenerative disease, is the primary cause of irreversible blindness, affectingover 60 million people worldwide. The most common kind of glaucoma, in all populations, isprimary open-angle glaucoma (POAG). Currently the only modifiable risk factor of POAGis intraocular pressure (IOP), but even lowering that will only slow the process, not stop it.8
However, a substantial amount of POAG has been shown to have a genetic basis. Familialaggregation of POAG has been long recognized and studied to find multiple loci linked tothem, causing the discovery of glaucoma-causing genes myocilin (MYOC), optineurin (OPTN)and WDa-repeat domain 36.9 About 5% of POAG is presently attributed to a single-gene orMendelian forms of glaucoma. More cases of POAG are caused by the combined epistaticeffects of many genetics risk factors.10
The Glaucoma Gene Environment (GLAUGEN) seeks to illuminate the origin of the dis-ease, to discover genetic loci associated with POAG, and to identify gene-gene and gene-environment associations.8 GLAUGEN is a GWAS, case-control study, with about 2,000 un-related cases and over 2,300 controls. To be involved in the study, both the cases and thecontrols had to be at least 40 years of age and European derived or Hispanic Caucasian.Subjects were genotyped with 1,048,965 SNPs examined.8
2.2. Human Phenotype Network (HPN)
The human disease network,11 or the more general human phenotype networks (HPNs),12,13
are mathematical graph models where nodes represent human genetic disorders and edgeslink those nodes with shared biology.14 The underlying connections of the HPN contribute to
atryptophan-aspartic acid
Permanent tooth development
Response to platinum-based chemotherapy in non-small-cell lung cancer
Ovarian reserve
beta Carotene
Heart Function Tests
Odontogenesis
Behcet's disease
Membranous Glomerulonephritis
Pulse
Serum uric acid levels
Response to TNF antagonist treatmentWalking
PR interval
Heart Atria
Alanine Transaminase
Tooth EruptionAcenocoumarol
Emphysema
Gallbladder cancer
Left Atrial Function
Brain structure
Lupus Vulgaris
Interleukin-1beta
Cyclic Peptides
bipolar disorder and depression (combined) Schizophrenia
Testicular Neoplasms
Parathyroid Hormone
1-Alkyl-2-acetylglycerophosphocholine Esterase
Electrocardiography
Sj�gren's syndrome
Lung cancer
Frontal Lobe
External Ear
Open-Angle Glaucoma
Dupuytren's disease
Exfoliation Syndrome
Phytosterols
Speech Perception
alpha-Macroglobulins
vWF and FVIII levels
Subcutaneous Fat
Glucose Tolerance Test
Adverse response to carbamapezine
Urinary Bladder Neoplasms
Hair
Paget's disease
Vascular endothelial growth factor levelsSelf-reported allergy
Primary biliary cirrhosis
5-HTT brain serotonin transporter levels
Bladder cancer
Coronary spasm
oxaliplatin [Supplementary Concept] Telomere length
Respiratory Function TestsBrain imaging
Menarche and menopause (age at onset) Cystatin CSepsis
Precursor Cell Lymphoblastic Leukemia-Lymphoma
Menarche
Nevus count
Citalopram
Carotid Artery Diseases
Bone Density
Atrial Natriuretic Factor
Lymphocytic Leukemia
Eye color
Ankle Brachial Index
Thyroid cancer
Idiopathic pulmonary fibrosis
Vitamin B 12
Common traits (Other)
Sleep
Forced Expiratory Volume
Aging traits
Depression
Gallbladder Neoplasms
White matter integrity
Esophageal cancer
Endometriosis
Ileal carcinoids
Major CVD
T-Lymphocytes
Atherosclerosis
Digit length ratio
Interleukin-6
Tumor biomarkers
Coronary Vasospasm
Carbamazepine
Biliary Liver CirrhosisBrain Mapping
Thyroid Neoplasms
Cytomegalovirus Vaccines
Soluble levels of adhesion molecules
Alcohol Drinking Sex Hormone-Binding Globulin
Thyrotoxic hypokalemic periodic paralysis
Quantitative traits
HIV-1 viral setpoint
Neurobehavioral Manifestations
Luteinizing Hormone
Glycoproteins
Pancreatic cancer
Optic Nerve
Antidepressive Agents
Urinary albumin excretion
Hodgkin Disease
CD8-Positive T-Lymphocytes
Partial Thromboplastin Time
Astigmatism
Myeloid Leukemia
Ferritins
Colorectal cancer
Primary Ovarian Insufficiency
Palmitic acid (16:0) plasma levels
Memory
Social Desirability
Apolipoproteins C
Non-albumin protein levels
Hoarding
Amygdala
Occipital Lobe
carbohydrate-deficient transferrin [Supplementary Concept]
Meningococcal disease
Aromatase Inhibitors
Nasopharyngeal Neoplasms
Cognitive test performance
Anemia
Cornea
Lymphoid Leukemia
Eosinophils
Visual Cortex
Benzodiazepines
Anthropometric traits
Metabolic Syndrome XEndometrial cancer
Perphenazine
Chemokines
Germinoma
Tumor Necrosis Factor-alpha
Sneezing
Cardiac Arrhythmias
Response to gemcitabine in pancreatic cancer
Movement Disorders
Select biomarker traits
Thoracic Aortic Aneurysm
Resting heart rate
Glycemic traits
Endometrial NeoplasmsCardiovascular Diseases
Tamoxifen
lipoprotein A-I [Supplementary Concept]
Intestinal Diseases
Maximal Midexpiratory Flow Rate
Extraversion (Psychology)
Pain
Ulcerative colitis
AIDS
Chronic Hepatitis B
Glucosylceramides
Dietary macronutrient intake
Carotid intima media thicknessDysplastic Nevus Syndrome
Migraine Disorders
Crohn's disease and celiac disease
Mean forced vital capacity from 2 exams
Tetralogy of Fallot
Celiac disease and Rheumatoid arthritis
Insulin-Like Growth Factor I
QT interval
Hyperkalemic Periodic Paralysis
Vital Capacity
Butyrylcholinesterase
Urinalysis
Cystatins
Beta-2 microglubulin plasma levels
gamma-Glutamyltransferase
Brain Natriuretic Peptide
Blood Glucose
Blue vs. brown eyes
Allopurinol
Complement C4b-Binding Protein
Reading and spelling
Proinsulin levels
Hyperactive-impulsive symptoms
Vascular Calcification
Obesity
Cortical thickness
Vitamin K
Oligospermia
Neurotic Disorders
Cardiovascular disease risk factors
Hormone measurements
Braces
Weight
Inflammatory biomarkers
Transforming Growth Factor beta1
Blood Platelets
Suntan
Gastric cancer
Cleft Lip
Interleukin-6 Receptors
T-tau
Personality dimensions
Uterine fibroids
Folic Acid
Intercellular Adhesion Molecule-1
L-Lactate Dehydrogenase
Skin pigmentation
Basal Cell Carcinoma
Word reading
Caffeine
Protein quantitative trait loci
Erectile Dysfunction
Follicle Stimulating Hormone
Primary sclerosing cholangitis
Isoxazoles
Breast Neoplasms
Morbidity-free survival
Insulin-Like Growth Factor Binding Protein 3
Interleukin-2 Receptor alpha Subunit
Migraine
Stearic acid (18:0) plasma levels
Folate pathway vitamin levels
Cardiac hypertrophy
Behavioural disinhibition (generation interaction)
Muscular Diseases
Monocyte count
Iris
Cervical cancer
Aorta
Chronic Kidney Failure
LDL Lipoproteins
Fuchs Endothelial Dystrophy
Hair Color
Polysaccharides
Hepatitis B Vaccines
Glaucoma
Corneal Topography
Eye
Alpha-Globulins
Hippocampus
Brain Waves
Optic Disk
Prostate cancer
Anti-cyclic Citrullinated Peptide Antibody
Intracranial volume
Reasoning
Two-hour glucose challenge
Inattentive symptoms
Circulating cell-free DNA
Floxacillin
Jaw Abnormalities
Cognitive decline
Papillomaviridae
Fibromyoma Myoglobin
Alopecia
Vitamin A
Central Nervous System
Vitamin B 6
Hippocampal volume
Apolipoproteins E
Acquired Immunodeficiency SyndromeAngiotensin-converting enzyme activity
Glomerulosclerosis
human [Supplementary Concept] F8 protein
Leptin
HIV-1 susceptibility
Blood trace element (Zn levels)
Chemokine CCL2
Serotonin Plasma Membrane Transport Proteins
MelanomaIGA Glomerulonephritis
InsulinEpirubicin-induced leukopenia
Lung Neoplasms
Allergic Rhinitis
Acetaminophen
Metabolic traits
Face
Aging
Neuroanatomy
SmokingDepressive Disorder
Transferrin
Blood pressure measurement (cold pressor test)
Inflammatory bowel disease
mean corpuscular hemoglobin concentration
Dengue shock syndrome
Autistic Disorder
Gallbladder Diseases
IgE levels
Colony-Forming Units Assay
alpha-Tocopherol
Red blood cell count
Chronic myeloid leukemia
Blood Flow Velocity
Esophagitis
Tissue Plasminogen Activator
Endothelial function traits
Response to acetaminophen (hepatotoxicity)
Smoking behavior
BrainNeutrophils
Common Carotid Artery
Temporal Lobe
Dengue Hemorrhagic Fever
Pure-Tone Audiometry
Abdominal Aorta
Taste
InflammationVenous thromboembolism
Antipsychotic Agents
Testicular cancer
Caffeine consumption
Cytomegalovirus antibody response
Intuition
Chronic hepatitis B infection
Migraine without aura
AsthmaKidney stones
Crohn Disease
Myeloproliferative Disorders
Follicule stimulating hormone
Liver Cirrhosis
Oocytes
Chemokine CCL4
Gout
Interleukin-12
Non-substance related behavioral disinhibition
Metabolism
Eye color traits
Transferrin Receptors
RisperidoneAddiction
Circadian Rhythm
Nasopharyngeal carcinoma
Adverse response to aromatase inhibitors
Subclinical atherosclerosis traits (other)
IgA levels
Sphingomyelins
Sudden Death
Osteitis Deformans
Creutzfeldt-Jakob Syndrome
End-stage coagulation
Rhabdomyolysis
Keratoconus
Cardiomegaly
Vascular Diseases
von Willebrand Factor
Neuropsychological Tests
Psychological Sexual Dysfunctions
Warfarin
Myopia
Soluble ICAM-1
Leukopenia
Stevens-Johnson Syndrome
Hearing Loss
Left Ventricular Function
Cardiac structure and function
Response to iloperidone treatment (PANSS-T score)
Autism
Prion diseases
Pancreatic Neoplasms
Alkaline Phosphatase
Anxiety
N-glycan levels
Multiple myeloma
Goiter
Event-related brain oscillations
Dialysis-related mortality
Allergic sensitization
Response to Vitamin E supplementation
Mental Disorders
Neuranatomic and neurocognitive phenotypes
Functional Laterality
Recombination rate (males)
Facial morphology
Foot
Hand Strength
Pulmonary function
Carotid Stenosis
Amyloid beta-Peptides
Response to platinum-based agents
Anthropometry
Religion and Psychology
Melanosis
Angiotensin-Converting Enzyme Inhibitors
Thyroid volume
Hodgkin's lymphoma
Retinal Vein
Interleukin-10
Echocardiographic traits
Androgen levelsTelomere
Response to cerivastatin
CerebrumSclerosis
Illicit drug useMyasthenia gravis
Heart rate variability traits
Gambling
Response to radiation
Immunoglobulin E
Thyrotropin
Mortality
Brugada syndromeCeliac disease
Blood Viscosity
Phospholipids
Mortality among heart failure patients
tau Proteins
Volumetric brain MRI
P-tau181p
Small Cell Lung Carcinoma
Adiponectin
Aortic root size
Systemic lupus erythematosus and Systemic sclerosis
Emphysema-related traits
gemcitabine [Supplementary Concept]
Albuminuria
Response to antineoplastic agents
Prostatic Neoplasms
Protein C
DehydroepiandrosteroneTanning
Kidney Diseases
Colonic Neoplasms
Working memory
Vascular Endothelial Growth Factor A
Brachial Artery
Oleic acid (18:1n-9) plasma levels
Essential Tremor
Iris characteristics
Caudate nucleus volume
Lipid Metabolism
Palmitoleic acid (16:1n-7) plasma levels
Hypothyroidism
Substance-Related Disorders
Gestational Diabetes
Paget's disease
Forced Vital Capacity
Angiography
Azoospermia
Hair morphology
Interleukin-18
Glioma
Tobacco Use Disorder
Head and Neck Neoplasms
Hematological and biochemical traits
Rheumatoid arthritis
Esophageal adenocarcinoma
Menopause
Focal Segmental Glomerulosclerosis
Hearing impairment
Drug-Induced Liver Injury
Iron deficiency
Esophageal Neoplasms
Interleukin-8
Psychomotor Performance
Blood Cells
Stomach Neoplasms
Personality
Sclerosing Cholangitis
Breast size
Metabolome
Anticonvulsants
Homocysteine
Attention Deficit Disorder with Hyperactivity
Serum selenium levels
Sunburns
Blood Sedimentation
Blood Vessels
Renal sinus fat
Clozapine
Macular Degeneration
Radiation response
Ovarian cancer
Autoimmune Diseases
Urinary metabolites
Short-Term Memory
Sensory disturbances after bilateral sagittal split ramus osteotomy
Colorectal Neoplasms
Thyroxine
Cleft Palate
Aspartate aminotransferase
Entorhinal Cortex
Acute lung injury
Vitamin K 1
Graves Disease Atrioventricular conduction
Vitamin E
Natriuretic peptide levels
Cortical structure
Erythrocyte Indices
Kawasaki disease
Nephropathy
DNA
radiation-related) Thyroid cancer (Papillary
Functional MRI
Kidney Calculi
Duodenal Ulcer
Coffee
Internal Carotid Artery
Pain Measurement
HIV-1
Handedness in dyslexia
Nevi and Melanomas
Migraine - clinic-based
Addictive Behavior
LongevityMonocyte early outgrowth colony forming units
Drug Toxicity
Vascular constriction
Chronic kidney disease
Insulin-Like Growth Factor Binding Protein 4
Parietal Lobe
Serum dimethylarginine levels (symmetric)
Comprehensive strength and appendicular lean mass
Aortic-valve calcification
Caudate Nucleus
Erythropoietin
Immunoglobulin A
Venous Thrombosis
Platelet function and related traits
Glucose Transporter Type 2
Biomedical quantitative traits
Exploratory Behavior
Soluble E-selectin levels
Marijuana Abuse
Mucocutaneous Lymph Node Syndrome
Osteoporosis
Attention Deficit and Disruptive Behavior Disorders
Insulin-Like Growth Factor Binding Protein 5
Antipsychotic-induced QTc interval prolongation
Asperger SyndromeLymphocytes
Sleep duration
Moyamoya disease
Survival
Infantile hypertrophic pyloric stenosis
E-Selectin
Carotid Arteries
Retinol levels
Fuchs's corneal dystrophy
gamma-Glutamylcyclotransferase
Ovarian NeoplasmsFructose-Bisphosphate Aldolase
Meningococcal Infections
Apolipoprotein A-I
Peroxidase
Intra-Abdominal Fat
Abdominal Fat
Coronary Disease
Vitamin D
Thyroid hormone levels
Blue vs. green eyes
Sphingolipids
Hypertrophic Pyloric Stenosis
Upper aerodigestive tract cancers
Response to tamoxifen in breast cancer
Life Expectancy
Ceramides
Serum alkaline phosphatase levels
β2-Glycoprotein I (β2-GPI) plasma levels
Progranulin levels
Insulin-like growth factors
Adverse response to lamotrigine and phenytoin
Dehydroepiandrosterone Sulfate
Dupuytren Contracture
Coronary restenosis
Abdominal Aortic Aneurysm
Male fertility
Leukocyte Count
fibrin fragment D [Supplementary Concept]
CD40 Ligand
Systemic Scleroderma
Apolipoprotein Levels
Chronic lymphocytic leukemia
Retinal vascular caliber
Behavior
Carcinoid Tumor
Heart Diseases
Interleukin 1 Receptor Antagonist ProteinBleomycin sensitivity
total Cholesterol
Iris color
Response to treatment for acute lymphoblastic leukemia
Corneal curvature
Iron-Regulatory Proteins
Coffee consumption
Pubertal anthropometrics
Blood Proteins
Barrett's esophagus
Treatment response for severe sepsis
Non-Small-Cell Lung Carcinoma
Genetic Recombination
Lymphoma
Migraine with aura
Digestive system disease (Barrett's esophagus and esophageal adenocarcinoma combined)
Haptoglobins
Blood Coagulation Factor Inhibitors
P-Selectin
Information processing speed
Circulating vasoactive peptide levels
Hepatocellular carcinoma
Type 2 diabetesChemerin levels
Hepatitis B vaccine response
0"
100"
200"
300"
1" 5" 9" 13" 17"
frequ
ency"
degree"
0.1$
1$
10$
100$
1000$
1$ 10$ 100$
Fig. 1. The SNP-based human phenotype network, filtered (edge weight cutoff = 0.002). Vertices are coloredby “modules”13 to increase readability. The vertex sizes are proportional to the number of associated SNPs.The degree distribution is given on both linear and logarithmic scales.
the understanding of the basis of disorders, which in turn leads to a better understanding ofhuman diseases.
In the present work, we rely on the SNP-based HPN, where connected diseases share com-mon SNPs. We use two sources for the phenotype-to-SNPs mapping: the National HumanGenome Research Institute GWAS catalog,15 a manually curated list of all the National Insti-tute of Heath (NIH) funded GWAS studies, and the NIH’s database of Genotypes and Phe-notypes (dbGaP, http://www.ncbi.nlm.nih.gov/gap). To build the HPN, all traits listed inthe GWAS catalog and dbGaP are annotated with their risk-associated SNPs. All the traitsbecome nodes in our network. Then, traits that have one or more SNPs in common are linkedin the HPN by an edge, the weight of which is proportional to the size of the SNPs’ overlap.The GWAS catalog and dbGaP report 1,252 traits combined, annotated with 37,681 SNPs in16,411 loci. The resulting bipartite network is projected in the space of phenotype vertices toobtain the HPN shown in Fig. 1.
The HPN contains 985 vertices and over 26,000 edges, and encompasses all phenotypeslisted in the GWAS catalog and dbGaP, provided that they are connected to at least one othertrait. The resulting network is extremely dense, with an average degree greater than 500.
2.3. Statistical Epistasis Network (SEN)
Statistical epistasis network (SEN) is designed to study a global aggregation of pairwiseepistatic interactions among a large number of factors.16,17 First, pairwise epistatic interaction
is quantified using the information-theoretic measure called information gain18 for all possiblepairs of genetic attributes. Specifically, the genotypes of two SNPs A and B, and the discretephenotypic class C are regarded as variables. Mutual information I(A;C) or I(B;C) measuresthe shared information between (A or B) and C calculated as I(A;C) = H(A)+H(C)−H(A,C)
or I(B;C) = H(B) +H(C)−H(B,C), where entropy H measures the uncertainty of a randomvariable or joint multiple random variables. Therefore, mutual information I(A;C) or I(B;C)
describes the reduction of the uncertainty of the phenotypic class C given the knowledge ofthe genotype of A or B, known as the main effect of A or B on C. Similarly, the joint mutualinformation I(A,B;C) measures the total effect of combining A and B on explaining the phe-notype C. Subtracting the individual mutual information I(A;C) and I(B;C) from I(A,B;C)
captures the gained information on C by considering A and B together rather than individ-ually. This information gain IG(A;B;C) is a practical measure for the epistatic interactioneffect of SNPs A and B on phenotype C, and has been used as an efficient non-parametric andmodel-free statistical quantification of pairwise epistasis in genetic association studies.19–23
Second, all pairwise SNP-SNP interactions are quantified and ranked. We build statisticalepistasis networks where vertices are SNPs and edges are weighted pairwise interactions. Weonly include pairs of SNPs if their interaction strengths are stronger than a preset threshold.We analyze the network topological properties at each cutoff value of the threshold, such as thesize of the network (the number of its vertices and the number of its edges), the connectivity ofthe network (the size of its largest connected component), and its vertex degree distribution.
Then, a threshold of the pairwise interaction strength is determined systematically byfinding the cutoff where the topological properties of the real-data network differentiate themost from the null distribution of permutation testing. Such a SEN provides a significantglobal structure of clustered strong pairwise epistatic interactions associated with a particularphenotype. It serves as a map for further network properties investigation and key SNPsprioritization as discussed in the following section.
2.4. Network Property Analysis of SEN
The assortativity of a network measures the propensities of vertices with similar characteristicsto connect to one another.24,25 In the context of SENs, we are interested in looking into themain effect assortativity, i.e., whether there exists a correlation of main effects between pairsof interacting SNPs. Such main effect assortativity is calculated as the Pearson correlationcoefficient r of the main effects at either ends of an edge in a SEN,
r =M−1
∑i jiki − [M−1
∑iji+ki
2 ]2
M−1∑
ij2i +k
2i
2 − [M−1∑
iji+ki
2 ]2, (1)
where M is the total number of edges, and ji and ki are the main effects, calculated asthe mutual information of a SNP and the phenotypic class of the vertices (SNPs) at theends of the i-th edge, with i = 1, 2, ...,M . The coefficient r lies between -1 and 1, with r =
1 indicating perfectly assortative, r = 0 for non-assortative, and r = −1 for complete dis-assortative networks.
At the vertex level, centrality measures the importance of individual vertices in a network.The most commonly used centrality measure is the node’s degree CD(v), which is the total
Phenotype #SNPs DegreeExfoliation Syndrome 1 1Coronary Artery Disease 639 2Cardiovascular Diseases 70 2Corneal curvature 13 4Optic Disk 17 4Open-Angle Glaucoma 6 5Glioma 12 2Eye 13 4Glaucoma 18 9Coronary restenosis 56 3
2SHQ�$QJOH�*ODXFRPD
([IROLDWLRQ�6\QGURPH
&DUGLRYDVFXODU�'LVHDVHV
*ODXFRPD (\H
2SWLF�'LVN
&RURQDU\�$UWHU\�'LVHDVH
*OLRPD
&RURQDU\�UHVWHQRVLV
&RUQHDO�FXUYDWXUH
(a) (b)
Fig. 2. Glaucoma network neighborhood. (a) Phenotypes, including the number of SNPs associated witheach phenotype and the “strength” of the interaction (degree) with the rest of the sub-network. (b) a depictedrepresentation of the Glaucoma-centered sub-network, in which the node sizes are proportional to the numberof associated SNPs.
number of edges connected to vertex v. The degree of a SNP in the statistical epistasis networkshows the number of other SNPs with which it is interacting. Betweenness centrality is a moresophisticated metric that quantifies the number of times a vertex is part of the shortest pathbetween any pair of vertices,26 represented as CB(v) =
∑s 6=v 6=t∈V
σst(v)σst
, where σst is the totalnumber of shortest paths from vertex s to vertex t and σst(v) is the number of those pathsthat pass through vertex v. Closeness centrality, defined as CC(v) = 1∑
s 6=v dvs, where dvs is the
distance between vertices v and s.27,28 This metric describes how easily a given vertex canreach all other vertices in a network. In the context of SENs, centrality measures are used toidentify key SNPs that play an essential role in the global interaction structure.
3. Results
3.1. Pre-screening of Glaucoma Data Using HPN
To reduce the computational cost (in time and resources) of analyzing a GWAS dataset, thedata must be filtered, selected and/or prioritized prior to processing. Algorithmic filteringmethods fall into two main families: knowledge-driven and data-driven. Knowledge-drivenfilters relies on the actual known biology behind the data, whereas data-driven filtering solelyrelies on the statistical pre-processing of the data, regardless of their type. ReliefF, SURF,and TuRF29,30 are examples of data-driven statistical methods of preprocessing GWAS data.For a complete review of the knowledge-based vs. data-driven filtering, refer to Sun et al.31
The method we propose to use here starts with the full HPN described in Section 2.2.We identify the glaucoma vertex and establish a list of the disorders and phenotypes in itsimmediate neighborhood. The 10 members of the “glaucoma neighborhood” are presented intable in Fig. 2.
Using the sub-HPN, we compile a list of 948 SNPs that have been associated with anyof the phenotypes in the sub-network. We subsequently annotate each of the risk-associatedSNPs with its linkage disequilibrium32 SNPs, indirectly associated with an increased risk. Thefull list of 4,388 features containing both direct and indirect risk-associated SNPs is usedto filter the full one million SNPs of the GLAUGEN GWAS data. The subset made of theoverlap between our priority-list and the GLAUGEN dataset, composed of 2,380 SNPs, is
used to build the SNP interaction network presented in the following section.
3.2. Statistical Epistasis Network of Glaucoma
Using the list of 2,380 SNPs, we evaluate all the pairwise epistatic interactions (>2.8 million)using the information gain measure. The strongest interacting pair is SNPs rs13123790 andrs6572380 with a strength of 1.163% association of the disease class. We set the pairwiseinteraction strength threshold decreasing from the strongest observed value with a step of0.0001. At each cutoff, a SEN is built by including pairs of SNPs as edges and two endvertices if their interaction strength is above the cutoff. Then for each network we evaluate itstopological properties including the number of non-isolated vertices, the number of edges, thesum of weighted edges, and the size of its largest connected component. A 100-fold permutationtest assesses the significance of these observed network properties. We permute the data 100times by randomly shuffling the disease class column, and for each permuted dataset all thepairwise interaction strengths are calculated and networks are built using the same cutoffs asthe read-data networks.
We observe a network at cutoff 0.693% that has a largest connected component includingsignificantly more vertices (p = 0.05) than those permuted-data networks at the same cutoff.This largest connected component has 713 vertices and 789 edges (Fig. 3).
The main effect assortativity of this network is -0.053 with a significance of p = 0.04
using a 100-fold edge swapping permutation test where we randomly pick 10× |E| (|E| is thetotal number of edges) pairs of edges and swap their end vertices. This significant negativeassortativity indicates that the main effects of interacting SNPs are negatively correlated, i.e.,high main-effect vertices tend to interact more with low main-effect vertices.
The degree, betweenness, and closeness centralities of all the vertices in the network areevaluated, and their distributions are shown in Fig. 4. Most of vertices have degrees equal toor less than 3. However there are a minority vertices with a degree higher than 5, meaning thatthey interact with larger number of other SNPs. In the distribution of betweenness centrality,the majority of vertices have low betweenness. However, some vertices have a much higherbetweenness than the rest. The closeness centrality follows a normal distribution indicatingthat most vertices have similar access to all other vertices. We sort the vertices by descendingcentrality. At the top, we find the key SNPs that are potentially involved the genetic processesresponsible for glaucoma. We further investigate the clinical and biomedical implications ofthese SNPs through Biofilter33 functional annotations. Biofilter provides a single interfaceto multiple publicly available online human genetic datasources (https://ritchielab.psu.edu/software/biofilter-download).
4. Discussion
In this study, we proposed a genetic interaction analysis framework of using human phenotypenetwork (HPN) as knowledge-based data filter and subsequently statistical epistasis network(SEN) for quantitative epistatic interaction detection and visualization. Our method was suc-cessfully applied to GLAUGEN GWAS data and we were able to identify a connected networkof interacting SNPs that included significantly more SNPs than expected randomly. Such a
rs6571228
rs5768690
rs9941233
rs953259
rs956627
rs992014rs9366399
rs10814916
rs7678436
rs2822557
rs4640244
rs9841585
rs4369779
rs13266634
rs738494
rs10777084
rs2274557
rs1320288 rs3752591
rs13201929
rs10165941
rs5219
rs17141444
rs9563035
rs2079918
rs7466269
rs9557082
rs10505031 rs813164
rs10486771 rs7138792
rs1077165
rs4653061 rs10184275
rs3752823
rs2462140
rs10010325 rs10008860
rs10777845 rs245136
rs2277142 rs2277912
rs9374274
rs1401471
rs4743034 rs16872571
rs2605463
rs9472138
rs9349200
rs1570211
rs11669012
rs17706439
rs1771428
rs189798
rs6996971
rs1504749
rs7336332
rs1396477
rs12517906
rs1220025
rs11714343
rs11664670
rs150724
rs2449222
rs569688
rs7209435
rs930716
rs11580979 rs1533971
rs2732494
rs12076129 rs5015480
rs10505879
rs10508288
rs9935794
rs1816002 rs4455882
rs17103138
rs882300
rs6663404
rs2162256
rs11782819 rs284489
rs989488
rs9487094
rs10502372
rs10509779
rs2281680
rs12199222
rs4897336 rs7172432
rs4313471
rs2273194
rs4121677
rs7542900 rs7178902
rs10237118 rs11108037
rs739981
rs551719
rs1035025
rs688391rs2621469
rs6982250
rs2106913
rs3924153
rs1491192
rs1436900
rs1553847 rs1270837
rs756199
rs7017184 rs2291256
rs597425rs11730157
rs3125050 rs1490388
rs1568482
rs1524976
rs3825214 rs1251588
rs3748069
rs7914674
rs4371861
rs972869
rs891088
rs10848958
rs7525553
rs886374
rs892267
rs888663
rs894558
rs891835
rs896076
rs3916765
rs747175
rs353121
rs3743200
rs3743157 rs1858943
rs3745060
rs7171517
rs1935881
rs7249094
rs300483rs3805647
rs4537554
rs391300
rs1245531 rs3118914
rs173700rs2844479
rs11981985
rs7191006
rs710841
rs6885224
rs2149998
rs10499480
rs12502295
rs1037124
rs7318731
rs3766793
rs17034592
rs10468542 rs4242252
rs2580816 rs11856676
rs12741973
rs9449444
rs344907
rs10506546
rs1746048
rs7128948
rs2783169
rs17084051
rs6918956
rs923375
rs4503747
rs10906115
rs11154022
rs2286036
rs2782934
rs9946041 rs6764363
rs270504
rs10791993
rs2347824
rs1367158
rs1805015 rs1802295
rs4721923
rs2853552
rs2171
rs2849460
rs10518996
rs4722613
rs943855
rs10506627 rs1239947 rs7756992
rs944260
rs540937rs8048899
rs9326439
rs2522425
rs13097792
rs538827
rs4771591
rs282544
rs2015454
rs2829588 rs2522422
rs4672150
rs1558139
rs2750023
rs6020814
rs6504218
rs553346
rs10014145
rs322239
rs7743494
rs7502462
rs2670321
rs1532357
rs2948998
rs17171705
rs10424480
rs3827414
rs1944866
rs1949009
rs300493
rs6725887
rs7403531
rs10014970
rs4365558
rs11211081
rs7025486
rs13438327 rs334213
rs1880781
rs188019
rs1040046
rs9568793 rs1901103
rs8055236
rs535343
rs4951536
rs16852525
rs2024341
rs785422
rs517871
rs11131055
rs11123849
rs13077378
rs11708189
rs2074404
rs6585827
rs10502032
rs17286397
rs1950500
rs3825569
rs1945085
rs499818
rs11170631
rs7069291
rs1058793
rs2306374 rs7812590
rs1875206
rs1874326
rs2410501
rs2627775
rs727957
rs1950702
rs11122547
rs9626074
rs4793501
rs745557rs10829156
rs17356672
rs11926571 rs12592821
rs7954465
rs655487
rs1293151
rs1438342
rs12356233 rs10518527 rs2319517
rs1410518
rs10863936
rs1406961
rs804125
rs974110
rs11807640
rs10518622
rs7873584
rs7314446
rs2383208
rs6435396
rs3774937
rs11241936
rs9391221 rs560713
rs6499640
rs1847118 rs1407149
rs248471
rs1137
rs13012046
rs12576239
rs7833986
rs4982323 rs2478830
rs12150284
rs1229542 rs7669668
rs4842838 rs498872 rs1017307
rs1029973
rs11090754
rs7498403
rs6905922
rs10801913
rs3797235
rs6476642
rs928952
rs4792192
rs6129032
rs7810842
rs2385576
rs1399004
rs212062
rs11223996
rs11127920
rs10496091
rs13389923
rs1545552
rs1275419 rs1052131
rs1865640
rs2146758
rs1733724
rs1007000
rs10922273 rs2470984
rs1111233
rs6780569 rs10895547
rs1525361 rs10085496
rs10834309
rs11599750
rs925019
rs2853676
rs2876303
rs4379368
rs745324
rs6779469
rs2753268
rs1538138
rs1333040
rs4687161
rs17565975
rs914981
rs13123790 rs12651137
rs11921014 rs7239883
rs7669856
rs6572380
rs7754614 rs706080
rs10504130
rs7129429
rs9328448
rs9323914
rs1192415
rs4507859
rs6708562
rs2750024
rs1357523
rs9897551
rs2833836 rs9317297 rs10497323
rs1467242 rs1549250
rs1444439
rs7003283
rs1741344
rs6107955 rs7823896 rs11918414
rs779388 rs6010620
rs2171963
rs10509680
rs10510189
rs297367
rs7084752
rs7481311 rs1057510 rs1540923
rs918911
rs16852600
rs7792939
rs9498099 rs766992
rs2123808
rs4800148
rs2142219 rs1039524
rs2221618
rs1151008
rs2701248 rs10518221
rs7028544
rs4978434
rs11196686 rs1434709
rs11943513
rs7090871
rs1048228 rs701276
rs1325273
rs9863717
rs798489
rs9285397
rs803422
rs7031748 rs10932055
rs6895327
rs10807323
rs1859428
rs10816943
rs9540221 rs2047870
rs1861612
rs10738886
rs10828213
rs12359685
rs2223662
rs13102884
rs13117608
rs2080401
rs7208097
rs2048327
rs10493900
rs1559777
rs12418204
rs6566726
rs8190893
rs1167242
rs26438
rs7604827
rs4421636
rs6565681 rs1473533
rs1324034 rs1012997
rs10929060 rs10485733 rs6795970
rs6699417
rs834485rs3111828
rs17550532 rs5751614
rs2367563
rs1145153
rs831571
rs1145796
rs831983
rs7183263 rs12338076
rs10520520 rs815815
rs6284
rs968008
rs4380028
rs788178rs7856675
rs705384
rs7044529
rs2236691
rs7084584
rs1965072
rs4979264 rs9521112
rs1216472
rs7029757
rs7669522 rs13240504
rs2554380 rs9399799
rs284495
rs494459 rs3134904
rs7045593 rs704744
rs1044581
rs167275
rs1105778
rs17261321 rs4845405
rs1324982 rs2439312
rs4778818 rs16940375
rs17553657 rs3027322
rs10504592 rs1635281 rs472265
rs7648727 rs1376486
rs2547416
rs1022141
rs17041869
rs964184
rs10483727 rs3129055
rs7916697 rs9644059
rs17128272
rs557962
rs6856061
rs795955rs9294851
rs6013871
rs4129959
rs38831
rs1889151
rs12597778
rs12420080
rs1260836
rs1888814
rs3786800
rs7665764
rs16861329 rs885389
rs16860271
rs4952125
rs1412444 rs1932397
rs1412829
rs1935681
rs6774494
rs6780510
rs5215
rs7606754
rs280171
rs7112925
rs6773931
rs1230545
rs295208
rs9521509
rs6869688 rs1933988
rs8111071
rs10495936
rs1076673
rs17584499
rs1267043
rs4666636 rs7013518
rs4346352
rs1541010
rs1480577
rs1719079
rs1754073 rs1490331
rs10754458 rs9359617
rs4405822
rs587847
rs9358118
rs16893890
rs9296824 rs10282118
rs6671793
rs12358475
rs1870661 rs2168159
rs16897288
rs2206734
rs6667140
rs6670655
rs11656696
rs1293470
rs9325777
rs2717128
rs919129
rs7741993 rs7488278
rs642773
rs7963889
rs1476882
rs7742369
rs7651039
rs758099
rs10491472
rs6507838 rs1363258
rs12940030
rs6570507
rs6765739
rs12679004
rs4527850
rs1435066 rs10915424
rs10430923
rs6929078
rs4977574
rs13150558
rs7092929 rs792747
rs2794467
rs8000245
rs16886072
rs9825379
rs1410811
rs1898162 rs2390024
rs11782634 rs913170
rs9386234
rs7809017
rs2405781
rs140174
rs12682841
rs11861962
rs11859036
rs668204
rs1484948
rs17364464
rs3771395 rs941567
rs1549318
rs17291650
rs979976
rs343079
rs4794665 rs9318225 rs7660418
rs3847375
rs11900940 rs6564651
rs6480314 rs6563943 rs989548
rs1572072
rs274917
rs4072683
rs2188857
rs2376999
rs7000183
rs6036896
rs13167932 rs952361rs17288067
rs1281317
rs16907259
rs2767000
rs11062784
rs1870301
rs2501968 rs10502182 rs1968011 rs1019156
rs12583288 rs7332115
rs7698623 rs12579680
rs2305707
rs1315134
rs4130328
rs12931996
rs10514431 rs9304456
rs12970134
rs565659rs1053049
rs7404645
rs2495699
rs9848261
rs4648011 rs7027110
rs756202
rs10505784
rs2839670
rs1532358
rs17117825
rs2273301
rs1998038
rs6792584 rs12446047
rs463235
rs10511904
rs10496166
rs7283906 rs11763147
rs11749320 rs10758892 rs400337
rs2869832
rs10514995
rs2967055
rs342169rs8042680
rs6981943
Fig. 3. The SEN of Glaucoma. The network includes 713 SNPs (vertices) and 789 pairwise interactions(edges). The size of a vertex represents the strength of the main effect of its corresponding SNP, with thedisease association ranging from �0.001% to 1.124%. The width of an edge indicates the strength of itscorresponding interaction, with the disease association ranging from 0.693% to 1.163%.
degree
frequency
1 2 3 4 5 6 7 8
0
50
100
150
200
250
betweenness
frequency
0.0 0.1 0.2 0.3 0.4 0.5
0
100
200
300
400
500
closeness
frequency
0.03 0.04 0.05 0.06 0.07 0.08
0
20
40
60
80
100
120
140
(a) (b) (c)
Fig. 4. The distributions of (a) degree, (b) betweenness, and (c) closeness centralities of vertices in the SENof Glaucoma.
large connected SNP network indicated a complex genetic basis of glaucoma.HPN is an computational model that systematically organizes and clusters various pheno-
types and diseases based on their shared genetic association factors or related basic functionalunits. It also serves a powerful knowledge-based filter for GWAS data pre-screening and pri-oritization. Indeed, when compared to established data-driven filters, including the family ofRelief algorithms, the HPN filtering performed equally well. This was evaluated by quantifyingall the pairwise interactions of filtered sets of SNPs using different data-driven filters (datanot shown).
SEN is able to detect and analyze genetic interactions among a large set of attributes ingenetic association studies. The derived SNP interaction network of glaucoma in this studyhad a significant connected structure and may serve as a map for identifying key geneticfactors associated with the disease with further biological validations. The significant negativemain effect assortativity of such a network indicated that epistatic interactions were likely tohappen between SNPs with negatively correlated main effects. This finding suggests that mostexisting main-effect-based GWAS data filtering algorithms may overlook potential importantinteractions since some SNPs with low main effects would not be pre-selected using such filters.
The set of SNPs with highest ranked centralities were annotated using Biofilter33 to theirassociated genes and, in turn, the genes to pathways and functional groups. We used theannotation to identify other complex diseases possibly sharing biology with glaucoma andsuggest novel hypotheses for undocumented shared genetic interactions.
It is known that glaucoma and type II diabetes (T2D) are associated, but direct geneticrelationship remains unknown. We found SNPs connecting T2D to glaucoma: rs1333040 (CD =
5, CB = 0.043) and rs1053049 (CD = 5). The genes CDKN2B-AS1 and PPARD, which these twoSNPs are respectively located in, are known to be associated with diabetes. Peng et al. observedthat CDKN2A/B gene had a 1.17-fold (95% CI: 1.1-1.23) risk estimate for T2D.34 PPARDhas been found to be associated with higher fasting glucose, glycated hemoglobin (HbA1c),and increased risk of T2D and combined impaired fasting glucose/T2D in a population-basedChinese Han sample.35 SNP rs9540221 (CD = 5) is also intergenic between NFYAP1 and
LGMNP1, of which the former has also been associated with diabetes and obesity. Indeed,population-based studies suggest that persons with T2D have an increased risk of developingopen-angle glaucoma (OAG): the Beaver Dam Eye Study,36 the Framingham Eye Study,37
and the Los Angeles Latino Eye Study.38 Chopra et al. has noted that there is a positiveassociation between T2D and OAG in a sample Latino population study in Los Angeles,USA.38 The Rotterdam Study39 has also previously reported a positive association betweenDM and OAG, with a relative risk of 3.11 (95% CI, 1.12-8.66). Laboratory evidence for T2Dand glaucoma linkages also exists. Some have observed that there is an inner retinal celldeath, determined by the presence of hyaline bodies in the ganglion cell and retinal nerve fiberlayer of the eyes of post-mortem eyes of patients with T2D.40 Possible hypotheses of T2D toglaucoma direct linkage include how altered biochemical pathways and vascular changes inT2D reduce blood flow and impaired oxygen diffusion, thus increasing oxidative stress andendothelial cell injury.41 Excessive glial cell activation may contribute to chronic inflammation,making the retinal ganglion cells more susceptible to glaucomatous damage.42 Connectivetissue remodeling might also promote increased intraocular pressure (IOP) and greater opticnerve head mechanical stress, which can result in glaucomatous optic neuropathy.43 Diabetesis known to cause microvascular damage and may affect the vascular autoregulation of theoptic nerve and retina. Thus a longer duration of T2D would be associated with a higher riskof OAG.
Obesity, like T2D, is a trait often associated with glaucoma, but whose connection remainselusive. We found associations between obesity and glaucoma in SNPs rs12970134 (CD =
5, CC = 0.062), rs11807640 (CD = 8, CB = 0.516, CC = 0.071), rs9540221 (CD = 5), rs10777845(CD = 5), and rs4365558 (CD = 5, CC = 0.063), all of which showed significance in our glaucomastudy. SNP rs12970134 is located upstream of gene MC4R, the defects of which have beenidentified as a cause for autosomal dominant obesity.44 SNP rs11807640 is located downstreamto gene CDK4PS, which has been associated with obesity traits among some postmenuposalwomen.45 Both rs9540221, which is intergenic between NFYAP1 and LGMNP1 and rs4365558,which is intergenic between A4GALT and RPL5P34, are SNPs situated between genes thathave been associated with obesity as well. Finally, rs10777845 is upstream to gene RMST,which has been observed to be associated with severe early-onset obesity.46 In the investigationto determine direct linkage between glaucoma and obesity, several hypotheses exist. Newman-Casey et al. observed that not only was the frequency of OAG was higher among obeseindividuals (3.1%) than among non-obese individuals (2.5%), but that the relationship betweenOAG and obesity had significant correlations with gender.47 Obese women had a 6% increasedhazard of developing OAG when compared to non-obese women, while there was no significanteffect in men. Similar findings were made by Zang and Wynder, in which they observed OAGdiagnosis as twice as likely in women with a body mass index (BMI) above 27.5, but no impactof BMI in men.48 Some hypothesize that increased orbital pressure as a result of excess fattissue may cause a rise in venous pressure and a consequent increase in intraocular pressure(IOP).47
Our study also show links between Alzheimer’s disease (AD) and glaucoma. The SNPrs803422 (CD = 5, CC = 0.060), which we found to affect glaucoma is also associated with
AD. In addition, the two genes SORCS3 and MTHFD1L which contain two of the SNPs withthe strongest association are also known to be associated with AD. Patients with AD havea significantly increased rate of glaucoma occurrence. In a study in four nursing homes inGermany, 112 Alzheimer’s patients were taken as a case group. 29 of those 112 were found tohave Glaucoma, a rate of 25.9% as opposed to a 5.2% rate in the control group.49 However,in spite of significant research on this topic, there have still been no clear clinical or geneticrelationships found between the two diseases.50
Perhaps unsurprisingly, our research has also led to us connecting glaucoma with otherphenotypes relating to the eye, particularly cataracts and myopia. The association betweencataracts and glaucoma can be seen in the similar genes CDK4PS , NFYAP1, and LGMNP1,all of which contain SNPs found to be highly associated with glaucoma. Myopia and glaucomaare both directly related to the SNP rs569688 (CB = 0.415, CC = 0.074). There have been manyrecent studies about the connection between glaucoma and myopia, particularly primary open-angle glaucoma (POAG) and high myopia. Though the connection has still not been found, itis believed the high myopia is another risk factor of POAG, though it is not as important asintraocular pressure. However, it is recommended that those with high myopia be screen forglaucoma more frequently.51
Of the SNPs and genes associated to glaucoma in our study, the majority that are notdirectly eye-related, are heart related. Glaucoma is connected therefore probably connectedto cardiovascular disease, coronary artery disease, and coronary restenosis. SNPs rs17034592(CD = 7, CC = 0.060) and loci LDB2, CDKN2B and TRNAQ46P are all associated withcoronary artery disease. SNPs rs499818 (CD = 5, CC = 0.052), rs1333040 (CD = 5, CC = 0.052),and loci A4GALT, RPL5P34, and CST7 are all associated with myocardial infarction, andA4GALT and RPL5P34 are associated with cardiovascular disease. Studies have shown thatcardiovascular disease may play an important role in the development and progression ofglaucoma. However, it is also hypothesized that this relationship may be indirect, due tothe variable of age.52 Additional studies about the relationship between intraocular pressure,the main risk factor for glaucoma, and cardiovascular disease exist. However, any geneticrelationship between these two diseases is yet to be found.
An undocumented interaction found in this study is the link between glaucoma and col-orectal carcinoma. Relevant SNPs that were significant were rs972869 (CD = 7, CC = 0.052),located near the EPHB1 gene, and rs300493 (CD = 6, CB = 0.311, CC = 0.066), located inthe gene NAV3, both of which are linked to colorectal carcinoma. Studies have shown thatobesity and T2D are associated risks of colorectal cancer, which could lead to the hypothesisthat colorectal carcinoma and glaucoma may be co-morbidities and are affected by similarpathways.
In summary, the successful application of our methodology on GLAUGEN GWAS dataproves the effectiveness of our bioinformatics framework, and suggests its great potential indetecting and characterizing gene-gene interactions in GWAS data. In future studies, we expectto extend the interaction analysis beyond the order of pairs. The increased computationaldemand would, however, require a stricter data pre-filtering. The more stringent SNP filteringcan be achieved using a HPN based on SNP overlap to restrict the interactions to the most
significant ones. Other avenues of research would include using eye-specific tissue genotypedata to further refine the analysis and the drug response of the nodes of interest.
Acknowledgements
This work was supported by the National Institute of Health (NIH) grants R01-EY022300,R01-LM010098, R01-LM009012, R01-AI59694, P20-GM103506, P20-GM103534.
References
1. R. Sachidanandam, D. Weissman, S. C. Schmidt, J. M. Kakol, L. D. Stein and et al., Nature409, 928 (2001).
2. W. Y. S. Wang, B. J. Barratt, D. G. Clayton and J. A. Todd, Nature Review Genetics 6, 109(2005).
3. H. J. Cordell, Human Molecular Genetics 11, 2463 (2002).4. J. H. Moore, Nature Genetics 37, 13 (2005).5. N. A. Davis, J. E. Crowe Jr., N. M. Pajewski and B. A. McKinney, Genes and Immunity 11,
630 (2010).6. T. Hu and J. H. Moore, Network modeling of statistical epistasis, in Biological Knowledge Dis-
covery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, eds. M. Elloumiand A. Y. Zomaya (Wiley, 2013) pp. 175–190.
7. A. Pandey, N. A. Davis, B. C. White, N. M. Pajewski, J. Savitz, W. C. Drevets and B. A.McKinney, Translational Psychiatry 2, p. e154 (2012).
8. J. L. Wiggs, M. A. Hauser, W. Abdrabou, R. R. Allingham, D. L. Budenz, E. Delbono, D. S.Friedman, J. H. Kang, D. Gaasterland, T. Gaasterland, R. K. Lee, P. R. Lichter, S. Loomis,Y. Liu, C. McCarty, F. A. Medeiros, S. E. Moroi, L. M. Olson, A. Realini, J. E. Richards, F. W.Rozsa, J. S. Schuman, K. Singh, J. D. Stein, D. Vollrath, R. N. Weinreb, G. Wollstein, B. L.Yaspan, S. Yoneyama, D. Zack, K. Zhang, M. Pericak-Vance, L. R. Pasquale and J. L. Haines,J Glaucoma 22, 517 (Sep 2013).
9. M. Takamoto and M. Araie, Ophthalmology Journal 58, 1 (2014).10. J. H. Fingert, Eye (Lond) 25, 587 (May 2011).11. K.-I. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal and A.-L. Barabasi, Proceedings of the
National Academy of Sciences 104, 8685 (2007).12. C. Darabos, K. Desai, R. Cowper-Sallari, M. Giacobini, B. Graham, M. Lupien and J. Moore,
Lecture Notes in Computer Science 7833, 23 (2013).13. C. Darabos, M. J. White, B. E. Graham, D. N. Leung, S. Williams and J. H. Moore, BioData
Mining 7, p. 1 (Jan 2014).14. H. Li, Y. Lee, J. L. Chen, E. Rebman, J. Li and Y. A. Lussier, Journal of the American Medical
Informatics Association : JAMIA 19, 295 (January 2012).15. D. Welter, J. MacArthur, J. Morales, T. Burdett, P. Hall, H. Junkins, A. Klemm, P. Flicek,
T. Manolio, L. Hindorff and H. Parkinson, Nucleic Acids Res 42, D1001 (Jan 2014).16. T. Hu, N. A. Sinnott-Armstrong, J. W. Kiralis, A. S. Andrew, M. R. Karagas and J. H. Moore,
BMC Bioinformatics 12, p. 364 (2011).17. T. Hu, Y. Chen, J. W. Kiralis and J. H. Moore, Genetic Epidemiology 37, 283 (2013).18. T. M. Cover and J. A. Thomas, Elements of Information Theory: Second Edition (Wiley, 2006).19. A. Jakulin and I. Bratko, Analyzing attribute dependencies, in Proceedings of the 7th European
Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003), ,Lecture Notes in Artificial Intelligence Vol. 2838 (Springer-Verlag, 2003).
20. J. H. Moore, J. C. Gilbert, C.-T. Tsai, F.-T. Chiang, T. Holden, N. Barney and B. C. White,Journal of Theoretical Biology 241, 252 (2006).
21. D. Anastassiou, Molecular Systems Biology 3, p. 83 (2007).22. J. H. Moore, N. Barney, C.-T. Tsai, F.-T. Chiang, J. Gui and B. C. White, Human Heredity 63,
120 (2007).23. B. A. McKinney, J. E. Crowe, J. Guo and D. Tian, PLoS Genetics 5, p. e1000432 (2009).24. M. E. J. Newman, Networks: An Introduction (Oxford University Press, 2010).25. M. E. J. Newman, Physical Review Letters 89, p. 208701 (2002).26. L. C. Freeman, Sociometry 40, 35 (1977).27. A. Bavelas, Journal of the Acoustical Society of America 22, 725 (1950).28. G. Sabidussi, Psychometrika 31, 581 (1966).29. J. H. Moore and B. C. White, Lecture Notes in Computer Science 4447, 166 (2007).30. C. S. Greene, N. M. Penrod, J. Kiralis and J. H. Moore, BioData Min 2, p. 5 (2009).31. X. Sun, Q. Lu, S. Mukheerjee, P. K. Crane, R. Elston and M. D. Ritchie, Front Genet 5, p. 106
(2014).32. C. International HapMap, Nature 437, 1299 (Oct 2005).33. S. A. Pendergrass, A. Frase, J. Wallace, D. Wolfe, N. Katiyar, C. Moore and M. D. Ritchie,
BioData Min 6, p. 25 (2013).34. F. Peng, D. Hu, C. Gu, X. Li, Y. Li, N. Jia, S. Chu, J. Lin and W. Niu, Gene 531, 435 (Dec
2013).35. L. Lu, Y. Wu, Q. Qi, C. Liu, W. Gan, J. Zhu, H. Li and X. Lin, PLoS One 7, p. e34895 (2012).36. B. E. Klein, R. Klein and S. C. Jensen, Ophthalmology 101, 1173 (Jul 1994).37. P. Mitchell, W. Smith, T. Chey and P. R. Healey, Ophthalmology 104, 712 (Apr 1997).38. V. Chopra, R. Varma, B. A. Francis, J. Wu, M. Torres and S. P. Azen, Ophthalmology 115, 227
(Feb 2008).39. S. de Voogd, M. K. Ikram, R. C. W. Wolfs, N. M. Jansonius, J. C. M. Witteman, A. Hofman
and P. T. V. M. de Jong, Ophthalmology 113, 1827 (Oct 2006).40. J. R. Wolter, Am J Ophthalmol 51, 1123 (May 1961).41. V. H. Y. Wong, B. V. Bui and A. J. Vingrys, Clin Exp Optom 94, 4 (Jan 2011).42. M. Nakamura, A. Kanamori and A. Negi, Ophthalmologica 219, 1 (Jan-Feb 2005).43. V. C. Lima, T. S. Prata, C. G. V. De Moraes, J. Kim, W. Seiple, R. B. Rosen, J. M. Liebmann
and R. Ritch, Br J Ophthalmol 94, 64 (Jan 2010).44. K. M. Ling Tan, S. Q. D. Ooi, S. G. Ong, C. S. Kwan, R. M. E. Chan, L. K. Seng Poh, J. Mendoza,
C. K. Heng, K. Y. Loke and Y. S. Lee, J Clin Endocrinol Metab 99, E931 (May 2014).45. P. Coveney and R. Highfield, Frontiers of Complexity: The Search for Order in a Chaotic World
(Faber and Faber, London, 1995).46. E. Wheeler, N. Huang, E. G. Bochukova, J. M. Keogh, S. Lindsay, S. Garg, E. Henning, H. Black-
burn, R. J. F. Loos, N. J. Wareham, S. O’Rahilly, M. E. Hurles, I. Barroso and I. S. Farooqi,Nat Genet 45, 513 (May 2013).
47. P. A. Newman-Casey, N. Talwar, B. Nan, D. C. Musch and J. D. Stein, Ophthalmology 118,1318 (Jul 2011).
48. E. A. Zang and E. L. Wynder, Nutr Cancer 21, 247 (1994).49. A. U. Bayer, F. Ferrari and C. Erb, Eur Neurol 47, 165 (2002).50. P. Wostyn, K. Audenaert and P. P. De Deyn, Br J Ophthalmol 93, 1557 (Dec 2009).51. S.-J. Chen, P. Lu, W.-F. Zhang and J.-H. Lu, Int J Ophthalmol 5, 750 (2012).52. S. S. Hayreh, Survey of Ophthalmology 43, Supplement 1, S27 (1999).