GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the...

111
GENOMIC CONVERGENCE ASSOCIATION STUDIES OF EXPRESSION AND AGING IN THE HUMAN KIDNEY A DISSERTATION SUBMITTED TO THE DEPARTMENT OF GENETICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Heather Elizabeth Wheeler March 2010

Transcript of GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the...

Page 1: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

GENOMIC CONVERGENCE ASSOCIATION STUDIES OF

EXPRESSION AND AGING IN THE HUMAN KIDNEY

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF GENETICS

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Heather Elizabeth Wheeler

March 2010

Page 2: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/ss495bw5478

© 2010 by Heather Elizabeth Wheeler. All Rights Reserved.

Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

ii

Page 3: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Stuart Kim, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Gregory Barsh

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Anne Brunet

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Hua Tang

Approved for the Stanford University Committee on Graduate Studies.

Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

iii

Page 4: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

iv  

Abstract

Although family studies have shown that genes play a role in longevity and

tissue aging, it has proven difficult to identify the specific genetic variants involved.

Kidneys age at different rates, such that some people show little or no effects of aging

whereas others show rapid functional decline. We developed a sequential

transcriptional profiling and expression quantitative trait loci (eQTL) mapping

approach known as genomic convergence to find genes associated with aging in the

kidney. We first performed whole-genome transcriptional profiling to find 630 genes

that change expression with age in the kidney. Next, we used two methods to

determine which of these age-regulated genes are eQTLs, which means they contain

SNPs whose alleles associate with expression level. We found that 101 of the age-

regulated genes are eQTLs. We also found that the allele-specific eQTL detection

method, which compares the mRNA levels of the two alleles within heterozygous

individuals, was more sensitive than the total expression method in detecting allelic

expression differences. We tested the eQTLs for association with kidney aging,

measured by glomerular filtration rate (GFR) using combined data from the Baltimore

Longitudinal Study of Aging (BLSA) and the InCHIANTI study. We found a SNP

association (rs1711437 in MMP20) with kidney aging (uncorrected p = 3.6 x 10-5,

empirical p = 0.01) that explains 1-2% of the variance in GFR among individuals.

The results of this sequential analysis may provide the first evidence for a gene

association with kidney aging in humans. Our approach of combining both expression

and genotype data can be applied to any phenotype of interest to increase the power to

find genetic associations.

Page 5: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

v  

Acknowledgements

I am thankful for the opportunity to pursue my Ph.D. in genetics at Stanford

University. I am grateful to my advisor, Dr. Stuart Kim, for allowing me to join his

lab and study the genetics of human aging. His encouragement and guidance helped

me complete this dissertation. I appreciate all of his work securing collaborations in

order for me to gather the data I needed for this project. Whether discussing my

project, the latest discovery in the aging field or whether the Twins and/or Red Sox

were going to make the playoffs, my talks with Stuart were always entertaining, and

often educational.

This dissertation would not have been possible without the help of multiple

collaborators. Thanks to the donors for providing kidney tissue. John Higgins of the

Stanford Pathology Department and Janet Bueno of the Stanford Tissue Bank helped

me obtain kidney samples. Rick Myers, Devin Absher, Jun Li, Shannon Brady and

Amita Aggarwal from the Stanford Human Genome Center assisted with Illumina

genotyping assays. Ron Davis and Julie Wilhelmy from the Stanford Genome

Technology Center provided technical assistance for the Affymetrix expression

microarrays. Luigi Ferrucci, Jeff Metter and Toshiko Tanaka from the National

Institute on Aging provided genotype and phenotype data from the Baltimore

Longitudinal Study of Aging and the InCHIANTI Study. I also thank the members of

my thesis committee for their invaluable guidance and support: Greg Barsh, Anne

Brunet and Hua Tang. Thank you to John Boothroyd for serving as the chair of my

thesis committee.

Page 6: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

vi  

I would like to thank the members of the Kim Lab who have worked alongside

me. I have learned much from such a diverse group of people who work on such

diverse projects. First, I thank Jacob Zahn, Graham Rodwell, Rebecca Sonu and

Emily Crane, who performed the initial transcriptional studies from which my

dissertation stems. I also thank two undergraduate researchers who assisted me with

my project, Kaeley Anderson and Mandy Kovach. In addition, I thank Jesse

Karmazin, who will be extending my project. I thank the lab’s “computational

people” for useful programming and statistical tips: Lucy Southworth, Sarah

Kummerfeld and Sarah Aerni. I am grateful to Flo Pauli and Yelena Budovskaya for

making life inside and outside of the lab more fun. And I thank the rest of the “worm

people”: Kendall Wu, Min Jiang, Xiao Liu, Adolfo Sanchez-Blanco, Xiao Xu, Eric

Van Nostrand, Dror Sagi, Cindie Slightham, and Biff Mann. Thanks to you, I feel like

I have earned a second Ph.D. in C. elegans aging. Someday, I will look back fondly

on our marathon practice talks and smile about irregular balls and sheep clouds.

I must thank the following graduate school friends for their support both in lab

and life matters: Tovi Anderson, Jason Hoyt, Alayne Brunner, Rayka Yokoo, Yuya

Kobayashi, Max Banko, Alyssa Wright, Matt Hill and Dasha Glazer. I also would like

to thank my fellow members of the departmental softball team, the Lethal Yellows, for

some good times spent outside the lab over the years. I feel fortunate to have been a

member of the Genetics Department at Stanford, which provided me a friendly and

supportive environment to grow as a scientist.

I am also grateful for several teaching experiences I received while at Stanford.

I designed and taught two semesters of genetics at the University of California

Page 7: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

vii  

Berkeley Extension. Through this endeavor I gained valuable experience teaching a

class on my own. I thank Jim Ford for the opportunity to be a TA in his Human

Genetics class for medical students at Stanford. I also thank Katherine Moser and

Stuart Kim for allowing me to help design and implement four laboratory projects in

the AP Biology class at Gunn High School in Palo Alto.

Without my family, I would not have been able to accomplish so much. To my

parents, Terry and Tracy Wheeler, thank you for your love and support. To my sister,

Jamie Wheeler, thank you for your love and friendship. And to Marty Gabel, thank

you for being my loving partner and continuing this journey with me.

Page 8: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

viii  

Table of Contents Abstract ........................................................................................................................ iv

Acknowledgements .......................................................................................................v List of Tables.................................................................................................................x

List of Figures ..............................................................................................................xi

Chapter 1: Introduction..............................................................................................1 Genetics of Human Longevity .................................................................................2 Genetics of Kidney Aging ........................................................................................6 Genomic Convergence..............................................................................................9 Genome-wide Transcriptional Profile of Kidney Aging .....................................10 Significance and Dissertation Content..................................................................13

Chapter 2: Identification of eQTLs by Total Expression Analysis.......................20

Background .............................................................................................................21 Results......................................................................................................................22

Ancestry Analysis.................................................................................................22 Total Expression QTL Analysis ...........................................................................23

Discussion ................................................................................................................25 Methods ...................................................................................................................26

Stanford Kidney Samples .....................................................................................26 RNA and DNA Preparation..................................................................................27 SNP Selection.......................................................................................................27 Genotyping ...........................................................................................................28 Total Expression Quantification...........................................................................29 Ancestry Analysis.................................................................................................29 Total Expression Regression Models ...................................................................32

Chapter 3: Identification of eQTLs by Allele-Specific Expression Analysis .......40 Background .............................................................................................................41 Results......................................................................................................................42 Discussion ................................................................................................................44 Methods ...................................................................................................................46

Stanford Kidney Samples .....................................................................................46 RNA and DNA Preparation..................................................................................46 SNP Selection.......................................................................................................47 Genotyping ...........................................................................................................47 Allele-Specific Expression Quantification ...........................................................48

Page 9: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

ix  

Chapter 4: Genetic Association with Kidney Aging...............................................61 Background .............................................................................................................62 Results......................................................................................................................63 Discussion ................................................................................................................65 Methods ...................................................................................................................67

BLSA Samples .....................................................................................................67 InCHIANTI Samples............................................................................................68 Glomerular Filtration Rate Regression Models....................................................69 Testing for Evidence of SNP Association with GFR in Both Datasets................71 Permutation Analysis............................................................................................71

Chapter 5: Conclusions.............................................................................................77

Summary and Discussion of Findings...................................................................78 Future Directions for Human Aging Genomics...................................................82

References ...................................................................................................................87

Page 10: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

x  

List of Tables

Chapter 2

Table 2.1: SNPs that associate with total gene expression level ..........................34

Table 2.2: Common total expression associations across studies ........................36

Table 2.3: The probability of the genotype data at K=1-7 ...................................37

Table 2.4: Mean variance of the percent ancestry in each cluster........................39

Chapter 3

Table 3.1: eQTLs identified by allele-specific expression analysis .....................51

Table 3.2: Common allele-specific expression across studies .............................60 Chapter 4

Table 4.1: Characteristics of kidney aging study samples ...................................73

Table 4.2: Top SNPs that show association with kidney aging ...........................74

Page 11: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

xi  

List of Figures

Chapter 1

Figure 1.1: The genomic convergence approach..................................................16

Figure 1.2: A transcriptional profile of kidney aging ..........................................17

Figure 1.3: Chronicity index examples ................................................................18

Figure 1.4: Common signature for aging .............................................................19 Chapter 2

Figure 2.1: Total expression analysis ...................................................................35

Figure 2.2: Estimated genetic ancestry.................................................................38 Chapter 3

Figure 3.1: Assay for allele-specific expression...................................................50

Figure 3.2: Distribution of allele-specific expression ..........................................55

Figure 3.3: Distribution of mean allelic fold changes ..........................................56

Figure 3.4: Allele-specific expression analysis ....................................................57

Figure 3.5: Allele-specific expression eQTL characteristics ...............................58

Figure 3.6: Comparison of eQTL methods at one locus ......................................59 Chapter 4

Figure 4.1: A SNP in MMP20 associates with a kidney aging phenotype ..........75

Figure 4.2: Linkage disequilibrium pattern of MMP20........................................76

Page 12: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

1  

Chapter 1: Introduction

Portions of this chapter will be submitted for publication in

Philosophical Transactions of the Royal Society B: Biological Sciences (2010) with the following authors:

Heather E. Wheeler1 and Stuart K. Kim1,2

1Department of Genetics, Stanford University Medical Center, Stanford, CA, USA, 2Department of Developmental Biology, Stanford University Medical Center,

Stanford, CA, USA

Page 13: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

2

Genetics of Human Longevity

Aging trajectories vary among individuals. Both the age at which

physiological function begins to decline and the rate of such decline varies among

individuals. The heritability of human longevity ranges from 0.23-0.33, but little is

known about specific genes that affect the rate of aging or human lifespan (Herskind

et al., 1996; McGue et al., 1993; Mitchell et al., 2001). Because most organisms do

not escape predation and infection to reach old age in the wild, aging itself is not under

strong natural selection (Kirkwood, 1997). Instead, aging is probably an unregulated

side effect caused by the failure of natural selection to maintain function at the later

ages that few individuals reach in the wild (Partridge and Gems, 2002). If mutations

arise that cause deleterious effects late in life, whether they are neutral or beneficial

early in life, there is little or no selection to eliminate them from the population

(Hamilton, 1966; Kirkwood, 1997; Partridge, 2010; Williams, 1957). Because such

mutations will accumulate over time, aging is likely a highly polygenic trait and the

mechanisms involved may not be conserved among species.

Despite these evolutionary predictions, studies in model organisms have

revealed that mutations in single genes can extend lifespan. For example, mutations in

insulin or insulin-like signaling pathway genes have been shown to extend lifespan in

Caenorhabditis elegans (Kenyon et al., 1993), Drosophila melanogaster (Clancy et

al., 2001; Tatar et al., 2001) and mice (Bluher et al., 2003; Holzenberger et al., 2003).

Also, overexpression of the SIR2 deacetylase extends lifespan in Saccharomyces

cerevisiae (Kaeberlein et al., 1999), C. elegans (Tissenbaum and Guarente, 2001), and

Page 14: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

3

Drosophila (Rogina and Helfand, 2004). These examples demonstrate that

evolutionary conservation is present in some molecular pathways of aging. Therefore,

aging genes found across model organisms are good candidates to test for association

with lifespan in humans with the caveat that any pathways unique to human aging will

be missed.

The average lifespan in the US is 77 years and only one in 10,000 individuals

survive to age 100 (Atzmon et al., 2006). Thus, enrichment in the frequency of certain

alleles in centenarians probably reflects a selection effect that increases the likelihood

of survival. The genetic influence of achieving extreme old age may be even greater

than the heritability of average lifespan. Siblings of centenarians have an 8- to 17-fold

greater relative risk of surviving to age 100 (Perls et al., 2002). Most of the work

comparing the genotypes of long-lived individuals to average-aged individuals has

taken place in candidate genes. For example, the ε4 allele of apolipoprotein E

(APOE), which is well known to increase risk of Alzheimer’s disease and

cardiovascular disease, is found in significantly lower proportions of nonagenarians

and centenarians (Corder et al., 1993; Kervinen et al., 1994; Schachter et al., 1994).

Consequently, the ε2 allele is enriched in long-lived individuals and may offer a

protective effect (Kervinen et al., 1994; Schachter et al., 1994). A study of 35

additional genes related to cardiovascular disease found that an allele in apolipoprotein

C3 (APOC3) is enriched in centenarians and their offspring compared to average-aged

controls in an Ashkenazi Jewish population (Atzmon et al., 2006).

In addition to testing genes known to be associated with age-related diseases

for association with longevity, genes known to promote longevity in model organisms

Page 15: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

4

have also been examined in human populations. Mutations in insulin or insulin-like

signaling pathway genes have been shown to extend lifespan from C. elegans (Kenyon

et al., 1993) to mice (Bluher et al., 2003; Holzenberger et al., 2003). An

overrepresentation of rare functionally significant insulin-like growth factor I receptor

(IGF1R) mutations have been observed in centenarians (Suh et al., 2008). More

convincingly, the forkhead box O3A (FOXO3A) transcription factor, which is

regulated by the insulin/IGF1 signaling pathway, contains alleles associated with

longevity in both Asian and European populations (Flachsbart et al., 2009; Li et al.,

2009; Pawlikowska et al., 2009; Willcox et al., 2008). In a male population of

Japanese descent, the odds ratio for homozygous minor vs. homozygous major alleles

for SNP rs2802292 in FOXO3A between the long-lived individuals (≥95 years) and

controls was 2.75 (Willcox et al., 2008). In two replication studies including both

sexes, the odds ratios of the corresponding alleles were 1.26 for a German population

and 1.36 for a Chinese population (Flachsbart et al., 2009; Li et al., 2009). These

enriched alleles may promote better health and contribute toward extended lifespan.

Other aging genes found to affect model organism longevity have been

examined in humans. SIRT3 is a human homolog of SIR2, the deacetylase that when

overexpressed, extends lifespan in yeast, worms and files (Kaeberlein et al., 1999;

Rogina and Helfand, 2004; Tissenbaum and Guarente, 2001). An intronic VNTR

allele in SIRT3 was significantly depleted in a population of male Italian

nonagenarians and centenarians (Bellizzi et al., 2005). Mutant mice for the klotho

gene exhibit a syndrome resembling human aging, including atherosclerosis,

osteoporosis, emphysema, and infertility (Kuro-o et al., 1997). In humans, a

Page 16: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

5

heterozygous advantage for longevity was observed at the KLOTHO locus in

Ashkenazi Jews (Arking et al., 2005). Unfortunately, attempts to replicate the SIRT3,

KLOTHO and APOC3 findings in additional long-lived populations have been

unsuccessful (Lescai et al., 2009; Novelli et al., 2008). The only two genes associated

with human longevity that have been replicated in multiple populations are APOE and

FOXO3A (Corder et al., 1993; Flachsbart et al., 2009; Kervinen et al., 1994; Li et al.,

2009; Pawlikowska et al., 2009; Schachter et al., 1994; Willcox et al., 2008). The

effect sizes of these two genes are small (odds ratios ranging from 1.26-1.45 in

replicate studies) and thus much of the heritability of longevity remains to be

explained (Flachsbart et al., 2009; Li et al., 2009; Nebel et al., 2005).

A portion of the mechanisms that contribute to longevity in humans over an

~80 year lifespan may be very different from those that affect longevity in mice that

have a 2-3 year lifespan. Unlike most other biological processes, the genetic factors

that influence aging may not be evolutionarily conserved because wild animals usually

die from predation and infection, not aging (Kirkwood and Austad, 2000). Since

animals do not usually live long enough to grow old in the wild, mutations that cause

damage in old age would not be selected against and would thus accumulate over time

(Harman, 1956). Since processes that occur late in life are under little or no natural

selection, some mutations that affect physiology late in life may be species-specific

and there is little reason to expect that all mutations that cause aging should be

conserved from rodents to humans. Thus, a troubling aspect about aging is that animal

models for aging may have limited relevance to human aging. Therefore, an unbiased

approach to search for genes that specify human aging is also necessary.

Page 17: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

6

A genome-wide linkage scan of human longevity using long-lived siblings

identified a locus on chromosome 4 (Puca et al., 2001). Fine mapping of this 12 Mb

locus revealed an association between microsomal transfer protein (MTP) and human

lifespan (Geesaman et al., 2003). However, this finding could not be replicated in

additional populations, which highlighted the population structure problems that can

arise when the case-control design is used as a means to map longevity genes in

humans (Nebel et al., 2005). Genome-wide association studies usually test hundreds of

thousands to millions of single nucleotide polymorphisms (SNPs) across the genome

for association with a particular trait (Kottgen et al., 2009; Weedon et al., 2008;

WTCCC, 2007). Currently, no genome-wide association studies comparing

centenarians or other long-lived individuals to average-aged individuals have been

published. This is likely because large enough datasets of centenarians and well-

matched controls have not been collected to detect associations that are probably

either of small effect size or rare alleles. Another important concern is that while

case-control studies of centenarians may find global contributors to aging, they may

miss specific contributors to aging of a particular organ or tissue.

Genetics of Kidney Aging

We chose to identify genes that associate with a focused phenotype of aging

rather than the nonspecific phenotype of living to age 100. Important human aging

molecular pathways may be more easily found by examining physiological aging in

particular organs or tissues. Because tissues age at different rates and because the

presence of disease varies immensely among individuals, humans become increasingly

Page 18: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

7

different from each other with age. Thus, chronological age fails to provide an

accurate indicator of the aging process. More important than simply reaching age 100

is the health of the individual. Physiological age serves as an indicator of an

individual’s general health status in a particular organ or tissue and may also serve as

an indicator of remaining healthy lifespan (Borkan and Norris, 1980). Measuring how

a tissue changes with chronological age in a population can identify biomarkers that

can be used as an index of physiological age (Karasik et al., 2005). The biomarker can

then be used to determine if an individual is physiologically younger or older than his

or her chronological age. Determining genes and pathways that associate with the

measures of physiological age can reveal molecular processes important for the aging

of particular tissues. Studies in different tissues can be compared to find specific and

common regulators of aging. Only common regulators are likely to be revealed by

centenarian studies.

Specifically, we examined aging in the kidney, an organ that shows an

objectively quantifiable decline in function with age. Overall, the kidney gets smaller,

particularly in the cortex, and kidney function begins to decline after age 40-50

(Gourtsoyiannis et al., 1990; Lindeman and Goldman, 1986). Old kidneys show an

involution and thinning of the renal cortical cells, increased renal vascular resistance,

reduced renal plasma flow and increased filtration fraction (Fliser et al., 1993;

Lindeman and Goldman, 1986). Furthermore, there is relatively little cell turnover

compared to other organs, such as the bone marrow that continuously generates blood

cells, so that kidney aging reflects post-mitotic tissue changes rather than changes in

cell proliferation capacity (Lindeman et al., 1985; Silva, 2005a, b). These age-related

Page 19: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

8

changes in the kidney can be assessed and quantified with relative ease, which makes

the study of kidney aging particularly tractable for quantitative genetic analysis.

The four main compartments of the kidney (arteries, tubules, interstitium and

glomeruli) each show changes in morphology and function with age. Changes that

occur in one part of the kidney affect other parts of the kidney in a complex and

dynamic fashion. The renal arteries become rigid with age due to fibrous thickening of

the arterial interior. Aging causes structural changes in the tubules and interstitium.

The tubules begin to atrophy with age, and decrease in length and number. Aging

results in an increase in interstitial volume and interstitial fibrosis. These changes in

tubules and interstitium decrease the ability of the kidney to conserve or excrete NaCl

(Epstein and Hollenberg, 1976; Schmidt et al., 2001), excrete ammonium acid load

(Adler et al., 1968) and maintain other electrolyte balances (Faubert, 1998).

The glomeruli are ball-shaped structures in the kidney composed of capillary

blood vessels actively involved in the filtration of the blood to form urine. An

increasing fraction of glomeruli show global sclerosis with age (Kaplan et al., 1975;

Kasiske, 1987; Kincaid-Smith, 1991; Li et al., 2002; Marcantoni et al., 2002;

Neugarten et al., 2002). The remaining glomeruli show compensatory enlargement of

their capillary tufts (Goyal, 1982; Newbold et al., 1992). The rate at which blood is

filtered through all of the glomeruli, and thus the measure of the overall renal function,

is the glomerular filtration rate (GFR). The major aging phenotype in the kidney is a

25% decline in GFR starting at age 40 (Hoang et al., 2003; Lindeman et al., 1985;

Lindeman et al., 1984; Rowe et al., 1976a; Rowe et al., 1976b).

Page 20: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

9

Individuals show variable rates of kidney aging. In one longitudinal study, one

third of individuals showed no decrease in GFR measured over a 20 year period,

whereas the remainder of the population showed a distinct decline (Lindeman et al.,

1985). For those individuals who showed a significant decline in GFR, the slope of

the decrease varied widely (Lindeman et al., 1985). The heritability of GFR is

estimated to be 0.40-0.46 (Fox et al., 2004; Hunt et al., 2004). One possible method to

search for genes that associate with kidney aging is a genome-wide association study

of GFR. This approach is unbiased, but when hundreds of thousands of SNPs are

tested, the multiple hypothesis testing penalty is high. We chose a more focused

approach that combines multiple types of genomic information, including gene

expression data, in order to limit our GFR association test to SNPs more likely to be

functional. This allowed us to perform our analysis of kidney aging using a smaller

sample size than is required for a genome-wide association study. Our approach is

called genomic convergence.

Genomic Convergence

In genome-wide association studies the penalty for multiple hypothesis testing

is a large obstacle to overcome. A powerful alternative to genome-wide association

studies is genomic convergence, which selects candidate genes for a specific

phenotype based on genome-wide expression studies (Hauser et al., 2003; Le-

Niculescu et al., 2007; Liang et al., 2009; Mudge et al., 2008; Noureddine et al., 2005;

Oliveira et al., 2005). Differential expression between cases and controls may indicate

that the gene is functionally involved in disease pathogenesis. Gene expression

Page 21: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

10

microarrays can be used to identify expression increases or decreases in affected

individuals compared to controls, and then SNPs within the genes that change

expression can be used as candidates in genetic association studies. This approach

scans the entire genome for expression changes associated with a disease in order to

prioritize genes with a greater chance of contributing to the disease phenotype.

Genomic convergence was first used to identify genes associated with Parkinson’s

disease, schizophrenia, and Alzheimer’s disease (Hauser et al., 2003; Le-Niculescu et

al., 2007; Liang et al., 2009; Mudge et al., 2008; Noureddine et al., 2005; Oliveira et

al., 2005).

We have extended the genomic convergence approach to find genes associated

with kidney aging by adding expression quantitative trait loci (eQTL) mapping after

the initial genome-wide transcriptional analysis. The eQTL analyses tested SNPs for

association with gene expression level. If a gene is functionally involved in kidney

aging and if DNA differences in the gene cause variation in expression among

individuals, then there may be an association between the specific allele carried by an

individual and that individual’s physiological aging trajectory. Finally, we tested the

set of eQTLs for association with kidney aging in two studies of normal aging, the

Baltimore Longitudinal Study of Aging and the InCHIANTI study. A schematic of

our genomic convergence method is shown in Figure 1.1.

Genome-wide Transcriptional Profile of Kidney Aging

Our genomic convergence study to find genes associated with kidney aging

began with genome-wide transcriptional profiling of normal kidney tissue from 74

Page 22: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

11

individuals, aged 27-92 years (Rodwell et al., 2004). This study stands out because it

involves one of the largest numbers of samples of any study of human aging

performed to date, and thus has very high resolution and sensitivity. Gene expression

levels were measured in both kidney cortex and medulla samples using Affymetrix

microarrays. A linear regression algorithm was used to determine whether gene

expression increases or decreases with age (i.e. has a positive or negative slope;

p<0.001) and 447 age-regulated genes in the kidney were identified (Rodwell et al.,

2004; Figure 1.2).

In addition to marking chronological age, these 447 genes were also shown to

mark physiological age. Some people age slowly and retain kidney function into their

70s whereas others age rapidly and show a marked decline in renal function. To

measure the relative physiology of the kidney, a histological score called the

chronicity index was developed (Rodwell et al., 2004). Three scores were given to

each kidney section corresponding to the appearance of the glomeruli, the tubules, and

the arteries. Scores ranged from zero for normal appearance for youthful patients to

four for an advanced state of glomerular sclerosis, tubular atrophy/interstitial fibrosis,

or arterial intimal fibrosis (Figure 1.3). The glomerular, tubular, and arteriolar scores

were then added together to form the chronicity index ranging from zero (best) to 12

(worst).

For the 74 kidney samples, the physiological state of the organ was compared

to its respective gene expression profile (Figure 1.2). The gene expression profiles

were found to correlate well with chronicity index (Rodwell et al., 2004). Patients

with poor organ function for their age also had expression profiles that looked like

Page 23: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

12

those from people that are much older. For example, the chronicity index of

individual number 81 was high for her age of 78 years old, and the kidney expression

profile was similar to those from patients that were 10 to 20 years older. Conversely,

patients with good organ function for their age tended to have expression profiles

normally associated with younger people. For instance, individual number 95 was 81

years old but had an expression profile similar to other patients that are 30-40 years

younger and had a lower chronicity index. Although the age-regulated genes were

selected solely on the basis of their change with chronological age, these results

indicate that the expression profiles also correlate with physiological aging. Thus,

some of the age-regulated genes may be functioning in the aging process, rather than

simply being markers of aging. The 447 age-regulated genes were used as candidates

in the next step of our genomic convergence approach, eQTL mapping.

In addition to the 447 age-regulated kidney genes, we included an additional

183 genes in our candidate set for eQTL mapping that stem from a study of common

gene regulation of aging across human tissues (Zahn et al., 2006). Expression profiles

that are common to aging in all tissues would reveal core mechanisms that underlie

cellular aging. Gene expression data from aging kidney (Rodwell et al., 2004), aging

muscle (Zahn et al., 2006) and aging brain (Lu et al., 2004) were compared by

analyzing the behavior of entire genetic pathways using an approach called Gene Set

Enrichment Analysis (Subramanian et al., 2005). With this approach, age-regulation

for every gene in a pathway (defined by the Gene Ontology Consortium) is combined

to generate an overall effect on regulation of the entire pathway (Ashburner et al.,

2000). This approach is more sensitive than examining genes one-at-a-time as

Page 24: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

13

significant results can be obtained from the accumulation of small changes in many

genes in a pathway. This systems biology method is especially powerful in studies of

aging because of the polygenic nature of the phenotype. Furthermore, the specific

biological processes associated with each genetic pathway provide insights into

mechanisms of aging. From a total of 624 sets of genes defined by the Gene Ontology

Consortium, extracellular matrix genes and the cytosolic ribosomal genes were found

to increase expression with age in all three human tissues, whereas chloride transport

genes and electron transport genes were found to significantly decrease expression

with age in those same tissues (Zahn et al., 2006; Figure 1.4). These commonly age-

regulated pathways include 152 extracellular matrix genes, 85 ribosomal genes, 35

chloride transport genes and 95 electron transport chain genes (Zahn et al., 2006). We

combined the age-regulated genes with the age-regulated pathways and obtained a set

of 630 genes that were used as candidates in our eQTL analyses.

Significance and Dissertation Content

In this dissertation, I describe our genomic convergence approach to find genes

associated with kidney aging. This is the first time such a sequence of genomic

approaches has been used to study aging in humans. We began with a genome-wide

transcriptional profile of aging in the kidney. Graham Rodwell and Jacob Zahn led

this profiling study before I arrived in the Stuart Kim laboratory. Importantly, not

only did the age-regulated genes mark chronological age, but their gene expression

profiles also correlated with kidney physiology, making the genes excellent candidates

for functioning in the aging process. This led us to our genomic convergence

Page 25: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

14

approach, developed by Stuart Kim and myself. Rather than simply testing

polymorphisms in the age-regulated genes for association with kidney aging, we first

performed eQTL analyses on these candidate genes. We reasoned that if a gene is

functionally involved in kidney aging and if DNA differences in the gene cause

variation in expression among individuals, then there may be an association between

the specific allele carried by an individual and that individual’s physiological aging

trajectory. Portions of this dissertation have been published (Wheeler et al., 2009).

In Chapter 2, I describe our first method of eQTL analysis, the total expression

method. In this method, SNP genotypes are tested for association with total gene

expression level, taken from Affymetrix microarrays. In addition to the microarray

data available from Rodwell et al. (2004), I collected 26 new kidney tissue samples

and prepared the RNA for microarray analysis. I also extracted DNA from 96 kidney

samples and performed the genotyping using Illumina BeadChips. I performed all of

the statistical analyses. The total expression analysis led to the identification of 12

eQTLs in the kidney.

The subject of Chapter 3 is our second method of eQTL analysis, the allele-

specific expression method. Here, we examined heterozygotes for SNPs within a

transcript for differential expression of each allele. The cDNAs of heterozygotes were

examined for allelic transcript levels that differ from each other, using genomic DNA

allelic ratios as controls of 1:1 hybridization intensity. I made cDNA from each of the

96 kidney samples and hybridized both cDNA and genomic DNA to separate Illumina

genotyping BeadChips. I also performed all of the statistical analyses. Using the

allele-specific expression method, we found 93 kidney eQTLs. In this chapter, I also

Page 26: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

15

discuss general trends observed in the data and compare the allele-specific expression

results to the total expression results.

Chapter 4 presents our genetic association study for kidney aging. We tested

the set of 101 eQTLs for association with kidney aging (GFR) in two studies of

normal aging, the Baltimore Longitudinal Study of Aging and the InCHIANTI study.

The SNP genotype and GFR data from these two studies was obtained from our

collaborators at the National Institute on Aging. I performed the statistical analyses

with some assistance from E. Jeffrey Metter and Toshiko Tanaka. Using this

sequential approach of genomic convergence, we were able to find SNPs in the matrix

metalloproteinase gene MMP20 that are significantly associated with kidney aging.

In Chapter 5, I present conclusions and future directions. The results of the

sequential genomic analyses presented in the previous four chapters may provide the

first evidence for a gene association with kidney aging in humans. I explain how our

method of genomic convergence can be applied to any phenotype of interest to

increase the power to find genetic associations. I also describe possible future

methods to find more genes that specify and control human aging.

Page 27: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

16

Figure 1.1 The genomic convergence approach. Beginning with a genome-wide transcriptional profile, genes are filtered in each step for those the most likely to be functional. In the final step, expression quantitative trait loci (eQTLs) are tested for association with glomerular filtration rate (GFR), a phenotype of kidney aging.

Page 28: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

17

Figure 1.2 A transcriptional profile of kidney aging. 447 genes in the kidney were significantly age-regulated (p<0.001). Rows correspond to age-regulated genes, ordered from most highly induced to most highly repressed. Columns correspond to individual patients, ordered from youngest to oldest. The age of certain patients is shown for reference. Left panel refers to data from cortex samples, and right panel depicts data from medulla samples. The first row shows the chronicity index (ChI; morphological appearance and physiological state of the kidney), from blue (healthiest) to yellow (least healthy) as indicated in the scale bar. Scale shows log2 of the expression level (Exp). Figure adapted from (Rodwell et al., 2004).

Page 29: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

18

Figure 1.3 Chronicity index examples. Histology from patient aged 29 yrs is shown on the left, demonstrating a normal glomerulus (G), tubules and interstitial space (T), and artery (A), respectively (chronicity index of zero). Histology from patient aged 84 yrs is shown on the right, demonstrating glomerulosclerosis (g), tubular atrophy and interstitial fibrosis (t), and arterial intimal hyalinosis (a), respectively (chronicity index of ten). Hematoxylin and eosin staining of formalin-fixed, paraffin-embedded sections. Figure from (Rodwell et al., 2004).

Page 30: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

19

Figure 1.4 Common signature for aging. Shown are aging signatures for four genetic pathways. Rows are human tissues. Columns correspond to individual genes in each gene set. Scale represents the slope of the change in log2 expression level with age. Gray indicates genes were not present in the dataset. Figure adapted from (Zahn et al., 2006).

Page 31: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

20

Chapter 2: Identification of eQTLs by Total Expression

Analysis

Portions of this chapter were previously published in PLoS Genetics (2009), 5(10): e1000685 with the following authors:

Heather E. Wheeler1, E. Jeffrey Metter2,3, Toshiko Tanaka2,3, Devin Absher4, John Higgins5, Jacob M. Zahn6, Julie Wilhelmy6, Ronald W. Davis6, Andrew Singleton7,

Richard M. Myers4, Luigi Ferrucci2,3, Stuart K. Kim1,8

1Department of Genetics, Stanford University Medical Center, Stanford, CA, USA, 2Longitudinal Studies Section, Clinical Research Branch, National Institute on Aging,

Baltimore, MD, USA, 3Medstar Research Institute, Baltimore, MD, USA, 4HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA, 5Department of

Pathology, Stanford University Medical Center, Stanford, CA, USA, 6Stanford Genome Technology Center, Palo Alto, CA, USA, 7Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA, 8Department of Developmental

Biology, Stanford University Medical Center, Stanford, CA, USA

Page 32: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

21

Background

As discussed in the previous chapter, the expression profiles of the age-

regulated genes correlated with kidney physiology, making the genes excellent

candidates for functioning in the aging process. For example, a gene that decreases

expression with age may contribute to poor renal function because it is expressed at

levels below a physiological threshold in the elderly. If age-regulated genes are

important for kidney function, then variation in gene expression may correlate with

variation in kidney function. As the second step in our genomic convergence

approach (Figure 1.1), we performed expression quantitative trait (eQTL) mapping of

the age-regulated genes. We focused on finding expression-associated SNPs (eSNPs)

using two methods. The first method, known as total expression analysis, is the

subject of this chapter.

We searched for eQTLs by pooling individuals that have the same genotype

for a particular SNP, and then determining whether the different SNP genotypes are

associated with expression of the corresponding gene. The candidate genes were

chosen from two studies of gene expression changes that occur with age. We obtained

a set of 447 age-regulated genes from a genome-wide transcriptional profile of aging

in the human kidney (Rodwell et al., 2004). In addition, a previous gene set

enrichment analysis identified four genetic pathways that were coordinately age-

regulated in each of three human tissues (kidney, muscle and brain). These pathways

include 152 extracellular matrix genes, 85 ribosomal genes, 35 chloride transport

genes and 95 electron transport chain genes (Zahn et al., 2006). We combined the

Page 33: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

22

age-regulated genes with the age-regulated pathways and obtained a set of 630 genes

that change expression with age.

We selected 1041 SNPs in the promoter regions and 386 SNPs in the coding

and untranslated regions of the 630 age-regulated genes. We first searched for

suitable coding and untranslated region SNPs because they can also be tested for

allele-specific expression (Chapter 3). For genes without suitable mRNA SNPs, we

chose SNPs in their promoter regions (defined as 5kb upstream and downstream of the

transcription start site). Chosen SNPs had a minor allele frequency greater than 0.05

in the HapMap CEU (Utah residents with northern and western European ancestry)

population (Altshuler et al., 2005). A list of SNPs tested for association with

expression is available at http://www.plosgenetics.org/doi/pgen.1000685 (Table S1).

We genotyped these SNPs, corrected for population structure, and then tested the

alleles for association with gene expression in 96 kidney samples.

Results

Ancestry Analysis

We first used a custom Illumina GoldenGate assay to genotype these 1427

SNPs using DNA from 96 frozen kidney tissue samples and 197 formalin-fixed,

paraffin-embedded kidney tissue samples. These normal kidney tissue samples were

obtained from Stanford University Medical Center with informed consent either from

biopsies of kidneys from transplantation donors or from nephrectomy patients with

localized pathology. Because our kidney tissue samples were from individuals living

in the diverse San Francisco Bay Area, we chose to control for population structure by

Page 34: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

23

including a covariate for ancestry in our regression analysis. Most of the individuals

in our total expression association study self reported their ancestry (84/96 frozen

kidney samples). Genetic clustering analysis has been shown to highly correlate with

self-identified ancestry (Tang et al., 2005). To determine the ancestry of the 12

unknown individuals, we used the clustering program STRUCTURE (Pritchard et al.,

2000). We included DNA from the 96 individuals who had frozen kidney tissue as

well as the 197 individuals with formalin-fixed, paraffin-embedded tissue. We used

the genotypes of 839 unlinked SNPs from our 293 samples and from the CEU, YRI,

and JPT+CHB HapMap populations in our analysis (Altshuler et al., 2005). The YRI

population contains individuals from the Yoruba people of Ibadan, Nigeria. The

JPT+CHB population contains individuals from Tokyo, Japan and Beijing, China. We

determined our Stanford samples cluster with the greatest probability into three

populations, each clustering with one of the HapMap populations (See Methods for a

detailed analysis). Because most of the Stanford samples were predominantly of

Caucasian genetic ancestry and because it is simplest to use a Boolean covariate value

in regression analysis when chronological significance of the state (genetic ancestry in

this case) is unknown, we chose to divide the individuals into two groups. In the first

group we included individuals with an average percent CEU ancestry >75%. This

group included 211 individuals. The second group contained the other 82 individuals.

Total Expression QTL Analysis

RNA extraction of sufficient quantity and quality was only possible for our

frozen tissue samples, not the formalin-fixed, paraffin-embedded tissues. Total

expression data for 96 kidney tissue samples was obtained from whole-genome

Page 35: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

24

microarrays of 70 kidneys from Rodwell et al. (2004) and new expression data from

26 kidney samples. The kidney samples were dissected into cortex (94 samples) and

medulla (59 samples). Kidney samples were from normal tissue from patients aged 29

to 92 years. Expression levels of each gene in the genome were determined using

Affymetrix HG-U133A and HG-U133B microarrays. DNA was also extracted from

these 96 samples and 1427 SNPs from our candidate genes were genotyped on custom

Illumina GoldenGate arrays.

We compared the genotypes from our chosen SNPs to their corresponding

gene expression levels using linear regression. Our model corrected for age, sex,

tissue type (cortex or medulla), and ancestry group. The CEU>75% ancestry group

included 72 individuals. The second group contained the other 24 individuals. We

found 16 SNPs in 12 genes associated with total expression level (Linear Regression,

p<0.001, Table 2.1). The percent of variance in gene expression explained by these

SNPs ranged from 6-43% (Table 2.1). Four of the genes have two significant SNPs;

in two cases, the SNPs are in different linkage disequilibrium blocks indicating that

the eSNPs are independent, and in two cases, the SNPs are linked to each other (r2>0.8

HapMap CEU population) and thus represent only one significant association

(Altshuler et al., 2005).

One example of a promoter region SNP that showed strong association with

total expression is rs705704, which is 274 base pairs upstream of the transcription start

site of ribosomal protein S26 (RPS26, p = 1.2 x 10-20, Figure 2.1A). Individuals with

the AA genotype have the highest expression, heterozygotes have medium expression,

and GG homozygotes have the lowest expression of RPS26. RPS26 has been

Page 36: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

25

identified as an eQTL in other studies (Figure 2.1B; Cheung et al., 2005; Dixon et al.,

2007; Myers et al., 2007; Schadt et al., 2008; Webster et al., 2009). The 12 kidney

eQTLs found by our total expression analysis were used as candidate genes in our

kidney aging association study, which is step three in our genomic convergence

approach (Figure 1.1) and the subject of chapter 4. A second type of eQTL analysis,

allele-specific expression analysis, is the subject of chapter 3.

Discussion

We tested 1427 SNPs in 630 age-regulated genes for association with gene

expression in kidney tissue from 96 individuals. Our goal was to find genes that

associate with kidney aging and as an intermediate step we performed this eQTL

analysis in hopes of converging on genes most likely to be functional. Genes that

show allele associations with expression level may also show allele associations with a

biological function, such as glomerular filtration rate, our chosen phenotype of kidney

aging. We found 16 SNPs in 12 genes that associate with total expression level.

Of these 12 genes, five of them have been shown to be cis-acting eQTLs in

other studies (Table 2.2; Myers et al., 2007; Schadt et al., 2008; Stranger et al., 2007;

Veyrieras et al., 2008). Three of our kidney eQTLs (RPS26, COX7A2L, RPS18) were

also eQTLs in the liver (Schadt et al., 2008). Two kidney eQTLs (RPL12, RPS9) were

also found in lymphoblastoid cell lines (Stranger et al., 2007; Veyrieras et al., 2008)

and one (RPS26) in brain cortex (Myers et al., 2007). In the liver and brain studies,

8% and 21% of tested genes were found to be cis-acting eQTLs, respectively (Myers

et al., 2007; Schadt et al., 2008). Here, just 2% of the genes we tested were found to

Page 37: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

26

be cis-acting in the kidney. However, both the liver and brain studies had more

samples, 427 in liver and 193 in brain (Myers et al., 2007; Schadt et al., 2008).

Therefore, these studies had more power to detect expression associations.

Although our power was limited to detect eQTLs using this total expression

analysis, we did find 12 and each allele was able to explain a large proportion of the

variance in gene expression (0.06-0.43, Table 2.1). We were able to compare the

results of this total expression analysis to that of the allele-specific expression analysis

in the coding and untranslated region SNPs (Chapter 3). Also, for the 354 genes that

did not have assayable mRNA SNPs, we were able to test for eQTLs using this

method. SNPs in the 12 kidney eQTLs were tested for association with kidney aging

as the final step in our genomic convergence approach (Chapter 4).

Methods

Ethics Statement

Ethical approval for the study was obtained from the Stanford University

Institutional Review Board (IRB). All subjects provided written informed consent for

the collection of samples and subsequent analysis. This study was conducted

according to the principles expressed in the Declaration of Helsinki.

Stanford Kidney Samples

Normal kidney tissue was obtained from Stanford University Medical Center

with informed consent either from biopsies of kidneys from transplantation donors or

from nephrectomy patients with localized pathology. Kidney tissue from nephrectomy

patients was harvested meticulously with the intention of gathering normal tissue

Page 38: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

27

uninvolved by the tumor. Samples that showed evidence of pathological involvement

or in which there was only tissue in close proximity to the tumor were not used.

Kidney sections were either immediately frozen on dry ice and stored at −80°C until

use or formalin-fixed and paraffin-embedded.

RNA and DNA Preparation

Frozen kidney samples were weighed (25-50 mg), cut into small pieces on dry

ice, and then placed in 1 ml of TRIzol Reagent (Invitrogen, Carlsbad, California,

United States) for RNA extraction or 600 µl of Buffer RLT Plus (Qiagen, Valencia,

California, United States) for DNA extraction. The tissue was homogenized using a

PowerGen700 homogenizer (Fisher Scientific, Pittsburgh, Pennsylvania, United

States). Total RNA was isolated according to the TRIzol Reagent protocol and

genomic DNA was isolated according to the Qiagen AllPrep DNA/RNA Mini Kit

protocol.

Normal kidney tissue (25-35 mg) from formalin-fixed, paraffin-embedded

blocks was cut out with a scalpel and crushed in liquid nitrogen with a mortar and

pestle. The samples were then treated with 1 ml xylene to remove the paraffin. DNA

was extracted from the tissue according to the RecoverAll Total Nucleic Acid

Isolation Kit for FFPE kit protocol (Ambion, Austin, Texas, United States).

SNP Selection

Candidate aging genes were chosen from previous transcriptional profiling

studies and include 447 age-regulated kidney genes (Rodwell et al., 2004) as well as

the genes in the four pathways that are commonly age-regulated in the kidney, muscle

Page 39: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

28

and brain: extracellular matrix, ribosome, chloride transport and electron transport

chain (Zahn et al., 2006). The candidate kidney aging genes were first searched for

mRNA SNPs that could be used in an allele-specific expression assay. In addition to

being within the transcript on an autosome, the SNPs had to have a minor allele

frequency greater than 0.05 in the HapMap CEU population, an Illumina SNP score

greater than 0.4, and be greater than 30 bp from an exon boundary (NCBI Build 36.1)

to ensure the Illumina genotyping assay would work properly for both genomic DNA

and cDNA. For genes that had multiple assayable mRNA SNPs, those closest to the

5’ end of the gene were chosen, with a maximum of two SNPs per gene. These

criteria were met for 386 SNPs in 276 genes. For candidate aging genes that did not

have an appropriate mRNA SNP, promoter region (defined as 5kb upstream or

downstream of the transcription start site) SNPs meeting the same minor allele

frequency (>0.05) and SNP score (>0.4) criteria were chosen. One to four SNPs were

chosen per gene for analysis, totaling 1041 promoter SNPs in 354 candidate aging

genes.

Genotyping

The candidate aging SNPs were genotyped using a GoldenGate Custom Panel

from Illumina (San Diego, California, United States). Oligonucleotides specific for

each allele of each SNP were designed for use in a multiplex PCR. A standard

protocol designed by Illumina and implemented at the Stanford Human Genome

Center was used to determine the genotypes of the 96 individuals for whom we had

kidney tissue. Samples were hybridized to custom Sentrix Array Matrices and

scanned on the Illumina BeadStation 500GX. Allele calls were determined using the

Page 40: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

29

Illumina BeadStudio clustering software. The genotyping was successful (>90% call

rate, HWE p > 0.001) at 1341/1427 of the SNP loci in 599/630 genes (95%). A list of

the 1341 SNPs is available at http://www.plosgenetics.org/doi/pgen.1000685 (Table

S1).

Total Expression Quantification

Most of the microarrays (68 cortex and 59 medulla samples) used in our total

expression association study were previously analyzed (Rodwell et al., 2004). The

same Affymetrix (Santa Clara, California, United States) HG-U133A and HG-U133B

high-density oligonucleotide arrays used in Rodwell et al. were used here to measure

total expression levels in 26 additional cortex samples. The samples were processed at

the Stanford Genome Technology Center using their standard protocol (Rodwell et al.,

2004). Eight micrograms of total RNA was used to synthesize cRNA for each sample,

and 15 µg of cRNA was hybridized to each microarray. Using the dChip program

(Zhong et al., 2003), microarray data (.cel files) were normalized according to the

stable invariant set, and gene expression values were calculated using a perfect match

model. All arrays passed the quality controls set by dChip. The raw microarray data

are available at the Stanford Microarray Database (http://smd.stanford.edu).

Ancestry Analysis  

Because the samples come from the diverse San Francisco Bay Area

population, we needed to control for population structure. We chose to use the

program STRUCTURE to determine the genetic ancestry of the individuals in our

sample (Pritchard et al., 2000). We had self-reported ancestry for 84 of the 293

Page 41: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

30

individuals in our sample and could compare this data to the results of STRUCTURE.

One key to successful use of the STRUCTURE program is that the markers used

cannot be closely linked. Some of our markers are partially linked as they lie within

the same gene region. We searched the HapMap CEU population and found no

pairwise linkage disequilibrium (LD) data for 839 of our SNPs and thus they are

completely unlinked and were chosen for the STRUCTURE analysis. We used the

CEU LD data because the blocks of LD are known to be larger than in the YRI

population and of the samples for which we have self-reported ancestry, ~80% are

Caucasian. We included genotype data at our 839 SNPs from three HapMap

populations (CEU, JPT+CHB, YRI) in our analysis to verify that our SNPs can

distinguish genetic ancestry (Altshuler et al., 2005). Because the CEU and YRI

populations contain family trios, we only included the parents in our analysis.

We performed 3 runs of STRUCTURE at each K from 1-2 and 10 runs at each

K from 3-7 using the admixture ancestry model. K is the number of populations the

program assumes. The admixture model allows individuals to have mixed ancestry,

and is thus flexible to deal with the complexity of the Bay Area population. A burnin

length of 10,000 and a run length of 10,000 were used for each run. The estimated

natural log probability of the data is shown in Table 2.3. The probabilities are similar

for K=3,4,6,7. In the documentation for the STRUCTURE software, the authors note

that P(K) is often very small for K less than the appropriate value (in our case, K=1

and 2), and then plateaus for larger K, as is observed in Table 2.3. In this situation,

where several values of K give similar estimates of Ln P(Data), it seems that the

smallest of these if often “correct” (Pritchard et al., 2007).

Page 42: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

31

Figure 2.2 shows the clustering of genetic ancestry of one run at each value of

K from 3 to 7. At each value of K, we see the three HapMap populations cluster

perfectly, with the exception of a few individuals in the CEU population at K=6 and 7.

Most of the Stanford patients cluster with the CEU population at K=3, indicating they

are Caucasian. However, we do see some patients clustering with the JPT+CHB

population and a few with the YRI population, as well some admixed individuals. Of

the 84 individuals with self-reported ancestry information, 78 matched the genetic

information. One individual reported as African American, but in all runs at K=3, was

estimated to be greater than 99% Caucasian. This may be a data collection error.

Three individuals were estimated as admixed for Asian and Caucasian ancestry at

K=3, while two of them reported being Asian and one reported being Caucasian. The

two individuals who self-reported as Hispanic, show a mix of Asian and Caucasian

ancestry at K=3, which makes sense historically. At K=4, we see a fourth cluster

emerging from the Stanford patients, made up of individuals who were admixed for

Asian and Caucasian ancestry at K=3. This cluster (yellow on Figure 2.2) includes the

two self-reported Hispanic individuals, so this may be a Hispanic cluster. At K=5, we

see a subset of the Stanford Patients who previously clustered with the CEU

population breaking into a second Caucasian population (purple on Figure 2.2). A

third Caucasian population emerges at K=6 (white on Figure 2.2) and a fourth at K=7

(gray on Figure 2.2). The mean variance of the percent ancestry in each cluster of

these additional Caucasian populations increases as K increases (Table 2.4). The

clustering of these Caucasian individuals is inconsistent as K increases and therefore

the best clustering to use in additional analyses is that at K=3. Because most of the

Page 43: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

32

Stanford samples were predominantly of Caucasian genetic ancestry and because it is

simplest to use a Boolean covariate value in regression analysis when chronological

significance of the state (genetic ancestry in this case) is unknown, we chose to divide

the individuals into two groups for our total expression QTL analysis. In the first

group we included individuals with an average percent CEU ancestry >75%. This

group included 211 individuals. The second group contained the other 82 individuals.

Total Expression Regression Models

We used a linear regression model to determine which SNP genotypes showed

a statistically significant association with gene total expression levels:

Yij = β0 j + β1 jgij + β2 jagei + β3 j ti + β4 janci + β5 j si + εij (2.1)

In equation 2.1, Yij is the base 2 logarithm of the expression level for the gene of SNP j

in kidney sample i, gij is the genotype (0,1,2 for AA, AB, BB) of individual i at SNP j,

agei is the age in years of the individual i, ti is 0 if sample i was from kidney cortex

and 1 if sample i was from kidney medulla, anci is 0 if the individual contributing

sample i has >75% CEU ancestry and 1 for other ancestry proportions, si is 0 for males

and 1 for females, and εij is a random error term. The coefficients βkj for k = 0-5 were

estimated by least squares from the data. Our primary interest was β1j values that

significantly differed from zero, indicating that SNP j associates with total expression

level. Because our microarrays were processed on two different scanners three years

apart, we analyzed the two sets of data separately. The first set comprised the 127

samples previously analyzed in Rodwell et al. and the second set comprised the 26

additional samples processed here. We combined the results from the two regression

Page 44: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

33

analyses using Fisher’s combined probability test (Fisher, 1948). The β1j p-values

from each of the two analyses were combined into one test statistic (χ2) having a chi-

square distribution and four degrees of freedom using the formula:

χ 2 = −2 loge (pi)i=1

2

∑ (2.2)

Using Fisher’s method, we found 11 promoter SNPs in seven genes and five mRNA

SNPs in five genes that associated with total expression level (p < 0.001).

Page 45: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

34

Table 2.1 SNPs that associate with total gene expression level.

Gene SNP β1jgij P-value

R2*

High expression allele

Low expression allele

RPS26 rs705704 1.2 x 10-20 0.430 A G POLR1D rs10492487 6.2 x 10-6 0.143 T A COX7A2L rs1997 8.9 x 10-6 0.133 T A ZNF6 rs1006629 1.2 x 10-5 0.101 C T RPL12 rs2247322 3.7 x 10-5 0.103 A T TXNDC5 rs8643 1.2 x 10-4 0.100 G A RPS9 rs2304524 1.3 x 10-4 0.083 G C RPS18 rs213204 2.5 x 10-4 0.091 C A CFB rs641153 2.6 x 10-4 0.094 T C ANTXR1 rs7584385 2.8 x 10-4 0.086 G A POLR1D rs7097 3.0 x 10-4 0.095 G A RPL12 rs1139400 3.7 x 10-4 0.078 G A COL15A1 rs1051105 7.0 x 10-4 0.083 A G ATP5F1 rs1264899 7.4 x 10-4 0.076 A G RPS18 rs213199 8.3 x 10-4 0.077 C T RPS9 rs3810229 9.5 x 10-4 0.063 A C

*Proportion of variance in gene expression explained by the SNP

Page 46: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

35

Figure 2.1 Total expression analysis. Genotypic associations with total expression level. (A) Boxplot of RPS26 expression according to genotype at the promoter SNP rs705704 (p = 1.2 x 10-20). The boxes define the interquartile range and the thick line is the median. Open dots are possible outliers. (B) Haploview (Barrett et al., 2005) linkage disequilibrium (LD) plot of the RPS26 region. The SNP rs705704 is 274 bp upstream of the RPS26 transcription start site. Values in boxes correspond to the pairwise r2 LD values (darker boxes correspond to higher r2 values) for the HapMap CEU population. rs705704 (red) is partially linked to three SNPs (black) previously shown to associate with RPS26 expression levels (Cheung et al., 2005; Dixon et al., 2007; Myers et al., 2007; Schadt et al., 2008; Webster et al., 2009).

Page 47: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

36

Table 2.2 Common total expression associations across studies. Gene SNP Study Tissue Same? * RPS26 rs705704 Myers, Schadt brain, liver snp COX7A2L rs2041354 Schadt liver gene RPL12 rs2247322 Veyrieras LCLs snp RPS9 rs17273267 Stranger LCLs gene RPS18 rs1810472 Schadt liver gene

*Those labeled "snp" showed association with expression at the same snp in our dataset and the respective study (p < 0.001). Those labeled "gene" showed association with expression at the same gene in our dataset and at the shown snp in the respective study (p < 0.001). LCLs = lymphoblastoid cell lines. Table data compiled from the eQTL Genome Browser (http://eqtl.uchicago.edu). Studies: Myers, A.J. et al. A survey of genetic human cortical gene expression. Nat Genet 39, 1494-9(2007). Schadt, E.E. et al. Mapping the genetic architecture of gene expression in human liver. PLoS Biol 6, e107 (2008). Stranger, B.E. et al. Population genomics of human gene expression. Nat Genet 39, 1217-24 (2007). Veyrieras, J.B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet 4, e1000214 (2008).

Page 48: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

37

Table 2.3 The probability of the genotype data at K=1-7. K Mean Ln P(Data) Mean Var[Ln P(Data)]

3 runs 1 -427220 414

3 runs 2 -412674 1563

10 runs 3 -397937 2455

10 runs 4 -397893 3878

10 runs 5 -399198 7467

10 runs 6 -397796 5895

10 runs 7 -397551 6108

K = number of populations the program STRUCTURE assumes.

Page 49: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

38  

Figure 2.2 Estimated genetic ancestry. Each individual is represented by a thin vertical line, which is partitioned into K colored segments that represent the individual's estimated membership fractions in K clusters. Black lines separate individuals of different populations. Populations are labeled above the figure. Made using the program DISTRUCT (Rosenberg, 2004).

Page 50: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

39  

Table 2.4 Mean variance of the percent ancestry in each cluster (10 runs at each K).

Cluster: YRI JPT+CHB CEU

Asian/Cauc.

Admix. Cauc. 2 Cauc. 3 Cauc. 4

K = 3 5.0 x 10-6 1.3 x 10-5 1.5 x 10-5

K = 4 1.4 x 10-5 3.2 x 10-4 8.0 x 10-3 8.7 x 10-3

K = 5 3.0 x 10-5 3.7 x 10-6 4.4 x 10-2 6.2 x 10-4 1.9 x 10-1

K = 6 2.1 x 10-5 2.0 x 10-4 6.1 x 10-2 9.8 x 10-4 6.1 x 10-2 2.4 x 10-2

K = 7 1.1 x 10-5 4.3 x 10-5 9.4 x 10-3 1.3 x 10-2 2.7 x 10-2 4.5 x 10-2 2.2 x 10-2

K = number of populations the program STRUCTURE assumes.

Page 51: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

40  

Chapter 3: Identification of eQTLs by Allele-Specific

Expression Analysis

Portions of this chapter were previously published in PLoS Genetics (2009), 5(10): e1000685 with the following authors:

Heather E. Wheeler1, E. Jeffrey Metter2,3, Toshiko Tanaka2,3, Devin Absher4, John Higgins5, Jacob M. Zahn6, Julie Wilhelmy6, Ronald W. Davis6, Andrew Singleton7,

Richard M. Myers4, Luigi Ferrucci2,3, Stuart K. Kim1,8

1Department of Genetics, Stanford University Medical Center, Stanford, CA, USA, 2Longitudinal Studies Section, Clinical Research Branch, National Institute on Aging,

Baltimore, MD, USA, 3Medstar Research Institute, Baltimore, MD, USA, 4HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA, 5Department of

Pathology, Stanford University Medical Center, Stanford, CA, USA, 6Stanford Genome Technology Center, Palo Alto, CA, USA, 7Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA, 8Department of Developmental

Biology, Stanford University Medical Center, Stanford, CA, USA

Page 52: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

41

Background

If age-regulated genes are important for kidney function, then variation in gene

expression may correlate with variation in kidney function. As the second step in our

genomic convergence approach (Figure 1.1), we performed expression quantitative

trait (eQTL) mapping of the age-regulated genes. We focused on finding expression-

associated SNPs (eSNPs) using two methods. The results of the first method, total

expression analysis, were presented in the previous chapter. The second method,

allele-specific expression, is presented here.

This second method identified differential allelic expression within individuals

that are heterozygous for a specific SNP. In this method, the expression levels of each

allele are measured directly by assaying SNPs within the mRNA transcript. The

cDNAs of heterozygotes were examined for allelic transcript levels that differ from

each other. Genomic DNA allelic ratios were used as controls of 1:1 hybridization

intensity. Because differential expression is examined within heterozygotes, mRNA

levels are measured within the same genetic background and cellular environment.

We combined the 447 kidney age-regulated genes (Rodwell et al., 2004) with

the genes in the four commonly age-regulated pathways (Zahn et al., 2006) and

obtained a set of 630 genes that change expression with age. Of the 630 genes in this

candidate set, 276 of them had assayable SNPs within the coding or untranslated

regions. We genotyped both cDNA derived from kidney tissue and genomic DNA in

96 individuals and tested 386 SNPs in these 276 genes for allele-specific expression.

Page 53: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

42

Results

Allele-specific expression analysis was used to test all of the age-regulated

genes that had SNPs in their mRNAs for differential allelic expression. We assayed

the relative expression levels of 386 mRNA SNPs in 276 age-regulated genes in 96

individuals. Most of the mRNA SNPs were in the 3’ untranslated regions of genes

(249), some were in coding regions (115), and a few were in the 5’ untranslated

regions (22).

Oligonucleotides specific for each allele of each SNP were designed for use in

the Illumina GoldenGate multiplex PCR assay. Kidney cortex mRNA was reverse

transcribed into cDNA prior to the start of the GoldenGate assay. In the assay, the

PCR products for each allele were labeled with a different fluorophore and the

intensities of each allele were compared to determine if one allele was expressed

higher than the other. The cDNA allelic intensities for each SNP were compared

within heterozygotes to test for differential allelic expression. Because the intensities

from each fluorophore (Cy3 and Cy5) can differ, the genomic DNA allelic intensities

of heterozygotes were used as a control to define a 1:1 allelic ratio for each SNP. A

schematic of the allele-specific expression assay is shown in Figure 3.1. The cDNA

allelic ratio for each heterozygote was compared to the 95% confidence interval

surrounding the mean genomic DNA allele intensity ratio for each SNP. At least five

heterozygotes were tested per SNP. If the cDNA allele intensity ratio for more than

50% of individual heterozygotes fell outside the 95% confidence interval and the

combined p-value was less than 10-6, the SNP was considered to be an eSNP.

Page 54: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

43

In total, 105 eSNPs in 93 age-regulated genes were detected (Table 3.1, Figure

3.2). The median fold-change of the higher expressed allele to the lower-expressed

allele was 2.1. The level of overexpression of one allele varied widely among genes,

from 1.4-fold to apparent monoallelic (>10-fold) expression (Table 3.1, Figure 3.3).

Two genes (SPP1 and TIMP3) had linked eSNPs (r2>0.8 HapMap CEU population)

that both showed allele-specific differences in expression. Ten genes contained two

unlinked eSNPs that independently showed differences in expression.

For most of these eSNPs (96/105), the higher-expressed allele was usually the

same across heterozygotes. For example, the A allele is expressed higher than the C

allele in 11 of 12 heterozygotes tested at rs2245803 in the gene matrix

metalloproteinase 20 (MMP20, Figure 3.4), and the A allele is expressed higher than

the C allele in 12 of 13 heterozygotes tested at rs2296292 in LAMC1 (Figure 3.5A). In

these SNPs, the functional SNP causing the expression difference is likely linked to

the SNP we measured. For a smaller subset of the SNPs (9/105 eSNPs), both alleles

were observed at a higher level in different heterozygotes. One explanation for this is

that the functional SNP causing the expression difference is not closely linked to the

SNP we measured in the transcript. Another explanation is that epigenetic effects

such as imprinting could cause the differences in expression from the two homologs.

For example, one of the genes in which either allele was associated with higher

expression is PEG3 (paternally expressed 3), which is a known imprinted gene

(Figure 3.5B; Murphy et al., 2001; Van den Veyver et al., 2001). Presumably, the

higher-expressed allele in our studies is from the paternal homolog.

Page 55: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

44

386 SNPs were tested for association with expression by both the allele-

specific method and the total expression method. While 105 eSNPs were identified by

the allele-specific method, only five eSNPs were identified by the total expression

method. Of the five SNPs found by the total expression method, four were also found

by the allele-specific expression method (Bold in Table 3.1). One example is rs8643

in the gene TXNDC5, in which both methods found that the G allele is associated with

higher expression than the A allele (Figure 3.6). These results indicate that the allele-

specific assay identified many more eSNPs and is likely more sensitive in detecting

expression differences than the total expression assay. A probable reason is that for

the allele specific assay, expression is measured from two alleles in heterozygotes and

thus variability due to genetic background and environmental effects are reduced or

eliminated.

Discussion

We tested 386 mRNA SNPs in 276 age-regulated genes for allele-specific

expression in kidney tissue from 96 individuals. Our goal was to find genes that

associate with kidney aging and as an intermediate step we performed this eQTL

analysis in hopes of converging on genes most likely to be functional. Genes that

show allele associations with expression level may also show allele associations with a

biological function, such as glomerular filtration rate, our chosen phenotype of kidney

aging. By comparing the cDNA of heterozygotes to their genomic DNA, we found

105 SNPs in 93 age-regulated genes that are allele-specifically expressed.

Page 56: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

45

Other groups have used the allele-specific expression approach to identify

differentially-expressed genes in lymphoblastoid cell lines (Pastinen et al., 2005;

Pastinen et al., 2004; Serre et al., 2008; Yan et al., 2002), brain (Bray et al., 2003),

white blood cells (Pant et al., 2006), fetal kidney and fetal liver (Lo et al., 2003).

These studies found that 20-50% of the genes in the genome are differentially

expressed. Sixteen of the genes showing allele-specific expression found by our study

were also found in previous studies (Table 3.2; Lo et al., 2003; Milani et al., 2009;

Pant et al., 2006; Serre et al., 2008). Thus, 77 of the 93 allele-specifically expressed

genes identified in this work represent novel findings. Our finding that 41% of tested

genes showed allele-specific expression is similar to the percentage found in previous

studies (Bray et al., 2003; Lo et al., 2003; Pant et al., 2006; Pastinen et al., 2005;

Pastinen et al., 2004; Serre et al., 2008; Yan et al., 2002).

We were able to compare the results of this allele-specific expression analysis

to that of total expression analysis from Chapter 2. Specifically, 41% of genes

assayed contained eSNPs using the allele-specific expression method, whereas only

2% of genes assayed contained eSNPs using the total expression method. The

statistical cutoff for finding eSNPs using the allele-specific method was more stringent

than the one used for the total expression method. Thus, our results may

underestimate the improved sensitivity of the allele-specific method over the total

expression method. Unlike the total expression method, the allele-specific method

examines alleles within the same cellular environment in heterozygous individuals.

This maximizes the sensitivity of the assay because the alleles are expressed from the

same environment and genetic background. The implications of this result that the

Page 57: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

46

allele-specific method is more sensitive than the total expression method are discussed

in Chapter 5. SNPs in the 93 kidney eQTLs found by the allele-specific expression

method were tested for association with kidney aging as the final step in our genomic

convergence approach (Chapter 4).

Methods

Ethics Statement

Ethical approval for the study was obtained from the Stanford University

Institutional Review Board (IRB). All subjects provided written informed consent for

the collection of samples and subsequent analysis. This study was conducted

according to the principles expressed in the Declaration of Helsinki.

Stanford Kidney Samples

Normal kidney tissue was obtained from Stanford University Medical Center

with informed consent either from biopsies of kidneys from transplantation donors or

from nephrectomy patients with localized pathology. Kidney tissue from nephrectomy

patients was harvested meticulously with the intention of gathering normal tissue

uninvolved by the tumor. Samples that showed evidence of pathological involvement

or in which there was only tissue in close proximity to the tumor were not used.

Kidney sections were immediately frozen on dry ice and stored at −80°C until use.

RNA and DNA Preparation

Frozen kidney samples were weighed (25-50 mg), cut into small pieces on dry

ice, and then placed in 1 ml of TRIzol Reagent (Invitrogen, Carlsbad, California,

United States) for RNA extraction or 600 µl of Buffer RLT Plus (Qiagen, Valencia,

Page 58: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

47

California, United States) for DNA extraction. The tissue was homogenized using a

PowerGen700 homogenizer (Fisher Scientific, Pittsburgh, Pennsylvania, United

States). Total RNA was isolated according to the TRIzol Reagent protocol and

genomic DNA was isolated according to the Qiagen AllPrep DNA/RNA Mini Kit

protocol.

SNP Selection

Candidate aging genes were chosen from previous transcriptional profiling

studies and include 447 age-regulated kidney genes (Rodwell et al., 2004) as well as

the genes in the four pathways that are commonly age-regulated in the kidney, muscle

and brain: extracellular matrix, ribosome, chloride transport and electron transport

chain (Zahn et al., 2006). The candidate kidney aging genes were first searched for

mRNA SNPs that could be used in an allele-specific expression assay. In addition to

being within the transcript on an autosome, the SNPs had to have a minor allele

frequency greater than 0.05 in the HapMap CEU population, an Illumina SNP score

greater than 0.4, and be greater than 30 bp from an exon boundary (NCBI Build 36.1)

to ensure the Illumina genotyping assay would work properly for both genomic DNA

and cDNA. For genes that had multiple assayable mRNA SNPs, those closest to the

5’ end of the gene were chosen, with a maximum of two SNPs per gene. These

criteria were met for 386 SNPs in 276 genes.

Genotyping

The candidate aging SNPs were genotyped using a GoldenGate Custom Panel

from Illumina (San Diego, California, United States). Oligonucleotides specific for

Page 59: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

48

each allele of each SNP were designed for use in a multiplex PCR. A standard

protocol designed by Illumina and implemented at the Stanford Human Genome

Center was used to determine the genotypes of the 96 individuals for whom we had

kidney tissue. Samples were hybridized to custom Sentrix Array Matrices and

scanned on the Illumina BeadStation 500GX. Allele calls were determined using the

Illumina BeadStudio clustering software. The genotyping was successful (>90% call

rate, HWE p > 0.001) at 95% of the genes.

Allele-Specific Expression Quantification

Total RNA was reverse transcribed into cDNA using the SuperScript Double-

Stranded cDNA Synthesis Kit (Invitrogen, Carlsbad, California, United States). The

same Illumina GoldenGate Custom Panel used for genotyping was used to measure

cDNA levels according to which allele of the SNP is present in the transcript. Only

SNPs for which the genomic DNA genotyping was successful were analyzed. After

the cDNA PCR products were hybridized and scanned, the raw allelic intensities were

first used to determine which transcripts were expressed. The expression threshold

was defined by the absent allele in normal homozygotes. That is, for an AA genotype,

the intensity of the B allele was taken to be background. The expression threshold

was calculated for each SNP as the mean of the background intensity plus two

standard deviations. SNPs with five or more heterozygotes showing expression of at

least one of the two alleles were carried through the rest of the analysis. Of the SNPs

measured, 309 of them in 225 genes were genotyped correctly (call rate >90%, HWE

p>0.001) and expressed above a background threshold in at least 5 heterozygotes. To

determine which alleles were associated with expression level, a confidence interval

Page 60: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

49

was calculated for each SNP using the genomic DNA allele intensities of

heterozygotes. The confidence interval for each SNP was defined as the mean of the

normalized genomic DNA allele A/B raw intensity ratios plus or minus two standard

deviations. If the cDNA allele intensity ratio for more than 50% of individual

heterozygotes fell outside the 95% confidence interval and the meta p-value (Fisher,

1948) was less than 10-6, the SNP was considered to be an eSNP. eSNPs were not

observed simply due to low, noisy transcript levels because the relative abundance of

each gene in the total cDNA sample (calculated from whole-genome microarray data)

was greater than the relative abundance of the gene in the genomic DNA sample.

Page 61: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

50

Figure 3.1 Assay for allele-specific expression. (A) Example: the G/A regulatory SNP in the upstream region causes the top transcript to be expressed higher than the bottom transcript. The C/G SNP is in the mRNA and linked to the regulatory SNP. (B) cDNA levels are measured by using a SNP genotyping assay designed to measure SNPs located in the mRNA of a gene (the C/G SNP in this example). The cDNA allelic ratio is compared to the genomic DNA allelic ratio (1:1 reference) to determine if the two alleles are expressed at significantly different levels (see Methods section in Chapter 3).

Page 62: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

51  

Table 3.1 Expression QTLs identified by allele-specific expression analysis in heterozygotes. Bold SNPs also significantly associated with total expression levels.

Gene SNP Number heterozygotes ASE proportion

Major allele (# higher expression)

Minor allele (# higher expression)

Mean fold change (higher allele / lower allele)

Fisher's Meta P-value

PEG3 rs1055359 32 1 A (12) G (20) 11.7 <10-100 COL17A1 rs805701 23 1 A (23) G (0) 2.2 <10-100 LAMC1 rs2296292 13 1 A (12) C (1) 2.2 10-44 MMP8 rs1276282 12 1 C (0) T (12) 2.6 <10-100 CLCA1 rs1882753 10 1 T (10) C (0) 2.4 <10-100 PAPPA2 rs2294654 9 1 G (0) A (9) 3.2 <10-100 GABRG2 rs211037 8 1 C (1) T (7) 2.1 <10-100 SLC16A7 rs10506399 36 0.97 G (0) A (35) 2 <10-100 PDIA4 rs1052549 31 0.97 T (0) G (30) 1.7 10-55 ATHL1 rs2242565 37 0.95 A (34) G (1) 3.1 <10-100 FAM83F rs17406386 33 0.94 A (31) G (0) 4.1 <10-100 BRP44L rs3728 43 0.93 T (0) G (40) 1.6 <10-100 LAMA3 rs1154226 29 0.93 C (0) G (27) 2 <10-100 KERA rs1990548 27 0.93 A (24) C (1) 1.9 <10-100 TXNDC5 rs8643 15 0.93 G (14) A (0) 3.1 10-40 FLJ38725 rs7992315 14 0.93 T (13) C (0) 1.6 10-46 COX7A2L rs1997 52 0.92 A (0) T (48) 1.6 10-91 MMP20 rs2245803 12 0.92 C (0) A (11) 2.1 <10-100 COL17A1 rs9425 12 0.92 G (0) A (11) 1.8 <10-100 RGS6 rs3291 23 0.91 A (20) G (1) 2.6 <10-100 CXCL14 rs1046092 11 0.91 A (0) G (10) 1.8 10-65 SPP1 rs4754 40 0.9 T (34) C (2) 4.8 <10-100 PHCA rs591043 31 0.9 A (0) G (28) 2.3 10-99 MATN2 rs3088121 21 0.9 A (1) G (18) 1.6 10-62

Page 63: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

52  

Gene SNP Number heterozygotes ASE proportion

Major allele (# higher expression)

Minor allele (# higher expression)

Mean fold change (higher allele / lower allele)

Fisher's Meta P-value

GPC6 rs1535692 20 0.9 G (0) A (18) 2.2 <10-100 GPC5 rs553717 10 0.9 G (0) A (9) 2.1 <10-100 GABRA4 rs7678338 45 0.89 T (40) C (0) 4.5 <10-100 GPR61 rs17575798 9 0.89 G (0) A (8) 1.9 10-23 ADRA2A rs3750625 9 0.89 C (0) A (8) 1.8 <10-100 TPD52 rs10098470 8 0.88 G (7) A (0) 1.4 10-43 TIMP3 rs1065314 36 0.86 T (31) C (0) 3.1 <10-100 DSPP rs2615489 22 0.86 A (19) G (0) 2.8 <10-100 SPP1 rs9138 40 0.85 A (28) C (6) 6.2 <10-100 TMEM92 rs2254177 12 0.83 C (0) T (10) 1.8 <10-100 CHRNA3 rs660652 11 0.82 G (0) A (9) 2.8 <10-100 OSMR rs1239344 36 0.81 G (0) A (29) 1.7 <10-100 TIMP3 rs1427384 36 0.81 A (1) G (28) 2.9 <10-100 GABRP rs929762 15 0.8 T (12) C (0) 2.4 <10-100 GPNMB rs5850 42 0.79 G (32) A (1) 1.8 10-59 C3 rs2230199 28 0.79 C(1) G (21) 2.8 <10-100 C7 rs14190 37 0.78 A (6) G (23) 1.7 10-77 GOT2 rs6993 32 0.78 C (0) T (25) 1.5 <10-100 RAFTLIN rs6900 13 0.77 C (10) T (0) 1.6 10-16 NDUFC2 rs499799 29 0.76 G (21) C (1) 4.6 10-54 THBS4 rs423906 20 0.75 G (0) A (15) 2.1 <10-100 MMP9 rs13969 42 0.74 A (0) C (31) 3 10-67 RPL15 rs1133926 27 0.74 A (1) G (19) 3.6 <10-100 LPL rs3208305 41 0.73 A (9) T (21) 2.3 <10-100 MMP25 rs1043298 37 0.73 T (1) A (26) 1.8 <10-100 NDUFAF1 rs1899 33 0.73 G (24) A (0) 2 10-50 CLCA1 rs1321694 15 0.73 A (0) T (11) 4 <10-100

Page 64: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

53  

Gene SNP Number heterozygotes ASE proportion

Major allele (# higher expression)

Minor allele (# higher expression)

Mean fold change (higher allele / lower allele)

Fisher's Meta P-value

LOC387758 rs7111860 15 0.73 T (11) G (0) 3.4 <10-100 SOHLH2 rs2296967 11 0.73 G (0) A (8) 3 <10-100 SPARCL1 rs9933 45 0.71 C (32) T (0) 1.6 10-69 PLK2 rs15915 35 0.71 G (1) A (24) 1.7 10-67 THBS4 rs1866389 28 0.71 C (19) G(1) 1.8 <10-100 MMP3 rs602128 24 0.71 T (17) C (0) 2.1 <10-100 LTF rs1126478 24 0.71 A (3) G (14) 4.1 <10-100 GPC6 rs17645969 30 0.7 C (2) A (19) 1.9 <10-100 DCN rs7441 10 0.7 C (0) T (7) 1.8 10-39 FMO5 rs894469 10 0.7 A (7) G (0) 3.7 <10-100 PRICKLE1 rs1043652 35 0.69 G (1) A (23) 2.1 10-48 FA2H rs1046371 37 0.68 C(1) G (24) 1.4 10-44 MMP9 rs20544 49 0.67 T (32) C (1) 3.2 <10-100 HAPLN1 rs2242128 18 0.67 G (1) C (11) 2.8 <10-100 SMPD2 rs1476387 6 0.67 G (3) T (1) 6.7 10-95 ATP5C1 rs4655 41 0.66 T (26) C (1) 2.1 <10-100 RHOBTB3 rs12351 37 0.65 T (24) G (0) 3.3 <10-100 PHYH rs11133 31 0.65 G (9) A (11) 1.7 10-70 IGF1R rs2229765 23 0.65 G (8) A (7) 1.4 10-53 POSTN rs6750 20 0.65 G (7) C(6) 2.8 <10-100 SP2 rs2229358 47 0.64 G (1) A (29) 1.6 <10-100 MATN1 rs20566 36 0.64 A (22) G (1) 1.6 10-51 RARRES1 rs2307064 28 0.64 C (9) T (9) 1.5 <10-100 RPL28 rs7255657 11 0.64 A (6) G (1) 1.9 10-21 PECI rs3177253 30 0.63 G (18) A (1) 2.4 10-59 LAMB1 rs7561 42 0.62 C (25) A (1) 2.7 10-59 GABRA4 rs17599102 39 0.62 C (1) T (23) 2.2 <10-100

Page 65: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

54  

Gene SNP Number heterozygotes ASE proportion

Major allele (# higher expression)

Minor allele (# higher expression)

Mean fold change (higher allele / lower allele)

Fisher's Meta P-value

ATP5F1 rs1264899 38 0.61 G (22) A (1) 1.4 10-32 MTR rs2853522 33 0.61 C (14) A (6) 1.5 10-44 NOV rs14324 28 0.61 C (1) T (16) 1.7 10-58 CLIC6 rs2834601 18 0.61 C (2) T (9) 2.2 10-60 KIAA0644 rs740252 40 0.6 G (1) C (23) 2 <10-100 EGF rs3733625 15 0.6 A (1) G (8) 5.1 <10-100 MFGE8 rs8530 27 0.59 G (3) A (13) 1.6 <10-100 FLRT2 rs17646457 22 0.59 G (1) A (12) 1.9 10-55 LIX1 rs316234 37 0.57 C (1) A (20) 1.9 <10-100 IGF1R rs3743262 14 0.57 C (2) T (6) 3.4 10-54 GLRB rs1129304 36 0.56 T (19) A (1) 1.5 10-77 FN1 rs2289202 20 0.55 G (0) A (11) 2 10-42 MAP4 rs1061003 41 0.54 C (22) G (0) 1.5 10-32 AATF rs1045056 41 0.54 T (0) C (22) 2.8 10-30 ADAMTS5 rs457947 37 0.54 C (19) G (1) 1.6 <10-100 CFB rs641153 13 0.54 C (3) T (4) 1.4 10-38 TGFB2 rs900 13 0.54 A (7) T (0) 2.1 10-15 MMP7 rs10502001 32 0.53 C (13) T (4) 1.6 10-25 ADCY1 rs2280495 19 0.53 C (1) T (9) 1.8 10-37 SLC16A7 rs3763979 17 0.53 G (3) A (6) 2.5 <10-100 HIBADH rs1052741 29 0.52 C (13) T (2) 2.7 10-45 FBLN2 rs1061375 40 0.5 G (1) A (19) 2 10-35 FLRT2 rs10309 38 0.5 G (2) A (17) 2.1 <10-100 C18orf1 rs3744811 32 0.5 C (1) T (15) 1.9 10-38 SPARCL1 rs1049539 32 0.5 A (1) G (15) 1.7 10-22 PTPRO rs1050646 18 0.5 T (0) C (9) 1.4 10-25 COL6A3 rs4663722 18 0.5 C (2) G (7) 1.5 10-24

Page 66: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

55

Figure 3.2 Distribution of allele-specific expression. The white bars show the distribution of the allelic expression ratio for all heterozygotes that express the transcript of the 309 SNPs tested. The red bars show the distribution of the allelic expression ratio for heterozygotes that show allele-specific expression.

Page 67: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

56

Figure 3.3 Distribution of mean allelic fold change in allele-specifically expressed genes. The level of overexpression of one allele varied widely among genes, from 1.4- fold to apparent monoallelic (>10-fold) expression. The median fold change was 2.1.

Page 68: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

57

Figure 3.4 Allele-specific expression analysis. The red lines indicate the 95% confidence interval surrounding the normalized genomic DNA allelic ratio. Each bar represents one heterozygous individual at the particular SNP listed. Individuals above the upper bound or below the lower bound display allele-specific expression. (A) Negative control: no allele-specific expression was observed at SNP locus rs11553763 in the gene TSC1 in the ten heterozygotes tested. (B) Allele-specific expression was observed at SNP locus rs2245803 in the gene MMP20 in 11 of 12 heterozygotes tested. The A allele was expressed higher than the C allele in all the individuals displaying allele-specific expression.

Page 69: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

58

Figure 3.5 Allele-specific expression eQTL characteristics. In eQTLs discovered by allele-specific expression analysis, the more highly expressed allele was most often the same across heterozygotes. The red lines indicate the 95% confidence interval surrounding the normalized genomic DNA allelic ratio. Each bar represents one heterozygous individual at the particular SNP listed. Individuals above the upper bound or below the lower bound display allele-specific expression. (A) In 12 of 13 heterozygotes at rs2296292 in LAMC1, the A allele is expressed higher than the C allele and in 1 heterozygote, the C allele is expressed higher than the A allele. In 91% of discovered eQTLs, greater than 75% of heterozygotes with differential expression have the same allele higher, which indicates the functional SNP is likely linked to the SNP that was interrogated in the transcript. (B) In 12 of 32 heterozygotes at rs1055359 in PEG3, the A allele is expressed higher than the G allele and in 20 of 32 heterozygotes, the G allele is expressed higher than the A allele. A similar pattern was observed in 9% of discovered eQTLs. PEG3 is a known imprinted gene.

Page 70: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

59

Figure 3.6 Comparison of eQTL methods at one locus. (A) Allele-specific expression method: the red lines indicate the 95% confidence interval surrounding the normalized genomic DNA allelic ratio. Each bar represents one heterozygous individual at the particular SNP listed. Individuals above the upper bound or below the lower bound display allele-specific expression. Allele-specific expression was observed at SNP locus rs8643 in the gene TXNDC5 in 14 of 15 heterozygotes tested. The G allele was expressed higher than the A allele in all the individuals displaying allele-specific expression. (B) Total expression method: boxplot of TXNDC5 total expression according to genotype at the 3’ UTR SNP rs8643 (p = 1.2 x 10-4). The boxes define the interquartile range and the thick line is the median. Open dots are possible outliers. GG homozygotes at rs8643 have higher expression of TXNDC5 than heterozygotes.

Page 71: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

60

Table 3.2 Common allele-specific expression across studies. Gene SNP Study Same? * C3 rs17030 Lo, Pant gene CLIC6 rs2834601 Serre snp COX7A2L rs1997 Pant snp FBLN2 rs9843344 Milani gene LAMB1 rs7561 Serre snp LAMC1 rs20563 Milani gene LTF rs1126478 Pant snp MATN2 rs2615 Lo gene MFGE8 rs1878326 Milani gene MMP7 rs10502001 Serre snp MMP8 rs1940475 Pant gene MMP9 rs13925 Pant gene MTR rs2229276 Pant gene RGS6 rs3291 Lo snp SLC16A7 rs3763980 Pant gene SP2 rs2229358 Milani snp

*Those labeled "snp" showed ASE in our dataset and the respective study at the same snp. Those labeled "gene" showed ASE at the same gene in our dataset and at the shown snp in the respective study. Studies: Lo, H.S. et al. Allelic variation in gene expression is common in the human genome. Genome Res 13, 1855-62 (2003). Milani, L. et al. Allele-specific gene expression patterns in primary leukemic cells reveal regulation of gene expression by CpG site methylation. Genome Res 19, 1-11 (2009). Pant, P.V. et al. Analysis of allelic differential expression in human white blood cells. Genome Res 16, 331-9 (2006). Serre, D. et al. Differential allelic expression in the human genome: a robust approach to identify genetic and epigenetic cis-acting mechanisms regulating gene expression. PLoS Genet 4, e1000006 (2008).

Page 72: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

61

Chapter 4: Genetic Association with Kidney Aging

Portions of this chapter were previously published in

PLoS Genetics (2009), 5(10): e1000685 with the following authors:

Heather E. Wheeler1, E. Jeffrey Metter2,3, Toshiko Tanaka2,3, Devin Absher4, John Higgins5, Jacob M. Zahn6, Julie Wilhelmy6, Ronald W. Davis6, Andrew Singleton7,

Richard M. Myers4, Luigi Ferrucci2,3, Stuart K. Kim1,8

1Department of Genetics, Stanford University Medical Center, Stanford, CA, USA, 2Longitudinal Studies Section, Clinical Research Branch, National Institute on Aging,

Baltimore, MD, USA, 3Medstar Research Institute, Baltimore, MD, USA, 4HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA, 5Department of

Pathology, Stanford University Medical Center, Stanford, CA, USA, 6Stanford Genome Technology Center, Palo Alto, CA, USA, 7Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA, 8Department of Developmental

Biology, Stanford University Medical Center, Stanford, CA, USA

Page 73: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

62

Background

Our sequential genomic convergence approach identified 101 genes that show

age-related changes in expression in the kidney and that also contain SNPs associated

with expression (eSNPs), indicating a presence of functional polymorphisms. Two

methods were used to find these expression quantitative trait loci (eQTLs), the total

expression method and the allele-specific expression method (Chapters 2-3). We used

these eQTLs as candidates in a gene association study of normal kidney aging. We

genotyped a total of 2038 SNPs within these 101 genes in two different cohorts

selected to study normal aging. A list of the 2038 SNPs genotyped can be found at

http://www.plosgenetics.org/doi/pgen.1000685 (Table S4).

In the two cohorts of aging, the function of the kidney was measured by

glomerular filtration rate (GFR) using 24-hour creatinine clearance. The first cohort is

the Baltimore Longitudinal Study of Aging (BLSA), which is a long-running study of

human aging begun in 1958 (Lindeman et al., 1984). This study has enlisted over

3000 healthy volunteers from the Baltimore area for clinical evaluations of many age-

related traits and diseases (Ferrucci, 2008). GFR was measured at multiple ages for

each individual, with an average of 3-4 measurements per individual taken at different

times spanning decades. Thus, this study shows not only the average level of kidney

function with respect to age, but also shows the age-related downward trend in kidney

function for each individual. Multiple GFR measurements and genotype data were

available for 1066 participants.

Page 74: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

63

The second cohort is the InCHIANTI study, which is a population-based

epidemiological study aimed at measuring factors important for aging in the older

population living in the Chianti region of Tuscany, Italy (Ferrucci et al., 2000). About

90% of the individuals age 65 and older from two towns participated in this study,

making it an exceptionally useful source to study genetic determinants of normal

aging. GFR measurements were performed at one age in 1130 individuals.

Characteristics of both cohorts are shown in Table 4.1. The 2038 genotyped SNPs in

the 101 eQTLs were tested for association with GFR, our chosen phenotype of kidney

aging.

Results

We used regression models that included age as a covariate to test the SNP

genotypes in each population for association with GFR (See Methods). In order for an

allelic association with GFR to be considered significant, we first required evidence of

association in both populations (p<0.05 in each population). A total of 13 genes

contained SNPs that met these criteria (Table 4.2). Next, we combined these p-values

using Fisher’s meta analysis, a method for combining p-values from independent tests

with the same overall hypothesis (Fisher, 1948). To correct for multiple hypothesis

testing, we performed 1000 permutations of each model by swapping identification

labels and keeping the genotypes together to preserve linkage disequilibrium (See

Methods). Two linked SNPs (rs1711437 and rs1784418) in matrix metalloproteinase

20 (MMP20) remained significant after permutation testing (uncorrected p < 5 x 10-5,

corrected p = 0.01).

Page 75: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

64

We considered whether associations found in the BLSA cohort could have

been due to population structure. Concern for population structure was minimal in the

InCHIANTI cohort because it is a homogeneous Italian population. Most of the

BLSA cohort is made of Caucasian individuals (84%). Our mixed-effect regression

model included a covariate for self-reported race, which should control for differences

due to population structure. In addition, we found that rs1711437 in MMP20 showed

an association with kidney aging using only data from self-reported Caucasians in the

BLSA cohort (uncorrected p = 0.0010). These results indicate that the MMP20 SNPs

associate with kidney aging per se, and are not artifacts arising from genetic

differences between races.

A SNP in the insulin-like growth factor 1 receptor gene (IGF1R) was strongly

associated with GFR when taking age into account in the meta-analysis (rs11630259,

p = 7.8 x 10-5, Table 4.2). Decreased activity of this gene has been associated with

longer lifespan in model organisms and humans (Holzenberger et al., 2003; Kenyon et

al., 1993; Suh et al., 2008). However, SNPs in IGF1R did not remain significant

following permutation testing. Therefore, further studies are required to establish a

connection between this SNP and kidney aging.

In both populations, one or two copies of the A allele at rs1711437 in MMP20

associated with a higher GFR (Figure 4.1). For an individual who carries the A allele,

his or her creatinine clearance is approximately that of someone 4-5 years younger

who does not carry the A allele. In the BLSA population, the genotype of rs1711437

explains 2.1% of the variation in creatinine clearance and in the InCHIANTI

Page 76: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

65

population, the genotype explains 0.9% of the variation. Similar results were found

for the second SNP rs1784418, which is in linkage disequilibrium with rs1711437.

Both rs1711437 and rs1784418 are associated with variation in kidney aging,

but the functional SNP is not known. The eSNP rs2245803 identified by allele-

specific expression analysis is not linked to rs1711437 and rs1784418 (Figure 4.2).

Thus, some other SNP in this linkage disequilibrium block, such as a coding SNP or a

different eSNP, may cause differences in activity of MMP20 and be responsible for

association with the kidney aging phenotype. Interestingly, two nonsynonymous

coding SNPs, rs1784424 (Asn281Thr) and rs1784423 (Ala275Val) are contained

within this linkage disequilibrium block (Figure 4.2). These amino acid differences

might affect MMP20 function and these coding changes may be causal for differences

in kidney aging among individuals.

Discussion

As the final step in our genomic convergence approach, we tested 2038 SNPs

in 101 eQTLs for association with kidney aging. Two SNPs in MMP20 significantly

associated with age-related decline in GFR of the kidney. Matrix metalloproteinases

degrade extracellular matrix proteins including laminin, elastin, proteoglycans,

fibronectin, and collagens (Jormsjo et al., 2001). Most previous studies of MMP20

describe its role in tooth development (Bartlett et al., 2006). A role for MMP20 in

renal function has not been previously described. Changes in the extracellular matrix

play a key role in aging of the kidney. Interstitial fibrosis occurs during aging because

of an increase in matrix (Abrass et al., 1995).

Page 77: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

66

The insulin-like growth factor 1 receptor (IGF1R) was the second-highest

scoring gene in our kidney aging association study (Table 4.2). Although the SNP in

this gene did not reach statistical significance in this study, this result is interesting

because this gene is part of the insulin-like signaling pathway that has been shown in

be involved in aging in worms, flies and mice (Guarente and Kenyon, 2000). In

humans, rare variants in the IGF1R gene in centenarians are associated with reduced

IGF1R levels and defective IGF signaling (Suh et al., 2008).

In a genome-wide association study, SNPs in three gene regions (UMOD,

SHROOM3, GATM-SPATA5L1) were shown to associate with GFR (Kottgen et al.,

2009). None of these genes were age-regulated in the kidney and thus they were not

tested for expression associated SNPs in our study. Also, the study did not have

longitudinal data like the BLSA. This study was published just as our project was

finishing.

Our genomic convergence approach began with a genome-wide transcriptional

profile of kidney aging. We narrowed down which age-regulated genes to test for

association with kidney aging by performing eQTL mapping. Testing SNPs in 101

age-regulated eQTLs for association with GFR in two populations chosen to study

normal aging led to the discovery of a SNP in MMP20 that associates with GFR. This

finding needs to be replicated in additional populations, but may be the first gene

association found for normal kidney aging. Genomic convergence, combining

expression and association analyses, can be used to detect genetic associations with

any phenotype of interest.

Page 78: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

67

Methods

BLSA Samples

The Baltimore Longitudinal Study of Aging (BLSA) is an intramural research

program within the National Institute on Aging (Lindeman et al., 1984). Healthy

volunteers aged 18 and older were enrolled in the study starting in 1958. BLSA

participants are predominantly Caucasian, community-residing volunteers who tend to

be well-educated, with above-average income and access to medical care. These

subjects visit the Gerontology Research Center at regular intervals for two days of

medical, physiological, and psychological testing. Each participant has a health

evaluation by a health provider (physician, nurse practitioner, or physician assistant).

Currently, the study population has 1450 active participants, aged 18-97 years

(http://www.grc.nia.nih.gov/branches/blsa/blsa.htm). The level of kidney function in

the participants has been measured longitudinally in each individual between 1 and 16

times over a 10 to 50 year time period. The kidney aging phenotype of glomerular

filtration rate (GFR) was measured by calculating creatinine clearance. Specifically,

serum creatinine and 24-hour urinary creatinine levels were obtained from participants

using standard clinical procedures (Metter et al., 2004), and were used to calculate

creatinine clearance as follows:

CCr =UCr ×VUPCr ×1440

(4.1)

where CCr is creatinine clearance in ml/min, UCr is urinary creatinine concentration,

VU is the volume of urine collected over 24 hours, PCr is the plasma concentration of

Page 79: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

68

creatinine, and 1440 is the number of minutes in 24 hours. We were granted access to

genotype and GFR data for 1066 individuals. The genotype data comprised the 2038

SNPs genotyped on the Illumina HumanHap550 Genotyping BeadChip that are within

the 101 genes that contain SNP associations with expression and have minor allele

frequencies > 0.01 (Table S4). The GFR data included 3672 creatinine clearance

measurements.

InCHIANTI Samples

The participants in the InCHIANTI study consist of residents of two small

towns in Tuscany, Italy (Ferrucci et al., 2000). The study includes 1320 participants

(age range 20-102 yrs), who were randomly selected from the population registry of

Greve in Chianti (population 11,709) and Bagno a Ripoli (population 4,704) starting

in 1998 (Ferrucci et al., 2000). Over 90% of the population that were over the age of

65 participated in this study, and thus the cohort is a good representation of normal

aging (http://www.inchiantistudy.net).

GFR was calculated using creatinine clearance from 24-hour urine collection

as in the BLSA study. In this study, the measurement for creatinine clearance was

performed at one age only. The genotype data generated by HumanHap550

Genotyping BeadChip consisted of the same 2038 SNPs in 101 candidate aging genes

obtained from the BLSA (http://www.plosgenetics.org/doi/pgen.1000685, Table S4).

The sample size was 1130 individuals.

Page 80: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

69

Glomerular Filtration Rate Regression Models

Due to the longitudinal nature of the BLSA data, we used a mixed-effect

regression analysis to search for SNP associations with creatinine clearance. Because

the creatinine clearance measurements within one subject over time are correlated, the

regression coefficients are allowed to vary between individuals. First, we developed

the following model using a likelihood ratio approach to explain how creatinine

clearance changes with time:

Yia = β0i + β1iai + β2ia i

2 + β3idia + β4 id i

2 + β5isi + β6iri + εia (4.2)

In equation 4.2, Yia is the creatinine clearance of subject i at age a, ai is the age of

subject i, dia is the date in decimal years of the visit of subject i at age a, si is the sex of

subject i, ri is the self-reported race of subject i, and εia is a random error term. Most

of the data points (84%) came from self-reported Caucasian individuals. These

individuals were coded 0 for the ri term and everyone else was coded 1. The

coefficients βki of each subject i for k = 0-6 were estimated by maximum likelihood

from the data using the “lmer” function from the “lme4” package of R version 2.8.0.

Next, to determine if the genotype of any of our candidate aging genes can account for

some of the variance in creatinine clearance, we added two terms to the model:

Yia = β0ij + β1ijai + β2ija i

2 + β3ijdia + β4 ijd i

2 + β5ij si + β6ij ri + β7ijgij + β8ij (gij × ai) + εija (4.3)

In equation 4.3, gij is the genotype of SNP j in subject i. We obtained estimates for

three different inheritance models: additive, recessive and dominant. In the additive

model g is 0, 1, or 2 for homozygous dominant, heterozygous, and homozygous

recessive genotypes, respectively. In the recessive model, g is 0 for the homozygous

dominant and heterozygous genotypes and g is 1 for the homozygous recessive

Page 81: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

70

genotype. In the dominant model, g is 0 for the homozygous dominant genotype and g

is 1 for the heterozygous and homozygous recessive genotypes. For each SNP and

each inheritance model, we compared the results from equation 4.3 to the results from

equation 4.2 using a likelihood ratio test to generate a p-value for each SNP. Even

though we included a self-reported race term in our models, we also confirmed the

rs1711437 association with GFR by analyzing only the data points from Caucasian

individuals (p = 0.0010).

For the InCHIANTI data, because the data are not longitudinal, we used a

simple linear regression model to search for SNP associations with creatinine

clearance. We tested the three inheritance models for SNP association with creatinine

clearance at every age (equation 4.4) and for SNP association with the rate of

creatinine clearance decline with age (equation 4.5):

Yi = β0 j + β1 jgij + β2 jai + β3 j si + εij (4.4)

Yi = β0 j + β1 jgij + β2 jai + β3 j (gij × ai) + β4 j si + εij (4.5)

In equations 4.4 and 4.5, Yi is the creatinine clearance of subject i, gij is the genotype

of subject i at SNP j, ai is the age of subject i, si is the sex of subject i, and εij is a

random error term. The coefficients were estimated by least squares from the data. In

equation 4.4, our primary interest was β1j values that significantly differed from zero,

indicating that SNP j associates with creatinine clearance at every age. In equation

4.5, our primary interest was β3j values that significantly differed from zero, indicating

that SNP j associates with the rate of creatinine clearance decline with age.

Page 82: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

71

Testing for Evidence of SNP Association with GFR in Both Datasets

In order to be confident of a SNP association with GFR, we required the SNP

to show evidence of association in both the BLSA and InCHIANTI populations. That

is, we combined the p-values from the BLSA and InCHIANTI data using Fisher’s

method (equation 2.2) only if the individual p-values for a particular SNP and

inheritance model in each population were both less than 0.05. We used the p-value

from the likelihood ratio test for the BLSA data and the p-value from the β1j estimate

from equation 4.4 or the β3j estimate from equation 4.5 for the InCHIANTI data to

calculate the meta p-value.

Permutation Analysis

To correct for multiple hypothesis testing, we performed permutations to test

how often our results could appear by chance. We resampled the data for each

population and each model 1000 times, keeping the genotypes together, but swapping

the sample labels. The creatinine clearance, age, date and sex information remained

together, but the 2038 SNP genotypes connected to each individual were changed in

each permutation. Therefore, only the phenotype-genotype relationship was altered by

permutation, the linkage disequilibrium patterns between SNPs remained the same.

For each permutation, we calculated Fisher’s meta p-values only when both individual

p-values from each population were less than 0.05, as we did in the observed data.

Then, for each model, we determined how many of the permutations met or exceeded

the number of SNPs we found in the observed data at meta p-value thresholds. The

Page 83: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

72

permuted p-value was the number of permutations that met these criteria divided by

1000. Permuted p-values less than 0.05 were considered significant.

Page 84: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

73

Table 4.1 Characteristics of kidney aging study samples.

BLSA

Mean (SD) or n InCHIANTI

Mean (SD) or n Age 57.6 (17.1) 68.4 (15.5) Date of Birth 1932 (13.5) 1931 (15.5) No. Subjects 1066 1130 No. GFR measurements per subject 3.4 (2.6) 1 (0) No. Male datapoints 2313 515 No. Female datapoints 1359 615 24-hour Creatinine Clearance 112.9 (42.4) 82.4 (30.2)

Page 85: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

74  

Table 4.2 Top SNPs that show association with kidney aging in two populations.

Gene SNP Model BLSA P InCHIANTI P Fisher’s Meta P* Permuted P MMP20 rs1711437 DOM 0.0017 0.0015 3.6 x 10-5 1.0 x 10-2 IGF1R rs11630259 REC 0.0001 0.0443 7.8 x 10-5 NS RGS6 rs8007684 ADD x AGE 0.0165 0.0009 1.9 x 10-4 NS FAM83F rs3021274 DOM x AGE 0.0063 0.0234 1.4 x 10-3 NS MMP25 rs1004792 REC x AGE 0.0038 0.0427 1.6 x 10-3 NS ADCY1 rs11766192 REC x AGE 0.0352 0.0054 1.8 x 10-3 NS ADAMTS5 rs10482979 REC 0.0169 0.0211 3.2 x 10-3 NS GPC5 rs342693 REC x AGE 0.0325 0.0149 4.2 x 10-3 NS MTR rs2275568 ADD 0.0286 0.0319 7.3 x 10-3 NS RPL15 rs2360610 DOM 0.0469 0.0226 8.3 x 10-3 NS GLRB rs17035648 DOM x AGE 0.0252 0.0474 9.2 x 10-3 NS GPC6 rs4612931 DOM x AGE 0.0496 0.0270 1.0 x 10-2 NS SOHLH2 rs9593921 DOM x AGE 0.0380 0.0419 1.2 x 10-2 NS

*Calculated only if individual p-values from each population were <0.05

Page 86: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

75

Figure 4.1 A SNP in MMP20 associates with a kidney aging phenotype. Loess smoothing lines through a scatter plot of creatinine clearance versus age stratified by genotype at rs1711437 in the BLSA (A) and InCHIANTI (B) populations. (corrected p = 0.01).

Page 87: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

76

Figure 4.2 Linkage disequilibrium pattern of MMP20. The two SNPs (green) for which we found significant associations with kidney aging are located in introns of MMP20. They are linked to each other and to two nonsynonymous SNPs (black) located in exon 6 of MMP20. Pairwise r2 LD values (darker boxes correspond to higher r2 values) from the HapMap CEU population are displayed. These four SNPs are not linked to the SNP (red) in exon 1 that associated with expression level of the gene. Plot made using Haploview (Barrett et al., 2005).

Page 88: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

77

Chapter 5: Conclusions

Portions of this chapter will be submitted for publication in Philosophical Transactions of the Royal Society B: Biological Sciences (2010)

with the following authors:

Heather E. Wheeler1 and Stuart K. Kim1,2

1Department of Genetics, Stanford University Medical Center, Stanford, CA, USA, 2Department of Developmental Biology, Stanford University Medical Center,

Stanford, CA, USA

Page 89: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

78

Summary and Discussion of Findings

The goal of our approach was to converge on genes that influence human

kidney aging through sequential genomic analyses. Our genomic convergence

procedure began with a genome-wide transcriptional profile of aging in the human

kidney, which gave an unbiased view of gene expression changes that occur with age

(Rodwell et al., 2004). Then, we used total expression analysis and allele-specific

expression analysis to determine which alleles are differentially expressed. We

identified 101 age-regulated eQTLs. SNPs in one of these genes, MMP20, showed a

statistically significant association with normal kidney aging. Although significant by

combining the data from two independent populations, the best way to confirm our

gene association with renal aging is to replicate the findings in additional populations.

The populations used to identify aging SNPs, BLSA and InCHIANTI, stand

out for their usefulness in studying normal kidney aging. Both of these studies were

purposefully designed to study healthy individuals, instead of those harboring diseases

associated with old age. The BLSA study includes longitudinal measurements of traits

associated with normal aging, which added considerable power to the analysis.

Two SNPs in MMP20 significantly associated with age-related decline in GFR

of the kidney. Matrix metalloproteinases are involved in the breakdown of

extracellular matrix in normal physiological processes, such as embryonic

development, reproduction, and tissue remodeling, as well as in disease processes,

such as arthritis and metastasis (Llano et al., 1997; Woessner, 1991). Matrix

metalloproteinases degrade extracellular matrix proteins including laminin, elastin,

Page 90: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

79

proteoglycans, fibronectin, and collagens (Jormsjo et al., 2001). A role for MMP20 in

renal function has not been previously described, although prior studies show that

MMP20 plays an important role in tooth development (Bartlett et al., 2006). The

finding that a matrix metalloproteinase is involved in kidney aging is striking because

changes in the extracellular matrix play a key role in aging of the kidney. The

glomerular basement membrane thickens, and the mesangial matrix increases in

volume with age (McLachlan et al., 1977). Interstitial fibrosis occurs during aging

because of an increase in matrix and fibrillar collagen accumulation in the subintimal

space (Abrass et al., 1995).

MMP20 was included in our candidate aging gene set not because the gene

itself is significantly age-regulated in the kidney. Instead, MMP20 was included

because it is a component of the extracellular matrix, one of the pathways that

coordinately increased expression with age in three human tissues including the

kidney (Zahn et al., 2006). Therefore, polymorphisms in MMP20 may not only

associate with aging of the kidney, but may associate with phenotypes of aging in

other tissues as well. Additionally, if MMP20 is a common regulator of aging, certain

alleles may also be enriched in centenarians.

The second-highest scoring gene in our kidney aging association study is the

insulin-like growth factor 1 receptor. Although the SNP in this gene did not reach

statistical significance in this study, this result is interesting because this gene is part

of the insulin-like signaling pathway that has been shown in be involved in aging in

worms, flies and mice (Guarente and Kenyon, 2000). Specifically, reduced signaling

in this pathway results in longer lifespans for these model organisms. In worms, the

Page 91: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

80

orthologous gene is called daf-2 (GeneID 175410), and daf-2 mutants can have

lifespans that are 100% longer than wild-type worms (Kenyon et al., 1993). In

humans, rare variants in the IGF1R gene in centenarians are associated with reduced

IGF1R levels and defective IGF signaling (Suh et al., 2008).

Sequential use of transcriptional profiling and eQTL mapping could be used as

a general method to increase the statistical power for any human gene association

study. Like candidate gene approaches, an advantage of our approach to identify

variants associated with kidney aging is that it increases the statistical power of the

gene association study by decreasing the number of SNPs that are tested to potentially

functional SNPs. An advantage of our sequential approach over a candidate gene

approach is that the entire genome was screened for genes that are age-regulated in the

first step.

Several groups have used DNA microarrays to measure gene expression in

lymphoblastoid cell lines and have found polymorphisms that associate with

expression level (Cheung et al., 2003; Cheung et al., 2005; Deutsch et al., 2005; Dixon

et al., 2007; Monks et al., 2004; Morley et al., 2004; Spielman et al., 2007; Stranger et

al., 2005; Stranger et al., 2007). In a total expression analysis of human brain cortical

tissue, 21% of genes have SNPs that associate with expression levels (Myers et al.,

2007). Other groups have used the allele-specific expression approach to identify

differentially-expressed genes in lymphoblastoid cell lines (Pastinen et al., 2005;

Pastinen et al., 2004; Serre et al., 2008; Yan et al., 2002), brain (Bray et al., 2003),

white blood cells (Pant et al., 2006), fetal kidney and fetal liver (Lo et al., 2003).

These studies found that 20-50% of the genes in the genome are differentially

Page 92: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

81

expressed. Sixteen of the genes showing allele-specific expression found by our study

were also found in previous studies (Lo et al., 2003; Milani et al., 2009; Pant et al.,

2006; Serre et al., 2008). Thus, 77 of the 93 allele-specifically expressed genes

identified in this work represent novel findings. Our finding that 41% of tested genes

showed allele-specific expression is similar to the percentage found in previous studies

(Bray et al., 2003; Lo et al., 2003; Pant et al., 2006; Pastinen et al., 2005; Pastinen et

al., 2004; Serre et al., 2008; Yan et al., 2002).

Of the expression-associated SNPs we identified, most were found using

allele-specific expression measurements within heterozygotes. Specifically, 41% of

genes assayed contained eSNPs using the allele-specific expression method, whereas

only 2% of genes assayed contained eSNPs using the total expression method. The

statistical cutoff for finding eSNPs using the allele-specific method was more stringent

than the one used for the total expression method. Thus, our results may

underestimate the improved sensitivity of the allele-specific method over the total

expression method.

Unlike the total expression method, the allele-specific method examines alleles

within the same cellular environment in heterozygous individuals. This maximizes the

sensitivity of the assay because the alleles are expressed from the same environment

and genetic background. Previous work with a smaller set of 64 genes also showed

that allele-specific analysis in heterozygotes was more sensitive than total expression

methods for finding SNPs associated with expression levels in cis (Pastinen et al.,

2005). The results from the allele-specific analysis demonstrate that differential

expression is widespread across the human genome and suggest that differential

Page 93: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

82

expression could be a major factor contributing to differences in phenotype among

individuals.

Future Directions for Human Aging Genomics

Finding new human aging genes, possibly MMP20, contributes to our

understanding of molecular mechanisms underlying the human aging process. Among

young individuals, an unfavorable SNP genotype may indicate risk for rapid decline in

kidney function and this information could be extremely useful to identify patients

who may require early intervention. Among older individuals, a favorable SNP

genotype may indicate that they may still be eligible as kidney donors even though

they are over the current upper age limit. As more aging genes are confirmed, the

alleles belonging to a patient can be combined to better predict the aging trajectory of

the kidney.

Our finding that the allele-specific expression method is more sensitive to

detect associations than the total expression method has implications for everyone

studying the genetics of gene expression. The Genotype-Tissue Expression (GTEx)

project aims to study and map the relationship between human gene expression and

genetic variation. The project is currently in a pilot phase and will analyze dense

genotyping and expression data collected from multiple human tissues and will

correlate genetic variation and gene expression, thus producing a list of genetic

regions associated with expression of specific transcripts (Hardy and Singleton, 2009).

As the GTEx project moves forward, it will be important to consider allele-specific

expression data to maximize sensitivity to detect differential allelic expression.

Page 94: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

83

The GTEx project and other publicly available eQTL datasets will allow more

widespread use of the genomic convergence approach. Gene associations found in

genome-wide association studies (GWAS) that are also eQTLs provide a possible

functional mechanism and should be given higher priority in follow-up studies.

Expression studies are especially important became many of the SNP associations

found in GWAS are often intergenic or intronic (Hardy and Singleton, 2009; Kottgen

et al., 2009; WTCCC, 2007).

Currently, GWAS are only able to identify common alleles (minor allele

frequencies ≥ 0.05) associated with phenotypes of interest. Next-generation

sequencing technologies from companies like Illumina, Roche and Helicos can

identify rare variants (Pushkarev et al., 2009; Wang et al., 2008; Wheeler et al., 2008).

Although faster and cheaper than traditional capillary sequencers these next-

generation technologies produce shorter read lengths (35–250 bp, depending on the

platform) than capillary sequencers (650–800 bp) (Mardis, 2008). Therefore, they are

most useful when a reference genome is available and thus work well in human

studies. Deep human transcriptome-wide resequencing (RNA-seq) using next-

generation technologies has started to be used in allele-specific expression studies

(Bell and Beck, 2009; Wang et al., 2009). One study of primary T cells from four

individuals was able to test 1371 transcripts for allele-specific expression (Heap et al.,

2010). A major hurdle of allele-specific expression analysis using RNA-Seq data is

read-mapping biases. When sequence reads are mapped back to the genome, there is a

significant bias toward higher mapping rates of the allele in the reference sequence,

compared with the alternative allele in heterozygotes (Degner et al., 2009). After

Page 95: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

84

controlling for these biases by masking known SNPs and relaxing the mismatch

threshold, 7.5% of the 1371 tested transcripts in Heap et al. (2010) were allele-

specifically expressed. Future studies of cells or tissues from a greater number of

individuals will provide more heterozygotes and thus more transcripts to test for

allele-specific expression.

All of the alleles found thus far to associate with aging and longevity have

small effect sizes. If rare variants rather than common variants explain most of the

genetic variation in aging among humans, new computational methods must be

developed to find genes and pathways involved in aging. Methods are needed that can

combine multiple rare variants in the same gene or the same pathway into a score that

can be tested for association with different phenotypes of aging and longevity.

Several association test methods for rare variants have been proposed and it remains to

be seen if they will be successful in accounting for the missing genetic variation

among individuals in aging and other complex traits (Guo and Lin, 2009; Li and Leal,

2008; Zhu et al., 2009).

Another approach is to link genes and pathways to aging through molecular

networks. Examination of the perturbations of expression, protein and metabolite

networks that occur with age could reveal pathways important in the aging process. A

study in mice found that computationally identified targets of the NF-κB transcription

factor decrease expression correlation with age (Southworth et al., 2009). Blockade of

NF-κB for 2 weeks in the epidermis of chronologically aged mice reverted the tissue

characteristics and global gene expression patterns to those of young mice,

emphasizing the importance of the pathway in aging (Adler et al., 2007). Also, by

Page 96: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

85

examining tissue-to-tissue coexpression networks in mice, new obesity pathways were

identified (Dobrin et al., 2009). Complex machine learning techniques are likely

necessary to understand complex human phenotypes including aging. Aging and

common human diseases originate from a complex interplay between variations in

DNA (both rare and common) and a broad range of factors such as diet, sex and

exposure to environmental toxins. A benefit of examining molecular networks is that

both genetic and environmental perturbations affect the states of networks that in turn

affect the phenotype of interest (Schadt, 2009). Thus, molecular networks represent a

useful intermediate between the genetic/environmental input and the complex

phenotype to find associated molecular pathways. Molecular network data can also be

used to understand the biological context in which a given gene found in a traditional

association study operates.

Greater understanding of the pathways involved in aging of different human

tissues could lead to drug targets. For some aging genes, one allele may adversely

affect tissue function in old age because it increases the activity of the gene above a

healthy level. In these cases, one could develop a drug to target the gene or pathway in

individuals carrying the overactive allele, and thereby preserve function in old age.

The field of pharmacogenomics, which determines how genotype can predict an

individual’s response to a drug, will be important to treating age-related decline.

Already, the necessary dose of the drug warfarin is known to vary widely among

individuals according to their genotypes (Takeuchi et al., 2009). Warfarin is an

anticoagulant used to prevent age-related conditions like stroke, thrombosis,

pulmonary embolism and coronary malfunction (Daly and King, 2003). Dosing

Page 97: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

86

strategies according to an individual’s genotype at the relevant loci are beginning to be

implemented clinically (Klein et al., 2009). As more aging genes are confirmed, the

alleles belonging to a patient can be combined to better predict the aging trajectory of

the relevant organ or tissue. Medical practitioners will know which organs or tissues

are likely to age the fastest and can act to prevent age-related decline.

Pharmacogenomics data will help medical practitioners determine the best mode of

prevention or treatment for an individual. The goal of human aging research is not

necessarily to extend lifespan, but instead is to extend the healthy years of life.

Page 98: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

87

References

Abrass, C.K., Adcox, M.J., and Raugi, G.J. (1995). Aging-associated changes in renal extracellular matrix. Am J Pathol 146, 742-752.

Adler, A.S., Sinha, S., Kawahara, T.L., Zhang, J.Y., Segal, E., and Chang, H.Y. (2007). Motif module map reveals enforcement of aging by continual NF-kappaB activity. Genes Dev 21, 3244-3257.

Adler, S., Lindeman, R.D., Yiengst, M.J., Beard, E., and Shock, N.W. (1968). Effect of acute acid loading on urinary acid excretion by the aging human kidney. The Journal of laboratory and clinical medicine 72, 278-289.

Altshuler, D., Brooks, L.D., Chakravarti, A., Collins, F.S., Daly, M.J., and Donnelly, P. (2005). A haplotype map of the human genome. Nature 437, 1299-1320.

Arking, D.E., Atzmon, G., Arking, A., Barzilai, N., and Dietz, H.C. (2005). Association between a functional variant of the KLOTHO gene and high-density lipoprotein cholesterol, blood pressure, stroke, and longevity. Circ Res 96, 412-418.

Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25-29.

Atzmon, G., Rincon, M., Schechter, C.B., Shuldiner, A.R., Lipton, R.B., Bergman, A., and Barzilai, N. (2006). Lipoprotein genotype and conserved pathway for exceptional longevity in humans. PLoS Biol 4, e113.

Barrett, J.C., Fry, B., Maller, J., and Daly, M.J. (2005). Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263-265.

Bartlett, J.D., Skobe, Z., Lee, D.H., Wright, J.T., Li, Y., Kulkarni, A.B., and Gibson, C.W. (2006). A developmental comparison of matrix metalloproteinase-20 and amelogenin null mouse enamel. Eur J Oral Sci 114 Suppl 1, 18-23; discussion 39-41, 379.

Page 99: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

88

Bell, C.G., and Beck, S. (2009). Advances in the identification and analysis of allele-specific expression. Genome Med 1, 56.

Bellizzi, D., Rose, G., Cavalcante, P., Covello, G., Dato, S., De Rango, F., Greco, V., Maggiolini, M., Feraco, E., Mari, V., et al. (2005). A novel VNTR enhancer within the SIRT3 gene, a human homologue of SIR2, is associated with survival at oldest ages. Genomics 85, 258-263.

Bluher, M., Kahn, B.B., and Kahn, C.R. (2003). Extended longevity in mice lacking the insulin receptor in adipose tissue. Science 299, 572-574.

Borkan, G.A., and Norris, A.H. (1980). Assessment of biological age using a profile of physical parameters. J Gerontol 35, 177-184.

Bray, N.J., Buckland, P.R., Owen, M.J., and O'Donovan, M.C. (2003). Cis-acting variation in the expression of a high proportion of genes in human brain. Hum Genet 113, 149-153.

Cheung, V.G., Conlin, L.K., Weber, T.M., Arcaro, M., Jen, K.Y., Morley, M., and Spielman, R.S. (2003). Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet 33, 422-425.

Cheung, V.G., Spielman, R.S., Ewens, K.G., Weber, T.M., Morley, M., and Burdick, J.T. (2005). Mapping determinants of human gene expression by regional and genome-wide association. Nature 437, 1365-1369.

Clancy, D.J., Gems, D., Harshman, L.G., Oldham, S., Stocker, H., Hafen, E., Leevers, S.J., and Partridge, L. (2001). Extension of life-span by loss of CHICO, a Drosophila insulin receptor substrate protein. Science 292, 104-106.

Corder, E.H., Saunders, A.M., Strittmatter, W.J., Schmechel, D.E., Gaskell, P.C., Small, G.W., Roses, A.D., Haines, J.L., and Pericak-Vance, M.A. (1993). Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 261, 921-923.

Daly, A.K., and King, B.P. (2003). Pharmacogenetics of oral anticoagulants. Pharmacogenetics 13, 247-252.

Page 100: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

89

Degner, J.F., Marioni, J.C., Pai, A.A., Pickrell, J.K., Nkadori, E., Gilad, Y., and Pritchard, J.K. (2009). Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 3207-3212.

Deutsch, S., Lyle, R., Dermitzakis, E.T., Attar, H., Subrahmanyan, L., Gehrig, C., Parand, L., Gagnebin, M., Rougemont, J., Jongeneel, C.V., et al. (2005). Gene expression variation and expression quantitative trait mapping of human chromosome 21 genes. Hum Mol Genet 14, 3741-3749.

Dixon, A.L., Liang, L., Moffatt, M.F., Chen, W., Heath, S., Wong, K.C., Taylor, J., Burnett, E., Gut, I., Farrall, M., et al. (2007). A genome-wide association study of global gene expression. Nat Genet 39, 1202-1207.

Dobrin, R., Zhu, J., Molony, C., Argman, C., Parrish, M.L., Carlson, S., Allan, M.F., Pomp, D., and Schadt, E.E. (2009). Multi-tissue coexpression networks reveal unexpected subnetworks associated with disease. Genome Biol 10, R55.

Epstein, M., and Hollenberg, N.K. (1976). Age as a determinant of renal sodium conservation in normal man. J Lab Clin Med 87, 411-417.

Faubert, P.F., and Parush, J.G. (1998). Disorders of potassium metabolism. In Renal Disease in the Elderly (New York City, Marcel Dekker, Inc.), pp. 39-60.

Ferrucci, L. (2008). The Baltimore Longitudinal Study of Aging (BLSA): a 50-year-long journey and plans for the future. J Gerontol A Biol Sci Med Sci 63, 1416-1419.

Ferrucci, L., Bandinelli, S., Benvenuti, E., Di Iorio, A., Macchi, C., Harris, T.B., and Guralnik, J.M. (2000). Subsystems contributing to the decline in ability to walk: bridging the gap between epidemiology and geriatric practice in the InCHIANTI study. J Am Geriatr Soc 48, 1618-1625.

Fisher, R.A. (1948). Combining independent tests of significance. American Statistician 2, 1.

Flachsbart, F., Caliebe, A., Kleindorp, R., Blanche, H., von Eller-Eberstein, H., Nikolaus, S., Schreiber, S., and Nebel, A. (2009). Association of FOXO3A variation with human longevity confirmed in German centenarians. Proc Natl Acad Sci U S A 106, 2700-2705.

Page 101: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

90

Fliser, D., Zeier, M., Nowack, R., and Ritz, E. (1993). Renal functional reserve in healthy elderly subjects. J Am Soc Nephrol 3, 1371-1377.

Fox, C.S., Yang, Q., Cupples, L.A., Guo, C.Y., Larson, M.G., Leip, E.P., Wilson, P.W., and Levy, D. (2004). Genomewide linkage analysis to serum creatinine, GFR, and creatinine clearance in a community-based population: the Framingham Heart Study. J Am Soc Nephrol 15, 2457-2461.

Geesaman, B.J., Benson, E., Brewster, S.J., Kunkel, L.M., Blanche, H., Thomas, G., Perls, T.T., Daly, M.J., and Puca, A.A. (2003). Haplotype-based identification of a microsomal transfer protein marker associated with the human lifespan. Proc Natl Acad Sci U S A 100, 14115-14120.

Gourtsoyiannis, N., Prassopoulos, P., Cavouras, D., and Pantelidis, N. (1990). The thickness of the renal parenchyma decreases with age: a CT study of 360 patients. AJR Am J Roentgenol 155, 541-544.

Goyal, V.K. (1982). Changes with age in the human kidney. Exp Gerontol 17, 321-331.

Guarente, L., and Kenyon, C. (2000). Genetic pathways that regulate ageing in model organisms. Nature 408, 255-262.

Guo, W., and Lin, S. (2009). Generalized linear modeling with regularization for detecting common disease rare haplotype association. Genet Epidemiol 33, 308-316.

Hamilton, W.D. (1966). The moulding of senescence by natural selection. J Theor Biol 12, 12-45.

Hardy, J., and Singleton, A. (2009). Genomewide association studies and human disease. N Engl J Med 360, 1759-1768.

Harman, D. (1956). Aging: a theory based on free radical and radiation chemistry. J Gerontol 11, 298-300.

Hauser, M.A., Li, Y.J., Takeuchi, S., Walters, R., Noureddine, M., Maready, M., Darden, T., Hulette, C., Martin, E., Hauser, E., et al. (2003). Genomic convergence:

Page 102: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

91

identifying candidate genes for Parkinson's disease by combining serial analysis of gene expression and genetic linkage. Hum Mol Genet 12, 671-677.

Heap, G.A., Yang, J.H., Downes, K., Healy, B.C., Hunt, K.A., Bockett, N., Franke, L., Dubois, P.C., Mein, C.A., Dobson, R.J., et al. (2010). Genome-wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing. Hum Mol Genet 19, 122-134.

Herskind, A.M., McGue, M., Holm, N.V., Sorensen, T.I., Harvald, B., and Vaupel, J.W. (1996). The heritability of human longevity: a population-based study of 2872 Danish twin pairs born 1870-1900. Hum Genet 97, 319-323.

Hoang, K., Tan, J.C., Derby, G., Blouch, K.L., Masek, M., Ma, I., Lemley, K.V., and Myers, B.D. (2003). Determinants of glomerular hypofiltration in aging humans. Kidney Int 64, 1417-1424.

Holzenberger, M., Dupont, J., Ducos, B., Leneuve, P., Geloen, A., Even, P.C., Cervera, P., and Le Bouc, Y. (2003). IGF-1 receptor regulates lifespan and resistance to oxidative stress in mice. Nature 421, 182-187.

Hunt, S.C., Coon, H., Hasstedt, S.J., Cawthon, R.M., Camp, N.J., Wu, L.L., and Hopkins, P.N. (2004). Linkage of serum creatinine and glomerular filtration rate to chromosome 2 in Utah pedigrees. Am J Hypertens 17, 511-515.

Jormsjo, S., Whatling, C., Walter, D.H., Zeiher, A.M., Hamsten, A., and Eriksson, P. (2001). Allele-specific regulation of matrix metalloproteinase-7 promoter activity is associated with coronary artery luminal dimensions among hypercholesterolemic patients. Arterioscler Thromb Vasc Biol 21, 1834-1839.

Kaeberlein, M., McVey, M., and Guarente, L. (1999). The SIR2/3/4 complex and SIR2 alone promote longevity in Saccharomyces cerevisiae by two different mechanisms. Genes Dev 13, 2570-2580.

Kaplan, C., Pasternack, B., Shah, H., and Gallo, G. (1975). Age-related incidence of sclerotic glomeruli in human kidneys. Am J Pathol 80, 227-234.

Karasik, D., Demissie, S., Cupples, L.A., and Kiel, D.P. (2005). Disentangling the genetic determinants of human aging: biological age as an alternative to the use of survival measures. J Gerontol A Biol Sci Med Sci 60, 574-587.

Page 103: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

92

Kasiske, B.L. (1987). Relationship between vascular disease and age-associated changes in the human kidney. Kidney Int 31, 1153-1159.

Kenyon, C., Chang, J., Gensch, E., Rudner, A., and Tabtiang, R. (1993). A C. elegans mutant that lives twice as long as wild type. Nature 366, 461-464.

Kervinen, K., Savolainen, M.J., Salokannel, J., Hynninen, A., Heikkinen, J., Ehnholm, C., Koistinen, M.J., and Kesaniemi, Y.A. (1994). Apolipoprotein E and B polymorphisms--longevity factors assessed in nonagenarians. Atherosclerosis 105, 89-95.

Kincaid-Smith, P. (1991). "Age-related glomerular sclerosis: baseline values in Hong Kong". Pathology 23, 275.

Kirkwood, T.B. (1997). The origins of human ageing. Philos Trans R Soc Lond B Biol Sci 352, 1765-1772.

Kirkwood, T.B., and Austad, S.N. (2000). Why do we age? Nature 408, 233-238.

Klein, T.E., Altman, R.B., Eriksson, N., Gage, B.F., Kimmel, S.E., Lee, M.T., Limdi, N.A., Page, D., Roden, D.M., Wagner, M.J., et al. (2009). Estimation of the warfarin dose with clinical and pharmacogenetic data. N Engl J Med 360, 753-764.

Kottgen, A., Glazer, N.L., Dehghan, A., Hwang, S.J., Katz, R., Li, M., Yang, Q., Gudnason, V., Launer, L.J., Harris, T.B., et al. (2009). Multiple loci associated with indices of renal function and chronic kidney disease. Nat Genet.

Kuro-o, M., Matsumura, Y., Aizawa, H., Kawaguchi, H., Suga, T., Utsugi, T., Ohyama, Y., Kurabayashi, M., Kaname, T., Kume, E., et al. (1997). Mutation of the mouse klotho gene leads to a syndrome resembling ageing. Nature 390, 45-51.

Le-Niculescu, H., Balaraman, Y., Patel, S., Tan, J., Sidhu, K., Jerome, R.E., Edenberg, H.J., Kuczenski, R., Geyer, M.A., Nurnberger, J.I., Jr., et al. (2007). Towards understanding the schizophrenia code: an expanded convergent functional genomics approach. Am J Med Genet B Neuropsychiatr Genet 144B, 129-158.

Page 104: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

93

Lescai, F., Blanche, H., Nebel, A., Beekman, M., Sahbatou, M., Flachsbart, F., Slagboom, E., Schreiber, S., Sorbi, S., Passarino, G., et al. (2009). Human longevity and 11p15.5: a study in 1321 centenarians. Eur J Hum Genet 17, 1515-1519.

Li, B., and Leal, S.M. (2008). Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83, 311-321.

Li, M., Nicholls, K.M., and Becker, G.J. (2002). Glomerular size and global glomerulosclerosis in normal Caucasian donor kidneys: effects of aging and gender. J Nephrol 15, 614-619.

Li, Y., Wang, W.J., Cao, H., Lu, J., Wu, C., Hu, F.Y., Guo, J., Zhao, L., Yang, F., Zhang, Y.X., et al. (2009). Genetic association of FOXO1A and FOXO3A with longevity trait in Han Chinese populations. Hum Mol Genet.

Liang, X., Slifer, M., Martin, E.R., Schnetz-Boutaud, N., Bartlett, J., Anderson, B., Zuchner, S., Gwirtsman, H., Gilbert, J.R., Pericak-Vance, M.A., et al. (2009). Genomic convergence to identify candidate genes for Alzheimer disease on chromosome 10. Hum Mutat 30, 463-471.

Lindeman, R.D., and Goldman, R. (1986). Anatomic and physiologic age changes in the kidney. Exp Gerontol 21, 379-406.

Lindeman, R.D., Tobin, J., and Shock, N.W. (1985). Longitudinal studies on the rate of decline in renal function with age. J Am Geriatr Soc 33, 278-285.

Lindeman, R.D., Tobin, J.D., and Shock, N.W. (1984). Association between blood pressure and the rate of decline in renal function with age. Kidney Int 26, 861-868.

Llano, E., Pendas, A.M., Knauper, V., Sorsa, T., Salo, T., Salido, E., Murphy, G., Simmer, J.P., Bartlett, J.D., and Lopez-Otin, C. (1997). Identification and structural and functional characterization of human enamelysin (MMP-20). Biochemistry 36, 15101-15108.

Lo, H.S., Wang, Z., Hu, Y., Yang, H.H., Gere, S., Buetow, K.H., and Lee, M.P. (2003). Allelic variation in gene expression is common in the human genome. Genome Res 13, 1855-1862.

Page 105: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

94

Lu, T., Pan, Y., Kao, S.Y., Li, C., Kohane, I., Chan, J., and Yankner, B.A. (2004). Gene regulation and DNA damage in the ageing human brain. Nature 429, 883-891.

Marcantoni, C., Ma, L.J., Federspiel, C., and Fogo, A.B. (2002). Hypertensive nephrosclerosis in African Americans versus Caucasians. Kidney Int 62, 172-180.

Mardis, E.R. (2008). The impact of next-generation sequencing technology on genetics. Trends Genet 24, 133-141.

McGue, M., Vaupel, J.W., Holm, N., and Harvald, B. (1993). Longevity is moderately heritable in a sample of Danish twins born 1870-1880. J Gerontol 48, B237-244.

McLachlan, M.S., Guthrie, J.C., Anderson, C.K., and Fulker, M.J. (1977). Vascular and glomerular changes in the ageing kidney. J Pathol 121, 65-78.

Metter, E.J., Talbot, L.A., Schrager, M., and Conwit, R.A. (2004). Arm-cranking muscle power and arm isometric muscle strength are independent predictors of all-cause mortality in men. J Appl Physiol 96, 814-821.

Milani, L., Lundmark, A., Nordlund, J., Kiialainen, A., Flaegstad, T., Jonmundsson, G., Kanerva, J., Schmiegelow, K., Gunderson, K.L., Lonnerholm, G., et al. (2009). Allele-specific gene expression patterns in primary leukemic cells reveal regulation of gene expression by CpG site methylation. Genome Res 19, 1-11.

Mitchell, B.D., Hsueh, W.C., King, T.M., Pollin, T.I., Sorkin, J., Agarwala, R., Schaffer, A.A., and Shuldiner, A.R. (2001). Heritability of life span in the Old Order Amish. Am J Med Genet 102, 346-352.

Monks, S.A., Leonardson, A., Zhu, H., Cundiff, P., Pietrusiak, P., Edwards, S., Phillips, J.W., Sachs, A., and Schadt, E.E. (2004). Genetic inheritance of gene expression in human cell lines. Am J Hum Genet 75, 1094-1105.

Morley, M., Molony, C.M., Weber, T.M., Devlin, J.L., Ewens, K.G., Spielman, R.S., and Cheung, V.G. (2004). Genetic analysis of genome-wide variation in human gene expression. Nature 430, 743-747.

Mudge, J., Miller, N.A., Khrebtukova, I., Lindquist, I.E., May, G.D., Huntley, J.J., Luo, S., Zhang, L., van Velkinburgh, J.C., Farmer, A.D., et al. (2008). Genomic

Page 106: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

95

convergence analysis of schizophrenia: mRNA sequencing reveals altered synaptic vesicular transport in post-mortem cerebellum. PLoS ONE 3, e3625.

Murphy, S.K., Wylie, A.A., and Jirtle, R.L. (2001). Imprinting of PEG3, the human homologue of a mouse gene involved in nurturing behavior. Genomics 71, 110-117.

Myers, A.J., Gibbs, J.R., Webster, J.A., Rohrer, K., Zhao, A., Marlowe, L., Kaleem, M., Leung, D., Bryden, L., Nath, P., et al. (2007). A survey of genetic human cortical gene expression. Nat Genet 39, 1494-1499.

Nebel, A., Croucher, P.J., Stiegeler, R., Nikolaus, S., Krawczak, M., and Schreiber, S. (2005). No association between microsomal triglyceride transfer protein (MTP) haplotype and longevity in humans. Proc Natl Acad Sci U S A 102, 7906-7909.

Neugarten, J., Kasiske, B., Silbiger, S.R., and Nyengaard, J.R. (2002). Effects of sex on renal structure. Nephron 90, 139-144.

Newbold, K.M., Sandison, A., and Howie, A.J. (1992). Comparison of size of juxtamedullary and outer cortical glomeruli in normal adult kidney. Virchows Archiv 420, 127-129.

Noureddine, M.A., Li, Y.J., van der Walt, J.M., Walters, R., Jewett, R.M., Xu, H., Wang, T., Walter, J.W., Scott, B.L., Hulette, C., et al. (2005). Genomic convergence to identify candidate genes for Parkinson disease: SAGE analysis of the substantia nigra. Mov Disord 20, 1299-1309.

Novelli, V., Viviani Anselmi, C., Roncarati, R., Guffanti, G., Malovini, A., Piluso, G., and Puca, A.A. (2008). Lack of replication of genetic associations with human longevity. Biogerontology 9, 85-92.

Oliveira, S.A., Li, Y.J., Noureddine, M.A., Zuchner, S., Qin, X., Pericak-Vance, M.A., and Vance, J.M. (2005). Identification of risk and age-at-onset genes on chromosome 1p in Parkinson disease. Am J Hum Genet 77, 252-264.

Pant, P.V., Tao, H., Beilharz, E.J., Ballinger, D.G., Cox, D.R., and Frazer, K.A. (2006). Analysis of allelic differential expression in human white blood cells. Genome Res 16, 331-339.

Page 107: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

96

Partridge, L. (2010). The new biology of ageing. Philos Trans R Soc Lond B Biol Sci 365, 147-154.

Partridge, L., and Gems, D. (2002). A lethal side-effect. Nature 418, 921.

Pastinen, T., Ge, B., Gurd, S., Gaudin, T., Dore, C., Lemire, M., Lepage, P., Harmsen, E., and Hudson, T.J. (2005). Mapping common regulatory variants to human haplotypes. Hum Mol Genet 14, 3963-3971.

Pastinen, T., Sladek, R., Gurd, S., Sammak, A., Ge, B., Lepage, P., Lavergne, K., Villeneuve, A., Gaudin, T., Brandstrom, H., et al. (2004). A survey of genetic and epigenetic variation affecting human gene expression. Physiol Genomics 16, 184-193.

Pawlikowska, L., Hu, D., Huntsman, S., Sung, A., Chu, C., Chen, J., Joyner, A., Schork, N.J., Hsueh, W.C., Reiner, A.P., et al. (2009). Association of common genetic variation in the insulin/IGF1 signaling pathway with human longevity. Aging Cell.

Perls, T.T., Wilmoth, J., Levenson, R., Drinkwater, M., Cohen, M., Bogan, H., Joyce, E., Brewster, S., Kunkel, L., and Puca, A. (2002). Life-long sustained mortality advantage of siblings of centenarians. Proc Natl Acad Sci U S A 99, 8442-8447.

Pritchard, J.K., Stephens, M., and Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics 155, 945-959.

Pritchard, J.K., Wen, X., and Falush, D. (2007). Documentation for structure software: Version 2.2.

Puca, A.A., Daly, M.J., Brewster, S.J., Matise, T.C., Barrett, J., Shea-Drinkwater, M., Kang, S., Joyce, E., Nicoli, J., Benson, E., et al. (2001). A genome-wide scan for linkage to human exceptional longevity identifies a locus on chromosome 4. Proc Natl Acad Sci U S A 98, 10505-10508.

Pushkarev, D., Neff, N.F., and Quake, S.R. (2009). Single-molecule sequencing of an individual human genome. Nat Biotechnol 27, 847-852.

Rodwell, G.E., Sonu, R., Zahn, J.M., Lund, J., Wilhelmy, J., Wang, L., Xiao, W., Mindrinos, M., Crane, E., Segal, E., et al. (2004). A transcriptional profile of aging in the human kidney. PLoS Biol 2, e427.

Page 108: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

97

Rogina, B., and Helfand, S.L. (2004). Sir2 mediates longevity in the fly through a pathway related to calorie restriction. Proc Natl Acad Sci U S A 101, 15998-16003.

Rosenberg, N.A. (2004). DISTRUCT: a program for the graphical display of population structure. Molecular Ecology Notes 4, 137-138.

Rowe, J.W., Andres, R., and Tobin, J.D. (1976a). Letter: Age-adjusted standards for creatinine clearance. Ann Intern Med 84, 567-569.

Rowe, J.W., Andres, R., Tobin, J.D., Norris, A.H., and Shock, N.W. (1976b). The effect of age on creatinine clearance in men: a cross-sectional and longitudinal study. J Gerontol 31, 155-163.

Schachter, F., Faure-Delanef, L., Guenot, F., Rouger, H., Froguel, P., Lesueur-Ginot, L., and Cohen, D. (1994). Genetic associations with human longevity at the APOE and ACE loci. Nat Genet 6, 29-32.

Schadt, E.E. (2009). Molecular networks as sensors and drivers of common human diseases. Nature 461, 218-223.

Schadt, E.E., Molony, C., Chudin, E., Hao, K., Yang, X., Lum, P.Y., Kasarskis, A., Zhang, B., Wang, S., Suver, C., et al. (2008). Mapping the genetic architecture of gene expression in human liver. PLoS Biol 6, e107.

Schmidt, R.J., Beierwaltes, W.H., and Baylis, C. (2001). Effects of aging and alterations in dietary sodium intake on total nitric oxide production. Am J Kidney Dis 37, 900-908.

Serre, D., Gurd, S., Ge, B., Sladek, R., Sinnett, D., Harmsen, E., Bibikova, M., Chudin, E., Barker, D.L., Dickinson, T., et al. (2008). Differential allelic expression in the human genome: a robust approach to identify genetic and epigenetic cis-acting mechanisms regulating gene expression. PLoS Genet 4, e1000006.

Silva, F.G. (2005a). The aging kidney: a review -- part I. Int Urol Nephrol 37, 185-205.

Silva, F.G. (2005b). The aging kidney: a review--part II. Int Urol Nephrol 37, 419-432.

Page 109: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

98

Southworth, L.K., Owen, A.B., and Kim, S.K. (2009). Aging mice show a decreasing correlation of gene expression within genetic modules. PLoS Genet 5, e1000776.

Spielman, R.S., Bastone, L.A., Burdick, J.T., Morley, M., Ewens, W.J., and Cheung, V.G. (2007). Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet 39, 226-231.

Stranger, B.E., Forrest, M.S., Clark, A.G., Minichiello, M.J., Deutsch, S., Lyle, R., Hunt, S., Kahl, B., Antonarakis, S.E., Tavare, S., et al. (2005). Genome-Wide Associations of Gene Expression Variation in Humans. PLoS Genet 1, e78.

Stranger, B.E., Nica, A.C., Forrest, M.S., Dimas, A., Bird, C.P., Beazley, C., Ingle, C.E., Dunning, M., Flicek, P., Koller, D., et al. (2007). Population genomics of human gene expression. Nat Genet 39, 1217-1224.

Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-15550.

Suh, Y., Atzmon, G., Cho, M.O., Hwang, D., Liu, B., Leahy, D.J., Barzilai, N., and Cohen, P. (2008). Functionally significant insulin-like growth factor I receptor mutations in centenarians. Proc Natl Acad Sci U S A 105, 3438-3442.

Takeuchi, F., McGinnis, R., Bourgeois, S., Barnes, C., Eriksson, N., Soranzo, N., Whittaker, P., Ranganath, V., Kumanduri, V., McLaren, W., et al. (2009). A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose. PLoS Genet 5, e1000433.

Tang, H., Quertermous, T., Rodriguez, B., Kardia, S.L., Zhu, X., Brown, A., Pankow, J.S., Province, M.A., Hunt, S.C., Boerwinkle, E., et al. (2005). Genetic structure, self-identified race/ethnicity, and confounding in case-control association studies. Am J Hum Genet 76, 268-275.

Tatar, M., Kopelman, A., Epstein, D., Tu, M.P., Yin, C.M., and Garofalo, R.S. (2001). A mutant Drosophila insulin receptor homolog that extends life-span and impairs neuroendocrine function. Science 292, 107-110.

Page 110: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

99

Tissenbaum, H.A., and Guarente, L. (2001). Increased dosage of a sir-2 gene extends lifespan in Caenorhabditis elegans. Nature 410, 227-230.

Van den Veyver, I.B., Norman, B., Tran, C.Q., Bourjac, J., and Slim, R. (2001). The human homologue (PEG3) of the mouse paternally expressed gene 3 (Peg3) is maternally imprinted but not mutated in women with familial recurrent hydatidiform molar pregnancies. J Soc Gynecol Investig 8, 305-313.

Veyrieras, J.B., Kudaravalli, S., Kim, S.Y., Dermitzakis, E.T., Gilad, Y., Stephens, M., and Pritchard, J.K. (2008). High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet 4, e1000214.

Wang, J., Wang, W., Li, R., Li, Y., Tian, G., Goodman, L., Fan, W., Zhang, J., Li, J., Guo, Y., et al. (2008). The diploid genome sequence of an Asian individual. Nature 456, 60-65.

Wang, Z., Gerstein, M., and Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10, 57-63.

Webster, J.A., Gibbs, J.R., Clarke, J., Ray, M., Zhang, W., Holmans, P., Rohrer, K., Zhao, A., Marlowe, L., Kaleem, M., et al. (2009). Genetic control of human brain transcript expression in Alzheimer disease. Am J Hum Genet 84, 445-458.

Weedon, M.N., Lango, H., Lindgren, C.M., Wallace, C., Evans, D.M., Mangino, M., Freathy, R.M., Perry, J.R., Stevens, S., Hall, A.S., et al. (2008). Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet 40, 575-583.

Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A., He, W., Chen, Y.J., Makhijani, V., Roth, G.T., et al. (2008). The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872-876.

Wheeler, H.E., Metter, E.J., Tanaka, T., Absher, D., Higgins, J., Zahn, J.M., Wilhelmy, J., Davis, R.W., Singleton, A., Myers, R.M., et al. (2009). Sequential use of transcriptional profiling, expression quantitative trait mapping, and gene association implicates MMP20 in human kidney aging. PLoS Genet 5, e1000685.

Page 111: GENOMIC CONVERGENCE ASSOCIATION STUDIES OF …ss495bw5478/...Whether discussing my project, the latest discovery in the aging field or whether the Twins and/or Red Sox ... Stuart Kim

100

Willcox, B.J., Donlon, T.A., He, Q., Chen, R., Grove, J.S., Yano, K., Masaki, K.H., Willcox, D.C., Rodriguez, B., and Curb, J.D. (2008). FOXO3A genotype is strongly associated with human longevity. Proc Natl Acad Sci U S A 105, 13987-13992.

Williams, G.C. (1957). Pleiotropy, Natural-Selection, and the Evolution of Senescence. Evolution 11, 398-411.

Woessner, J.F., Jr. (1991). Matrix metalloproteinases and their inhibitors in connective tissue remodeling. FASEB J 5, 2145-2154.

WTCCC (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661-678.

Yan, H., Yuan, W., Velculescu, V.E., Vogelstein, B., and Kinzler, K.W. (2002). Allelic variation in human gene expression. Science 297, 1143.

Zahn, J.M., Sonu, R., Vogel, H., Crane, E., Mazan-Mamczarz, K., Rabkin, R., Davis, R.W., Becker, K.G., Owen, A.B., and Kim, S.K. (2006). Transcriptional profiling of aging in human muscle reveals a common aging signature. PLoS Genet 2, e115.

Zhong, S., Li, C., and Wong, W.H. (2003). ChipInfo: Software for extracting gene annotation and gene ontology information for microarray analysis. Nucleic Acids Res 31, 3483-3486.

Zhu, X., Feng, T., Li, Y., Lu, Q., and Elston, R.C. (2009). Detecting rare variants for complex traits using family and unrelated data. Genet Epidemiol.