Www.worldosteoporosisday.org WORLD OSTEOPOROSIS DAY 2013 International Osteoporosis Foundation.
Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases...
Transcript of Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases...
![Page 1: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/1.jpg)
Molecular Evolution of Type I Collagen (COL1a1) and
Its Relationship to Human Skeletal Diseases
by
Daryn Amanda Stover
A Dissertation Presented in Partial Fulfillment of the Requirements for the Degree
Doctor of Philosophy
Approved November 2010 by the Graduate Supervisory Committee:
Brian C. Verrelli, Chair
Thomas E. Dowling Michael S. Rosenberg
Anne C. Stone Gary T. Schwartz
ARIZONA STATE UNIVERSITY
December 2010
![Page 2: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/2.jpg)
i
ABSTRACT
Skeletal diseases related to reduced bone strength, like osteoporosis, vary in
frequency and severity among human populations due in part to underlying
genetic differentiation. With >600 disease-associated mutations (DAMs),
COL1a1, which encodes the primary subunit of type I collagen, the main
structural protein in bone, is most commonly associated with this phenotypic
variation. Although numerous studies have explored genotype-phenotype
relationships with COL1a1, surprisingly, no study has undertaken an evolutionary
approach to determine how changes in constraint over time can be modeled to
help predict bone-related disease factors.
Here, molecular population and comparative species genetic analyses were
conducted to characterize the evolutionary history of COL1a1. First, nucleotide
and protein sequences of COL1a1 in 14 taxa representing ~450 million years of
vertebrate evolution were used to investigate constraint across gene regions.
Protein residues of historically high conservation are significantly correlated with
disease severity today, providing a highly accurate model for disease prediction,
yet interestingly, intron composition also exhibits high conservation suggesting
strong historical purifying selection. Second, a human population genetic analysis
of 192 COL1a1 nucleotide sequences representing 10 ethnically and
geographically diverse samples was conducted. This random sample of the
population shows surprisingly high numbers of amino acid polymorphisms (albeit
rare in frequency), suggesting that not all protein variants today are highly
![Page 3: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/3.jpg)
ii
deleterious. Further, an unusual haplotype structure was identified across
populations, but which is only associated with noncoding variation in the 5’
region of COL1a1 where gene expression alteration is most likely. Finally, a
population genetic analysis of 40 chimpanzee COL1a1 sequences shows no amino
acid polymorphism, yet does reveal an unusual haplotype structure with
significantly extended linkage disequilibrium >30 kilobases away, as well as a
surprisingly common exon duplication that is generally highly deleterious in
humans. Altogether, these analyses indicate a history of temporally and spatially
varying purifying selection on not only coding, but noncoding COL1a1 regions
that is also reflected in population differentiation. In contrast to clinical studies,
this approach reveals potentially functional variation, which in future analyses
could explain the observed bone strength variation not only seen within humans,
but other closely related primates.
![Page 4: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/4.jpg)
iii
To my family, with love
![Page 5: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/5.jpg)
iv
ACKNOWLEDGMENTS
I would like to thank the members of my committee: Brian Verrelli, Anne Stone,
Thomas Dowling, Michael Rosenberg, and Gary Schwartz, for their advice and
support, both with my dissertation research and with my graduate education and
professional development. I would especially like to thank my chair, Brian
Verrelli, for dedicating a considerable amount of time and energy to helping me
reach my academic and professional goals. Regardless of the topic, be it statistical
analyses, writing grant proposals and manuscripts, or how to deal with a difficult
student, he was always willing to provide advice and guidance to help get me
through. Without his efforts, I would not be here today. A special thanks also to
Anne Stone for providing the primate DNA samples used in Chapter 4 and to
Michael Rosenberg for saving me a considerable amount of time by creating the
PhaseSeqs script to help transfer data among analysis programs.
I would also like to thank my family for their love, support, and
encouragement throughout my life as well as for their dedication to my education
and to fostering my personal and professional interests. Thank you to my friends
also for their love and support, and especially for the much-needed distractions
from graduate school. I also greatly appreciate the helpful discussion of my
research with past and present members of the Verrelli Lab and colleagues at
Arizona State University (ASU). Finally, I owe a specially thank you to Michael
Hammer, Elizabeth Wood, and Matthew Kaplan for originally sparking my
interest in human evolutionary genetics as an undergraduate and for providing a
strong foundation in molecular laboratory techniques.
![Page 6: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/6.jpg)
v
Funding for my dissertation research was provided by the National
Science Foundation via a Doctoral Dissertation Improvement Grant (DEB-
0909637), the Graduate and Professional Students Association at ASU, and
through the generous use of Verrelli Lab start-up funds. I would also like to thank
the School of Life Sciences and the Graduate College for funding my graduate
education as a teaching and research associate.
![Page 7: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/7.jpg)
vi
TABLE OF CONTENTS
Page
LIST OF TABLES…………………………………………………………….....ix
LIST OF FIGURES……………………………………………………………....x
CHAPTER
1 INTRODUCTION……………………………………………………….1
Type I Collagen and the COL1a1 Subunit……………………….2
Potential Importance of Noncoding COL1a1 Polymorphism…….4
Bone Phenotypic Variation among Primates……………………..6
Research Questions……………………………………………….8
2 COMPARATIVE VERTEBRATE EVOLUTIONARY ANALYSES OF
COL1a1………………………………………………………………….11
Abstract………………………………………………………….11
Introduction……………………………………………………..12
Materials and Methods………………………………………….16
Results…………………………………………………………..24
Discussion……………………………………………………….28
Conclusion………………………………………………………34
Acknowledgements……………………………………………..35
![Page 8: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/8.jpg)
vii
CHAPTER Page
3 HAPLOTYPE STRUCTURE AND AMINO ACID POLYMORPHISM
AT HUMAN COL1a1…………………………………………………..41
Abstract………………………………………………………….41
Introduction……………………………………………………..42
Materials and Methods………………………………………….45
Results…………………………………………………………..51
Discussion……………………………………………………….56
Conclusion………………………………………………………61
Acknowledgements……………………………………………..62
4 COMPARATIVE HUMAN AND CHIMPANZEE ANALYSES OF
COL1a1…………………………………………………………………66
Abstract…………………………………………………………66
Introduction……………………………………………………..67
Materials and Methods………………………………………….71
Results…………………………………………………………..75
Discussion……………………………………………………….83
5 CONCLUSION…………………………………………………………96
LITERATURE CITED………………………………………………………...101
![Page 9: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/9.jpg)
viii
APPENDIX Page
A SUPPLEMENTARY MATERIAL: CHAPTER 2…………………….121
B SUPPLEMENTARY MATERIAL: CHAPTER 3…………………….156
C SUPPLEMENTARY MATERIAL: CHAPTER 4…………………….168
![Page 10: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/10.jpg)
ix
LIST OF TABLES
Table Page
1. Human Clade A Collagen Gene Exon and Intron Characteristics…………..36
2. Clade A Collagen Gene Human-Chimpanzee Divergence Estimates………37
3. COL1a1 Human Population Diversity Estimates…………………………...63
4. Intraspecific and Interspecific Tests of Neutrality…………………………..64
5. Chimpanzee COL1a1 Diversity Estimates by Gene Region………………..90
6. Chimpanzee COL1a1 Haplogroup-Specific Diversity Estimates…………...91
![Page 11: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/11.jpg)
x
LIST OF FIGURES
Figure Page
1. COL1a1 Gene Locus Diagram………………………………………………38
2. COL1a1 Disease-Associated Mutations and Amino Acid Evolutionary
Rates………………………………………………………………………....39
3. Intron Length Frequency Distributions for Clade A Collagen Genes……….40
4. Human COL1a1 Linkage Disequilibrium Patterns………………………….65
5. Chimpanzee COL1a1 Exon Duplication Diagram…………………………..92
6. Chimpanzee Chromosome 17 Haplotypes…………………………………..93
7. Chromosome 17 Gene and PCR Fragment Diagram………………………..95
![Page 12: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/12.jpg)
1
CHAPTER 1: INTRODUCTION
The incidence and severity of numerous complex human diseases vary greatly
among geographic populations (e.g., Abate and Chandalia 2003; Hajjar et al.
2006; Lau et al. 2006). Because a majority of these diseases have a genetic
component in addition to an environmental one, it is likely that variation within
the human genome is a significant source of observed phenotypic variation among
populations. Included among these diseases are skeletal disorders related to
variation in bone strength (measured as bone mineral density, BMD) like
osteoporosis, which has been shown to vary significantly among populations (e.g.,
Lauderdale et al. 1997; Looker et al. 1997; Melton 1997; Barrett-Connor et al.
2005), attributed in part to genetic differentiation (e.g., Dvornyk et al. 2003; Gong
and Haynatzki 2003; Gong et al. 2006; Koller et al. 2010). As we move into an
era of personalized, genome-based healthcare (Ng et al. 2008), characterization of
this genetic variation will enable medical practitioners to target preventative
measures to individuals with specific genotypes linked to increased risk of
developing skeletal disorders. In order to facilitate the design of novel treatments,
however, we must go a step further than simply identifying disease-associated
mutations (DAMs). Instead, it is crucial that we understand the evolutionary
context, both among populations and species, of potentially functional mutations.
Taking into account the historic effects of evolutionary pressures like natural
selection on the accumulation of this genetic variation will allow us to better
understand the origin and evolution of skeletal disease phenotypes. Specifically,
![Page 13: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/13.jpg)
2
we can determine not only how and when such genetic variation is deleterious,
but, potentially, when it is beneficial, which can guide the design of innovative
treatments that could prevent the onset of disease symptoms entirely.
To date, >30 candidate genes have been associated with variation in BMD,
osteoporosis, or osteoporotic fracture (Ho et al. 2000; Shen et al. 2003; Liu et al.
2006; Gong et al. 2006; Rivadeneira et al. 2009; Ralston 2010). Of these, collagen
type I alpha 1 (COL1a1), which encodes part of the bone structural protein type I
collagen, consistently displays some of the strongest evidence of association with
disease across populations (e.g., Garcia-Giralt et al. 2002; Stewart et al. 2006;
Ioannidis et al. 2007; Jiang et al. 2007; Kaufman et al. 2008). Here, we examine
the recent and ancient evolutionary history of this >17-kb chromosome 17q21.33
locus using molecular evolutionary and population genetic approaches to better
understand the nature of human DAMs. Our results offer new insight into the
origin of skeletal disease in humans and what genetic variation may be
contributing to phenotypic differences in bone strength among populations.
Type I collagen and the COL1a1 subunit
Type I collagen, which is encoded by two subunits of COL1a1 and one subunit of
COL1a2 wound together to form a triple-helix, is the most abundant protein in
vertebrates and is the main structural protein of bone, teeth, and tendon (Viguet-
Carrin et al. 2006). As such, mutations in these genes have been associated with
several skeletal and connective-tissue disorders (Dalgleish 1997; Marini et al.
2007). Within COL1a1 alone, >600 DAMs have been identified (Dalgleish 1997;
![Page 14: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/14.jpg)
3
Marini et al. 2007), the majority of which are linked to osteoporosis, osteogenesis
imperfecta types I-IV, and Ehlers-Danlos Syndrome types I and VIIA, which
afflict ~200 million, ~500,000, and 200,000 individuals worldwide, respectively
(e.g., Stoll et al. 1989; Burrows 1999; Reginster and Burlet 2006). These DAMs
primarily affect protein coding regions, particularly within the triple-helix
domain, which is composed of a repeating amino acid sequence with glycine, the
smallest of the amino acids, in every third position, the repetition of which is
crucial to enabling the domain to wind into its compact structure in type I
collagen (Yamada et al. 1980; Bernard et al. 1983; Exposito et al. 2002; Boot-
Handford and Tuckwell 2003; Aouacheria et al. 2004; Wada et al. 2006). Thus,
substitutions of these glycines are often the most phenotypically severe DAMs
(e.g., resulting in lethal OI type II; Kuivaniemi et al. 1997; Dalgleish 1997;
Marini et al. 2007; Rauch et al. 2010).
The phenotypic severity associated with COL1a1 mutations, however, can
be quite variable depending on the position of the affected amino acid and how
the mutation alters the thermostability of type I collagen, both of which have been
used previously to predict the phenotypic outcome of novel mutations that affect
glycine residues (e.g., Persikov et al. 2005; Marini et al. 2007; Bodian et al. 2008,
2009). These previous methods, however, do not incorporate an evolutionary
approach. For example, genome-wide studies (e.g., Miller and Kumar 2001;
Subramanian and Kumar 2006) using evolutionary site models have shown that
DAMs in general are found more often at amino acid positions that are highly
![Page 15: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/15.jpg)
4
conserved across species. Applying similar means to examine the long-term
evolutionary history of COL1a1 in vertebrates may allow for increased accuracy
in predicting the phenotypic severity of novel mutations, particularly those that
affect non-glycine positions, which have been largely ignored by previous
prediction models.
Potential importance of noncoding COL1a1 polymorphism
Within the natural population (i.e., individuals who lack clinical symptoms of
skeletal disease), COL1a1 amino acid variation is rare based on a study of 48
individuals (96 chromosomes) from each of four populations in the United States
(European-, African-, Mexican, and Chinese-Americans) in which only 3 amino
acid mutations were identified, each of which at an allele frequency <2% in the
total sample (Chan et al. 2008). As such, COL1a1 protein variation is unlikely to
explain the association of this gene with bone phenotypic variation among
populations. Rather, genetic variation in noncoding regions that affects the
expression of COL1a1 may contribute significantly to these population
differences in bone-related phenotypes, as has been hypothesized for phenotypic
variation among populations in general (e.g., Ge et al. 2009; Kasowski et al.
2010).
A mutation in an Sp1 transcription factor binding site in the first intron of
COL1a1, for example, has been shown to increase gene expression and likely
accounts for the associated change in the ratio of COL1a1 and COL1a2 subunits
that reduces the structural integrity of type I collagen (Grant et al. 1996; Mann et
![Page 16: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/16.jpg)
5
al. 2001; Jin, van’t Hof et al. 2009). This mutation reaches frequencies >20%
among populations of western European ancestry (e.g., Grant et al. 1996, Ralston
et al. 2006; Jiang et al. 2007), but is only rarely found among Africans (e.g.,
Beavan et al. 1998) and is absent among Asians (e.g., Han et al. 1999; Nakajima
et al. 1999; Lau et al. 2004). Thus, this single noncoding mutation has been found
to contribute significantly to population variation in bone strength and fracture
risk (e.g., Beavan et al. 1998; Bandres et al. 2005; Ralston et al. 2006; Jiang et al.
2007). Further, this mutation is found in significant linkage disequilibrium with
two other mutations in the promoter that are also associated with reduced BMD
(Garcia-Giralt et al. 2002, 2005; Stewart et al. 2006; Jiang et al. 2007; Jin, Stewart
et al. 2009). Recently, these three polymorphisms have been shown to have
haplotype-specific affects on COL1a1 expression resulting in not only low BMD,
but reduced overall bone quality as well (Jin, Stewart et al. 2009; Jin, van’t Hof et
al. 2009). Aside from these polymorphisms, however, little is known about
noncoding variation at COL1a1, particularly in the natural population (e.g., Chan
et al. 2008), as these regions have been largely ignored in previous studies in
favor of screening individuals for amino acid mutations. As such, it would be
interesting to investigate the extent of potentially functional genetic
differentiation in noncoding regions of COL1a1 in ethnically and geographically
diverse populations.
Noncoding variation that may have functional implications for COL1a1
gene expression need not be restricted to transcription factor binding sites,
![Page 17: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/17.jpg)
6
however, but can rather include intron compositional properties that impact gene
structure. For example, genome-wide studies have shown that highly-expressed
genes have higher GC-content and shorter introns, which are related to increased
transcriptional efficiency (Hurst et al. 1996; Castillo-Davis et al. 2002; Urrutia
and Hurst 2001, 2003; Comeron 2004; Kudla et al. 2006). COL1a1 is likely
highly expressed given its abundance in connective tissue and its importance
during fetal development and wound repair (e.g., Gelse et al. 2003; Hildebrand et
al. 2005; Cohen 2006), yet because studies are often constrained to analyses of
soft-tissues (e.g., Su et al. 2004; Blekhman, Oshlack et al. 2008), little is known
about COL1a1 expression. However, as with coding regions, comparisons of the
long-term evolutionary history of COL1a1 introns among vertebrates could also
reveal historical selective pressures related to functional constraint, thereby
providing novel targets in the search for polymorphisms associated with bone-
related phenotypic differences among populations.
Bone phenotypic variation among primates
While examining COL1a1 variation in vertebrates in general will shed light on the
ancient evolutionary history of this gene, to better understand the origin of
skeletal disease in humans we must also determine how evolutionary pressures
have changed more recently within our lineage. Specifically, although bone
phenotypic differences exist among human populations, this does not mean that
such variation is unique to our species. Rather, evolutionary processes that led to
the prevalence of skeletal disease in humans could be shared with other closely-
![Page 18: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/18.jpg)
7
related primate species. Although bone phenotypic data for non-human primates
is limited, two general trends have already emerged. First, BMD does vary within
other species (e.g., Sumner et al. 1989; Cerroni et al. 2000; Black et al. 2001;
Gunji et al. 2003; Havill et al. 2003), which may be due in part to underlying
genetic variation (e.g., Lipkin et al. 2001; Havill et al. 2005). Second, slight
differences in bone morphology have been documented among species, even
including differences between humans and our closest-living relative, the
chimpanzee, in osteoporotic-like symptoms, such as patterns of bone loss and the
accumulation of microfractures with age (e.g., Sumner et al. 1989; Wang et al.
1998; Gunji et al. 2003; Kikuchi et al. 2003; Mulhern and Ubelaker 2003, 2009;
Matsumura et al. 2010).
These data suggest that bone phenotypic variation common among
humans may not be isolated to our lineage. However, with the limited availability
of phenotypic data it is difficult to assess the extent of this variation within
species. Instead, population genetic comparisons of candidate genes can allow us
to make inferences about potential functional differences that may exist within
and between species, as has been done for other phenotypes like color vision and
resistance to viral infection (e.g., Wooding et al. 2005; Verrelli et al. 2008). Given
the link between COL1a1 and bone phenotypic variation in humans, this locus is a
perfect candidate for such an approach. For example, comparing patterns of
genetic variation at COL1a1 within chimpanzees to those observed in humans
would reveal if the evolutionary constraints at this locus shifted recently in our
![Page 19: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/19.jpg)
8
lineage since its divergence from the last common ancestor of humans and
chimpanzees ~4-6 million years (My) ago.
Research Questions
1) Is there regional variation in historic selective pressures at COL1a1 and how
does this relate to the location and severity of DAMs?
Although COL1a1 has been extensively studied due to its direct link with
skeletal phenotype, previous studies have not only focused on coding regions, but
have largely ignored an evolutionary approach. As reported in Chapter 2, a
comparative species approach is used to examine variation in functional constraint
in both coding and noncoding regions of COL1a1 (Stover and Verrelli 2010).
Specifically, we examine evolutionary change at each amino acid site over the
past 450 My of vertebrate history to identify specific positions and overall protein
domains that are evolutionarily conserved, which are inferred to be sites or
regions of high functional constraint. Given this high constraint, these regions are
expected to result in more severe phenotypes if mutated, which we test using the
location and associated phenotypic severity of known DAMs, thereby generating
a model that can predict the severity of novel mutations.
2) Is the recent evolutionary history of COL1a1 in humans consistent with historic
selective pressures in vertebrates?
Population differences in BMD, osteoporosis, and osteoporotic fracture
are well established in humans as is the significant contribution of genetic
variation to these differences; however, the underlying cause of this genetic
![Page 20: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/20.jpg)
9
diversity among populations is unknown. While it is possible that these patterns
of variation may be expected given historical selective pressures, they may,
alternatively, be an outcome of recent shifts in functional properties of bone in the
human lineage. For example, Wu and Zhang (2010) invoked variation in positive
selection as the general driving force behind genetic differentiation in skeletal
genes among populations. However, a recent weakening of purifying selection
among populations can also result in similar patterns of variation, such as an
excess of rare amino acid polymorphism (e.g., Bustamante et al. 2005; Boyko et
al. 2008; Lohmueller et al. 2008). Alternatively, genetic differentiation among
populations that results in bone phenotypic variation may simply be due to neutral
processes like genetic drift and differing demographic histories.
In Chapter 3, we use a population genetic approach to test these
possibilities for the recent evolution of COL1a1 in humans as compared to our
findings in Chapter 2 for the historic evolution of this gene. Specifically, we
collected nucleotide sequence data for the COL1a1 locus from a total of 96
individuals (192 chromosomes) representing 10 globally-distributed populations
and compare patterns of coding and noncoding diversity and haplotype structure
observed at COL1a1 with expectations under neutrality and different models of
selection. As this is the first comparative study of noncoding variation at COL1a1
among ethnically diverse, natural populations, we also discuss the potential
functional impact of genetic differentiation in introns.
![Page 21: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/21.jpg)
10
3) Is the evolutionary history of COL1a1 in humans consistent with that in other
primates?
Phenotypic data suggests that skeletal variation exists both within and
between other primate species, which could imply that evolutionary processes that
have led to the prevalence of skeletal disease in humans may not be unique to our
lineage. We address this possibility in Chapter 4 using a population genetic
approach to determine if selective constraints acting on COL1a1 in humans (as
addressed in Chapter 3) are similar to those affecting the evolution of this locus in
chimpanzees. Specifically, we collected nucleotide sequence data for the COL1a1
locus from a total of 20 individuals (40 chromosomes) from the western Africa
Pan troglodytes verus subspecies. As with our human dataset, we compare
patterns of coding and noncoding diversity and haplotype structure with
expectations under neutrality and models of selection. By using our closest-living
relative for this comparative population approach, we can shed light on the origin
of skeletal disease in humans within the past 4-6 My as it relates to COL1a1.
![Page 22: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/22.jpg)
11
CHAPTER 2: COMPARATIVE VERTEBRATE EVOLUTIONARY
ANALYSES OF COL1a1a
Abstract
Collagen type I alpha 1 (COL1a1), which encodes the primary subunit of type I
collagen, the main structural and most abundant protein in vertebrates, harbors
hundreds of mutations linked to human diseases like osteoporosis and
osteogenesis imperfecta. Previous studies have attempted to predict the
phenotypic severity associated with type I collagen mutations, yet an evolutionary
analysis that compares historical and recent selective pressures, including across
non-coding regions, has never been conducted. Here, we use a comparative
genomic and species evolutionary analysis representing ~450 My of vertebrate
history to investigate functional constraints associated with both exons and introns
of the >17-kb COL1a1 gene. We find that although the COL1a1 amino acid
sequence is highly conserved, there are both spatial and temporal signatures of
varying selective constraint across protein domains. Further, sites of high
evolutionary constraint significantly correlate with the location of disease-
associated mutations, the latter of which also cluster with respect to specific
severity classes typically categorized in clinical studies. Finally, we find that
COL1a1 introns are significantly short in length with high GC-content, patterns
that are shared across highly-diverged vertebrates, and which may be a signature
a Published as: Stover DA, Verrelli BC. 2010. Comparative vertebrate evolutionary analyses of type I collagen: potential of COL1a1 gene structure and intron variation for common bone-related diseases. Mol. Biol. Evol., doi: 10.1093/molbev/msq221.
![Page 23: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/23.jpg)
12
of strong stabilizing selection for high COL1a1 gene expression. In conclusion,
although previous studies focused on COL1a1 coding regions, the current results
implicate introns as areas of high selective constraint and targets of bone-related
phenotypic variation. From a broader perspective, our comparative evolutionary
approach provides further resolution to models predicting mutations associated
with bone-related function and disease severity.
Introduction
Fibrillar collagens are the main structural proteins in vertebrates, abundant in
connective tissue, cartilage, bone, and tendon (e.g., Gelse et al. 2003). In humans,
fibrillar collagen proteins are linked to numerous skeletal diseases, such as
osteoporosis, osteoarthritis, osteogenesis imperfecta (OI) and chondrodysplasia
(e.g., Dalgleish 1997; Cohen 2006). The fibrillar collagen gene most commonly
associated with disease is COL1a1, which encodes two of the three subunits of
type I collagen (the third being encoded by COL1a2), is the most abundant
protein in mammals, and is the main structural protein of bone, teeth, and tendon
(Viguet-Carrin et al. 2006). Within this gene, >600 human disease-associated
mutations (DAMs) have been identified, the majority of which are linked to
osteoporosis afflicting ~200 million individuals globally, as well as OI (or
“brittle-bone disease”) types I-IV and Ehlers-Danlos Syndrome types I and VIIA
(Dalgleish 1997; Marini et al. 2007) afflicting ~500,000, and 200,000 individuals,
respectively (e.g., Stoll et al. 1989; Burrows 1999; Reginster and Burlet 2006).
Interestingly, COL1a1 is also associated with population variation in bone
![Page 24: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/24.jpg)
13
strength, measured as bone mineral density (Garcia-Giralt et al. 2002; Stewart et
al. 2006; Jiang et al. 2007). Thus, as frequency differences among populations in
osteoporosis are well-documented (e.g., Lauderdale et al. 1997; Looker et al.
1997; Melton 1997; Barrett-Connor et al. 2005), COL1a1 is a leading candidate
gene in predicting not only severe bone-related diseases, but also natural bone
variation among populations in general.
The majority of known COL1a1 DAMs affect protein coding regions,
typically within the triple-helix domain that is composed of a repeating amino
acid sequence with glycine, the smallest of the amino acids, in every third
position and primarily separated by proline residues (Yamada et al. 1980; Bernard
et al. 1983; Exposito et al. 2002; Boot-Handford and Tuckwell 2003; Aouacheria
et al. 2004; Wada et al. 2006). This repetition is crucial, as it enables the triple-
helix domain to wind into its compact structure in type I collagen. As such, the
most phenotypically severe (e.g., OI type II) DAMs often result from substitutions
of these glycines, while less severe phenotypes (e.g., OI type I) often result from
alterations of the length of the triple-helix domain, which is normally encoded by
43 of 51 exons (fig. 1) within the >17-kb COL1a1 locus (Kuivaniemi et al. 1997;
Dalgleish 1997; Marini et al. 2007; Rauch et al. 2010). Studies have attempted to
predict the phenotypic severity associated with COL1a1 mutations based on their
amino acid position and affect on the thermostability of type I collagen (e.g.,
Persikov et al. 2005; Marini et al. 2007; Bodian et al. 2008, 2009); however,
surprisingly, no model has incorporated an evolutionary approach. For example,
![Page 25: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/25.jpg)
14
genome-wide studies (e.g., Miller and Kumar 2001; Subramanian and Kumar
2006) using evolutionary site models have shown that DAMs in general are found
more often at amino acid positions that are highly conserved across species.
Because previous efforts to identify DAMs at COL1a1 have focused
primarily on coding regions, it is unclear whether variation in other gene regions
is functionally relevant. In fact, given the abundance of rare disease-associated
amino acid variants (Dalgleish 1997; Marini et al. 2007) and the rarity of COL1a1
protein variation in general in the natural population (Chan et al. 2008), the
observed variation in bone strength among human populations implicates
COL1a1 non-coding regions as functionally relevant as well. For example, an Sp1
transcription factor binding site mutation in the first intron of COL1a1 increases
gene expression, and likely accounts for the associated change in the ratio of
COL1a1 and COL1a2 subunits that reduces the structural integrity of type I
collagen (Grant et al. 1996; Mann et al. 2001; Jin, van’t Hof et al. 2009).
Interestingly, this mutation reaches frequencies >20% in certain populations, but
is absent in others, thus contributing to ethnic and geographic variation in bone
strength and fracture risk (Bandres et al. 2005; Ralston et al. 2006; Jiang et al.
2007).
Potentially functional non-coding variation associated with gene
expression may also include intron compositional properties that impact gene
structure. For example, genome-wide studies have shown that highly-expressed
genes have higher GC-content (Urrutia and Hurst 2001; Comeron 2004; Kudla et
![Page 26: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/26.jpg)
15
al. 2006), as well as shorter introns, which is related to transcriptional efficiency
(Hurst et al. 1996; Castillo-Davis et al. 2002; Urrutia and Hurst 2003; Comeron
2004). COL1a1 is likely highly expressed given its abundance in connective
tissue and its importance during fetal development and wound repair (e.g., Gelse
et al. 2003; Hildebrand et al. 2005; Cohen 2006), yet because studies are often
constrained to analyses of soft-tissues (e.g., Su et al. 2004; Blekhman, Oshlack et
al. 2008), little is known about COL1a1 gene expression. In this respect, in
addition to evolutionary analyses of COL1a1 coding regions, similar analyses of
introns (e.g., length and GC-content) could also reveal historical selective
pressures related to functional constraint.
Attempts have been made to predict mutations associated with bone-
related disease, the majority of which include COL1a1 because of its direct link to
skeletal phenotypes. However, these studies have focused on protein sequences,
and more specifically, have often ignored an evolutionary approach. Here, we
present the first molecular evolutionary and statistical analysis of both COL1a1
coding and non-coding sites to ask several questions: (1) Is there evidence of
differential selective pressure across protein domains, and how does this vary
across recent/historical periods of time? (2) Do amino acid sites of high and low
functional constraint over evolutionary time predict the gene locations and
severity of human DAMs? (3) Finally, are there particular aspects of the gene
structure that are evolutionarily unique, compared to even other fibrillar
collagens?
![Page 27: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/27.jpg)
16
Materials and Methods
DNA Amplification and Sequencing
Our DNA sequences were collected either from public genome databases
(National Center for Biotechnology Information (NCBI) and Ensembl), or were
generated directly when necessary. For example, with an estimated molecular
divergence time from our common ancestor at ~4-6 My (e.g., Kumar et. al, 2005),
the chimpanzee is our closest living primate relative and is necessary for estimates
of COL1a1 nucleotide site divergence in the human lineage. However, regions of
the chimpanzee (and to some extent the human) COL1a1 gene sequences have
assembly errors in their genome databases (which is likely a result of the
repetitive nature of the sequence). Thus, DNA sequences for the >17-kb
chromosome 17q21.33 locus (fig. 1) were generated from a human and a western
Africa Pan troglodytes verus chimpanzee, both sampled from the Coriell Institute
for Medical Research (Camden, NJ). Although there are several recognized
chimpanzee subspecies, P. t. verus is the most appropriate contrast with humans
because it has similar levels of nuclear diversity (Stone et al. 2002; Gilad et al.
2003; Fischer et al. 2004; Wooding et al. 2005; Verrelli et al. 2006, 2008; Claw et
al. 2010).
The overall high GC-content at COL1a1 (discussed below) often required
that our polymerase chain reaction (PCR) fragments are generated in short
sequences (e.g., as small as 500 bp) with various temperature and buffer
conditions. PCR products were purified using shrimp alkaline phosphatase and
![Page 28: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/28.jpg)
17
exonuclease I (US Biochemicals, Cleveland, OH) prior to DNA sequencing using
an ABI 3730 automated sequencer (Applied Biosystems, Foster City, CA).
Sequences were aligned and edited using Sequencher v. 4.5 (Gene Codes, Ann
Arbor, MI). All PCR and sequencing primers were designed from available
human sequence (accession # NT_010783.14) and are available upon request.
Gene Structure Analyses
To test hypotheses about selective pressures acting on gene structure, we first
used a gene family approach comparing COL1a1 in humans to other closely-
related fibrillar collagens. Phylogenetic comparisons of vertebrate fibrillar
collagen proteins reveal three clades with COL1a1 being one of five “clade A”
collagens (Boot-Handford and Tuckwell 2003; Aouacheria et al. 2004; Wada et
al. 2006). As such, length and GC-content data for all exons and introns were
collected from public databases for the four other clade A collagens (COL1a2,
COL2a1, COL3a1, and COL5a2). Sequence gaps and Alu elements were omitted,
the latter of which to avoid polymorphic insertions (e.g., a polymorphic Alu in
COL3a1; Milewicz et al. 1996). Further, estimates of intron nucleotide divergence
were calculated from sites that best reflect “neutrality.” Although we cannot say
with certainty which sites are selectively neutral, we do expect that purifying
selection acts relatively stronger on certain sites with putative function compared
to others (e.g., McDonald and Kreitman 1991; Wray et al. 2003). Thus, we
omitted first introns, which are typically enriched for transcription factor binding
sites (e.g., Bornstein et al. 1987; Majewski and Ott 2002), as well as intron 5’ and
![Page 29: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/29.jpg)
18
3’ splice sites. While this first comparison examines the “uniqueness” of COL1a1
among fibrillar collagens, a second comparison was used to also gain an overall
genomic perspective. We used the Gazave et al. (2007) dataset, which includes
length, GC-content, and human-chimpanzee nucleotide divergence estimates for a
total of 51,673 introns compiled from 7,791 genes, representing the largest
evolutionary analysis of human genome introns. As with above, first introns and
intron splice sites were omitted.
Because we are interested in whether specific aspects of COL1a1 introns
are unusual, both independently and with respect to overall structure, analyses of
“means” alone for intron length and GC-content are not informative. Further, the
distribution of intron length in the human genome is dramatically skewed (i.e.,
lacks “normality,” Gazave et al. 2007); thus, because of the enormity of the
genome-wide sample relative to COL1a1, even non-parametric tests that compare
only means of distributions may not be statistically sensitive. Instead, we
constructed bins for the distributions of intron “length” and of GC-content
“percentage” for each clade A collagen and for the Gazave et al. (2007) dataset,
and compared these binned distributions using non-parametric row by column
(RxC) chi-squared tests (e.g., supplementary tables 1a-c, Appendix A). Among
clade A collagens, comparisons of binned distributions were also made for exon
length and GC-content. These analyses that compare distributions of site
frequency classes allow for fine-scale statistical tests and increased power,
especially in cases of unequal sample size (e.g., Akashi and Schaeffer 1997;
![Page 30: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/30.jpg)
19
Verrelli and Tishkoff 2004; Hernandez et al. 2007). Statistical significance levels
for all tests were corrected for multiple comparisons using a standard Bonferroni
method.
We also carried out comparative analyses across taxa to estimate how
aspects of the gene structure of COL1a1 may have changed over evolutionary
time. To avoid errors in alignment and problems associated with incomplete
sequences from available databases, we only used species for which complete
COL1a1 intron and exon sequences were publicly available. That is, species with
gaps in their genomic sequence, regardless of the length of the gap, were
excluded. As a result, the only species for which complete, high-quality COL1a1
sequences were publicly available are mouse (Mus musculus), dog (Canis
familiaris), cow (Bos taurus), western-clawed frog (Xenopus tropicalis), and
zebrafish (Danio rerio), which were added to our human and chimpanzee
sequences. This “7-species dataset” reflects a broad sampling across ~450 My of
vertebrate evolution. Unlike our previous comparisons within the human genome,
these analyses involve the same number of exons and introns. Thus, comparisons
across species were conducted for COL1a1 exon and intron length and GC-
content using Mann-Whitney U tests. To test hypotheses regarding the
conservation of independent introns across evolutionary time, we also used non-
parametric RxC chi-squared tests to compare orthologous introns and exons
among these species (e.g., supplementary tables 2a-e, Appendix A).
![Page 31: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/31.jpg)
20
Nucleotide Site Divergence
To estimate the impacts of long- and short-term selective pressures on the
COL1a1 protein, we first constructed a “vertebrate” dataset reflecting ~450 My
from available complete cDNA sequences, which includes the 7-species dataset in
addition to rat (Rattus norvegicus), donkey (Equus asinus), African clawed frog
(Xenopus laevis), rainbow trout (Oncorhynchus mykiss), and goldfish (Carassius
auratus). Amino acid sequences (which were the only available complete
sequence) from chicken (Gallus gallus) and Japanese firebelly newt (Cynops
pyrrhogaster) were also collected. Second, we also constructed a “primate”
dataset using our human and chimpanzee sequences in combination with database
partial-cDNA sequences (see supplementary table. 3, Appendix A) from gorilla
(Gorilla gorilla), orangutan (Pongo abelii), macaque (Macaca mulatta), and
marmoset (Callithrix jacchus), reflecting ~45 My in total. Sequences within the
two datasets were aligned with a Clustal analysis in MEGA v.4 (Kumar et al.
2008), with the removal of alignment gaps.
With these two datasets, we conducted several analyses of the ratio of
divergence at nonsynonymous and synonymous sites (dN/dS; e.g., Nei and
Gojobori 1986). To identify statistically significant differences, we used the
GABranch algorithm of Pond and Frost (2005a) available through the
Datamonkey on-line server (Pond and Frost 2005b). This analysis enables more
flexible testing of hypotheses in identifying different dN/dS evolutionary
“classes” of constraint using a maximum-likelihood approach. For both our
![Page 32: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/32.jpg)
21
general vertebrate and primate datasets, the relationships among species were
described with a maximum parsimony tree constructed in MEGA v.4 (Kumar et
al. 2008) and the best-fit nucleotide substitution model for each dataset was
determined by the selection procedure of Pond and Frost (2005c).
Other than the triple-helix domain, COL1a1 contains N-and C-terminal
non-collagenous domains, which are found in all fibrillar collagens and
hypothesized to have different functional constraints (e.g., Exposito et al. 2002;
Boot-Handford and Tuckwell 2003; Aouacheria et al. 2004). Thus, we also
applied the GABranch algorithm (Pond and Frost 2005a) to each COL1a1 domain
separately, and used the HyPhy program (Pond et al. 2005) to assess statistical
significance. Specifically, likelihoods are generated from a substitution rate model
for dN/dS evolving independently across all lineages in each domain, and then
compared to a second set of likelihoods (using a standard likelihood ratio test
employed by HyPhy) generated from models where each domain is constrained to
fit another. For example, is the rate for the triple-helix different from that
observed in the N- or C-terminal across lineages?
Unlike the coding region analyses above, non-coding regions are
sufficiently diverged to preclude alignment across taxa, even for closely-related
non-human primates in our dataset. Thus, we estimated historical selective
constraint on the COL1a1 intron nucleotide sequence in humans with an analysis
of human-chimpanzee divergence. As with above, we wish to examine how
divergence varies across introns (which is especially relevant to our analysis of
![Page 33: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/33.jpg)
22
intron lengths), and not compare simple “means;” thus, we used similar RxC chi-
squared tests as above. Specifically, after removing alignment gaps and splice
sites, a distribution of divergence for COL1a1 introns was constructed as bins of
“percentage” (number of differences per nucleotide) and compared to similar bins
of human-chimpanzee divergence for introns in the other four clade A collagen
genes, as well as in the Gazave et al. (2007) dataset.
DAMs at COL1a1
Finally, we examined the association of COL1a1 amino acid sites of high and low
evolutionary conservation with where DAMs are located. Following the method
of Subramanian and Kumar (2006), we estimated the evolutionary substitution
rate at each amino acid site from our vertebrate dataset (14 species listed above
for which the entire amino acid sequence is available), using a maximum-
likelihood approach implemented in PAML v.4.2 (Yang 2007). Specifically, the
distribution of evolutionary rates among sites was categorized into an 8-class
discrete gamma model (with an estimated gamma shape parameter starting at 0.5),
and unequal substitution rates among amino acids were accounted for using the
model of Jones et al. (1992). A total of 293 unique missense and nonsense
COL1a1 DAMs that could be plotted within our vertebrate alignment were
collected from the Database of Osteogenesis Imperfecta and Type III Collagen
Mutations (Dalgleish 1997). We then used a chi-squared test to contrast the
distribution of these observed mutations with the distribution of expected
![Page 34: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/34.jpg)
23
mutations in the 8-class model derived from the Subramanian and Kumar (2006)
analysis.
As the majority of COL1a1 mutations are associated with OI, we also
examined this distribution according to the clinical severity of disease symptoms
(Sillence et al. 1979; Basel and Steiner 2009). Type I OI typically presents with
mild bone weakening similar to that caused by osteoporosis; OI II, the most
severe form, is typically lethal in the pre- or post-natal stage; OI III, the most
severe of survivable OI types, is characterized by severe skeletal deformities and
fragility fractures; and those that cannot be placed into types I-III are typically
classified as OI IV. With this classification, we used four categories: (1) OI I and
osteoporosis-associated mutations; (2) OI IV and related mutations (e.g., I/IV and
III/IV); (3) OI III mutations; and (4) lethal OI II and II/III mutations, which total
249 mutations that meet these criteria. We repeated the same chi-squared analyses
above using the expected distributions from the Subramanian and Kumar (2006)
analysis, but with each of these 4 severity categories as the observed distributions.
Finally, to determine if there is significant spatial clustering of these severity
categories across the COL1a1 amino acid sequence, we used the maximum-
likelihood approach of Zhang and Townsend (2009) as implemented in the
MACML program.
![Page 35: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/35.jpg)
24
Results
Exon Pattern Analyses
COL1a1 exons, while seemingly short in length, are not unusual for clade A
fibrillar collagens (table 1; supplementary tables 1a and c, Appendix A). Exon
length does not differ between human and chimpanzee, and although there are
differences within our 7-species dataset (which are primarily restricted to the N-
terminal domain), they are not statistically unusual (supplementary tables 2a and
c-e, Appendix A). At 66%, COL1a1 exon GC-content is considerably higher than
that seen at human genes in general (International Human Genome Sequencing
Consortium 2001), which is not surprising given the abundance of GC-rich
glycine and proline codons typically found in collagens. However, COL1a1 exon
GC-content is still significantly greater than the other clade A fibrillar collagens
(except COL2a1), even when examining only glycine- and proline-rich triple-
helix coding regions (table 1; supplementary tables 1b and c, Appendix A).
Further, although COL1a1 exon GC-content overall differs for comparisons
between mammals and non-mammals, no differences are seen in comparisons of
orthologous exons (supplementary tables 2b-e, Appendix A).
Protein Evolution and DAMs
COL1a1 human-chimpanzee synonymous divergence is not unusual compared to
other genes (e.g., Chimpanzee Sequencing Consortium 2005), while
nonsynonymous divergence at COL1a1 (as well as at all clade A collagens), is
virtually absent (table 2). Low COL1a1 amino acid divergence is also observed
![Page 36: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/36.jpg)
25
within our primate sample (dN/dS<0.15; supplementary table 3, Appendix A),
and although the GABranch analysis found evidence for two dN/dS rate classes
overall, the small number of fixations precluded analyses among domains. On the
other hand, the analysis of our vertebrate sample found evidence for six dN/dS
rate classes in the triple-helix domain (dN/dS<0.36) and three rate classes for each
of the C- and N-terminal domains (dN/dS<0.17 and <0.74, respectively;
supplementary fig. 1, Appendix A). With the HyPhy analysis, we find little
evidence for rate differences between the C- and N-terminal domains for
comparisons between primate and other mammalian lineages, whereas, the rate in
the triple-helix is significantly less than that of the N-terminal, but significantly
greater than that of the C-terminal domain (likelihood ratio test, P<0.009).
Using the approach of Subramanian and Kumar (2006), our site analysis
finds spatial variation across COL1a1, with sites in the N-terminal domain having
a significantly higher evolutionary rate (fig. 2; supplementary tables 4a and b,
Appendix A). In addition, DAMs occur more often at highly conserved amino
acid positions, a relationship that is consistent for all of the phenotypic severity
categories associated with OI disease except, interestingly, the least severe
category 1 (fig. 2; supplementary tables 4a-c, Appendix A). In fact, when
focusing on the extremes of lethal (category 4) and more benign, but osteoporotic-
like (category 1) mutations, the former occur at much more highly conserved
amino acid positions a significantly greater proportion of the time (Χ2=25.2,
P=0.0001). Finally, our MACML analysis finds clustering across COL1a1 in
![Page 37: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/37.jpg)
26
several regions, with statistically significant evidence involving the two extreme
DAM categories, regardless of the model used from Zhang and Townsend (2009).
A significant cluster for category 1 includes residues 190-366 (P<0.05), which is
the N-terminal end of the triple-helix domain, whereas a significant cluster for
category 4 includes residues 352-1186 (P<0.05), which is centered on the triple-
helix domain.
Intron Pattern Analyses
COL1a1 introns are significantly shorter than those of the other clade A collagens
with the exception of COL2a1 (table 1, fig. 3). In fact, the intron structure at
COL1a1 - many introns, but all of them relatively short in length - is statistically
very unusual (X2=150, P<10-6) when compared to human genes in general
(Gazave et al. 2007). We do find evidence for significant variation in length of
orthologous introns between taxa within our 7-species dataset; however, as
overall intron length has not changed dramatically (supplementary tables 5a and
c-e, Appendix A), COL1a1 intron content appears to be similar even across
vertebrates separated by ~450 My.
Intron GC-content at COL1a1 is significantly greater than that at the other
clade A collagens except COL2a1 (supplementary tables 6b and d, Appendix A).
However, we also found that GC-content was negatively correlated with intron
length at COL1a1 (supplementary table 6e, Appendix A), a pattern that has been
generally noted from genome-wide studies (e.g., International Human Genome
Sequencing Consortium 2001; Gazave et al. 2007; Pozzoli et al. 2008).
![Page 38: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/38.jpg)
27
Nonetheless, even when the dataset for the 5 genes was standardized to include
introns only 80-500 bp in length to reflect intron sizes at COL1a1, the results did
not change. Unlike COL1a1, no other collagen gene shows a significant
correlation between intron length and GC-content; thus, instead of “percentage”
GC-content, we examined linear regressions of the number of G/C nucleotides in
each intron, which standardizes GC-content for intron length. F-test analyses
comparing these regressions confirm that intron GC-content at COL1a1 is
unusually higher than all other clade A collagens except COL2a1 (supplementary
table 6f, Appendix A). In fact, when compared to the Gazave et al. (2007) dataset,
and genes with introns of only 80-500 bp, COL1a1 still has significantly greater
GC-content (X2=35, P=0.0009). Unlike intron length, intron GC-content differs
significantly in our 7-species dataset at COL1a1, regardless of whether overall
distributions or pairwise comparisons among equivalent introns are compared
(supplementary tables 5b-e, Appendix A). Overall intron GC-content is
significantly higher in mammals vs. non-mammals, except for mouse, which has
significantly lower intron GC-content than even other mammals.
Analyses of genome-wide introns indicate that length and GC-content are
also positively correlated with human-chimpanzee divergence (Gazave et al.
2007). While this seems to be the case for intron length and divergence at
COL1a1 and the other clade A collagens, the relationship for GC-content and
divergence is only weakly correlated for the collagen genes (supplementary table
6e, Appendix A). Thus, as in analyses above, we standardized datasets to include
![Page 39: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/39.jpg)
28
introns 80-500 bp for divergence comparisons. Intron divergence at COL1a1 is
low, but not statistically unusual compared to the other collagens (table 2;
supplementary tables 6c and d, Appendix A) or human genes in general (X2=8.0,
P=0.6). However, it is possible that our analysis of COL1a1 divergence is an
underestimate as it does not take into consideration the significantly high, gene-
specific GC-content (e.g., Hernandez et al. 2007). To test this latter hypothesis,
we used divergence at COL1a1 non-CG dinucleotides (0.43%) as a conservative,
“background” substitution rate (i.e., instead of the higher genome-wide estimate
of ~1%; Chimpanzee Sequencing Consortium 2005). This estimate was adjusted
for the ~10-fold increase in mutation rate expected for CG-dinucleotides in
general (Subramanian and Kumar 2003), and compared to the rate “observed” at
CG-dinucleotide sites (i.e., after using orangutan as an outgroup to correct for
changes recently in the human and chimpanzee lineages). When these two
estimates were applied to the total number of intron sites at COL1a1 to simulate
sampling variance, interestingly, observed human-chimpanzee divergence appears
to be reduced (X2=6.14, P=0.013).
Discussion
Our evolutionary analyses of COL1a1 spanning the past ~450 My are consistent
with it being one of the most highly conserved in vertebrates. While this may be
the case for the protein overall, we find evidence for spatial/temporal variation in
selective pressures across domains. In fact, the C-terminal domain that is
responsible for the recognition and assembly of type I collagen subunits (e.g.,
![Page 40: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/40.jpg)
29
Exposito et al. 2002; Boot-Handford and Tuckwell 2003; Aouacheria et al. 2004),
is actually the most highly conserved domain. Like the N-terminal, the C-terminal
peptide is cleaved after the subunit’s triple-helix domain has been assembled into
type I collagen; thus, interestingly, the protein region that exhibits the strongest
signature of purifying selection is also one that is not part of the mature protein.
On the other hand, the relatively less constraint on the COL1a1 triple-helix
domain implies more “flexibility” in this region over evolutionary time, which
may contribute to observed structural and mechanical variation in bone, including
mineral content and organization of collagen fibers, among mammals, birds, and
reptiles (e.g., Currey 1987; Wang et al. 1998; Rensberger and Watabe 2000).
It is possible that the evolutionary differences seen across domains are the
result of directional selection pressures over time. Others have suggested this
explanation (e.g., Morgan et al. 2010); however, the rate variation observed here
is not what would be typically interpreted as a signature of “positive selection.”
Specifically, the estimates of dN/dS over both recent and deeper evolutionary
lineages are more consistent with purifying selection, and thus the differences
among domains in these analyses, while statistically significant, are more
reflective of variation in functional constraint and not adaptive evolution.
Furthermore, our observations of very little amino acid variation among primates,
including no fixation between human and chimpanzee, together with the observed
slight differences in the properties of bone documented among primates (e.g.,
![Page 41: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/41.jpg)
30
Wang et al. 1998; Kikuchi et al. 2003), are consistent with strong purifying
selection that simply varies from ancient to more recent evolutionary time.
The abundance of DAMs at COL1a1 is also consistent with high
functional constraint; however, the fact that so many exist, and yet are not
associated with lethality (e.g., category 1 mutations) may suggest that purifying
selection has recently become weak in the human lineage. This hypothesis has
been championed as a general explanation for human disease prevalence, as high
amino acid polymorphism in humans (albeit rare in frequency), in spite of the
little amino acid divergence with chimpanzees, is a genome-wide pattern (e.g.,
Bustamante et al. 2005; Boyko et al. 2008; Lohmueller et al. 2008). However, the
fact that purifying selection, and the absence of any positive selection, dominates
the evolutionary history at COL1a1 is somewhat novel compared to other genes
strongly associated with human disease (e.g., Blekhman, Man et al. 2008). Thus,
instead of invoking weak purifying selection in the human lineage, a better
explanation for the pattern of DAMs seen today is that selective constraints
simply differ across protein domains, which is also consistent with the overall
vertebrate pattern at COL1a1.
In fact, our comparisons of the location of DAMs with the estimates of
evolutionary constraints at their respective sites helps further support this
hypothesis. For example, although previous studies have suggested that the
probability of lethality for a mutation increases as one moves away from the N-
terminus (e.g., Byers et al. 1991; Marini et al. 2007; Rauch et al. 2010), we find
![Page 42: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/42.jpg)
31
that DAMs not only cluster in specific protein sites of high evolutionary
constraint, but that they even cluster with respect to classes of lesser severity,
including more common, osteoporotic-like conditions. Interestingly, the N-
terminal domain, where only four DAMs have been observed (Dalgleish 1997), is
also the region here that shows the least evolutionary constraint over recent as
well as deep evolutionary time. On the other hand, ~7% of COL1a1 positions,
including those in the triple-helix, lack known DAMs and are also the most
rapidly evolving sites. Thus, while factors such as amino acid size,
thermostability, and domain are important to bone-related disease prediction
models (e.g., Persikov et al. 2005; Marini et al. 2007; Bodian et al. 2008, 2009),
an evolutionary site model may detect bone-related phenotypes of lesser, but more
common, severity with finer resolution, particularly for non-glycine mutations,
which have been largely ignored by previous prediction models.
Our final analyses of the evolutionary history of the overall gene structure
also find unusual patterns for non-coding DNA. Of particular interest is that
COL1a1 introns have remained relatively short over the past ~450 My despite
divergence in overall genome size (e.g., haploid human genome is ~2-fold larger
than that of zebrafish; Hedges and Kumar 2002). Nonetheless, there are several
reasons why these patterns are not expected a priori. First, because of the
repetitive nature of vertebrate fibrillar collagen coding sequence, which has been
proposed to have originated through a series of duplication events from an
ancestral collagen with a single exon of ~54 bp (Bernard et al. 1983; Valkkila et
![Page 43: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/43.jpg)
32
al. 2001; Exposito et al. 2002; Boot-Handford and Tuckwell 2003; Aouacheria et
al. 2004; Wada et al. 2006), there is extensive homology among gene regions.
This homology presents the potential for increased unequal crossing-over events
that result in exon duplications and deletions in the triple-helix domain, which,
although important in the history of collagen evolution, is highly deleterious today
as it typically leads to severe disease (e.g., Barsh et al. 1985; Cohn et al. 1993;
Bodian et al. 2009). As such, separating the short COL1a1 exons to reduce
“interexon homology” (e.g., Cohn et al. 1993) would actually predict longer
introns, yet this is not observed here.
A second consideration involves the greater mutation rate associated with
high GC-content in general (e.g., Subramanian and Kumar 2003), and thus, the
expectation that mutational input at COL1a1, including introns, may be relatively
high. However, our conservative estimates suggest that intron site divergence at
COL1a1 may actually be reduced. We may expect that purifying selection on
deleterious amino acid variants coincidentally reduces neutral, linked
polymorphism in introns (as a result of background selection, e.g., Charlesworth
et al. 1995). However, under this scenario, we do not expect intron site
divergence, if neutral, to be reduced as gene regions become effectively unlinked
and evolutionarily independent over longer periods of time (e.g., Hellmann et al.
2003). For example, even at COL1a1, synonymous sites show typical levels of
neutral divergence, yet nonsynonymous sites are highly conserved. Thus, the
pattern found for intron site divergence at COL1a1 may be consistent with
![Page 44: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/44.jpg)
33
purifying selection, which suggests that intron site variation has functional
implications as well.
As previously noted, transcriptional efficiency is correlated with GC-
content and intron length across the genome (Hurst et al. 1996; Castillo-Davis et
al. 2002; Urrutia and Hurst 2003; Vinogradov 2003; Comeron 2004; Kudla et al.
2006; Pozzoli et al. 2008); thus, one explanation for the significantly short, GC-
rich introns at COL1a1 is that high gene expression is maintained by stabilizing
selection. As type I collagen is the most abundant protein in mammals and is
consistently required across life stages, from fetal development to wound repair
(e.g., Gelse et al. 2003; Hildebrand et al. 2005; Cohen 2006), strong selection for
COL1a1 transcriptional efficiency would appear to be necessary. In addition, for a
gene the size of COL1a1, the observation of so few transposable elements (4,
including 1 Alu) is unusual given their frequency in the genome, yet interestingly,
this pattern is also consistent with human genes with high expression
(International Human Genome Sequencing Consortium 2001; Hackenberg et al.
2005; Pozzoli et al. 2008). COL2a1 may have similarly short, GC-rich introns
with few transposable elements (4, again including only 1 Alu) as COL1a1
because of similar constraints on gene expression. This may be the case as type II
collagen, encoded by COL2a1, is the main structural protein of cartilage and is,
therefore, also as functionally important in bone development (Gelse et al. 2003;
Cohen 2006). Finally, while certain introns (e.g., first introns) may exhibit lower
evolutionary divergence as they are known to be enriched for transcription factor
![Page 45: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/45.jpg)
34
binding domains, reduced divergence across COL1a1 introns overall may also be
related to intron length. For example, transcription enhancers often accumulate
within the ~150 bp of exon-intron boundaries (e.g., Majewski and Ott 2002), and
thus, given the relatively short intron lengths at COL1a1, purifying selection may
be expected to be higher if the number of intron nucleotide sites that are
selectively neutral is also relatively reduced.
Conclusion
Although previous analyses of the COL1a1 protein have invoked positive
selection to explain the molecular rate heterogeneity observed across both
lineages and time, our dataset including ~450 My of vertebrate evolution
concludes that these patterns are best explained by variation in purifying
selection. Furthermore, although low evolutionary COL1a1 site rates are
consistent with the abundance of DAMs found in the human population today, our
unique approach may predict not only the location of these mutations, but also
their degree of severity. In fact, these patterns imply that COL1a1-related diseases
are likely not isolated to humans, or primates in general, which suggests that even
distantly-related vertebrates would be suitable models for research on common
bone-related diseases such as osteoporosis. Finally, the unusual patterns seen for
the COL1a1 intron structure may be a signature of selective constraint for high
gene expression that has interesting implications. Specifically, given the inferred
history of purifying selection on them, these non-coding regions should also be
![Page 46: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/46.jpg)
35
considered when predicting which variants at COL1a1 may explain potentially
deleterious, or even adaptive, bone-related phenotypes today.
Acknowledgments
The authors thank G. Perry and C. Hepp for database and PAML assistance,
respectively, and S. Kumar for comments on early versions of the analyses. This
work was supported by a National Science Foundation grant DEB-0909637 to
B.C.V. and D.A.S.
![Page 47: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/47.jpg)
36
Table 1
Human Clade A Collagen Gene Exon and Intron Characteristics
Gene Characteristic COL1a1 COL1a2 COL2a1 COL3a1 COL5a2 Gene size (bp) 16,012 35,362 30,915 37,285 145,535 No. of Exons 51 52 54 51 54 Exon length (bp) 86 ± 52 79 ± 49 83 ± 53 86 ± 53 83 ± 54 Intron length (bp) 193 ± 136 563 ± 502 416 ± 442 454 ± 316 1,380 ± 1,297Exon GC-content (%) 66 ± 5 59 ± 8 64 ± 6 59 ± 7 57 ± 7 Intron GC-content (%) 59 ± 7 35 ± 5 55 ± 6 30 ± 4 32 ± 4 Note: Gene size is the distance from the start to the stop codon. Length and GC-content values denote means and standard deviations, excluding first introns and splice sites. See supplementary tables 1a, b, and 6a, b (Appendix A) for more information.
![Page 48: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/48.jpg)
37
Table 2
Clade A Collagen Gene Human-Chimpanzee Divergence Estimates
Gene COL1a1 COL1a2 COL2a1 COL3a1 COL5a2
Sites nb dc n d n d n d n d Intronsa 9,463 0.69 28,160 0.97 21,619 1.09 22,242 0.94 71,769 0.74 Synonymous 1,212 1.49 1,140 0.79 1,220 1.31 1,195 0.50 1,209 1.16 Nonsynonymous 3,180 0 2,958 0.10 3,240 0.03 3,203 0.03 3,288 0.06 a Excludes first introns, see table 1 and supplementary table 6c (Appendix A) for more information. b Number of nucleotides
c Divergence as number of differences per nucleotide (%)
![Page 49: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/49.jpg)
Fig. 1. The COL1a1 gene locus, with all coding and non-coding regions to scale.
DNA sequenced region in human and chimpanzee includes 1,223 bp of the
promoter (striped box), 263 bp of the 5’ and 3’ mRNA untranslated regions (white
boxes), the 4,392 bp of coding sequence for all 51 exons (black boxes), with non-
coding regions in between. The “*” denotes a 496-bp gap in the collected
sequence of intron 25 due to the presence of an Alu element.
38
![Page 50: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/50.jpg)
Fig. 2. Locations of DAMs are shown as vertical markers along the COL1a1
amino acid sequence for (A) category 4 mutations, which include lethal OI II and
II/III groups, and (B) category 1 mutations, which include OI I and osteoporosis-
associated groups. Long and short vertical markers reflect non-glycine and
glycine substitutions, respectively, across the three protein domains. The relative
estimated evolutionary rates along the sequence are calculated from 14 diverse
vertebrate taxa using the Subramanian and Kumar (2006) analysis, with a
trendline summarizing a sliding window of averages across windows each of 10
amino acid residues (position number based on human sequence). See Materials
and Methods and supplementary table 4a (Appendix A) for more information.
39
![Page 51: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/51.jpg)
Fig. 3. Frequency distributions of intron length for human clade A collagen genes,
excluding first introns. The X-axis reflects the intron length bins (all scales are the
same), and the Y-axis shows the number of introns within these bins. See
supplementary tables 6a and d (Appendix A) for more information.
40
![Page 52: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/52.jpg)
41
CHAPTER 3: HAPLOTYPE STRUCTURE AND AMINO ACID
POLYMORPHISM AT HUMAN COL1a1
Abstract
Bone strength and the incidence and severity of related skeletal disorders like
osteoporosis vary significantly among human populations of different ethnic
origin due in part to underlying genetic differentiation, such as at the bone
structural protein gene, collagen type I alpha 1 (COL1a1). Previous research has
shown that, not only has the COL1a1 protein been highly conserved over deep
vertebrate evolutionary time, but that exon-intron structure has been as well,
suggesting that noncoding regions represent functionally important domains for
examining bone-related phenotypes. Here, we have collected DNA sequence
variation from both coding and noncoding regions of the >17-kb COL1a1 locus in
192 chromosomes from 10 ethnically and geographically diverse natural human
population samples to determine how recent and ancient evolutionary pressures
have differed and what this predicts about bone-related variation in populations
today. Surprisingly, we find population diversity for amino acid polymorphism to
be higher than that predicted from clinical studies, significant geographic
variation for an unusual haplotype block structure that includes the 5’ upstream
region of noncoding sequence, and an ancient origin for the functionally relevant
and well-studied Sp1 binding site polymorphism. While previous studies have
long focused on amino acid variation at COL1a1 as a source of deleterious
function, our evolutionary approach has led us to conclude otherwise; specifically,
![Page 53: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/53.jpg)
42
protein variation is not only more abundant and older than believed, but
noncoding regions may also greatly contribute to existing variation in the
expression of bone mineral density across diverse human populations.
Introduction
Bone strength, measured as bone mineral density (BMD), varies significantly
among populations with individuals of African ancestry tending to have greater
bone strength and better overall bone quality than other populations (Looker et al.
1998; Bachrach et al. 1999; Barrett-Connor et al. 2005; Baxter-Jones et al. 2010).
Similar trends are also seen for disorders related to skeletal strength including
osteoporosis, which is generally at its highest frequency among western
Europeans (e.g., Lauderdale et al. 1997; Looker et al. 1997; Melton 1997; Barrett-
Connor et al. 2005). Although this variation in bone strength and related disease
susceptibility is due in part to environmental differences among populations (e.g.,
dietary intake of calcium and vitamin D; Matkovic et al. 1990; Lau et al. 2005;
Adami et al. 2009; Musumeci et al. 2009), the majority of this phenotypic
variation is also genetically related given that BMD is estimated to be as high as
~80% heritable (e.g., Gueguen et al. 1995; Prentice 2001; Brown et al. 2005;
Videman et al. 2007). As bone-related disorders like osteoporosis alone impact
>200 million worldwide (Reginster and Burlet 2006), it is clear that models
leading to the characterization of potential sources of this genetic variation would
be highly valuable, not only for screening genotypes, but also for the development
of effective drug treatments (e.g., Qureshi et al. 2002).
![Page 54: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/54.jpg)
43
Among the candidate genes most commonly associated with variation in
bone strength and related disease susceptibility across populations is COL1a1
(e.g., Garcia-Giralt et al. 2002; Stewart et al. 2006; Jiang et al. 2007; Ioannidis et
al. 2007; Kaufman et al. 2008), which encodes the primary subunit of the main
structural protein in bone, type I collagen. Over 600 skeletal and connective tissue
disease-associated mutations (DAMs) have been identified within this single
gene, the majority of which are linked to osteoporosis, osteogenesis imperfecta
types I-IV, and Ehlers-Danlos Syndrome types I and VIIA (Dalgleish 1997;
Marini et al. 2007). One consideration is that estimates of amino acid
polymorphism abundance and deleterious function are biased as they come from
clinical genotype screens based on bone-related disorders. In fact, our previous
comparisons of estimated single amino acid evolutionary rates across a diverse
sample of taxa with the locations of DAMs in humans developed a model that
shows a history characterized by purifying selection, but also shows clustering of
variation in selective constraint across the protein sequence (Stover and Verrelli
2010). Thus, while it is clear that amino acid polymorphism can result in
deleterious phenotypes, we may predict that not all amino acid variation in natural
populations has severe or even detectable impacts on protein function.
Another consideration is the extent to which the focus of previous studies
on COL1a1 exons has also biased our perception of where functional variation
accumulates. For example, one study has measured COL1a1 polymorphism (Chan
et al. 2008), but with a sample of only admixed-American populations and a focus
![Page 55: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/55.jpg)
44
on coding regions, our understanding of other gene regions and of the natural
global human population is deficient at best. In fact, in the first evolutionary
analysis of coding and noncoding COL1a1 regions representing ~450 My of
vertebrate divergence, we previously showed that intron structure and content
have also been historically conserved (Stover and Verrelli 2010), likely due to the
need to maintain high transcriptional efficiency of this gene given its importance
in fetal development and wound repair (e.g., Gelse et al. 2003; Hildebrand et al.
2005; Cohen 2006). As such, introns and other noncoding regions may actually
represent better candidates for functional variation related to BMD expression if
amino acid variation is sufficiently rare in frequency. Nonetheless, while it is
clear that purifying selection has historically impacted noncoding regions of
COL1a1, it is unknown how selective pressures have impacted these regions over
recent evolutionary time.
The strongest evidence of association between COL1a1 noncoding genetic
variation and variation in bone strength and osteoporotic fracture-risk is a single
nucleotide polymorphism (SNP) located in an Sp1 transcription factor binding site
in the first intron (e.g., Efstathiadou et al. 2001; Bandres et al. 2005; Ralston et al.
2006; Jiang et al. 2007). This noncoding SNP increases the expression of COL1a1
and likely accounts for the associated change in the ratio of COL1a1 and
COL1A2 subunits in type I collagen that reduces its structural integrity (Grant et
al. 1996; Mann et al. 2001; Jin, van’t Hof et al. 2009). Despite its seemingly
negative impact on function, this SNP has reached a relatively high frequency
![Page 56: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/56.jpg)
45
among individuals of western European ancestry (>20%; e.g., Grant et al. 1996,
Ralston et al. 2006; Jiang et al. 2007), but is only rarely found among Africans
(e.g., Beavan et al. 1998) and is absent among Asians (e.g., Han et al. 1999;
Nakajima et al. 1999; Lau et al. 2004). While these previous studies may suggest
an origin of the Sp1 SNP in Europe given its frequency, this would imply a
relatively recent age. As this SNP is the most-studied in genotype-phenotype
relationships at COL1a1, an understanding of its age and evolutionary history
would address questions of its relevance over time with respect to bone-related
disease and gene expression in humans.
Here we conduct the first natural population genetic analyses in a global
and random sample of humans to address the functional importance of COL1a1
coding and noncoding polymorphism. We are specifically interested in testing
hypotheses to answer: (1) Is amino acid polymorphism reflective of clinical
studies in that it is typically deleterious and thus expected to be rare? (2) Are there
significant patterns of variation associated with noncoding regions that imply
functional relevance as predicted by our previous comparative species studies? (3)
Finally, what does the age and geographic pattern of the Sp1 functionally relevant
SNP predict about collagen-related gene expression in the global population?
Materials and Methods
Population Samples
As our overall objective was to obtain a general estimate of how COL1a1 genetic
diversity is distributed across natural populations from a global perspective, our
![Page 57: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/57.jpg)
46
sample included 96 individuals (total of 192 chromosomes; table 3) with no
known bone abnormalities (i.e., a random sample with respect to phenotypic
diversity) representing 10 geographically and ethnically diverse populations
publicly available from the Coriell Institute of Medical Research (Camden, NJ).
These populations also reflect a random sample of human genetic diversity
outside and inside of sub-Saharan Africa, the latter typically having higher
nucleotide and haplotype diversity owing to its older history and larger estimated
effective population size (Ne) compared to the recent demographic history of
proposed expansion associated with “non-African” groups (e.g., Rosenberg et al.
2002; Cavalli-Sforza and Feldman 2003; Tishkoff and Verrelli 2003; Campbell
and Tishkoff 2008). These samples have been previously used in similar
population statistical analyses by this lab and others, and enable comparisons of
genetic diversity across loci (e.g., Bersaglieri et al. 2004; Xu et al. 2005; Evans et
al. 2006; Claw et al. 2010). Samples include: sub-Saharan African (18
chromosomes, catalog number HD12), North African (14, HD11), Middle Eastern
(20, HD05), Russian (20, HD23), Chinese (20, HD32), Japanese (20, HD07),
Southeast Asian (20, HD13), Mexican (20, HD08), Northern European (20,
HD01), and Italian (20, HD21). Finally, as the chimpanzee is our closest living
relative with an estimated molecular divergence time from our common ancestor
at ~4-6 My (e.g., Kumar et al. 2005), we used COL1a1 nucleotide sequence
previously collected by us (Stover and Verrelli 2010) from a Pan troglodytes
verus (western Africa subspecies) individual to conduct analyses that require
![Page 58: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/58.jpg)
47
inferences of ancestral and derived states and estimates of locus-specific
divergence.
DNA Amplification and Sequencing
DNA sequence data were collected for a >17-kb region of the COL1a1 locus (fig.
4), which resides on chromosome 17q21 in human and chimpanzee genomes. Our
previous protocol for the generation and DNA re-sequencing of COL1a1
nucleotide sequences was followed here, specifically using short polymerase
chain reaction (PCR) fragments to avoid problems associated with sequence
secondary structure due to the repetitive nature of coding regions and the
unusually high GC-content (~60%) of this locus (Stover and Verrelli 2010). As
before, DNA sequencing of PCR fragments was conducted on an ABI 3730
(Applied Biosystems, Foster City, CA) and sequences were aligned and edited
using Sequencher v. 4.5 (Gene Codes, Ann Arbor, MI).
Statistical Analyses
Because of the documented genetic patterns associated with the different
evolutionary histories among geographic regions, all diversity analyses were
conducted for the global population as well as for each population sample
separately. Although synonymous exon sites and intron sites are not completely
“neutral,” in human datasets these “silent” sites are typically considered as the
most appropriate proxy for neutral evolution, especially when compared to
nonsynonymous sites in exons that can directly impact the protein. Other
noncoding regions may also have putative “functional” effects on gene expression
![Page 59: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/59.jpg)
48
and regulation and even exhibit significant evolutionary constraint, such as first
introns (e.g., Bornstein et al. 1987; Majewski and Ott 2002), and 5’ and 3’
untranslated mRNA (UTR) and promoter regions (e.g., Wray et al. 2003;
Haygood et al. 2007; Cheung and Spielman 2009). In fact, although there is a
known minimal promoter length required for COL1a1 (Chu et al. 1985), sites as
far as 2 kb upstream of the transcription start site are known to affect COL1a1
expression (e.g., Garcia-Giralt et al. 2005). Thus, all diversity analyses were
applied to several gene regions, in addition to simply exons and introns, to test
hypotheses of functional constraint across the COL1a1 locus.
The DnaSP v. 5.1 program (Rozas et al. 2003) was used for genetic
diversity statistic estimates, unless otherwise noted. Insertion-deletion (indel)
polymorphisms were excluded from all nucleotide diversity estimates given that
they are not generally expected to reflect neutral mutation rates (i.e., such as
SNPs). We used the PHASE v. 2.1.1 program (Stephens et al. 2001) to
statistically resolve heterozygous sequences for each individual into haplotypes.
This analysis was repeated with 100, 500, and 800 iterations with phased
haplotypes from the run with the highest average goodness-of-fit used in
subsequent analyses (Stephens et al. 2001). Genetic diversity was estimated as
Watterson’s (1975) θW, which is based upon the number of segregating sites (S)
corrected for sample size, and as θπ, which is based upon the average number of
pairwise differences among sequences (Nei 1987). Under neutrality, these two
estimates of the parameter θ = 4Neµ (for an autosomal locus, with “µ” denoting
![Page 60: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/60.jpg)
49
mutation rate per bp) are expected to be similar, which can be tested using
Tajima’s (1989) D statistic. This statistic can indicate skews in the SNP frequency
spectrum due to deviations from neutrality, yet it can also reflect demographic
attributes. To distinguish among these scenarios, we used the MS program
(Hudson 2002) to conduct coalescent simulations similar to other human analyses
in modeling parameters, such as population size and recombination (i.e., Verrelli
and Tishkoff 2004; Auton et al. 2009; Scheinfeld et al. 2009; Claw et al. 2010)
when necessary (see Results).
Given the association of COL1a1 with variation in bone strength across
populations, we are interested in determining whether specific variants or
haplotypes show significant genetic differentiation among samples. We first used
an FST analysis from Hudson et al. (1992) to relate information about the
proportion of variation shared within and between groups. However, we also used
Hudson’s (2000) Snn statistic as it examines genetic differentiation across samples
by considering the actual pairwise differences among haplotypes, and not simply
haplotype frequencies. This analysis is statistically more powerful in this respect
and is less sensitive to sample size (Hudson 2000), and it also provides a
simulation of the data to detect statistical significance, with P values corrected for
multiple comparisons by a standard Bonferroni method.
While our Snn analyses are intended to detect haplotype differentiation
among groups, we also used analyses of linkage disequilibrium (LD), the
rationale of which is haplotype structure across our sample may be a result of
![Page 61: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/61.jpg)
50
different haplotypes (i.e., different combinations among SNPs in different
groups), or it could simply be the result of similar haplotypes that differ in
frequency. By using additional analyses of LD, we should be able to distinguish
between these scenarios by determining if specific associations among SNPs or
haplotype “blocks” are evident within and among groups, and which may then
explain any detected significant haplotype differentiation from our Snn analyses
(e.g., Claw et al. 2010). We first used a simple model that measures correlations
among SNPs as r2, with statistical significance assessed by chi-squared tests after
a Bonferroni correction, to detect LD across COL1a1. However, as LD is not
unusual in the human genome (i.e., as a result of historical population expansion
and reduced Ne), it is appropriate to incorporate a background model of
recombination when interpreting significant LD from an evolutionary perspective
(i.e., Hudson 2001). Thus, we also used the LDhat program (McVean et al. 2002),
which applies the approximate-likelihood method of Hudson (2001) and uses
permutation analyses to determine if pairwise comparisons among SNPs exhibit
significant LD given a locus-specific estimate of θ and the recombination
parameter ρ = 4Nec (where “c” denotes the recombination rate per nucleotide).
Variants rare in frequency and that are uninformative (as determined by LDhat)
were omitted prior to the analysis.
In the case of the Sp1 binding site SNP (or others that emerge) where we
desire to estimate the age of a mutation, we used the estimator of Thomson et al.
(2000), which has been similarly used by others for human datasets (e.g., Verrelli
![Page 62: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/62.jpg)
et al. 2002; Scheinfeld et al. 2009; Claw et al. 2010) as assumptions of population
equilibrium and recombination are relaxed. The age estimate (t) involves the
relationship
∑=
=n
i
i
nx
t1 )( µ
where xi is the number of mutational differences between the ith sequence and the
most recent common ancestor of all sequences, n is the total number of sequences
in the sample, and µ is the mutation rate. The latter parameter is a “neutral” gene-
specific estimate based on the number of nucleotide substitutions between human
and chimpanzee COL1a1 sequences divided by twice the estimated divergence
time between species (i.e., 4-6 My; Kumar et al. 2005).
Finally, we also examined human-chimpanzee divergence in a test of
neutrality that contrasts intraspecific and interspecific patterns. As levels of
polymorphism and divergence are expected to be correlated under a neutral model
of evolution, we can compare these two classes of variation at putative functional
sites with those at our defined “neutral” sites using a Fisher’s exact test (i.e.,
McDonald and Kreitman 1991). This test enables us to examine how selective
pressures have differentially shaped the magnitude of human lineage-specific
variation over both short- and long-evolutionary time.
Results
Polymorphism and Divergence
A total of 16,993 bp of the COL1a1 locus was collected from each individual in
our human sample (fig. 4; supplementary table 1, Appendix B). Overall silent site
51
![Page 63: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/63.jpg)
52
SNP diversity at COL1a1 (table 3) is consistent with the human genome average
of ~0.1% as well as with that seen in similar population samples (e.g.,
Sachidanandam et al. 2001; Verrelli et al. 2002; Verrelli and Tishkoff 2004;
Garrigan and Hammer 2006; Claw et al. 2010). While diversity in this sample
appears to be skewed towards rare alleles (as indicated by a negative Tajima's D;
table 3), this is not surprising as it is consistent with the aforementioned historical
population expansion signature seen in molecular studies. Of note is that
population samples also exhibit similar SNP diversity (i.e., θπ; table 3). In all,
COL1a1 silent site polymorphism appears to reflect what is generally expected
under neutrality.
In examining variation outside of introns (and synonymous sites), only
one SNP, which is found in the 3’ UTR, reaches a frequency >5%. Thus, in
general, variation associated with promoter, UTR, and nonsynonymous sites is
relatively rare (supplementary table 1, Appendix B). Even so, we identified 6
nonsynonymous SNPs found among 6 populations, all but one of which are
singletons (supplementary table 2, Appendix B). Interestingly, while 2 of these
variants, both found in the triple-helix domain, have been reported previously (1
with unknown phenotype and 1 in association with osteopenia; Spotila et al. 1994;
Dalgleish 1997), the other 4 variants are novel (i.e., never documented in clinical
reports; supplementary table 2, Appendix B). Two of these novel variants also
occur in the triple-helix domain (including one glycine-altering mutation) and the
others in the C-terminal non-collagenous domain.
![Page 64: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/64.jpg)
53
Our interspecific comparisons with chimpanzee find no fixed
nonsynonymous differences and estimates of divergence across other putatively
functional domains, such as UTR and promoter regions, are comparable to silent
sites (supplementary table 1, Appendix B). When contrasted with intraspecific
variation for these site classes, a statistical excess of nonsynonymous
polymorphism is apparent (McDonald-Kreitman test, table 4). While this pattern
may be consistent with balancing selection, under this scenario we may expect
that adaptive variants are not sufficiently rare in frequency (e.g., Akashi and
Schaeffer 1992). In fact, when we removed rare variants (<5%) from all
polymorphic classes, patterns of polymorphism and divergence no longer deviate
from that expected under neutrality (data not shown).
Haplotype Structure and Age Estimates
As is typical with analyses of sub-Saharan Africans, we find significant
population differentiation when compared to other global groups. However, while
differentiation among these and human population samples in general is ~10-15%
globally (e.g., Rosenberg et al. 2002; Claw et al. 2010), several of our groups
differ dramatically for COL1a1, with pairwise comparisons varying from 0-35%
(supplementary fig. 1, Appendix B). Although contrasts involving sub-Saharan
Africans largely contribute to this diversity, even non-African samples exhibit
differentiation as high as 22%. While sampling variance may potentially explain
this observation for FST, our statistical assessment using Hudson's Snn reveals
significant differentiation not only between contrasts involving sub-Saharan
![Page 65: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/65.jpg)
54
Africans, but also among several pairs, most involving Asian population samples
(supplementary fig. 1, Appendix B).
As illustrated in our LD analysis (fig. 4), several significant correlations
among SNPs are evident, primarily restricted to the 5’ and 3’ ends of COL1a1.
Overall we do not find significantly different combinations of SNPs among
samples, but instead that the significant Snn patterns are largely explained by
differences in haplotype frequency (supplementary fig. 2, Appendix B). Our
LDhat analysis finds that very few pairwise comparisons are significantly
associated given a background recombination rate model, suggesting overall low
rates of crossing-over in the region. Interestingly, several SNPs primarily located
between introns 11-16 exhibit significantly less LD than expected (fig. 4);
therefore, we employed the HOTSPOTTER program of Li and Stephens (2003),
which tests the hypothesis that elevated haplotype diversity in a gene region is
inconsistent with that observed across the gene given local and locus-specific
estimates of recombination rates. We first examined the entire global dataset and
located a “hotspot” centered on intron 13 with an estimated ρ = 9.75-fold greater
than the background (ρ = 4) and that is 1.9 x 105 times more likely (P<0.05) to
explain the observed haplotype diversity than is a constant ρ model. When
populations were analyzed independently, this same pattern was still evident; yet,
likely due to a lack of statistical power with smaller sample sizes, significance
was not detected for the Italian, Middle Eastern, North African, and sub-Saharan
African samples.
![Page 66: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/66.jpg)
55
With this elevated haplotype diversity centered on intron 13 identified by
the LD analyses, we re-examined the overall haplotype structure and significant
Snn patterns in the 5' and 3' “haploblocks” (supplementary fig. 2, Appendix B). As
such, FST and Snn were recalculated and permutation analyses performed as above
for each of the 5’ (positions 368-4245) and the 3’ (positions 5714-16094) regions,
excluding SNPs in the hotspot (positions 4560-5430; fig. 4). In the 5’ region,
several contrasts differ by as much as 62% (supplementary fig. 3, Appendix B),
yet conversely, in the 3’ region, estimates of genetic differentiation are relatively
much lower (supplementary fig. 4, Appendix B). Thus, it appears that the overall
haplotype structure variation among groups is largely explained by a ~4-kb
haploblock in the 5’ region that coincidentally includes the first-intron Sp1
binding site SNP (fig. 4). Specifically, this block is entirely absent from our
Chinese and Japanese samples and is found only once in the Southeast Asian
sample.
For our analysis of the Sp1 SNP, based on comparison with chimpanzee
sequence, the Sp1-T allele is the derived state, which shows considerable
variation in frequency (0-20%) among geographic regions (supplementary table 3,
Appendix B). For our Thomson et al. (2000) age estimate, we first constructed a
neighbor-joining tree for the entire dataset using MEGA v. 4 (Tamura et al. 2007),
which shows that Sp1-T alleles do not form a single group (supplementary fig. 5,
Appendix B). Of the 17 Sp1-T bearing haplotypes, even if we look at only the 11
that form a single group (and conservatively ignore what appear to be
![Page 67: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/67.jpg)
56
recombinants), we obtain an age estimate of 190 ± 38 Ky. It is possible that
increased recombination rates could result in artificially elevated age estimates for
mutations as “old” and “young” haplotypes may recombine frequently; however,
other than the apparent “hotspot,” estimates of LD at COL1a1 suggest that
recombination rates overall are not unusual. In addition, the overall Sp1-T
haplotype frequency and associated SNP diversity (θπ = 0.04%) is not consistent
with a recent increase in its frequency by some form of population expansion or
selection. Altogether, with its geographic distribution, these data strongly support
a relatively old age estimate of the Sp1 SNP.
Discussion
This study follows up on the observations of Stover and Verrelli (2010) that
suggested introns and other noncoding regions at COL1a1 have been
evolutionarily conserved across vertebrates and represent viable candidates to
identify functional and disease-related variation today. With the first population
genetic contrasts of coding and noncoding COL1a1 gene regions in a globally
diverse and random sample of humans, our current analyses find unusual patterns
of amino acid polymorphism relative to divergence over deep time, evidence of a
geographically varying haploblock structure outside of coding regions, and
support for an ancient origin of the functionally-relevant Sp1 binding site variant.
Here, we discuss the implications these novel observations have for the long- and
short-term evolutionary pressures that have shaped bone-related phenotypes and
disease in natural human populations.
![Page 68: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/68.jpg)
57
As our previous analyses of the protein sequence across ~450 My of
vertebrate evolution show a long history of strong functional constraint (Stover
and Verrelli 2010), we would predict that amino acid polymorphism in the
population today would be rare. This rationale is also supported by the
documented abundance of COL1a1 amino acid polymorphism associated with
severe disease in clinical studies. Thus, the observation of 6 amino acid
polymorphisms in our sample may be considered surprising. In addition, a
previous study that focused on coding regions in 4 populations of American-
admixed individuals identified 3 rare amino acid variants, 2 of which were novel
(Chan et al. 2008). An additional hypothesis for this study was that clinical
screens reveal COL1a1 amino acid polymorphisms because of their biased link to
disease, but that such variation in randomly sampled natural populations is likely
to have little phenotypic consequence. However, the 205 amino acid variant found
here that has previously been associated with osteopenia (Spotila et al. 1994),
implies this hypothesis is not entirely supported. Although phenotypic
information is unknown for the novel mutations, we can use the model we
generated based on evolutionary constraint at sites over time to estimate the
likelihood of severity (Stover and Verrelli 2010). Interestingly, the four novel
amino acid variants found in this study as well as one from the Chan et al. (2008)
screen occur at amino acid positions considered to have the lowest evolutionary
rates. Thus, we may predict that these amino acid polymorphisms likely have
some functional association with respect to skeletal phenotype.
![Page 69: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/69.jpg)
58
Given the complete lack of amino acid divergence between human and
chimpanzee COL1a1, the pattern of rare human amino acid polymorphisms may
suggest that purifying selection against COL1a1 protein variation in human
populations has been relatively weak. However, as pointed out in Stover and
Verrelli (2010), mutations of varying degrees of severity cluster on the protein,
and these regions show varying levels of evolutionary constraint even within the
primate lineage beyond the last 4-6 My. Specifically, patterns of polymorphism
and divergence here suggest that certain mutations are rapidly removed by
purifying selection, while others may have less severe effects on bone phenotype.
Thus, purifying selection at COL1a1 is certainly not “weak,” but appears to vary
with respect to its strength across the protein sequence. In fact, individuals with a
COL1a1 amino acid polymorphism comprise >9% of our random sample, which
could be considered quite common compared to genome-wide DAMs in general.
In addition, these variants are distributed across 6 different geographic locales,
one of which is sub-Saharan Africa where most argue purifying selection has been
strongest compared to other recently colonized areas (e.g., Lohmueller et al.
2008). Thus, a random sampling not only reveals novel amino acid variants, but
also surprisingly suggests these mutations may not be as rare as predicted, and
could in fact be a source of observed bone strength variation among populations
(Looker et al. 1998; Bachrach et al. 1999; Barrett-Connor et al. 2005; Baxter-
Jones et al. 2010).
![Page 70: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/70.jpg)
59
Interspecific patterns of COL1a1 intron length and content seen across
ancient and even recent evolutionary time for vertebrates implied strong historical
functional constraint in these regions (Stover and Verrelli 2010). Estimates of
silent site human polymorphism are on par with that seen at autosomal loci in
general; thus while COL1a1 has significantly high GC-content that may otherwise
cause elevated mutational input (Stover and Verrelli 2010), levels of noncoding
polymorphism are not unusual. That said, the interesting pattern of variation
associated with noncoding regions is the haplotype structure that effectively
separates the locus into two “blocks” with significant population differentiation
being associated with the 5’ region. Thus, in spite of strong purifying selection
acting on coding regions and overall exon-intron structure, likely to reduce exon
shuffling and unequal crossing-over events (Cohn et al. 1993; Stover and Verrelli
2010) that are highly deleterious (e.g., Barsh et al. 1985; Cohn et al. 1993), the 5’
end of COL1a1 appears evolutionarily independent with respect to effects of
hitchhiking. As bone phenotypic variation across human populations is likely
explained in part by gene expression variation (e.g., Dohi et al. 1998; Ota et al.
2001; Fang et al. 2003; Liu et al. 2003; Jin, van’t Hof et al. 2009), the fact that we
find highly significant geographic variation for COL1a1 haplotypes in the 5’
region, where polymorphism is most likely to alter promoter function and
expression, is most interesting and warrants inspection from a combined
evolutionary-functional perspective.
![Page 71: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/71.jpg)
60
A perfect example is the Sp1 binding site functional variant. As previously
noted, although many studies have investigated its affect on COL1a1 gene
expression, few have examined its evolutionary history and origin (Gong and
Haynatzki 2003). The observed geographic distribution and age estimate suggest
that it is not recent, but instead may have existed prior to the emergence of
modern humans out of Africa (e.g., Garrigan and Hammer 2006). The possibility
that this variant and the associated impact on phenotype, such as increased
fracture risk, have been segregating in the population for a relatively long time is
further support against the hypothesis that COL1a1 amino acid polymorphism in
general is the result of very recent “weak” purifying selection. In fact, the patterns
observed here also do not rule out the possibility of the Sp1 SNP having historic
adaptive value, and thus, larger samples of this allele from global populations,
with long-range LD, could further clarify this hypothesis. Finally, it should be
noted that although haplotypes bearing the Sp1 SNP show geographic variation
among ethnically diverse groups that exhibit bone-related phenotypic variation,
this haplotype alone cannot explain the significant LD, 5’ haploblock, and
geographic differentiation observed at COL1a1. Thus, it is also possible that other
expression variants are also responsible, yet it is difficult to say whether any
within the dataset here are more than candidates. However, our molecular
evolutionary analysis implies that further haplotype structure studies upstream of
the 5’ region and in larger samples would narrow down the potential sites with
![Page 72: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/72.jpg)
61
which to conduct functional analyses (i.e., tests of the mutational impact on
promoter expression through reporter assays).
Conclusion
Although the abundance of DAMs discovered in clinical screens might suggest
that amino acid mutation at COL1a1 is highly deleterious in general, the
proportion of individuals with amino acid variants in our random sampling of the
natural population is not consistent with this hypothesis. This pattern suggests that
the natural population harbors COL1a1 amino acid variation much higher than
believed, and thus, could be contributing to functional variation observed across
groups in BMD. In addition, the pattern of haploblock structure associated with
noncoding variation supports the prediction from our previous interspecific
comparisons among vertebrates suggesting that noncoding regions also represent
potential foci for functional variation. Similarly, the estimated ancient age and
geographic distribution of the Sp1 SNP also imply that it is not consistent with a
deleterious evolutionary model. In fact, together with the other patterns of
noncoding SNPs, the Sp1 variant may reflect gene expression variation that is
common, not new in origin, and leads to different BMD phenotypes across
populations, a rather surprising hypothesis given the functional constraint long
predicted for type I collagen.
![Page 73: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/73.jpg)
62
Acknowledgments
The authors thank Michael Rosenberg for developing a PhaseSeqs data-script to
streamline the transfer of data for program analyses. This research was funded in
part by National Science Foundation grants BCS-0715972 (to B.C.V) and DEB-
0909637 (to B.C.V. and D.A.S.).
![Page 74: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/74.jpg)
63
Table 3
COL1a1 Population Diversity Estimates
Population na Sb θπc Dd
Global 192 109 0.11 -1.22Sub-Saharan African 18 61 0.14 -0.69North African 14 41 0.11 -0.20Middle Eastern 20 35 0.10 0.27 Russian 20 34 0.10 0.31 Chinese 20 27 0.08 0.51 Japanese 20 24 0.09 1.54 Southeast Asian 20 41 0.10 -0.23Mexican 20 39 0.07 -1.06Northern European 20 29 0.09 0.85 Italian 20 29 0.09 0.81 a Number of chromosomes
b Number of silent site SNPs (synonymous and intron sites, excluding the first intron and splice sites; see Materials and Methods) c Average number of pairwise differences between sequences (%)
d Tajima’s D statistic
![Page 75: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/75.jpg)
64
Table 4
Intraspecific and Interspecific Tests of Neutrality
No. of Differences Sample Sites Polymorphic Fixed P-value Global Nonsynonymous 6 0 Synonymous 12 18 0.02 Silent 109 83 0.04 Sub-Saharan African Nonsynonymous 1 0 Synonymous 7 18 0.31 Silent 61 83 0.43 Non-African Nonsynonymous 5 0 Synonymous 7 18 0.01 Silent 84 83 0.05 Note: “Silent” sites include synonymous and intron sites, excluding the first intron and splice sites. “P-value” refers to McDonald-Kreitman analysis (Fisher’s Exact Test), and is the result of contrasts between “Nonsynonymous” with “Synonymous” sites and “Nonsynonymous” with “Silent” sites for each of the three samples.
![Page 76: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/76.jpg)
Fig. 4. COL1a1 gene diagram. Black boxes denote exons, white boxes denote 5’
and 3’ mRNA untranslated regions, and the striped box denotes the region of the
promoter re-sequenced here. The “*” denotes a 496-bp gap in the collected
sequence of intron 25 due to the presence of an Alu element. Positions (in bp) are
numbered starting with the first nucleotide of the first exon. Positions in bold
represent the “5’ haplotype” absent among our Asian samples (see Results). The
Sp1 binding site variant in the first intron is at position 1126. Significant pairwise
correlations among polymorphic sites >5% frequency in 192 human
chromosomes are represented with light-shaded boxes, whereas dark-shaded
boxes represent comparisons that exhibit significantly less linkage disequilibrium
than expected given a gene-specific evolutionary model of recombination (see
Materials and Methods).
65
![Page 77: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/77.jpg)
66
CHAPTER 4: COMPARATIVE HUMAN AND CHIMPANZEE
ANALYSES OF COL1a1
Abstract
The most abundant structural protein in vertebrates, type I collagen, is encoded in
part by the collagen type I alpha 1 (COL1a1) gene, which harbors >600 mutations
linked to human skeletal diseases like osteoporosis. Our previous comparative
species work reflecting ~450 My of vertebrate evolution has shown that the
COL1a1 protein exhibits patterns of varying selective constraint across the amino
acid sequence, and surprisingly strong evidence for functional constraint on intron
composition and content. In addition, our previous population genetic analyses of
a natural, global human sample are consistent with these species comparisons in
showing several amino acid variants, in addition to an unusual haplotype structure
for noncoding regions likely related to observed differences in gene expression
across populations. Here we conduct an analysis of the >17-kb COL1a1 locus in
40 chromosome sequences from a population sample of chimpanzees, our closest
living relative, to determine whether patterns seen in humans at coding and
noncoding regions are unique. Interestingly, although we find no amino acid
variation, we reveal a significant excess of intermediate frequency polymorphism
that segregates between two haplogroups, as well as a partial exon duplication at
~20% in frequency, which is surprising given the latter are extremely rare and
deleterious in humans. Finally, long-range linkage disequilibrium analyses of
flanking regions spanning ~180-kb of the COL1a1 chromosomal region find an
![Page 78: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/78.jpg)
67
ancient age for the two haplogroups predating chimpanzee-bonobo divergence.
These patterns of noncoding haplotype structure and exon diversity are discussed
in light of their differences from humans as well as the implications they have for
functional significance and skeletal disease evolution.
Introduction
Bone strength as well as the incidence and severity of related skeletal disorders
like osteoporosis vary significantly among human populations (e.g., Lauderdale et
al. 1997; Looker et al. 1997; Melton 1997; Bachrach et al. 1999; Barrett-Connor
et al. 2005; Baxter-Jones et al. 2010) due in part to genetic differentiation (e.g.,
Dvornyk et al. 2003; Gong and Haynatzki 2003; Lui et al. 2003; Gong et al. 2006;
Koller et al. 2010). Phenotypic data available from non-human primates, though
limited, also suggest that bone strength varies within other species (e.g., Sumner
et al. 1989; Cerroni et al. 2000; Black et al. 2001; Gunji et al. 2003; Havill et al.
2003), which is also likely due in part to underlying genetic variation (e.g., Lipkin
et al. 2001; Havill et al. 2005). In addition, slight differences in bone morphology
have also been documented between humans and our closest-living relatives,
chimpanzees, for osteoporotic-like symptoms, such as patterns of bone loss and
the accumulation of microfractures with age (e.g., Sumner et al. 1989; Wang et al.
1998; Gunji et al. 2003; Kikuchi et al. 2003; Mulhern and Ubelaker 2003, 2009;
Matsumura et al. 2010). As such, these data suggest that underlying genetic
variation among species at bone-related genes may contribute to phenotypic
![Page 79: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/79.jpg)
68
variation, the determination of which would further our understanding of not only
human skeletal disease, but also natural variation in bone strength.
Among the genes most commonly associated with bone-related
phenotypic variation in humans is COL1a1, which encodes the primary subunit of
type I collagen, the main structural protein of bone, teeth, and tendon (Viguet-
Carrin et al. 2006). With >600 disease-associated mutations (DAMs; primarily
linked to osteoporosis, osteogenesis imperfecta types I-IV, and Ehlers-Danlos
Syndrome types I and VIIA; Dalgleish 1997; Marini et al. 2007) as well as
associations with natural variation in bone strength among human populations
(e.g., Garcia-Giralt et al. 2002; Stewart et al. 2006; Jiang et al. 2007; Ioannidis et
al. 2007; Kaufman et al. 2008), this locus is a prime candidate for research in
bone-related phenotypic variation, not only among human populations, but among
other species as well (Stover and Verrelli 2010).
The majority of known COL1a1 DAMs affect protein coding regions,
typically within the triple-helix domain that is composed of a repeating amino
acid sequence with glycine, the smallest of the amino acids, in every third
position, which enables this domain to wind into its compact structure in type I
collagen (Yamada et al. 1980; Bernard et al. 1983; Exposito et al. 2002; Boot-
Handford and Tuckwell 2003; Aouacheria et al. 2004; Wada et al. 2006). Because
type I collagen is a triple-helix comprised of two COL1a1 and one COL1a2
subunits, known protein length mutations of any size are deleterious, often lethal,
since an increase in length of one subunit disrupts helix stability without a similar
![Page 80: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/80.jpg)
69
lengthening of the other subunit (e.g., Cohn et al. 1993; Pace et al. 2001; Cabral et
al. 2003). As such, large duplications/deletions of the 51 exons encoded by the
>17-kb COL1a1 locus are exceedingly rare (e.g., Barsh et al. 1985; Cohn et al.
1993; Bodian et al. 2009). Amino acid mutations have similar effects on helix
stability with varying phenotypic outcomes from lethality to mild, osteoporotic-
like symptoms (Kuivaniemi et al. 1997; Dalgleish 1997; Marini et al. 2007; Rauch
et al. 2010), which we have previously shown can be predicted based upon the
degree of evolutionary conservation of amino acid positions over the past ~450
My, with a positive correlation between site conservation and the phenotypic
severity of DAMs (Stover and Verrelli 2010).
Within the natural human population, however, although the number of
COL1a1 amino acid polymorphisms is higher than expected given the overall
high evolutionary constraint at this locus, the low frequency of these variants
suggests that the association of COL1a1 with bone-related phenotypic variation
among populations cannot be explained by protein variation alone (Stover and
Verrelli, Chapter 3). Rather, this association is also driven by noncoding
variation. For example, a first intron, Sp1 transcription factor binding site
mutation has already been shown to increase COL1a1 gene expression, causing
population differences in reduced bone strength and increased fracture risk due to
variation in the frequency of this polymorphism (e.g., Grant et al. 1996; Mann et
al. 2001; Bandres et al. 2005; Ralston et al. 2006; Jiang et al. 2007; Jin, van’t Hof
et al. 2009). Further, because COL1a1 intron composition has been highly
![Page 81: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/81.jpg)
70
conserved historically, even resulting in reduced human-chimpanzee intron
divergence, introns in general may harbor phenotypically important genetic
variation (Stover and Verrelli 2010). In fact, within the human population,
noncoding regions of COL1a1 demonstrate unusually high haplotype
differentiation among populations, particularly for the 5’ region of the gene where
variants that alter expression are most likely to occur (Stover and Verrelli,
Chapter 3).
Overall, several interesting patterns at COL1a1 in humans have emerged.
First, the number of amino acid polymorphisms in the natural population is
surprisingly high (Chan et al. 2008; Stover and Verrelli, Chapter 3), suggesting
that not all protein variants today are highly deleterious. Second, length variation
of COL1a1 coding regions has yet to be identified in the natural population,
unassociated with a deleterious disease phenotype (Chan et al. 2008; Bodian et al.
2009; Stover and Verrelli, Chapter 3). Third, noncoding variation at COL1a1
demonstrates unusual haplotype structure among populations (Stover and Verrelli,
Chapter 3). To determine how unusual these human population-level patterns may
be, however, they must be placed in an evolutionary context. Although our
comparisons among distantly-related species help identify ancient factors
affecting COL1a1, comparative population genetic analyses would reveal what
factors are similarly impacting other species today. As such, chimpanzees serve as
a perfect model for this purpose, particularly since available data suggests that
bone-related phenotypic differences may exist between our closely-related
![Page 82: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/82.jpg)
71
species. Here we present the first population genetic analysis of COL1a1 in a
population sample of chimpanzees to determine if this gene has been subject to
similar selective pressures as in humans.
Materials and Methods
Population Samples
Although there are several recognized chimpanzee subspecies, the western Africa
Pan troglodytes verus subspecies represents the most appropriate contrast with
humans because of similar levels of nuclear population diversity (Stone et al.
2002; Gilad et al. 2003; Fischer et al. 2004; Wooding et al. 2005; Verrelli et al.
2006, 2008; Claw et al. 2010). As such, nucleotide sequence data were collected
from DNA samples of 20 wild-born, unrelated P. t. verus chimpanzees (40
chromosomes) that have been used previously in population genetic analyses of
other loci (Stone et al. 2002; Wooding et al. 2005, 2006; Verrelli et al. 2006,
2008; Claw et al. 2010). Available COL1a1 gene sequences for orangutan (Pongo
abelii) and macaque (Macaca mulatta) were also obtained from the National
Center for Biotechnology Information (NCBI) database in order to infer derived
versus ancestral states of polymorphic positions for estimates of divergence.
DNA Amplification and Sequencing
DNA sequence was collected for a total of 16,989 bp of the COL1a1 locus on
chromosome 17 including, 1,223 bp of the promoter and 263 bp of the 5’ and 3’
mRNA untranslated regions. This sequence is contiguous, spanning all of the 51
exons and intervening introns except for a 508-bp gap in intron 25 that was
![Page 83: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/83.jpg)
72
avoided due to the presence of an Alu element. Protocols follow those of Stover
and Verrelli (2010) in which both human and chimpanzee gene sequences were
initially generated. Polymerase chain reaction (PCR) and sequencing primers
were designed from human genome sequence (accession # NT_010783.14) and
are available upon request. PCR products were purified using shrimp alkaline
phosphatase and exonuclease I (US Biochemicals, Cleveland, OH) prior to DNA
sequencing with an Applied Biosystems (Foster City, CA) 3730 capillary
sequencer. Sequences were aligned and edited using Sequencher v. 4.5 (Gene
Codes, Ann Arbor, MI).
Statistical Analyses
While nonsynonymous exon sites are generally expected to be the most highly
conserved due to their importance in protein function, other noncoding regions
may also have putative “functional” effects on gene expression and regulation and
even exhibit significant evolutionary constraint, such as first introns (e.g.,
Bornstein et al. 1987; Majewski and Ott 2002), and 5’ and 3’ untranslated mRNA
(UTR) and promoter regions (e.g., Wray et al. 2003; Haygood et al. 2007; Cheung
and Spielman 2009). Synonymous exon sites and non-first introns on the other
hand, though not completely “silent” in terms of functional significance (e.g.,
Urrutia and Hurst 2003; Chamary and Hurst 2005), are generally less constrained
relative to these other sites (i.e., nonsynonymous, promoter, UTR) and thus, are
commonly used to reflect estimates of neutrality in population genetic studies
similar to this one (e.g., Haygood et al. 2007; Claw et al. 2010). As such, all
![Page 84: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/84.jpg)
73
diversity analyses were applied to several gene regions, in addition to simply
exons and introns, to test hypotheses of functional constraint across the COL1a1
locus. The DnaSP v. 5.1 program (Rozas et al. 2003) was used for these diversity
statistic estimates, unless otherwise noted.
Heterozygous sequence data were resolved into haplotypes using PHASE
v. 2.1.1 (Stephens et al. 2001). To examine the consistency of haplotype
reconstruction among runs, this process was repeated with 100, 250, and 500
iterations, with the best-fit haplotypes from the likelihood model being used for
all subsequent analyses. Genetic diversity was estimated as Watterson’s (1975)
θW, which is based upon the number of segregating sites (S) corrected for sample
size, and as θπ, which is based upon the average number of pairwise differences
among sequences (Nei 1987). These two estimates of the population parameter θ
= 4Neµ (for an autosomal locus, with “µ” denoting mutation rate per bp) are
expected to be equal under neutrality; however, non-neutral and demographic
processes are expected to skew the SNP frequency spectrum, which can be
detected using Tajima’s (1989) D statistic. Significance of estimates of COL1a1
diversity and of D was determined through comparisons to estimates from
previously published P. t. verus studies of other autosomal loci (Gilad et al. 2003;
Yu et al. 2003; Fischer et al. 2004; Claw et al. 2010) using coalescent simulations
provided by DnaSP with 10,000 replicates. To compare putatively silent and
functional SNP diversity within and between chimpanzees and humans, we also
used McDonald and Kreitman (1991) interspecific tests of neutrality, which test
![Page 85: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/85.jpg)
74
the hypothesis that functional sites exhibit patterns of diversity within recent and
historical time periods consistent with that seen at silent sites that reflect a simple
evolutionary model of drift.
Finally, to enable comparisons of human and chimpanzee haplotype
structure and linkage disequilibrium (LD) at COL1a1, we first identified
associations among SNPs using r2 with significance assessed by Fisher’s exact
tests after a standard Bonferroni correction for multiple comparisons as
implemented in DnaSP. Similar to analyses in Chapter 3, we also examined
correlations using the LDhat program of McVean et al. (2002), which uses the
approximate-likelihood method of Hudson (2001) and permutation analyses to
determine if pairwise comparisons among SNPs are in significantly more or less
LD than expected given the distance between them and the background rates of
recombination and mutation. Low-frequency SNPs are uninformative for these
comparisons; therefore, only polymorphisms >5% frequency were used in
haplotype analyses to increase our statistical power to detect correlations. As
previous analyses of P. t. verus populations, including those using these same
samples here, show no evidence of historical structure either geographically or
temporally (e.g., Wooding et al. 2005; Verrelli et al. 2006, 2008; Becquet et al.
2007; Claw et al. 2010; Leuenberger and Wegmann 2010), analyses such as
Hudson’s (2001) Snn that were performed in Chapter 3 for our human sample were
unnecessary here.
![Page 86: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/86.jpg)
75
Results
Chimpanzee COL1a1 Polymorphism and Haplotype Structure
A total of 16,989 bp of the COL1a1 locus was collected from each of the
chimpanzees with 10,677 bp representing silent sites (table 5). No
nonsynonymous mutations were detected in our sample (table 5; supplementary
fig. 1, Appendix C). Surprisingly, however, there is a duplication involving 36-bp
of exon 35 (fig. 5) found at an allele frequency of 17.5%, including one
homozygous individual. Intron splice sites surrounding this partial exon
duplication are intact, and thus, if this duplication is encoded, would result in an
additional 12 amino acids being added to the COL1a1 protein without altering the
downstream reading frame (fig. 5).
Although we find no significant pattern of diversity among gene regions
strictly based on “numbers” of mutations from our McDonald-Kreitman tests
(supplementary table 1, Appendix C), silent diversity at COL1a1 (θπ), both overall
and for intron and synonymous sites independently is significantly higher than the
average (θπ=0.1%) of previously published autosomal loci (P=0.001; table 5;
Gilad et al. 2003; Yu et al. 2003; Fischer et al. 2004; Claw et al. 2010). Given
levels of silent diversity at COL1a1, the observed positive value of silent Tajima’s
D is significantly higher than expected under a standard neutral model (P=0.006;
table 5) and is also well outside the range of previously reported values across the
genome, suggesting that there is a significant excess of high frequency SNPs at
COL1a1.
![Page 87: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/87.jpg)
76
LD analyses show that the significant excess of common polymorphisms
at COL1a1 is likely the result of sequences or lineages being split into two
haplogroups, designated haplogroups “A” (found at 55% allele frequency) and
“B” (found at 45% frequency), that extend the length of our sequenced region
(fig. 6; supplementary fig. 2, Appendix C). After excluding 5 predicted
recombinant sequences to be conservative about shared variation among lineages,
these two haplogroups are defined by 62 polymorphisms, 4 of which fall within
promoter and UTRs and an additional 8 within the first intron (supplementary fig.
1, Appendix C). Though silent SNP diversity (θπ) associated with haplogroup B
chromosomes is over two-fold lower than that associated with haplogroup A
chromosomes (excluding recombinants; table 6), coalescent simulations suggest
that this difference is not unusual. However, given the number of silent SNPs in
our overall sample found at an allele frequency >5% (S=70), it is highly unusual
to find only 4 such SNPs associated with haplogroup B chromosomes
(P<0.00001) and even 25 such SNPs associated with haplogroup A chromosomes
(P=0.01; table 6). This pattern further supports the hypothesis that the high levels
of diversity found in our overall sample are due to the presence of these two high
frequency, and highly divergent haplogroups.
A neighbor-joining tree constructed from the number of differences
between chromosomes for SNPs >5% allele frequency using MEGA v. 4 (Tamura
et al. 2007) also supports the hypothesis that the haplotype associated with the
partial exon 35 duplication (hereafter referred to as the “exon duplication-bearing
![Page 88: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/88.jpg)
77
haplotype”) occurred on the haplogroup A background (fig. 6). Further, no SNPs
>5% frequency are found among the 7 chromosomes bearing the exon duplication
(table 6). Even if we only use the 8 silent SNPs found among the non-
recombinant haplogroup A chromosomes as a conservative expectation of the
level of silent diversity associated with this haplogroup, coalescent simulations
reveal that it is statistically unusual to find 7 chromosomes with no associated
polymorphisms (P=0.02).
Polymorphism and Haplotype Structure Surrounding COL1a1
While high levels of diversity and LD appear to be characteristic of COL1a1, it is
possible that these patterns are typical of this region of chromosome 17 in
chimpanzees. Estimates of LD surrounding COL1a1 can also be informative
about the age and origin of specific polymorphisms that can then be used to test
evolutionary hypotheses of adaptive vs. neutral scenarios, i.e., is the exon 35
duplication consistent with positive selection? Thus, we collected additional
nucleotide sequence data for a series of 1-2 kb PCR fragments 5’ and 3’ of
COL1a1, starting ~10-kb away and spanning a total of ~180-kb (fig. 7). Because
we are primarily interested in variation in silent diversity for testing these
evolutionary hypotheses, PCR fragments were targeted to intergenic and intron
regions when available in both directions.
Comparisons of rates of recombination and patterns of LD over short
intervals within the human and chimpanzee genomes have revealed significant
differences even over the relatively short time separating these two species (e.g.,
![Page 89: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/89.jpg)
78
Ptak et al. 2005; Winckler et al. 2005). Thus, here we sampled diversity from
bonobo (Pan paniscus), which has an estimated divergence time from the
chimpanzee lineage at ~0.8-1.8 My (e.g., Stone et al. 2002; Won and Hey 2005;
Becquet et al. 2007; Wegmann and Excoffier 2010), as this can provide estimates
of LD and haplotype structure to better contrast patterns seen here at COL1a1 for
P. t. verus. DNA from 13 (26 chromosomes) bonobos was used in PCR and
sequencing as described above to examine these same regions flanking COL1a1
both 5’ and 3’ (fig. 7).
In total, we have collected an additional 13,836 bp of sequence 5’ and
18,907 bp 3’ of COL1a1, with 13,114 bp and 13,286 bp constituting silent sites,
respectively (supplementary table 2, Appendix C), giving a grand total of ~50-kb
of sequence data in chimpanzees spread across ~180-kb of chromosome 17.
Outside of the COL1a1 locus in chimpanzees, silent SNP diversity (θπ) decreases
with regional estimates consistent with those previously reported for other
autosomal loci (supplementary table 2, Appendix C; Gilad et al. 2003; Yu et al.
2003; Fischer et al. 2004; Claw et al. 2010). Similarly, Tajima’s D decreases
outside of the COL1a1 locus falling within the range of previously reported
values (supplementary table 2, Appendix C). There is also reduced support for
haplogroups A and B outside of the COL1a1 locus as evidenced by only a single
SNP in the surrounding regions (~10-kb 3’ of the COL1a1 UTR) found in
significant LD with these haplogroups and by the significant increase in haplotype
diversity on either side of COL1a1 as indicated by our LDhat analysis (fig. 6;
![Page 90: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/90.jpg)
79
supplementary fig. 2, Appendix C). However, even when considering the
background rate of recombination with our LDhat analysis, there are still
haplotypes with significant LD among sites >100-kb away, included among
which is the COL1a1 exon duplication-bearing haplotype (supplementary fig. 2,
Appendix C).
We may expect that an allele that has been affected by recent positive
directional selection may bear a signature of reduced genetic diversity and
unusually long-range LD (e.g., Tishkoff et al. 2001, 2007; Sabeti et al. 2005,
2006; Saunders et al. 2006). To determine if the exon duplication-bearing
haplotype is associated with unusually long-range LD, we used the Long-Range
Haplotype test of Sabeti et al. (2002) as implemented in the program Sweep v. 1.1
(Sabeti et al. 2007). Non-overlapping cores of between 3 and 10 polymorphisms
identified by the method of Gabriel et al. (2002) were generated for the COL1a1
locus (for a total of 9 cores). Using polymorphisms >5% frequency from our
entire chromosome 17 sequenced region, extended haplotype homozygosity
(EHH) was calculated for each COL1a1 core haplotype at increasing distances
(measured in kb) from the core (Sabeti et al. 2002). Because a fine-scale
recombination map is not yet available for the chimpanzee genome, we corrected
for local variation in recombination rate using the relative EHH (REHH) measure,
which compares EHH of each core haplotype to all other core haplotypes at
COL1a1 (Sabeti et al. 2005). As implemented in the significance calculator
available in Sweep, we assessed the significance of the extent of relative LD
![Page 91: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/91.jpg)
80
associated with the exon duplication-bearing haplotype by generating a
distribution of REHH measured at ~85-kb from either side of each core haplotype
and asked whether REHH associated with the exon duplication-bearing haplotype
is significantly greater than that of other core haplotypes of similar frequency at
COL1a1 (i.e., 15-20% frequency). Relative LD associated with the exon
duplication-bearing haplotype does not extend significantly further than expected
given the frequency of this haplotype (supplementary fig. 3, Appendix C). In fact,
EHH does not begin to decay within our sequenced region for 3 COL1a1
haplotypes (with a maximum frequency of 22.5%), including the exon
duplication-bearing haplotype.
Within bonobos, although the number of polymorphisms identified in 5’
and 3’ regions was comparable to those observed in chimpanzees, the frequency
of these polymorphisms is reduced. For example, overall silent SNP diversity (θπ)
in both regions 5’ and 3’ of COL1a1 in bonobos is approximately half that
observed in chimpanzees and results in an overall trend toward negative Tajima’s
D values among all PCR fragments (supplementary table 3, Appendix C). In fact,
given levels of silent diversity within the 3’ region, the observed negative value of
silent Tajima’s D is significantly lower than expected (P=0.02; supplementary
table 3, Appendix C) indicating that there is a skew toward low frequency
polymorphism in bonobos. As such, only 45 polymorphisms in our dataset reach
an allele frequency adequate for linkage analyses (>5% frequency). Interestingly,
similar to the chimpanzee dataset, at least two long-range haplotypes exist in
![Page 92: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/92.jpg)
bonobos with significant LD spanning sites >100-kb away (supplementary figs. 4
and 5, Appendix C).
COL1a1 Haplotype Age Estimates
We used the method of Thomson et al. (2000) to estimate the age of haplogroups
A and B as well as of the exon duplication-bearing haplotype. As previously
described (Thomson et al. 2000; Scheinfeldt et al. 2009; Claw et al. 2010), this
age estimate (t) is based upon the relationship:
∑=
=n
i
i
nx
t1 )( µ
where xi is the number of mutational differences between the ith sequence and the
estimated most recent common ancestor (MRCA) of all sequences, n is the total
number of sequences in the sample, and µ is the mutation rate. Here, as in Claw et
al. (2010), µ is estimated as the number of substitutions between human and
chimpanzee divided by twice the estimated molecular divergence time between
these species, or 5 My (± 1 My; Kumar et al. 2005). As previously mentioned,
COL1a1 human-chimpanzee divergence may be lower than expected (Stover and
Verrelli 2010), which would result in an underestimate of µ and an overestimate
of age; therefore, all ages were calculated using two estimates of µ. First, we used
a gene-specific estimate based upon the number of substitutions between human
and chimpanzee observed at COL1a1 (i.e., 92). Second, we used the number of
substitutions (excluding nonsynonymous sites) observed in our regions
surrounding COL1a1 and standardized this divergence rate by the length of our
COL1a1 sequenced region to calculate the expected number of substitutions at
81
![Page 93: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/93.jpg)
82
this locus (i.e., 120) if it were evolving at the same rate as the surrounding
regions, which we then used to get a regional estimate of µ. A neighbor-joining
tree constructed from the number of differences at COL1a1 between
chromosomes, including the 5 haplogroup A-B recombinant chromosomes, in
MEGA v. 4 (Tamura et al. 2007) was used to determine xi (supplementary fig. 6,
Appendix C). Because we are dealing with phased haplotypes (see Materials and
Methods), only SNPs >5% allele frequency were used for these age estimates,
which will cause a slight underestimate of xi and, therefore, an underestimate of
age. Thus, to further aid the resolution of haplotype ages, we also genotyped our
bonobo samples for the partial exon duplication and the A and B haplogroups by
sequencing two PCR fragments previously used to amplify COL1a1 in
chimpanzees. The A and B haplogroups were specifically genotyped using 6 of
the segregating SNPs in the first intron of COL1a1 (positions 111-702,
supplementary fig. 1, Appendix C).
Using our gene-specific µ, we estimate the age of the MRCA of
haplogroups A and B to be 3.6 ± 0.7 My. Even with using a more conservative
regional estimate of µ, we still estimate that these haplogroups split 2.8 ± 0.6 My,
which predates the chimpanzee-bonobo molecular divergence time (0.8-1.8 My).
Our 26 bonobo chromosomes are fixed for the haplogroup B allele for 5 of our
genotyped first intron SNPs and the haplogroup A allele for the 6th, which is also
consistent with haplogroup A and B alleles existing in the ancestral population
prior to the divergence of chimpanzee and bonobo. We further estimate the
![Page 94: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/94.jpg)
83
MRCAs of haplogroup A and B chromosomes within chimpanzees to be 1.5 ± 0.3
My and 169 ± 34 Ky, respectively (or 1.1 ± 0.2 My and 130 ± 26 Ky using our
regional estimate of µ).
As previously mentioned, chromosomes with the exon duplication-bearing
haplotype are not variable within the COL1a1 locus (for SNPs >5% frequency);
however, one SNP (position 7246, supplementary fig. 1, Appendix C) separates
this haplotype from the other members of haplogroup A. As such, we estimate
that the exon duplication-bearing haplotype diverged 109 ± 22 Ky (or 83 ± 17 Ky
using our regional estimate of µ). Examining our entire sequenced region, 2 SNPs
are polymorphic among chromosomes bearing this haplotype (supplementary fig.
1, 7, Appendix C), which when using an estimate of µ based upon the number of
substitutions between human and chimpanzee for the entire region (i.e., 367),
gives an estimated MRCA of the exon duplication-bearing haplotype as 27 ± 5
Ky. Consistent with this recent origin of the exon duplication-bearing haplotype,
we do not find the duplication in our bonobo sample.
Discussion
Human and Chimpanzee Amino Acid Polymorphism
Although the COL1a1 protein has been highly conserved over the past ~450 My
of vertebrate evolution, even including no amino acid divergence between human
and chimpanzee (Stover and Verrelli 2010), there is an abundance of DAMs in
humans (Dalgleish 1997; Marini et al. 2007). While this may be consistent with
varying selective pressures over space and time, it is unlikely to be the result of
![Page 95: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/95.jpg)
84
simply weak purifying selection, as discussed in both Chapter 2, and especially in
Chapter 3 where we find that amino acid variation overall is not very rare. On the
other hand, the lack of amino acid variation in chimpanzees suggests strong
purifying selection within this species. It is possible that these two divergent
patterns seen in populations today simply reflect different environmental
constraints between our species, such as in locomotion, diet, and skeletal growth
periods (e.g., Larsen 1995; Abbott et al. 1996; Bogin and Smith 1996; Cotter et al.
2009; Hancock et al. 2010). Tests of gene-specific variation in a functional setting
could evaluate this hypothesis.
Polymorphism and Haplotype Structure
Contrary to amino acid polymorphism, genetic diversity at COL1a1 in
chimpanzees is high and significantly skewed toward common variants, which
segregate into two high frequency and highly divergent haplogroups (A and B).
Age estimates suggest that the variation between these haplogroups existed prior
to the divergence of chimpanzee and bonobo. On the one hand, both haplogroup
A and B alleles have become fixed at different COL1a1 sites within the bonobo
lineage. Combined with the abundance of low frequency polymorphism observed
in the regions surrounding COL1a1, this fixation may be consistent with a recent
expansion of the bonobo population, as has been previously suggested based upon
other autosomal loci (Fischer et al. 2006, but see Eriksson et al. 2004 for
mtDNA). On the other hand, variation between haplogroups A and B still remains
in chimpanzees today.
![Page 96: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/96.jpg)
85
Based upon our sequenced regions surrounding COL1a1, we estimate that
LD associated with these haplogroups in chimpanzees spans ~30-40 kb.
Combined with the high frequency polymorphism that segregates between
haplogroups, this is a classic signature of balancing selection expected in a region
of low recombination (Charlesworth 2006), as is inferred for COL1a1 compared
to the surrounding regions. However, certain demographic models can also cause
similar patterns. Specifically, if two populations that differed in allele frequencies
at COL1a1 were to admix, high levels of polymorphism and LD would be
observed at this locus (e.g., Smith et al. 2001; Smith and O’Brien 2005). There
are several reasons why this possibility is unlikely here. First, as previously noted,
there is no evidence of substructure within this subspecies of chimpanzee (e.g.,
Wooding et al. 2005; Verrelli et al. 2006, 2008; Becquet et al. 2007; Claw et al.
2010; Leuenberger and Wegmann 2010). Second, patterns of diversity and LD
observed at COL1a1 are unique among previously reported studies (e.g., Stone et
al. 2002; Gilad et al. 2003; Fischer et al. 2004; Wooding et al. 2005, 2006;
Verrelli et al. 2006, 2008; Becquet et al. 2007; Claw et al. 2010), which is not
expected under a demographic model as such processes are predicted to have
genome-wide effects. Finally, given the estimated age of the split of haplogroups
A and B, it is unlikely that such a strong pattern resulting from historic population
structure would still persist today since even low levels of recombination are
expected to break apart associations among polymorphisms relatively quickly
(e.g., Clegg et al. 1980; Asmussen and Clegg 1982). We can infer that at least low
![Page 97: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/97.jpg)
86
levels of recombination exist at COL1a1 in chimpanzees because: 1) we find 5
recombinant A-B chromosomes within our population sample, and 2) even though
no nonsynonymous divergence has occurred between human and chimpanzee,
there are typical levels of synonymous divergence demonstrating that these sites
have become unlinked over time (Stover and Verrelli 2010). As such, we would
expect to find less haplotype structure at COL1a1 if these patterns were purely
due to demography. Thus, given our current understanding of chimpanzee
population genetics, there is a strong possibility that the pattern here reflects
ancient balancing selection.
Polymorphic Exon Duplication
Surprisingly, a partial duplication of exon 35, which falls within the triple-helix
domain of type I collagen, has also reached a relatively high frequency of 17.5%
in our chimpanzee population. Within humans, mutations that affect protein
length, and particularly of the triple-helix domain, of fibrillar collagens in general
let alone those of type I collagen, are exceedingly rare as they are often lethal or
result severely deleterious phenotypes (e.g., Barsh et al. 1985; Cohn et al. 1993;
Raff et al. 2000; Cabral et al. 2003; Bodian et al. 2009). As is expected given their
deleterious nature, length variants of the COL1a1 protein have not been found in
humans among samples of the natural population (Chan et al. 2008; Stover and
Verrelli, Chapter 3), which raises the question of how a COL1a1-exon duplication
could have risen to high frequency in chimpanzees.
![Page 98: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/98.jpg)
87
The rarity of polymorphism among alleles with the exon 35 duplication,
the long-range LD associated with its haplotype, and the estimated age of these
alleles could suggest that the exon duplication rose to its current frequency
relatively recently and rapidly, as might be expected under positive directional
selection (e.g., Sabeti et al. 2006). However, other haplotypes at COL1a1 of
similar frequency also have little associated polymorphism (e.g., a 22.5%
frequency haplotype on the haplogroup B background that only has 5 associated
polymorphisms within our entire sequenced region; fig. 6). Further, LD associated
with the exon duplication-bearing haplotype does not extend beyond that of other
COL1a1 haplotypes of similar frequency. However, within humans, haplotype
blocks can easily extend beyond the ~180-kb length of our sequenced region in
chimpanzees, particularly when affected by directional selection (e.g., Sabeti et al.
2002, 2005; Voight et al. 2006; Tishkoff et al. 2007; Enard et al. 2010). Thus, one
could argue that our sampled region of chromosome 17 does not span enough
distance to be able to identify a signature of positive selection based on patterns of
LD, which is supported by our EHH comparisons that show homozygosity does
not decay within our sequenced region for 3 COL1a1 haplotypes. Thus, while it is
unclear if the pattern here supports an adaptive model for this duplication, it is
highly unlikely that this duplication is deleterious, which in itself is a surprising
result given the impact COL1a1 length variants have in humans.
![Page 99: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/99.jpg)
88
Functional Implications and Future Directions
Although the pattern consistent with balancing selection does not readily point to
a “functional” SNP, there are numerous polymorphisms that could have
functional implications. First and foremost is the partial exon duplication. Even
though intact intron splice sites border the exon duplication, this does not
necessarily mean that the exon is encoded as part of the mature protein. Rather,
the entire duplicated region may simply be excised during mRNA processing as
part of the original intron 34, which would further support a neutral explanation
for the high frequency of this polymorphism.
Several other possibilities exist within COL1a1, including 4
polymorphisms in the promoter and UTR, which are regions known to affect gene
expression (e.g., Wray et al. 2003; Haygood et al. 2007). Additionally, 8
polymorphisms segregate between these haplogroups in the first intron in which
numerous transcription factor binding sites have been identified (Bornstein et al.
1987; Vergeer et al. 2000; Jin, van’t Hof et al. 2009) such that these
polymorphisms may also affect COL1a1 expression. As unusual patterns of
variation associated with noncoding regions are also seen in human populations, it
remains to be seen how this variation, adaptive or not, reflects functional variation
within and between species. In addition, functional analyses to determine whether
the gene duplication is transcribed and possibly translated would have a dramatic
impact on our assessment of how protein diversity can evolve at COL1a1.
Nonetheless, our results suggest significantly unusual patterns for humans,
![Page 100: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/100.jpg)
89
chimpanzees, and bonobos for a gene that is otherwise believed to be highly
conserved. This brings into question how population diversity levels may look for
other primates, both closely- and distantly-related. As these comparative
population and species genetic analyses continue, it is not out of reason to
speculate that similar bone strength variation and even skeletal disorders exist
within other primate species, as patterns of similar variation among these groups
predict. Two perfect examples in the human Sp1 and chimpanzee exon 35
duplication already exist from population samples. Thus, studying skeletal
phenotypes, other than simply rare diseases, using non-human primates as
evolutionary models becomes an attractive possibility for medical intervention.
![Page 101: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/101.jpg)
90
Table 5
COL1a1 Diversity Estimates by Gene Region
Region sitesa Sb θπc Dd
Total 16,989 92 0.21 2.44Promoter 1,223 4 0.14 1.95UTRe 263 1 0.19 1.66First intronf 1,462 11 0.30 2.05Other intronsf 9,470 64 0.26 2.30Synonymous 1,207 12 0.41 2.26Nonsynonymous 3,168 0 0 n/a Silentg 10,677 76 0.28 2.37 a Number of nucleotides
b Number of SNPs
c Average number of pairwise differences between sequences (%)
d Tajima’s D statistic
e 5’ and 3’ mRNA untranslated regions (UTR)
f Excludes splice sites
g Includes synonymous and intron sites, excluding the first intron and splice sites
![Page 102: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/102.jpg)
91
Table 6
COL1a1 Haplogroup-Specific Diversity Estimates
Haplogroup na Sb θπc
A 22 25 0.08A (excluding recombinants)d 17 8 0.04B 18 4 0.02Exon duplication-bearing 7 0 0 a Number of chromosomes
b Number of silent-site SNPs >5% allele frequency
c Average number of pairwise differences between sequences (%)
d Haplogroup A members excluding 5 chromosomes predicted to have recombined with Haplogroup B
![Page 103: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/103.jpg)
Fig. 5. Diagram of a 124-bp duplication in chimpanzees involving a partial
duplication of COL1a1 exon 35 (36 bp) that precedes the normal, full-length 54-
bp exon 35. A partial duplication of intron 34 separates these exons. “AG” and
“GT” indicate intact intron splice sites bordering the partial exon duplication.
exon 35exon 35 intron 35intron 34
54 bpAGAG GT
36 bpGT
intron 3488 bp
partial repeat
92
![Page 104: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/104.jpg)
93
Fig. 6. Inferred haplotypes for all polymorphisms with an allele frequency >5%.
Chromosomes with identical haplotypes have been combined into one row with
the number per haplotype listed on the side of the figure. The derived allele for
each site, as inferred from human-chimpanzee-macaque-orangutan contrasts is
represented with a grey box. The 85 polymorphisms at the COL1a1 locus are
indicated with a dashed line at the top of the figure. “5’ region” refers to 35
polymorphisms found within our additional sequenced fragments 5’ of COL1a1
and “3’ region” refers to 46 polymorphisms found within our additional fragments
3’ of COL1a1 as shown in fig. 7. Haplotypes belonging to COL1a1 haplogroup A
are indicated with a solid line on the side of the figure with the first two rows
being haplotypes with the exon duplication (see Results for more information);
remaining haplotypes belong to COL1a1 haplogroup B. See supplementary fig. 1
(Appendix C) for polymorphism positions and allelic states.
![Page 105: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/105.jpg)
94
94
![Page 106: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/106.jpg)
Fig. 7. Gene map of the chromosome 17 region surrounding COL1a1, drawn to
scale, with arrows indicating the orientation of genes in the 5’ to 3’ direction (note
this is the reverse orientation of genes in the genome in order to show COL1a1
from left to right). Positions are numbered according to the first base of the first
exon of COL1a1 as position 1 as determined from the human genome reference
sequence (NCBI build 36.1). Solid lines above the position scale indicate 1-2 kb
regions 5’ (to the left) and 3’ (to the right) of COL1a1 that were PCR amplified
and sequenced in 40 P. t. verus and 26 P. paniscus chromosomes. See Results for
more information.
COL1a1TMEM92 SGCA PPP1R9B SAMD
-80 -60 -40 -20 1 20 40 60 80 100-100
Nucleotide Position (kb)
95
![Page 107: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/107.jpg)
96
CHAPTER 5: CONCLUSION
The main objective of this research was to examine the recent and ancient
evolutionary history of the COL1a1 gene, which codes for the primary subunit of
type I collagen, the main structural and most abundant protein in mammals, to
gain new perspectives on the molecular origins of human skeletal disease related
to reduced bone strength. The molecular variation at this gene was characterized
using three timescales: historically over the past ~450 million years (My) of
vertebrate evolution, recently within and among human populations, and within
the past 4-6 My since the divergence of human and chimpanzee. These timescales
allow for fine-scale resolution of evolutionary change at COL1a1, generating an
expectation based upon historic change and a means for estimating when shifts
may have occurred in selective pressures affecting this locus that could explain
the prevalence of disease-associated mutations (DAMs) in humans.
As discussed in Chapter 2, the COL1a1 amino acid sequence has been
highly conserved during vertebrate evolution; however, temporal and spatial
variation in selective constraint is still apparent among protein domains, which
may be contributing to bone phenotypic variation among vertebrates. Further, it
was shown that this variation in selective constraint can be used to predict the
phenotypic severity of human DAMs. In addition to the COL1a1 protein, this
locus is characterized by the conservation of unusually short, GC-rich introns,
which may be in response to strong stabilizing selection to maintain increased
gene expression. This functional constraint has even lead to reduced human-
![Page 108: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/108.jpg)
97
chimpanzee intron divergence relative to expectations for a GC-rich region.
Though historically considered to be non-functional, COL1a1 introns and the
variation within them may actually impact bone-related phenotypes. Given these
inferences from a molecular evolutionary model, it would be of particular interest
to determine if these patterns of functional constraint are consistent genome-wide
among genes known to be highly expressed across vertebrate lineages.
Specifically, compared to neutral expectations, is reduced human-chimpanzee
intron divergence typical of highly-expressed genes? If so, this pattern would
offer further evidence in support of the importance of intron structure and
composition to the efficiency of gene expression, which would greatly impact
future research in the identification of DAMs in general.
As discussed in Chapter 3, in contrast to historic divergence, no
significant reduction in intron variation was observed recently within humans.
Nonetheless, significant haplotype differentiation was found among populations,
including the absence of an entire haplotype-block from Asian samples. Increased
haplotype diversity provides evidence of a gene region with an increased rate of
recombination, the location of which suggests that the 5’ region of COL1a1 has
been evolutionarily unlinked from the 3’ region, allowing for independence
between the majority of coding and promoter variation. These results have
important implications for the design of future association studies of bone
phenotypic variation in humans. First, to accurately measure potential
associations with the COL1a1 locus, these studies should use polymorphisms both
![Page 109: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/109.jpg)
98
5’ and 3’ of the region with elevated recombination. Second, genotype-phenotype
results based upon a single ethnic group cannot be extrapolated to other groups
given the high levels of population differentiation at COL1a1. Because this
differentiation primarily involves noncoding regions, and given the historic
selective constraint on COL1a1 intron structure and composition, it will be
interesting to determine if noncoding variation in general has functional
consequences that contribute to phenotypic differences in bone strength and
disease susceptibility among human populations.
Although the COL1a1 protein has been highly conserved historically,
>9% of the individuals from a random sampling of the natural human population
carry COL1a1 amino acid variation. Based on the evolutionary site model
discussed in Chapter 2, these amino acid variants are predicted to have at least
some impact on bone-related phenotypes. As with vertebrate comparisons, these
data are consistent with spatial variation in selective constraints at COL1a1, with
highly deleterious mutations being rapidly removed by purifying selection and
others of less severe phenotypic impacts remaining polymorphic within humans
and, therefore, likely contributing to population variation in skeletal phenotypes.
As discussed in Chapter 4, the absence of amino acid variation in chimpanzees, as
well as the lack of amino acid divergence between human and chimpanzee,
suggests that the abundance of DAMs and high proportion of amino acid variation
observed in the natural population in humans could be indicative of recent shifts
in selective pressures affecting COL1a1 within the past 4-6 My in the human
![Page 110: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/110.jpg)
99
lineage. Thus, it would be interesting to sample additional primate populations to
see how this trend compares across species to address whether this level of
variation is truly unique to humans.
Other than protein variation, it appears that both humans and chimpanzees
share unusual patterns for noncoding COL1a1 variation. Specifically, population
variation in the 5’ region in humans, including noncoding variants like the Sp1
polymorphism, and long-range, seemingly ancient haplotype differentiation in
chimpanzees, were detected. As this genetic variation may cause expression
differences across human populations and needs to be explored functionally, the
patterns in chimpanzee are surprising and warrant similar analyses in this species
to determine whether both also share bone strength differences. While this is pure
speculation at this point, these comparative population genetic analyses are,
nonetheless, the first to identify these noncoding patterns of variation.
In addition, a partial exon duplication has reached a relatively high
frequency in chimpanzees, which, in contrast to COL1a1 exon length variants
found in humans, suggests that this duplication is not deleterious. As such, it is
important to determine if it is actually encoded, the resolution of which could
greatly improve our understanding of the evolution of fibrillar collagen gene
structure in which exon duplication was once adaptive in the proliferation of
collagen genes, but is deleterious in humans today. Unfortunately, viable tissue
for the extraction of COL1a1 mRNA from a chimpanzee individual carrying this
duplication is currently unknown. If this duplication can be shown to be
![Page 111: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/111.jpg)
100
transcribed and possibly even translated, it represents the first example of viable
exon variation within populations and provides an amazingly valuable model with
which to study how type I collagen has historically evolved its repetitive
structure. In addition, its adaptive potential also may shed light on how to develop
synthetic treatments to increase bone strength, which are needed for a large
proportion of the human population.
Overall, this research provides new insight into the molecular origins of
skeletal disease in humans as it relates to the COL1a1 locus. Specifically, patterns
of genetic variation are consistent with a history of temporal and spatial variation
in purifying selection, not only affecting coding regions, but also noncoding
regions of COL1a1. From a clinical perspective, noncoding regions of this locus
represent important targets for future investigations in identifying genetic
variation that impacts bone-related phenotypic differences within and among
populations and species. Because COL1a1 is only one of dozens of genes
associated with variation in bone strength and skeletal disease susceptibility, it
will be interesting to determine if similar evolutionary histories are common
among other candidate genes. This work shows that an understanding of even
single amino acid and nucleotide changes at the sequence level over time can
radically alter our perception of bone-strength variation; thus, programs that
involve molecular evolutionary analyses within and between populations and
species will prove to be successful in modeling functional bone-related
phenotypes in the population today.
![Page 112: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/112.jpg)
101
LITERATURE CITED
Abate N, Chandalia M. 2003. The impact of ethnicity on type 2 diabetes. J. Diabetes Complications. 17:39-58.
Abbott S, Trinkaus E, Burr DB. 1996. Dynamic bone remodeling in later
pleistocene fossil hominids. Am. J. Phys. Anthropol. 99:585-601. Adami S, Bertoldo F, Braga V, Fracassi E, Gatti D, Gandolini G, Minisola S, Rini
GB. 2009. 25-hydroxy vitamin D levels in healthy premenopausal women: Association with bone turnover markers and bone mineral density. Bone 45:423-426.
Akashi H, Schaeffer SW. 1997. Natural selection and the frequency distributions
of “silent” DNA polymorphism in drosophila. Genetics 146:295-307. Aouacheria A, Cluzel C, Lethias C, Gouy M, Garrone R, Exposito JY. 2004.
Invertebrate data predict an early emergence of vertebrate fibrillar collagen clades and an anti-incest model. J. Biol. Chem. 279:47711-47719.
Asmussen MA, Clegg MT. 1982. Rates of decay of linkage disequilibrium under
2-locus models of selection. J. Math. Biol. 14:37-70. Auton A, Bryc K, Boyko AR, et al. (13 co-authors). 2009. Global distribution of
genomic diversity underscores rich complex history of continental human populations. Genome Res. 19:795-803.
Bachrach LK, Hastie T, Wang MC, Narasimhan B, Marcus R. 1999. Bone
mineral acquisition in healthy asian, hispanic, black, and caucasian youth: A longitudinal study. J. Clin. Endocrinol. Metab. 84:4702-4712.
Bandres E, Pombo I, Gonzalez-Huarriz M, Rebollo A, Lopez G, Garcia-Foncillas
J. 2005. Association between bone mineral density and polymorphisms of the VDR, ERalpha, COL1A1 and CTR genes in spanish postmenopausal women. J. Endocrinol. Invest. 28:312-321.
Barrett-Connor E, Siris ES, Wehren LE, Miller PD, Abbott TA, Berger ML,
Santora AC, Sherwood LM. 2005. Osteoporosis and fracture risk in women of different ethnic groups. J. Bone Miner. Res. 20:185-194.
Barsh GS, Roush CL, Bonadio J, Byers PH, Gelinas RE. 1985. Intron-mediated
recombination may cause a deletion in an alpha 1 type I collagen chain in a lethal form of osteogenesis imperfecta. Proc. Natl. Acad. Sci. U. S. A. 82:2870-2874.
![Page 113: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/113.jpg)
102
Basel D, Steiner RD. 2009. Osteogenesis imperfecta: Recent findings shed new light on this once well-understood condition. Genet. Med. 11:375-385.
Baxter-Jones ADG, Burrows M, Bachrach LK, Lloyd T, Petit M, Macdonald H,
Mirwald RL, Bailey D, McKay H. 2010. International longitudinal pediatric reference standards for bone mineral content. Bone 46:208-216.
Beavan S, Prentice A, Dibba B, Yan L, Cooper C, Ralston SH. 1998.
Polymorphism of the collagen type Ialpha1 gene and ethnic differences in hip-fracture rates. N. Engl. J. Med. 339:351-352.
Becquet C, Patterson N, Stone AC, Przeworski M, Reich D. 2007. Genetic
structure of chimpanzee populations. PLoS Genet. 3:e66. Bernard MP, Chu ML, Myers JC, Ramirez F, Eikenberry EF, Prockop DJ. 1983.
Nucleotide sequences of complementary deoxyribonucleic acids for the pro alpha 1 chain of human type I procollagen. statistical evaluation of structures that are conserved during evolution. Biochemistry 22:5213-5223.
Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF, Drake JA,
Rhodes M, Reich DE, Hirschhorn JN. 2004. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74:1111-1120.
Black A, Tilmont EM, Handy AM, Scott WW, Shapses SA, Ingram DK, Roth
GS, Lane MA. 2001. A nonhuman primate model of age-related bone loss: A longitudinal study in male and premenopausal female rhesus monkeys. Bone 28:295-302.
Blekhman R, Man O, Herrmann L, Boyko AR, Indap A, Kosiol C, Bustamante
CD, Teshima KM, Przeworski M. 2008. Natural selection on genes that underlie human disease susceptibility. Curr. Biol. 18:883-889.
Blekhman R, Oshlack A, Chabot AE, Smyth GK, Gilad Y. 2008. Gene regulation
in primates evolves under tissue-specific selection pressures. PLoS Genet. 4:e1000271.
Bodian DL, Chan TF, Poon A, Schwarze U, Yang K, Byers PH, Kwok PY, Klein
TE. 2009. Mutation and polymorphism spectrum in osteogenesis imperfecta type II: Implications for genotype-phenotype relationships. Hum. Mol. Genet. 18:463-471.
![Page 114: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/114.jpg)
103
Bodian DL, Madhan B, Brodsky B, Klein TE. 2008. Predicting the clinical lethality of osteogenesis imperfecta from collagen glycine mutations. Biochemistry 47:5424-5432.
Bogin B, Smith BH. 1996. Evolution of the human life cycle. Am. J. Hum. Biol.
8:703-716. Boot-Handford RP, Tuckwell DS. 2003. Fibrillar collagen: The key to vertebrate
evolution? A tale of molecular incest. Bioessays 25:142-151. Bornstein P, McKay J, Morishima JK, Devarayalu S, Gelinas RE. 1987.
Regulatory elements in the first intron contribute to transcriptional control of the human alpha 1(I) collagen gene. Proc. Natl. Acad. Sci. U. S. A. 84:8869-8873.
Boyko AR, Williamson SH, Indap AR, et al. (14 co-authors). 2008. Assessing the
evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4:e1000083.
Brown LB, Streeten EA, Shapiro JR, McBride D, Shuldiner AR, Peyser PA,
Mitchell BD. 2005. Genetic and environmental influences on bone mineral density in pre- and post-menopausal women. Osteoporosis Int. 16:1849-1856.
Burrows NP. 1999. The molecular genetics of the ehlers-danlos syndrome. Clin.
Exp. Dermatol. 24:99-106. Bustamante CD, Fledel-Alon A, Williamson S, et al. (14 co-authors). 2005.
Natural selection on protein-coding genes in the human genome. Nature 437:1153-1157.
Byers PH, Wallis GA, Willing MC. 1991. Osteogenesis imperfecta - translation of
mutation to phenotype. J. Med. Genet. 28:433-442. Cabral WA, Mertts MV, Makareeva E, Colige A, Tekin M, Pandya A, Leikin S,
Marin JC. 2003. Type I collagen triplet duplication mutation in lethal osteogenesis imperfecta shifts register of alpha chains throughout the helix and disrupts incorporation of mutant helices into fibrils and extracellular matrix. J. Biol. Chem. 278:10006-10012.
Campbell MC, Tishkoff SA. 2008. African genetic diversity: Implications for human demographic history, modern human origins, and complex disease mapping. Annu. Rev. Genomics Hum. Genet. 9:403-433.
![Page 115: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/115.jpg)
104
Castillo-Davis CI, Mekhedov SL, Hartl DL, Koonin EV, Kondrashov FA. 2002. Selection for short introns in highly expressed genes. Nat. Genet. 31:415-418.
Cavalli-Sforza LL, Feldman MW. 2003. The application of molecular genetic
approaches to the study of human evolution. Nat. Genet. 33:266-275. Cerroni AM, Tomlinson GA, Turnquist JE, Grynpas MD. 2000. Bone mineral
density, osteopenia, and osteoporosis in the rhesus macaques of cayo santiago. Am. J. Phys. Anthropol. 113:389-410.
Chamary JV, Hurst LD. 2005. Evidence for selection on synonymous mutations
affecting stability of mRNA secondary structure in mammals. Genome Biol. 6:R75.
Chan TF, Poon A, Basu A, Addleman NR, Chen J, Phong A, Byers PH, Klein TE,
Kwok PY. 2008. Natural variation in four human collagen genes across an ethnically diverse population. Genomics 91:307-314.
Charlesworth D, Charlesworth B, Morgan MT. 1995. The pattern of neutral
molecular variation under the background selection model. Genetics 141:1619-1632.
Charlesworth D. 2006. Balancing selection and its effects on sequences in nearby
genome regions. PLoS Genet. 2:e64. Cheung VG, Spielman RS. 2009. Genetics of human gene expression: Mapping
DNA variants that influence gene expression. Nat. Rev. Genet. 10:595-604.
Chimpanzee Sequencing Consortium. 2005. Initial sequence of the chimpanzee
genome and comparison with the human genome. Nature 437:69-87. Chu ML, Dewet W, Bernard M, Ramirez F. 1985. Fine-structural analysis of the
human pro-alpha-1(i) collagen gene - promoter structure, alui repeats, and polymorphic transcripts. J. Biol. Chem. 260:2315-2320.
Claw KG, Tito RY, Stone AC, Verrelli BC. 2010. Haplotype structure and
divergence at human and chimpanzee serotonin transporter and receptor genes: Implications for behavioral disorder association analyses. Mol. Biol. Evol. 27:1518-1529.
![Page 116: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/116.jpg)
105
Clegg MT, Kidwell JF, Horch CR. 1980. Dynamics of correlated genetic systems. V. rates of decay of linkage disequilibria in experimental populations of DROSOPHILA MELANOGASTER. Genetics 94:217-234.
Cohen MM, Jr. 2006. The new bone biology: Pathologic, molecular, and clinical
correlates. Am. J. Med. Genet. A. 140:2646-2706. Cohn DH, Zhang XM, Byers PH. 1993. Homology-mediated recombination
between type-i collagen gene exons results in an internal tandem duplication and lethal osteogenesis imperfecta. Hum. Mutat. 2:21-27.
Comeron JM. 2004. Selective and mutational patterns associated with gene
expression in humans: Influences on synonymous composition and intron presence. Genetics 167:1293-1304.
Cotter MM, Simpson SW, Latimer BM, Hernandez CJ. 2009. Trabecular
microarchitecture of hominoid thoracic vertebrae. Anat. Rec. (Hoboken) 292:1098-1106.
Crawford DC, Bhangale T, Li N, Hellenthal G, Rieder MJ, Nickerson DA,
Stephens M. 2004. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat. Genet. 36:700-706.
Currey JD. 1987. The evolution of the mechanical properties of amniote bone. J.
Biomech. 20:1035-1044. Dalgleish R. 1997. The human type I collagen mutation database. Nucleic Acids
Res. 25:181-187. Dohi Y, Iki M, Ohgushi H, Gojo S, Tabata S, Kajita E, Nishino H, Yonemasu K.
1998. A novel polymorphism in the promoter region for the human osteocalcin gene: The possibility of a correlation with bone mineral density in postmenopausal japanese women. J. Bone Miner. Res. 13:1633-1639.
Dvornyk V, Liu XH, Shen H, et al. (13 co-authors). 2003. Differentiation of
caucasians and chinese at bone mass candidate genes: Implication for ethnic difference of bone mass. Ann. Hum. Genet. 67:216-227.
Efstathiadou Z, Tsatsoulis A, Ioannidis JP. 2001. Association of collagen ialpha 1
Sp1 polymorphism with the risk of prevalent fractures: A meta-analysis. J. Bone Miner. Res. 16:1586-1592.
![Page 117: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/117.jpg)
106
Enard D, Depaulis F, Roest Crollius H. 2010. Human and non-human primate genomes share hotspots of positive selection. PLoS Genet. 6:e1000840.
Eriksson J, Hohmann G, Boesch C, Vigilant L. 2004. Rivers influence the
population genetic structure of bonobos (pan paniscus). Mol. Ecol. 13:3425-3435.
Evans PD, Mekel-Bobrov N, Vallender EJ, Hudson RR, Lahn BT. 2006.
Evidence that the adaptive allele of the brain size gene microcephalin introgressed into homo sapiens from an archaic homo lineage. Proc. Natl. Acad. Sci. U. S. A. 103:18178-18183.
Exposito JY, Cluzel C, Garrone R, Lethias C. 2002. Evolution of collagens. Anat.
Rec. 268:302-316. Fang Y, Van Meurs JBJ, Bergink AP, Hofman A, Van Duijn CM, Van Leeuwen
JP, Ap Pols H, Uitterlinden AG. 2003. Cdx-2 polymorphism in the promoter region of the human vitamin D receptor gene determines susceptibility to fracture in the elderly. J. Bone Miner. Res. 18:1632-1641.
Fischer A, Pollack J, Thalmann O, Nickel B, Paabo S. 2006. Demographic history
and genetic differentiation in apes. Curr. Biol. 16:1133-1138. Fischer A, Wiebe V, Paabo S, Przeworski M. 2004. Evidence for a complex
demographic history of chimpanzees. Mol. Biol. Evol. 21:799-808. Fullerton SM, Bernardo Carvalho A, Clark AG. 2001. Local rates of
recombination are positively correlated with GC content in the human genome. Mol. Biol. Evol. 18:1139-1142.
Gabriel SB, Schaffner SF, Nguyen H, et al. (18 co-authors). 2002. The structure
of haplotype blocks in the human genome. Science 296:2225-2229. Garcia-Giralt N, Enjuanes A, Bustamante M, Mellibovsky L, Nogues X, Carreras
R, Diez-Perez A, Grinberg D, Balcells S. 2005. In vitro functional assay of alleles and haplotypes of two COL1A1-promoter SNPs. Bone 36:902-908.
Garcia-Giralt N, Nogues X, Enjuanes A, Puig J, Mellibovsky L, Bay-Jensen A,
Carreras R, Balcells S, Diez-Perez A, Grinberg D. 2002. Two new single-nucleotide polymorphisms in the COL1A1 upstream regulatory region and their relationship to bone mineral density. J. Bone Miner. Res. 17:384-393.
Garrigan D, Hammer MF. 2006. Reconstructing human origins in the genomic
era. Nat. Rev. Genet. 7:669-680.
![Page 118: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/118.jpg)
107
Gazave E, Marques-Bonet T, Fernando O, Charlesworth B, Navarro A. 2007. Patterns and rates of intron divergence between humans and chimpanzees. Genome Biol. 8:R21.
Ge B, Pokholok DK, Kwan T, et al. (27 co-authors). 2009. Global patterns of cis
variation in human cells revealed by high-density allelic expression analysis. Nat. Genet. 41:1216-U78.
Gelse K, Poschl E, Aigner T. 2003. Collagens--structure, function, and
biosynthesis. Adv. Drug Deliv. Rev. 55:1531-1546. Gilad Y, Bustamante CD, Lancet D, Paabo S. 2003. Natural selection on the
olfactory receptor gene family in humans and chimpanzees. Am. J. Hum. Genet. 73:489-501.
Gong G, Haynatzki G, Haynatzka V, Howell R, Kosoko-Lasaki S, Fu YX, Yu F,
Gallagher JC, Wilson MR. 2006. Bone mineral density-affecting genes in africans. J. Natl. Med. Assoc. 98:1102-1108.
Gong G, Haynatzki G. 2003. Association between bone mineral density and
candidate genes in different ethnic populations and its implications. Calcif. Tissue Int. 72:113-123.
Grant SFA, Reid DM, Blake G, Herd R, Fogelman I, Ralston SH. 1996. Reduced
bone density and osteoporosis associated with a polymorphic Sp1 binding site in the collagen type I alpha 1 gene. Nat. Genet. 14:203-205.
Gueguen R, Jouanny P, Guillemin F, Kuntz C, Pourel J, Siest G. 1995.
Segregation analysis and variance-components analysis of bone-mineral density in healthy families. J. Bone Miner. Res. 10:2017-2022.
Gunji H, Hosaka K, Huffman MA, Kawanaka K, Matsumoto-Oda A, Hamada Y,
Nishida T. 2003. Extraordinarily low bone mineral density in an old female chimpanzee (pan troglodytes schweinfurthii) from the mahale mountains national park. Primates 44:145-149.
Hackenberg M, Bernaola-Galvan P, Carpena P, Oliver JL. 2005. The biased
distribution of alus in human isochores might be driven by recombination. J. Mol. Evol. 60:365-377.
Hajjar I, Kotchen JM, Kotchen TA. 2006. Hypertension: Trends in prevalence,
incidence, and control. Annu. Rev. Public Health 27:465-490.
![Page 119: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/119.jpg)
108
Han KO, Moon IG, Hwang CS, Choi JT, Yoon HK, Min HK, Han IK. 1999. Lack of an intronic sp1 binding-site polymorphism at the collagen type I alpha 1 gene in healthy korean women. Bone 24:135-137.
Hancock AM, Witonsky DB, Ehler E, et al. (11 co-authors). 2010. Colloquium
paper: Human adaptations to diet, subsistence, and ecoregion are due to subtle shifts in allele frequency. Proc. Natl. Acad. Sci. U. S. A. 107 Suppl 2:8924-8930.
Havill LM, Mahaney MC, Cox LA, Morin PA, Joslyn G, Rogers J. 2005. A
quantitative trait locus for normal variation in forearm bone mineral density in pedigreed baboons maps to the ortholog of human chromosome 11q. J. Clin. Endocrinol. Metab. 90:3638-3645.
Havill LM, Mahaney MC, Czerwinski SA, Carey KD, Rice K, Rogersa J. 2003. Bone mineral density reference standards in adult baboons (papio hamadryas) by sex and age. Bone 33:877-888.
Haygood R, Fedrigo O, Hanson B, Yokoyama KD, Wray GA. 2007. Promoter
regions of many neural- and nutrition-related genes have experienced positive selection during human evolution. Nat. Genet. 39:1140-1144.
Hedges SB, Kumar S. 2002. Genomics. vertebrate genomes compared. Science
297:1283-1285. Hellmann I, Ebersberger I, Ptak SE, Paabo S, Przeworski M. 2003. A neutral
explanation for the correlation of diversity with recombination rates in humans. Am. J. Hum. Genet. 72:1527-1535.
Hernandez RD, Williamson SH, Bustamante CD. 2007. Context dependence,
ancestral misidentification, and spurious signatures of natural selection. Mol. Biol. Evol. 24:1792-1800.
Hildebrand KA, Gallant-Behm CL, Kydd AS, Hart DA. 2005. The basics of soft
tissue healing and general factors that influence such healing. Sports Med. Arthrosc. 13:136-144.
Ho NC, Jia L, Driscoll CC, Gutter EM, Francomano CA. 2000. A skeletal gene
database. J. Bone Miner. Res. 15:2095-2122. Hudson RR, Slatkin M, Maddison WP. 1992. Estimation of levels of gene flow
from DNA sequence data. Genetics 132:583-589.
![Page 120: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/120.jpg)
109
Hudson RR. 2000. A new statistic for detecting genetic differentiation. Genetics 155:2011-2014.
Hudson RR. 2001. Two-locus sampling distributions and their application.
Genetics 159:1805-1817. Hudson RR. 2002. Generating samples under a wright-fisher neutral model of
genetic variation. Bioinformatics 18:337-338. Hurst LD, McVean G, Moore T. 1996. Imprinted genes have few and small
introns. Nat. Genet. 12:234-237. International Human Genome Sequencing Consortium. 2001. Initial sequencing
and analysis of the human genome. Nature 409:860-921. Ioannidis JP, Ng MY, Sham PC, et al. (28 co-authors). 2007. Meta-analysis of
genome-wide scans provides evidence for sex- and site-specific regulation of bone mass. J. Bone Miner. Res. 22:173-183.
Jiang H, Lei SF, Xiao SM, Chen Y, Sun X, Yang F, Li LM, Wu S, Deng HW.
2007. Association and linkage analysis of COL1A1 and AHSG gene polymorphisms with femoral neck bone geometric parameters in both caucasian and chinese nuclear families. Acta Pharmacol. Sin. 28:375-381.
Jin H, Stewart TL, Hof RV, Reid DM, Aspden RM, Ralston S. 2009. A rare
haplotype in the upstream regulatory region of COL1A1 is associated with reduced bone quality and hip fracture. J. Bone Miner. Res. 24:448-454.
Jin H, van't Hof RJ, Albagha OM, Ralston SH. 2009. Promoter and intron 1
polymorphisms of COL1A1 interact to regulate transcription and susceptibility to osteoporosis. Hum. Mol. Genet. 18:2729-2738.
Jones DT, Taylor WR, Thornton JM. 1992. The rapid generation of mutation data
matrices from protein sequences. Comput. Appl. Biosci. 8:275-282. Kasowski M, Grubert F, Heffelfinger C, et al. (17 co-authors). 2010. Variation in
transcription factor binding among humans. Science 328:232-235. Kaufman J, Ostertag A, Saint-Pierre A, Cohen-Solal M, Boland A, Van
Pottelbergh I, Toye K, de Vernejoul M, Martinez M. 2008. Genome-wide linkage screen of bone mineral density (BMD) in european pedigrees ascertained through a male relative with low BMD values: Evidence for quantitative trait loci on 17q21-23, 11q12-13, 13q12-14, and 22q11. J. Clin. Endocrinol. Metab. 93:3755-3762.
![Page 121: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/121.jpg)
110
Kikuchi Y, Udono T, Hamada Y. 2003. Bone mineral density in chimpanzees, humans, and japanese macaques. Primates 44:151-155.
Koller DL, Ichikawa S, Lai D, et al. (13 co-authors). 2010. Genome-wide
association study of bone mineral density in premenopausal european-american women and replication in african-american women. J. Clin. Endocrinol. Metab. 95:1802-1809.
Kudla G, Lipinski L, Caffin F, Helwak A, Zylicz M. 2006. High guanine and
cytosine content increases mRNA levels in mammalian cells. PLoS Biol. 4:e180.
Kuivaniemi H, Tromp G, Prockop DJ. 1997. Mutations in fibrillar collagens
(types I, II, III, and XI), fibril-associated collagen (type IX), and network-forming collagen (type X) cause a spectrum of diseases of bone, cartilage, and blood vessels. Hum. Mutat. 9:300-315.
Kumar S, Filipski A, Swarna V, Walker A, Hedges SB. 2005. Placing confidence
limits on the molecular age of the human-chimpanzee divergence. Proc. Natl. Acad. Sci. U. S. A. 102:18842-18847.
Kumar S, Nei M, Dudley J, Tamura K. 2008. MEGA: A biologist-centric
software for evolutionary analysis of DNA and protein sequences. Brief Bioinform. 9:299-306.
Larsen CS. 1995. Biological changes in human-populations with agriculture.
Annu. Rev. Anthropol. 24:185-213. Lau CS, Yin G, Mok MY. 2006. Ethnic and geographical differences in systemic
lupus erythematosus: An overview. Lupus 15:715-719. Lau EMC, Choy DTK, Li M, Woo J, Chung T, Sham A. 2004. The relationship
between COLI A1 polymorphisms (sp 1) and COLI A2 polymorphisms (eco R1 and puv II) with bone mineral density in chinese men and women. Calcif. Tissue Int. 75:133-137.
Lau HHL, Ng MYM, Ho AYY, Luk KDK, Kung AWC. 2005. Genetic and
environmental determinants of bone mineral density in chinese women. Bone 36:700-709.
Lauderdale DS, Jacobsen SJ, Furner SE, Levy PS, Brody JA, Goldberg J. 1997.
Hip fracture incidence among elderly asian-american populations. Am. J. Epidemiol. 146:502-509.
![Page 122: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/122.jpg)
111
Laval G, Patin E, Barreiro LB, Quintana-Murci L. 2010. Formulating a historical and demographic model of recent human evolution based on resequencing data from noncoding regions. PLoS One 5:e10284.
Leuenberger C, Wegmann D. 2010. Bayesian computation and model selection
without likelihoods. Genetics 184:243-252. Li N, Stephens M. 2003. Modeling linkage disequilibrium and identifying
recombination hotspots using single-nucleotide polymorphism data. Genetics 165:2213-2233.
Lipkin EW, Aumann CA, Newell-Morris LL. 2001. Evidence for common
controls over inheritance of bone quantity and body size from segregation analysis in a pedigreed colony of nonhuman primates (macaca nemestrina). Bone 29:249-257.
Liu YJ, Shen H, Xiao P, Xiong DH, Li LH, Recker RR, Deng HW. 2006.
Molecular genetic studies of gene identification for osteoporosis: A 2004 update. J. Bone Miner. Res. 21:1511-1535.
Liu YZ, Liu YJ, Recker RR, Deng HW. 2003. Molecular studies of identification
of genes for osteoporosis: The 2002 update. J. Endocrinol. 177:147-196. Lohmueller KE, Indap AR, Schmidt S, et al. (12 co-authors). 2008. Proportionally
more deleterious genetic variation in european than in african populations. Nature 451:994-997.
Long JR, Zhao LJ, Liu PY, et al. (11 co-authors). 2004. Patterns of linkage
disequilibrium and haplotype distribution in disease candidate genes. BMC Genet. 5:11.
Looker AC, Orwoll ES, Johnston CC,Jr, Lindsay RL, Wahner HW, Dunn WL, Calvo MS, Harris TB, Heyse SP. 1997. Prevalence of low femoral bone density in older U.S. adults from NHANES III. J. Bone Miner. Res. 12:1761-1768.
Looker AC, Wahner HW, Dunn WL, Calvo MS, Harris TB, Heyse SP, Johnston CC,Jr, Lindsay R. 1998. Updated data on proximal femur bone mineral levels of US adults. Osteoporos. Int. 8:468-489.
Majewski J, Ott J. 2002. Distribution and characterization of regulatory elements
in the human genome. Genome Res. 12:1827-1836.
![Page 123: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/123.jpg)
112
Mann V, Hobson EE, Li BH, Stewart TL, Grant SFA, Robins SP, Aspden RM, Ralston SH. 2001. A COL1A1 Sp1 binding site polymorphism predisposes to osteoporotic fracture by affecting bone density and quality. J. Clin. Invest. 107:899-907.
Mann V, Ralston SH. 2003. Meta-analysis of COLIA1 Sp1 polymorphism in
relation to bone mineral density and osteoporotic fracture. Bone 32:711-717.
Marini JC, Forlino A, Cabral WA, et al. (27 co-authors). 2007. Consortium for
osteogenesis imperfecta mutations in the helical domain of type I collagen: Regions rich in lethal mutations align with collagen binding sites for integrins and proteoglycans. Hum. Mutat. 28:209-221.
Matkovic V, Fontana D, Tominac C, Goel P, Chesnut CH. 1990. Factors that
influence peak bone mass formation - a study of calcium balance and the inheritance of bone mass in adolescent females. Am. J. Clin. Nutr. 52:878-888.
Matsumura A, Gunji H, Takahashi Y, Nishida T, Okada M. 2010. Cross-sectional
morphology of the femoral neck of wild chimpanzees. Int. J. Primatol. 31:219–238.
McDonald JH, Kreitman M. 1991. Adaptive protein evolution at the adh locus in
drosophila. Nature 351:652-654. McVean G, Awadalla P, Fearnhead P. 2002. A coalescent-based method for
detecting and estimating recombination from gene sequences. Genetics 160:1231-1241.
Melton LJ. 1997. The prevalence of osteoporosis. J. Bone Miner. Res. 12:1769-
1771. Milewicz DM, Byers PH, Reveille J, Hughes AL, Duvic M. 1996. A dimorphic
alu sb-like insertion in COL3A1 is ethnic-specific. J. Mol. Evol. 42:117-123.
Miller MP, Kumar S. 2001. Understanding human disease mutations through the
use of interspecific genetic variation. Hum. Mol. Genet. 10:2319-2328. Morgan CC, Loughran NB, Walsh TA, Harrison AJ, O'Connell MJ. 2010.
Positive selection neighboring functionally essential sites and disease-implicated regions of mammalian reproductive proteins. BMC Evol. Biol. 10:39.
![Page 124: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/124.jpg)
113
Mulhern DM, Ubelaker DH. 2003. Histologic examination of bone development in juvenile chimpanzees. Am. J. Phys. Anthropol. 122:127-133.
Mulhern DM, Ubelaker DH. 2009. Bone microstructure in juvenile chimpanzees.
Am. J. Phys. Anthropol. 140:368-375. Mulhern DM, Ubelaker DH. 2003. Histologic examination of bone development
in juvenile chimpanzees. Am. J. Phys. Anthropol. 122:127-133. Musumeci M, Vadala G, Tringali G, Insirello E, Roccazzello AM, Simpore J,
Musumeci S. 2009. Genetic and environmental factors in human osteoporosis from sub-saharan to mediterranean areas. J. Bone Miner. Metab. 27:424-434.
Nakajima T, Ota N, Shirai Y, Hata A, Yoshida H, Suzuki T, Hosoi T, Orimo H,
Emi M. 1999. Ethnic difference in contribution of Sp1 site variation of COLIA1 gene in genetic predisposition to osteoporosis. Calcif. Tissue Int. 65:352-353.
Nei M, Gojobori T. 1986. Simple methods for estimating the numbers of
synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426.
Nei M. 1987. Molecular Evolutionary Genetics. New York: Columbia University
Press. Ng PC, Zhao Q, Levy S, Strausberg RL, Venter JC. 2008. Individual genomes
instead of race for personalized medicine. Clin. Phar. Therapeutics 84:306-309.
Ota N, Nakajima T, Nakazawa I, Suzuki T, Hosoi T, Orimo H, Inoue S, Shirai Y,
Emi M. 2001. A nucleotide variant in the promoter region of the interleukin-6 gene associated with decreased bone mineral density. J. Hum. Genet. 46:267-272.
Pace JM, Atkinson M, Willing MC, Wallis G, Byers PH. 2001. Deletions and
duplications of gly-xaa-yaa triplet repeats in the triple helical domains of type I collagen chains disrupt helix formation and result in several types of osteogenesis imperfecta. Hum. Mutat. 18:319-326.
Payseur BA, Nachman MW. 2002. Natural selection at linked sites in humans.
Gene 300:31-42.
![Page 125: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/125.jpg)
114
Persikov AV, Ramshaw JA, Brodsky B. 2005. Prediction of collagen stability from amino acid sequence. J. Biol. Chem. 280:19343-19349.
Pond SL, Frost SD, Muse SV. 2005. HyPhy: Hypothesis testing using
phylogenies. Bioinformatics 21:676-679. Pond SL, Frost SD. 2005a. A genetic algorithm approach to detecting lineage-
specific variation in selection pressure. Mol. Biol. Evol. 22:478-485. Pond SL, Frost SD. 2005b. Datamonkey: Rapid detection of selective pressure on
individual sites of codon alignments. Bioinformatics 21:2531-2533. Pond SL, Frost SD. 2005c. Not so different after all: A comparison of methods for
detecting amino acid sites under selection. Mol. Biol. Evol. 22:1208-1222. Pozzoli U, Menozzi G, Fumagalli M, Cereda M, Comi GP, Cagliani R, Bresolin
N, Sironi M. 2008. Both selective and neutral processes drive GC content evolution in the human genome. BMC Evol. Biol. 8:99.
Prentice A. 2001. The relative contribution of diet and genotype to bone
development. Proc. Nutr. Soc. 60:45-52. Ptak SE, Hinds DA, Koehler K, Nickel B, Patil N, Ballinger DG, Przeworski M,
Frazer KA, Paabo S. 2005. Fine-scale recombination patterns differ between chimpanzees and humans. Nat. Genet. 37:429-434.
Qureshi AM, Herd RJ, Blake GM, Fogelman I, Ralston SH. 2002. Colia1 Sp1
polymorphism predicts response of femoral neck bone density to cyclical etidronate therapy. Calcif. Tissue Int. 70:158-163.
Raff ML, Craigen WJ, Smith LT, Keene DR, Byers PH. 2000. Partial COL1A2
gene duplication produces features of osteogenesis imperfecta and ehlers-danlos syndrome type VII. Hum. Genet. 106:19-28.
Ralston SH, Uitterlinden AG, Brandi ML, et al. (32 co-authors). 2006. Large-
scale evidence for the effect of the COLIA1 Sp1 polymorphism on osteoporosis outcomes: The GENOMOS study. PLoS Med. 3:e90.
Ralston SH. 2010. Genetics of osteoporosis. Ann. NY Acad. Sci. 1192:181-189. Rauch F, Lalic L, Roughley P, Glorieux FH. 2010. Genotype-phenotype
correlations in nonlethal osteogenesis imperfecta caused by mutations in the helical domain of collagen type I. Eur. J. Hum. Genet. 18:642-647.
![Page 126: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/126.jpg)
115
Reginster JY, Burlet N. 2006. Osteoporosis: A still increasing prevalence. Bone 38:S4-9.
Rensberger JM, Watabe M. 2000. Fine structure of bone in dinosaurs, birds and
mammals. Nature 406:619-622. Rivadeneira F, Styrkarsdottir U, Estrada K, et al. (36 co-authors). 2009. Twenty
bone-mineral-density loci identified by large-scale meta-analysis of genome-wide association studies. Nat. Genet. 41:1199-U58.
Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA,
Feldman MW. 2002. Genetic structure of human populations. Science 298:2381-2385.
Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R. 2003. DnaSP, DNA
polymorphism analyses by the coalescent and other methods. Bioinformatics 19:2496-2497.
Sabeti PC, Reich DE, Higgins JM, et al. (17 co-authors). 2002. Detecting recent
positive selection in the human genome from haplotype structure. Nature 419:832-837.
Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A,
Mikkelsen TS, Altshuler D, Lander ES. 2006. Positive natural selection in the human lineage. Science 312:1614-1620.
Sabeti PC, Varilly P, Fry B, et al. (267 co-authors). 2007. Genome-wide detection
and characterization of positive selection in human populations. Nature 449:913-918.
Sabeti PC, Walsh E, Schaffner SF, et al. (15 co-authors). 2005. The case for
selection at CCR5-Delta32. PLoS Biol. 3:e378. Sachidanandam R, Weissman D, Schmidt SC, et al. (42 co-authors). 2001. A map
of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928-933.
Saunders MA, Good JM, Lawrence EC, Ferrell RE, Li WH, Nachman MW. 2006.
Human adaptive evolution at myostatin (GDF8), a regulator of muscle growth. Am. J. Hum. Genet. 79:1089-1097.
Scheinfeldt LB, Biswas S, Madeoy J, Connelly CF, Schadt EE, Akey JM. 2009.
Population genomic analysis of ALMS1 in humans reveals a surprisingly complex evolutionary history. Mol. Biol. Evol. 26:1357-1367.
![Page 127: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/127.jpg)
116
Shen H, Recker RR, Deng HW. 2003. Molecular and genetic mechanisms of osteoporosis: Implication for treatment. Curr. Mol. Med. 3:737-757.
Sillence DO, Senn A, Danks DM. 1979. Genetic-heterogeneity in osteogenesis
imperfecta. J. Med. Genet. 16:101-116. Smith MW, Lautenberger JA, Shin HD, Chretien JP, Shrestha S, Gilbert DA,
O'Brien SJ. 2001. Markers for mapping by admixture linkage disequilibrium in african american and hispanic populations. Am. J. Hum. Genet. 69:1080-1094.
Smith MW, O'Brien SJ. 2005. Mapping by admixture linkage disequilibrium:
Advances, limitations and guidelines. Nat. Rev. Genet. 6:623-632. Soares P, Achilli A, Semino O, Davies W, Macaulay V, Bandelt HJ, Torroni A,
Richards MB. 2010. The archaeogenetics of europe. Curr. Biol. 20:R174-83.
Spotila Ld, Colige A, Sereda L, et al. (15 co-authors). 1994. Mutation analysis of
coding sequences for type-i procollagen in individuals with low bone-density. J. Bone Miner. Res. 9:923-932.
Stephens M, Smith NJ, Donnelly P. 2001. A new statistical method for haplotype
reconstruction from population data. Am. J. Hum. Genet. 68:978-989. Stewart TL, Jin H, McGuigan FE, et al. (11 co-authors). 2006. Haplotypes defined
by promoter and intron 1 polymorphisms of the COLIA1 gene regulate bone mineral density in women. J. Clin. Endocrinol. Metab. 91:3575-3583.
Stoll C, Dott B, Roth MP, Alembik Y. 1989. Birth prevalence rates of skeletal
dysplasias. Clin. Genet. 35:88-92. Stone AC, Griffiths RC, Zegura SL, Hammer MF. 2002. High levels of Y-
chromosome nucleotide diversity in the genus pan. Proc. Natl. Acad. Sci. U. S. A. 99:43-48.
Stover DA, Verrelli BC. 2010. Comparative vertebrate evolutionary analyses of
type I collagen: potential of COL1a1 gene structure and intron variation for common bone-related diseases. Mol. Biol. Evol., doi: 10.1093/molbev/msq221.
![Page 128: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/128.jpg)
117
Su AI, Wiltshire T, Batalov S, et al. (13 co-authors). 2004. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. U. S. A. 101:6062-6067.
Subramanian S, Kumar S. 2003. Neutral substitutions occur at a faster rate in
exons than in noncoding DNA in primate genomes. Genome Res. 13:838-844.
Subramanian S, Kumar S. 2006. Evolutionary anatomies of positions and types of
disease-associated and neutral amino acid mutations in the human genome. BMC Genomics 7:306.
Sumner DR, Morbeck ME, Lobick JJ. 1989. Apparent age-related bone loss
among adult female gombe chimpanzees. Am. J. Phys. Anthropol. 79:225-234.
Tajima F. 1989. Statistical method for testing the neutral mutation hypothesis by
DNA polymorphism. Genetics 123:585-595. Tamura K, Dudley J, Nei M, Kumar S. 2007. MEGA4: Molecular evolutionary
genetics analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24:1596-1599.
Thomson R, Pritchard JK, Shen P, Oefner PJ, Feldman MW. 2000. Recent
common ancestry of human Y chromosomes: Evidence from DNA sequence data. Proc. Natl. Acad. Sci. U. S. A. 97:7360-7365.
Tishkoff SA, Reed FA, Ranciaro A, et al. (19 co-authors). 2007. Convergent
adaptation of human lactase persistence in africa and europe. Nat. Genet. 39:31-40.
Tishkoff SA, Varkonyi R, Cahinhinan N, et al. (17 co-authors). 2001. Haplotype
diversity and linkage disequilibrium at human G6PD: Recent origin of alleles that confer malarial resistance. Science 293:455-462.
Tishkoff SA, Verrelli BC. 2003. Patterns of human genetic diversity: Implications
for human evolutionary history and disease. Annu. Rev. Genomics Hum. Genet. 4:293-340.
Urrutia AO, Hurst LD. 2001. Codon usage bias covaries with expression breadth
and the rate of synonymous evolution in humans, but this is not evidence for selection. Genetics 159:1191-1199.
![Page 129: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/129.jpg)
118
Urrutia AO, Hurst LD. 2003. The signature of selection mediated by expression on human genes. Genome Res. 13:2260-2264.
Valkkila M, Melkoniemi M, Kvist L, Kuivaniemi H, Tromp G, Ala-Kokko L.
2001. Genomic organization of the human COL3A1 and COL5A2 genes: COL5A2 has evolved differently than the other minor fibrillar collagen genes. Matrix Biol. 20:357-366.
Vergeer WP, Sogo JM, Pretorius PJ, de Vries WN. 2000. Interaction of Ap1, Ap2,
and Sp1 with the regulatory regions of the human pro-alpha1(I) collagen gene. Arch. Biochem. Biophys. 377:69-79.
Verrelli BC, Lewis CM,Jr, Stone AC, Perry GH. 2008. Different selective
pressures shape the molecular evolution of color vision in chimpanzee and human populations. Mol. Biol. Evol. 25:2735-2743.
Verrelli BC, McDonald JH, Argyropoulos G, Destro-Bisol G, Froment A,
Drousiotou A, Lefranc G, Helal AN, Loiselet J, Tishkoff SA. 2002. Evidence for balancing selection from nucleotide sequence analyses of human G6PD. Am. J. Hum. Genet. 71:1112-1128.
Verrelli BC, Tishkoff SA, Stone AC, Touchman JW. 2006. Contrasting histories
of G6PD molecular evolution and malarial resistance in humans and chimpanzees. Mol. Biol. Evol. 23:1592-1601.
Verrelli BC, Tishkoff SA. 2004. Signatures of selection and gene conversion
associated with human color vision variation. Am. J. Hum. Genet. 75:363-375.
Videman T, Levalahti E, Battie MC, Simonen R, Vanninen E, Kaprio J. 2007.
Heritability of BMD of femoral neck and lumbar spine: A multivariate twin study of finnish men. J. Bone Miner. Res. 22:1455-1462.
Viguet-Carrin S, Garnero P, Delmas PD. 2006. The role of collagen in bone strength. Osteoporosis Int. 17:319-336.
Vinogradov AE. 2003. DNA helix: The importance of being GC-rich. Nucleic
Acids Res. 31:1838-1844. Voight BF, Kudaravalli S, Wen X, Pritchard JK. 2006. A map of recent positive
selection in the human genome. PLoS Biol. 4(3):e72.
![Page 130: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/130.jpg)
119
Wada H, Okuyama M, Satoh N, Zhang S. 2006. Molecular evolution of fibrillar collagen in chordates, with implications for the evolution of vertebrate skeletons and chordate phylogeny. Evol. Dev. 8:370-377.
Wang X, Mabrey JD, Agrawal CM. 1998. An interspecies comparison of bone
fracture properties. Biomed. Mater. Eng. 8:1-9. Watterson GA. 1975. On the number of segregating sites in genetical models
without recombination. Theor. Popul. Biol. 7:256-276. Wegmann D, Excoffier L. 2010. Bayesian inference of the demographic history of
chimpanzees. Mol. Biol. Evol. 27:1425-1435. Winckler W, Myers SR, Richter DJ, et al. (11 co-authors). 2005. Comparison of
fine-scale recombination rates in humans and chimpanzees. Science 308:107-111.
Won YJ, Hey J. 2005. Divergence population genetics of chimpanzees. Mol. Biol.
Evol. 22:297-307. Wooding S, Bufe B, Grassi C, Howard MT, Stone AC, Vazquez M, Dunn DM,
Meyerhof W, Weiss RB, Bamshad MJ. 2006. Independent evolution of bitter-taste sensitivity in humans and chimpanzees. Nature 440:930-934.
Wooding S, Stone AC, Dunn DM, Mummidi S, Jorde LB, Weiss RK, Ahuja S,
Bamshad MJ. 2005. Contrasting effects of natural selection on human and chimpanzee CC chemokine receptor 5. Am. J. Hum. Genet. 76:291-301.
Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV, Romano
LA. 2003. The evolution of transcriptional regulation in eukaryotes. Mol. Biol. Evol. 20:1377-1419.
Wu DD, Zhang YP. 2010. Positive selection drives population differentiation in
the skeletal genes in modern humans. Hum. Mol. Genet. 19:2341-2346. Xu G, Bhatnagar V, Wen G, Hamilton BA, Eraly SA, Nigam SK. 2005. Analyses
of coding region polymorphisms in apical and basolateral human organic anion transporter (OAT) genes [OAT1 (NKT), OAT2, OAT3, OAT4, URAT (RST)]. Kidney Int. 68:1491-1499.
Yamada Y, Avvedimento VE, Mudryj M, Ohkubo H, Vogeli G, Irani M, Pastan I,
Decrombrugghe B. 1980. The collagen gene - evidence for its evolutionary assembly by amplification of a dna segment containing an exon of 54 bp. Cell 22:887-892.
![Page 131: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/131.jpg)
120
Yang Z. 2007. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24:1586-1591.
Yu N, Jensen-Seaman MI, Chemnick L, Kidd JR, Deinard AS, Ryder O, Kidd
KK, Li WH. 2003. Low nucleotide diversity in chimpanzees and bonobos. Genetics 164:1511-1518.
Zhang Z, Townsend JP. 2009. Maximum-likelihood model averaging to profile
clustering of site types across discrete linear sequences. PLoS Comput. Biol. 5:e1000421.
![Page 132: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/132.jpg)
121
APPENDIX A
SUPPLEMENTARY MATERIAL: CHAPTER 2
![Page 133: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/133.jpg)
122
Supplementary Table 1a
Human Clade A Collagen Gene Exon Lengths
Exon Length (bp) Exon # COL1a1 COL1a2 COL2a1 COL3a1 COL5a2
1 103 70 85 79 97 2 195 11 207 203 225 3 35 15 17 51 14 4 36 36 33 114 33 5 102 93 33 81 33 6 72 54 54 54 54 7 45 45 102 54 111 8 54 54 78 54 78 9 54 54 45 54 45 10 54 54 54 54 54 11 54 54 54 54 54 12 54 54 54 45 54 13 45 45 54 54 54 14 54 54 54 45 54 15 45 45 45 54 45 16 54 54 54 99 54 17 99 99 45 45 45 18 45 45 54 99 54 19 99 99 99 54 99 20 54 54 45 108 45 21 108 108 99 54 99 22 54 54 54 99 54 23 99 99 108 54 108 24 54 54 54 99 54 25 99 99 99 54 99 26 54 54 54 54 54 27 54 54 99 54 99 28 54 54 54 54 54 29 54 54 54 45 54 30 45 45 54 99 54 31 99 99 54 108 54 32 108 108 45 54 45
![Page 134: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/134.jpg)
123
33 108 54 99 54 99 34 54 54 108 54 108 35 54 54 54 54 54 36 108 54 54 108 54 37 54 108 54 54 54 38 54 54 54 54 54 39 162 54 108 162 108 40 108 162 54 108 54 41 108 108 54 108 54 42 54 108 162 54 162 43 108 54 108 108 108 44 54 108 108 54 108 45 108 54 54 108 54 46 54 108 108 54 108 47 108 54 54 108 54 48 283 108 108 298 108 49 191 259 54 188 54 50 243 185 108 243 108 51 144 243 289 144 292 52 144 188 188 53 243 240 54 144 144
Note: Lines within columns give approximate locations of triple-helix domain boundaries. RxC chi-squared tests (see supplementary table 1c) were conducted on comparisons among genes of binned distributions with bins in increments of 50 bp, up to 300 bp (e.g., 50, 100, 150, etc.). These bin intervals enabled the best resolution in comparisons among genes and bin size did not alter the results. See Materials and Methods (Chapter 2) for additional information.
![Page 135: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/135.jpg)
124
Supplementary Table 1b
Human Clade A Collagen Gene Exon GC-Content
Exon GC-Content (%) Exon # COL1a1 COL1a2 COL2a1 COL3a1 COL5a2
1 62.1 47.1 67.1 46.8 43.3 2 63.6 36.4 58.0 47.3 49.3 3 62.9 33.3 58.8 58.8 42.9 4 72.2 63.9 48.5 58.8 48.5 5 76.5 69.9 57.6 48.1 60.6 6 51.4 48.1 55.6 70.4 59.3 7 71.1 64.4 69.6 59.3 62.2 8 66.7 57.4 52.6 57.4 48.7 9 63.0 64.8 71.1 61.1 57.8 10 66.7 59.3 53.7 50.0 59.3 11 61.1 55.6 64.8 48.1 59.3 12 55.6 57.4 57.4 53.3 51.9 13 60.0 53.3 59.3 68.5 61.1 14 75.9 68.5 63.0 60.0 51.9 15 64.4 66.7 64.4 64.8 64.4 16 68.5 70.4 70.4 58.6 64.8 17 71.7 64.6 75.6 53.3 53.3 18 55.6 57.8 66.7 66.7 50.0 19 70.7 68.7 68.7 61.1 64.6 20 59.3 63.0 51.1 55.6 55.6 21 68.5 63.9 67.7 61.1 63.6 22 63.0 59.3 53.7 66.7 53.7 23 64.6 64.6 69.4 64.8 60.2 24 64.8 57.4 64.8 58.6 55.6 25 67.7 64.6 70.7 63.0 62.6 26 72.2 53.7 61.1 59.3 46.3 27 66.7 59.3 65.7 57.4 65.7 28 68.5 59.3 63.0 55.6 59.3 29 61.1 63.0 63.0 64.4 61.1 30 66.7 57.8 66.7 64.6 55.6 31 67.7 63.6 63.0 59.3 59.3 32 65.7 60.2 71.1 63.0 60.0
![Page 136: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/136.jpg)
125
33 63.0 53.7 67.7 61.1 54.5 34 70.4 68.5 69.4 63.0 59.3 35 74.1 70.4 61.1 61.1 50.0 36 66.7 61.1 66.7 59.3 61.1 37 64.8 59.3 64.8 61.1 63.0 38 66.7 66.7 68.5 59.3 57.4 39 65.4 63.0 69.4 65.4 57.4 40 63.9 61.1 66.7 64.8 55.6 41 63.0 60.2 66.7 58.3 61.1 42 68.5 63.9 66.7 57.4 67.3 43 69.4 64.8 66.7 60.2 66.7 44 77.8 60.2 67.6 72.2 57.4 45 66.7 59.3 68.5 56.5 59.3 46 64.8 57.4 66.7 68.5 64.8 47 65.7 63.0 68.5 57.4 57.4 48 64.0 62.0 65.7 51.3 58.3 49 55.0 55.6 59.3 43.6 51.9 50 62.6 47.6 65.7 46.1 58.3 51 59.0 46.5 64.4 43.8 56.2 52 42.4 51.1 42.0 53 56.4 40.0 54 54.9 47.9
Note: Lines within columns give approximate locations of triple-helix domain boundaries. RxC chi-squared tests (see supplementary table 1c) were conducted on comparisons among genes of binned distributions with bins starting at 40% in increments of 10%, up to 80% (e.g., 40, 50, 60, etc.). These bin intervals enabled the best resolution in comparisons among genes and bin size did not alter the results. See Materials and Methods (Chapter 2) for additional information.
![Page 137: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/137.jpg)
126
Supplementary Table 1c
Summary of RxC Chi-Squared Tests for Human Clade A Collagen Gene Exons
Chi-squared Value Region Compared
Included Exons COL1a2 COL2a1 COL3a1 COL5a2
Exon Length All 1.0 1.0 2.0 1.0 All 17.6* 4.9 24.0* 37.5* Exon GC-
content triple-helix 11.3* 1.1 16.1* 29.7* Note: * denotes comparisons that are statistically significant after a Bonferroni correction, P<0.0125.
![Page 138: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/138.jpg)
127
Supplementary Table 2a
COL1a1 Exon Lengths among Vertebrates
Exon Length (bp) Exon # Human Chimpanzee Mouse Dog Cow W. Frog Zebrafish
1 103 103 76 91 103 82 82 2 195 195 195 195 195 189 192 3 35 35 32 35 35 32 23 4 36 36 36 36 36 36 36 5 102 102 102 102 102 93 93 6 72 72 69 72 69 69 69 7 45 45 45 45 45 45 45 8 54 54 54 54 54 54 54 9 54 54 54 54 54 54 54
10 54 54 54 54 54 54 54 11 54 54 54 54 54 54 54 12 54 54 54 54 54 54 54 13 45 45 45 45 45 45 45 14 54 54 54 54 54 54 54 15 45 45 45 45 45 45 45 16 54 54 54 54 54 54 54 17 99 99 99 99 99 99 99 18 45 45 45 45 45 45 45 19 99 99 99 99 99 99 99 20 54 54 54 54 54 54 54 21 108 108 108 108 108 108 108 22 54 54 54 54 54 54 54 23 99 99 99 99 99 99 99 24 54 54 54 54 54 54 54 25 99 99 99 99 99 99 99 26 54 54 54 54 54 54 54 27 54 54 54 54 54 54 54 28 54 54 54 54 54 54 54 29 54 54 54 54 54 54 54 30 45 45 45 45 45 45 45 31 99 99 99 99 99 99 99 32 108 108 108 108 108 108 108
![Page 139: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/139.jpg)
128
33 108 108 108 108 108 108 108 34 54 54 54 54 54 54 54 35 54 54 54 54 54 54 54 36 108 108 108 108 108 108 108 37 54 54 54 54 54 54 54 38 54 54 54 54 54 54 54 39 162 162 162 162 162 162 162 40 108 108 108 108 108 108 108 41 108 108 108 108 108 108 108 42 54 54 54 54 54 54 54 43 108 108 108 108 108 108 108 44 54 54 54 54 54 54 54 45 108 108 108 108 108 108 108 46 54 54 54 54 54 54 54 47 108 108 108 108 108 108 108 48 283 283 283 283 283 280 280 49 191 191 191 191 191 191 191 50 243 243 243 243 243 243 243 51 144 144 144 144 144 144 144
Note: Lines within columns give approximate locations of triple-helix domain boundaries. Mann-Whitney U tests were conducted among species as were RxC chi-squared tests among orthologous exons (see supplementary tables 2d and e). See Materials and Methods (Chapter 2) for additional information.
![Page 140: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/140.jpg)
129
Supplementary Table 2b
COL1a1 Exon GC-Content among Vertebrates
Exon GC-content (%) Exon # Human Chimpanzee Mouse Dog Cow W. Frog Zebrafish
1 62.1 62.1 59.2 61.5 61.2 45.1 54.9 2 63.6 63.6 54.9 64.1 60.5 47.6 61.5 3 62.9 62.9 43.8 57.1 57.1 46.9 52.2 4 72.2 72.2 72.2 69.4 69.4 61.1 61.1 5 76.5 77.5 71.6 74.5 76.5 67.7 68.8 6 51.4 51.4 49.3 48.6 47.8 42.0 50.7 7 71.1 71.1 68.9 71.1 71.1 68.9 71.1 8 66.7 66.7 64.8 64.8 68.5 59.3 64.8 9 63.0 64.8 63.0 70.4 68.5 61.1 63.0
10 66.7 66.7 70.4 66.7 70.4 72.2 68.5 11 61.1 61.1 61.1 59.3 61.1 61.1 61.1 12 55.6 55.6 55.6 57.4 59.3 55.6 59.3 13 60.0 60.0 64.4 62.2 62.2 64.4 60.0 14 75.9 74.1 70.4 75.9 74.1 64.8 70.4 15 64.4 64.4 64.4 64.4 64.4 62.2 62.2 16 68.5 68.5 68.5 68.5 66.7 66.7 68.5 17 71.7 72.7 69.7 74.7 68.7 62.6 65.7 18 55.6 55.6 55.6 57.8 57.8 55.6 62.2 19 70.7 70.7 68.7 72.7 72.7 63.6 66.7 20 59.3 59.3 57.4 61.1 63.0 55.6 64.8 21 68.5 68.5 64.8 68.5 68.5 63.0 67.6 22 63.0 63.0 63.0 63.0 66.7 63.0 61.1 23 64.6 65.7 64.6 64.6 67.7 60.6 62.6 24 64.8 64.8 66.7 64.8 66.7 61.1 61.1 25 67.7 66.7 66.7 68.7 68.7 60.6 63.6 26 72.2 72.2 66.7 68.5 63.0 55.6 57.4 27 66.7 66.7 64.8 66.7 64.8 61.1 63.0 28 68.5 68.5 63.0 68.5 68.5 57.4 59.3 29 61.1 61.1 63.0 66.7 64.8 61.1 61.1 30 66.7 64.4 64.4 68.9 64.4 55.6 57.8 31 67.7 66.7 62.6 64.6 69.7 57.6 63.6 32 65.7 65.7 63.9 64.8 64.8 63.0 63.0
![Page 141: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/141.jpg)
130
33 63.0 63.9 60.2 62.0 64.8 54.6 59.3 34 70.4 70.4 72.2 70.4 70.4 66.7 57.4 35 74.1 75.9 72.2 74.1 75.9 68.5 63.0 36 66.7 66.7 61.1 65.7 67.6 61.1 60.2 37 64.8 64.8 63.0 66.7 66.7 63.0 66.7 38 66.7 68.5 68.5 72.2 70.4 59.3 63.0 39 65.4 64.8 64.8 68.5 70.4 62.3 60.5 40 63.9 63.0 61.1 65.7 62.0 55.6 60.2 41 63.0 63.0 63.0 64.8 65.7 63.0 63.0 42 68.5 66.7 68.5 63.0 63.0 63.0 63.0 43 69.4 69.4 70.4 72.2 72.2 63.9 63.0 44 77.8 79.6 70.4 66.7 72.2 70.4 77.8 45 66.7 66.7 60.2 63.9 67.6 59.3 61.1 46 64.8 64.8 66.7 61.1 68.5 57.4 64.8 47 65.7 65.7 66.7 66.7 65.7 55.6 59.3 48 64.0 65.0 62.2 66.1 65.0 56.4 60.7 49 55.0 55.0 54.5 54.5 54.5 48.7 51.8 50 62.6 62.6 59.7 61.7 61.7 50.6 53.5 51 59.0 59.0 56.2 55.6 60.4 50.0 49.3
Note: Lines within columns give approximate locations of triple-helix domain boundaries. Mann-Whitney U tests were conducted among species as were RxC chi-squared tests among orthologous exons (see supplementary tables 2d and e). See Materials and Methods (Chapter 2) for additional information.
![Page 142: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/142.jpg)
131
Supplementary Table 2c
Summary of COL1a1 Exon Characteristics among Vertebrates
Species
No. of amino acids
Exon length (bp)
Exon GC-content
(%) Human 1464 86 ± 52 66 ± 5 Chimpanzee 1464 86 ± 52 66 ± 5 Mouse 1453 86 ± 52 64 ± 6 Dog 1460 86 ± 52 65 ± 5 Cow 1463 86 ± 52 66 ± 5 Western clawed frog 1449 85 ± 52 59 ± 6
Zebrafish 1447 85 ± 52 62 ± 5 Note: Length and GC-content values denote means and standard deviations.
![Page 143: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/143.jpg)
132
Supplementary Table 2d
Summary of Mann-Whitney U Tests for COL1a1 Exons among Vertebrates
Mann-Whitney U Region Compared
Included Exons
Species Compared Human Mouse Dog Cow
W. Frog
Chimpanzee 1300.5 Mouse 1293.0 Dog 1294.0 1299.0 Cow 1300.0 1293.5 1294.5
W. Frog 1285.5 1293.5 1291.5 1286.0
All
Zebrafish 1286.6 1294.0 1292.5 1287.0 1299.5 Chimpanzee 18.0
Mouse 15.5
Dog 16.5 16.5 Cow 17.5 16.0 17.0
W. Frog 14.5 17.5 15.5 15.0
Exon Length
N-terminal domain
(exons 1-6)
Zebrafish 14.5 17.0 15.5 15.0 18.0 Chimpanzee 1299.5
Mouse 1091.5 Dog 1281.5 1075 Cow 1210.0 1011.5 1227.5
W. Frog 576.5* 772.0* 576.5* 533.5*
All
Zebrafish 745.0* 966.5 750.5* 693.5* 1039.5 Chimpanzee 832.5
Mouse 705.0 Dog 814.0 677.0 Cow 742.0 610.5 771.5
W. Frog 349.0* 433.5* 324.0* 275.5*
Exon GC-content
triple-helix domain
(exons 7-47)
Zebrafish 460.5* 569.0 442.0* 377.5* 651.5 Note: * denotes comparisons that are statistically significant after a Bonferroni correction, P<0.003. Because chimpanzee is identical or highly similar to human for all comparisons, chimpanzee was excluded from further analyses. Exon length comparisons were not conducted for the C-terminal domain (exons 48-51) as only a single exon differs and the difference is only 3 bp (see supplementary table 2a).
![Page 144: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/144.jpg)
133
Supplementary Table 2e
Summary of RxC Chi-Squared Tests for COL1a1 Exons among Vertebrates
Chi-Squared Value Region Compared
Included Exons
Species Compared Human Mouse Dog Cow W. Frog
Chimpanzee 0 Mouse 4.1 Dog 0.7 1.5 Cow 0.1 4.1 0.8
W. Frog 2.9 0.7 5.6 7.4
All
Zebrafish 9.7 2.1 7.9 9.7 6.0 Chimpanzee 0
Mouse 3.2 Dog 0.6 1.1 Cow 0.1 3.3 0.7
W. Frog 1.4 0.7 0.3 1.6
Exon Length
N-terminal domain
(exons 1-6)
Zebrafish 3.2 1.9 2.2 3.3 1.5 Chimpanzee 0.2
Mouse 6.7 Dog 2.8 4.5 Cow 2.5 5.5 2.7
W. Frog 13.5 4.8 13.8 12.7
All
Zebrafish 13.0 6.3 11.3 11.3 9.5 Chimpanzee 0.1
Mouse 1.5 Dog 1.8 1.7 Cow 1.7 1.9 1.7
W. Frog 3.8 2.5 4.1 2.9
Exon GC-content
triple-helix domain
(exons 7-47)
Zebrafish 3.5 3.8 4.6 3.2 2.7 Note: * denotes comparisons that are statistically significant after a Bonferroni correction, P<0.003. Because chimpanzee is identical or highly similar to human for all comparisons, chimpanzee was excluded from further analyses. Exon length comparisons were not conducted for the C-terminal domain (exons 48-51) as only a single exon differs and the difference is only 3 bp (see supplementary table 2a). The number of G and C bp were used for chi-squared comparisons of GC-content rather than percentage GC-content.
![Page 145: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/145.jpg)
134
Supplementary Table 3
Pairwise dN and dS Comparisons Across COL1a1 Domains in Primates
N-terminal Triple-helix C-terminal Species Pair dN dS dN dS dN dS H/C 0 0.007 0 0.016 0 0.027 H/O 0 0.014 0 0.040 0 0.076 H/M 0 0.022 0 0.067 0 0.066 C/O 0 0.022 0 0.039 0 0.071 C/M 0 0.029 0 0.066 0 0.060 O/M 0 0.036 0 0.058 0 0.087
Note: H = human, C = chimpanzee, O = orangutan, and M = macaque. For the N-terminal domain, the average number of synonymous and nonsynonymous sites across species is 138 and 396 bp, respectively. For the triple-helix, the average number of synonymous and nonsynonymous sites across species is 822 and 1986 bp, respectively. For the C-terminal domain, all species have 183 synonymous and 633 nonsynonymous bp.
![Page 146: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/146.jpg)
135
Supplementary Table 4a
Distributions of Amino Acid Positions and Observed Mutations among
Evolutionary Rate Bins
Positions per Domain Phenotypic Severity
Category
Evolutionary Rate Category
Total Positions
N-terminal
Triple-helix
C-terminal
Total Mutations 1 2 3 4
1 889 57 670 162 275 38 45 36 112 2 64 13 17 34 5 2 2 0 1 3 136 10 103 23 7 7 0 0 0 4 94 17 60 17 1 0 0 0 1 5 78 8 60 10 3 2 0 1 0 6 66 11 42 13 1 0 0 0 1 7 68 19 43 6 1 1 0 0 0 8 38 19 14 5 0 0 0 0 0 Total 1433 154 1009 270 293 50 47 37 115
Note: Evolutionary rate categories correspond to increments of 0.125 up to 1.0 (e.g., 0.125, 0.25, 0.375, etc.). Amino acid positions were binned according to these categories after scaling down the original rate estimates (ranging from 0.291 to 3.961). The number of amino acid positions in each rate category in each protein domain is listed under “Positions per domain.” RxC chi-squared tests were conducted to compare these distributions of positions among rate categories across domains (see supplementary table 4b). “Total mutations” provides the number of disease-associated mutations occurring at amino acid positions in each of the evolutionary rate categories and “phenotypic severity category” separates these mutations into categories according to the severity of their associated phenotype. Chi-squared tests were used to compare the distributions of these mutations among rate categories (see supplementary table 4c). See Materials and Methods (Chapter 2) for additional information.
![Page 147: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/147.jpg)
136
Supplementary Table 4b
Summary of RxC Chi-Squared Tests of Amino Acid Positions
Domain Compared N-terminal Triple-helixTriple-helix 124.8* C-terminal 52.3* 70.3*
Note: * denotes statistical significance at P<0.000001.
![Page 148: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/148.jpg)
137
Supplementary Table 4c
Summary of Chi-Squared Tests of Disease-Associated Mutations
Amino Acid Mutations
Chi-squared Value
Total 127.5* Severity Category 1 10.6 Severity Category 2 24.4** Severity Category 3 20.0** Severity Category 4 61.3**
Note: * denotes statistical significance at P<0.000001. ** denotes comparisons that are statistically significant after a Bonferroni correction, P<0.0125.
![Page 149: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/149.jpg)
138
Supplementary Table 5a
COL1a1 Intron Lengths among Vertebrates
Intron Length (bp) Intron # Human Chimpanzee Mouse Dog Cow W. Frog Zebrafish
2 151 144 118 82 143 287 377 3 98 98 115 99 106 99 193 4 86 86 86 95 85 796 631 5 718 719 781 814 837 113 319 6 223 223 233 263 213 860 214 7 154 154 158 159 146 92 288 8 159 159 156 191 177 417 452 9 494 494 369 526 501 874 185
10 112 112 115 116 113 982 124 11 335 335 326 325 328 92 84 12 84 84 77 98 83 88 81 13 112 112 94 111 112 680 84 14 110 110 109 126 105 100 244 15 174 174 132 159 163 344 83 16 253 253 230 257 247 494 442 17 84 84 85 84 82 84 90 18 99 99 89 87 92 167 674 19 127 127 115 128 124 788 85 20 214 214 177 124 161 84 197 21 90 90 99 76 74 738 93 22 121 118 279 261 274 647 90 23 161 161 144 170 195 113 94 24 84 84 85 88 82 84 931 25 898 911 519 589 601 141 88 26 139 139 185 169 186 543 95 27 99 99 96 99 100 115 246 28 107 107 104 112 94 119 107 29 446 446 396 426 432 125 135 30 89 89 83 88 87 451 86 31 293 293 208 269 263 616 152 32 454 454 352 422 564 150 101 33 216 216 187 186 205 1175 103 34 158 158 144 175 204 85 81 35 214 214 240 222 216 542 83
![Page 150: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/150.jpg)
139
36 84 83 84 81 82 81 147 37 122 122 106 115 114 213 88 38 136 136 126 140 134 588 87 39 97 97 99 96 85 157 452 40 153 153 337 163 141 849 138 41 103 103 99 103 100 440 104 42 100 100 105 109 88 75 116 43 376 376 363 392 542 768 85 44 108 108 117 109 114 78 97 45 334 333 252 275 331 236 96 46 357 357 319 285 274 612 108 47 88 88 82 87 433 334 78 48 128 128 131 137 105 167 106 49 292 292 197 267 285 339 89 50 125 125 125 96 111 290 282
Note: Splice sites are excluded, but Alus are included in this case to provide an accurate representation of length differences that have accumulated over time. Mann-Whitney U tests were conducted among species as were RxC chi-squared tests among orthologous exons (see supplementary tables 5d and e). See Materials and Methods (Chapter 2) for additional information.
![Page 151: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/151.jpg)
140
Supplementary Table 5b
COL1a1 Intron GC-Content among Vertebrates
Intron GC-Content (%) Intron # Human Chimpanzee Mouse Dog Cow W. Frog Zebrafish
2 58.2 73.6 48.3 81.7 67.1 36.9 36.9 3 74.4 58.2 48.7 58.6 56.6 40.4 28 4 39.7 74.4 54.7 75.8 75.3 35.4 33 5 54.7 40.2 40.7 39.9 40.1 33.6 42.3 6 48.7 54.3 49.8 47.9 48.8 36.6 29.9 7 42.8 48.1 53.8 49.7 52.1 29.3 30.2 8 49.4 43.4 48.7 44.5 42.9 30.7 38.3 9 54.5 49.6 46.3 49.2 50.9 34.6 33.5
10 51.9 55.4 53.9 56 54.9 38.9 29 11 45.2 51.9 44.8 49.8 50 46.7 31 12 59.8 46.4 45.5 46.9 51.8 31.8 32.1 13 62.7 59.8 55.3 48.6 55.4 34.3 31 14 58 62.7 59.6 56.3 53.3 40 28.7 15 55.7 57.5 46.2 61 57.7 38.4 18.1 16 53.6 55.7 51.7 57.6 55.1 36.6 29.9 17 65.7 53.6 54.1 59.5 53.7 35.7 27.8 18 63 64.6 60.7 65.5 62 34.1 35.6 19 66.8 63 51.3 57.8 58.1 37.1 25.9 20 67.8 67.3 60.5 68.5 62.1 33.3 29.9 21 60.3 67.8 52.5 65.8 63.5 32.8 40.9 22 63.4 60.2 56.6 59 62 36.6 26.7 23 59.5 63.4 62.5 64.7 57.9 34.5 40.4 24 61.2 58.3 52.9 53.4 59.8 28.6 42.1 25 61.6 53.3 49.7 53.8 52.6 29.1 35.2 26 54.2 60.4 53.5 64.5 57 36.1 28.4 27 54.9 60.6 55.2 65.7 67 31.3 24 28 57.3 54.2 46.2 56.2 53.2 34.5 31.8 29 61.1 54.7 50 53.8 54.4 36.8 30.4 30 54.6 55.1 51.8 59.1 56.3 30.6 38.4 31 62.5 60.4 53.8 60.2 57.8 32.3 34.9 32 57.6 54.2 46 53.8 52.8 32 34.7 33 58.9 62 53.5 61.8 60 37.3 29.1 34 66.7 57 50.7 63.6 61.8 25.9 30.9 35 60.7 59.3 57.1 59 59.3 33.8 32.5
![Page 152: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/152.jpg)
141
36 61.8 66.3 46.4 64.2 61 30.9 30.6 37 62.9 61.5 52.8 58.3 61.4 35.2 28.4 38 56.9 62.5 53.2 59.3 57.5 33.5 28.7 39 61.2 62.9 50.5 70.8 70.6 28.7 31.6 40 63 58.2 48.4 62.6 61.7 32.7 34.8 41 59.8 62.1 62.6 64.1 68 38.6 33.7 42 52.8 63 57.1 65.1 70.5 30.7 38.8 43 55.7 60.4 51.8 61 53.5 35.4 36.5 44 59.9 51.9 53 56.9 57 21.8 37.1 45 60.2 55 51.6 53.8 59.2 34.3 31.2 46 63.3 59.9 52.7 56.8 57.7 34.6 29.6 47 61 61.4 51.2 64.4 66.5 30.8 30.8 48 66.4 63.3 55 65 64.8 31.7 29.2 49 73.5 61.3 52.3 59.6 62.1 32.4 36 50 55.2 67.2 54.4 67.7 72.1 32.1 33.3
Note: Splice sites are excluded, but Alus are included in this case to provide an accurate representation of length differences that have accumulated over time. Mann-Whitney U tests were conducted among species as were RxC chi-squared tests among orthologous exons (see supplementary tables 5d and e). See Materials and Methods (Chapter 2) for additional information.
![Page 153: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/153.jpg)
142
Supplementary Table 5c
Summary of COL1a1 Intron Characteristics among Vertebrates
Species Intron length
(bp) Intron GC-content (%)
Human 203 ± 167 59 ± 7 Chimpanzee 203 ± 168 59 ± 7 Mouse 188 ± 135 52 ± 5 Dog 197 ± 150 59 ± 8 Cow 211 ± 166 58 ± 7 Western clawed frog 374 ± 303 34 ± 4
Zebrafish 192 ± 179 32 ± 5 Note: Length and GC-content values denote means and standard deviations, excluding first introns and splice sites.
![Page 154: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/154.jpg)
143
Supplementary Table 5d
Summary of Mann-Whitney U Tests for COL1a1 Introns among Vertebrates
Mann-Whitney U Region Compared
Included Introns
Species Compared Human Chimp. Mouse Dog Cow
W. Frog
Chimpanzee 1198.0
Mouse 1165.0 1167.5
Dog 1186.5 1187.0 1186.0 Cow 1194.5 1193.0 1163.5 1165.0
W. Frog 890.5 886.5 853.0 863.5 878.0
Intron Length All
Zebrafish 965.0 968.0 1019.5 1004.0 1005.0 771.5* Chimpanzee 1197.5
Mouse 442.5* 462.0* Dog 1185.0 1169.5 508.0* Cow 1120.0 1120.5 508.5* 1117.0
W. Frog 5.0* 4.0* 8.0* 4.0* 3.0*
Intron GC-content
All
Zebrafish 4.0* 4.0* 3.0* 4.0* 4.0* 907.0 Note: * denotes comparisons that are statistically significant after a Bonferroni correction, P<0.002.
![Page 155: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/155.jpg)
144
Supplementary Table 5e
Summary of RxC Chi-Squared Tests for COL1a1 Introns among Vertebrates
Chi-squared Value
Region Compared
Included Introns
Species Compared Human Chimp. Mouse Dog Cow
W. Frog
Chimpanzee 0.3
Mouse 322.8* 329.9*
Dog 203.6* 207.4* 162.8*
Cow 438.4* 445.4* 452.9* 309.1*
W. Frog 6315.1* 6341.8* 5287.8* 5626.6* 6144.0*
Intron Length All
Zebrafish 4641.1* 4660.8* 4129.9* 4456.2* 5013.6* 7674.5*
Chimpanzee 0.5
Mouse 208.0* 201.0*
Dog 128.3* 123.4* 82.7*
Cow 295.3* 289.7* 278.7* 190.9*
W. Frog 2579.0* 2564.1* 2115.4* 2288.0* 2566.0*
Intron GC-content
All
Zebrafish 2116* 2115.4* 1817.3* 2001.5* 2254.2* 2943.3* Note: * denotes comparisons that are statistically significant after a Bonferroni correction, P<0.002. The number of G and C bp were used for chi-squared comparisons of GC-content rather than percentage GC-content.
![Page 156: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/156.jpg)
145
Supplementary Table 6a
Human Clade A Collagen Gene Intron Lengths
Intron Length (bp) Intron # COL1a1 COL1a2 COL2a1 COL3a1 COL5a2
2 151 619 1483 230 5943 3 98 648 209 413 4120 4 86 1107 100 1276 1343 5 718 1274 103 937 1392 6 223 2931 159 451 4852 7 154 88 976 748 3533 8 159 88 624 642 1920 9 494 302 107 153 949 10 112 416 397 639 501 11 335 519 773 425 1118 12 84 1539 372 133 2936 13 112 287 127 457 954 14 110 95 302 654 866 15 174 385 522 416 489 16 253 494 2813 574 3074 17 84 139 475 147 3295 18 99 110 1398 204 3151 19 127 416 293 125 511 20 214 120 88 210 108 21 90 357 185 588 1228 22 121 109 387 330 227 23 161 909 366 215 1328 24 84 458 81 409 344 25 402 396 136 113 519 26 139 549 436 306 705 27 99 146 391 508 104 28 107 270 401 350 87 29 446 946 346 562 1245 30 89 1130 240 82 774 31 293 1216 243 260 1832 32 454 663 142 1497 317 33 216 941 234 81 997
![Page 157: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/157.jpg)
146
34 158 677 337 704 289 35 214 100 273 599 2756 36 84 92 233 344 206 37 122 356 370 265 413 38 136 832 249 190 401 39 97 1000 487 109 104 40 153 958 440 981 519 41 103 669 748 744 686 42 100 381 194 81 639 43 376 82 169 480 1110 44 108 136 352 505 1080 45 334 367 168 273 2297 46 357 473 160 92 563 47 88 122 178 777 1916 48 128 327 240 952 373 49 292 403 439 278 1022 50 125 706 353 733 2018 51 812 450 2464 52 339 1456 53 531 695
Note: Splice sites, Alus, and alignment gaps are excluded. RxC chi-squared tests (see supplementary table 6d) were conducted on comparisons among genes of binned distributions with bins in increments of 50 bp up to 1000 bp, with increments of 1000 bp thereafter up to 5000 bp (e.g., 50, 100, 150...1000, 2000, 3000, etc.). These bins enabled the best resolution in comparisons among genes and bin size did not alter the results. See Materials and Methods (Chapter 2) for additional information.
![Page 158: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/158.jpg)
147
Supplementary Table 6b
Human Clade A Collagen Gene Intron GC-Content
Intron GC-content (%) Intron # COL1a1 COL1a2 COL2a1 COL3a1 COL5a2
2 73.5 29.4 47.0 30.0 28.6 3 58.2 25.9 38.8 25.7 32.4 4 74.4 31.6 48.0 31.0 33.5 5 39.7 32.9 52.4 28.5 31.2 6 54.7 32.7 51.6 27.9 36.9 7 48.7 29.5 40.9 24.9 34.6 8 42.8 33.0 48.1 31.2 33.4 9 49.4 29.1 38.3 22.9 31.8 10 54.5 31.0 52.4 26.9 27.9 11 51.9 35.6 48.9 23.1 30.7 12 45.2 32.4 52.2 30.1 34.2 13 59.8 29.3 47.2 29.3 31.7 14 62.7 31.6 56.6 30.1 32.3 15 58.0 36.9 45.0 35.8 35.0 16 55.7 35.0 52.5 29.3 38.0 17 53.6 36.0 48.6 36.1 32.8 18 65.7 44.5 55.7 30.9 34.8 19 63.0 36.3 51.9 24.0 28.8 20 66.8 28.3 62.5 31.9 43.5 21 67.8 38.4 58.9 29.9 27.2 22 60.3 40.4 56.1 29.7 31.3 23 63.4 30.8 57.1 27.0 37.3 24 59.5 36.2 60.5 31.3 42.4 25 55.2 39.6 62.5 35.4 26.8 26 61.2 33.2 52.8 36.3 29.1 27 61.6 41.1 56.3 31.3 32.7 28 54.2 36.7 57.4 24.6 26.4 29 54.9 30.5 61.6 30.2 33.4 30 57.3 31.5 61.2 30.5 26.9 31 61.1 34.7 53.1 30.8 35.4 32 54.6 36.8 61.3 31.1 26.8 33 62.5 35.0 59.0 37.0 30.1
![Page 159: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/159.jpg)
148
34 57.6 37.4 55.5 32.1 34.3 35 58.9 37.0 64.8 32.7 35.3 36 66.7 39.1 60.9 33.4 33.5 37 60.7 33.7 64.6 29.4 28.6 38 61.8 38.8 55.0 23.7 24.2 39 62.9 34.7 60.2 24.8 37.5 40 56.9 38.7 64.3 30.0 32.0 41 61.2 36.3 58.6 29.6 30.2 42 63.0 40.7 63.4 25.9 28.5 43 59.8 34.1 62.1 34.2 35.0 44 52.8 49.3 59.7 25.9 28.7 45 55.7 32.4 56.0 33.3 34.7 46 59.9 34.5 60.0 35.9 31.8 47 60.2 52.5 62.9 33.2 27.6 48 63.3 37.9 56.2 27.5 32.4 49 61.0 35.7 56.0 31.3 28.8 50 66.4 32.9 54.4 30.4 36.7 51 33.1 59.6 35.6 52 53.1 28.6 53 56.1 30.6
Note: Splice sites, Alus, and alignment gaps are excluded. RxC chi-squared tests (see supplementary table 6d) were conducted on comparisons among genes of binned distributions with bins starting at 20% in increments of 5%, up to 85% (e.g., 20, 25, 30, etc.). These bins enabled the best resolution in comparisons among genes and bin size did not alter the results. See Materials and Methods (Chapter 2) for additional information.
![Page 160: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/160.jpg)
149
Supplementary Table 6c
Clade A Collagen Gene Intron Human-Chimpanzee Site Divergence
Intron Silent Divergence (%) Intron # COL1a1 COL1a2 COL2a1 COL3a1 COL5a2
2 0.66 0.48 1.15 2.17 0.69 3 0.00 1.39 1.44 1.69 0.87 4 1.16 0.99 0.00 1.41 0.82 5 0.70 1.02 3.88 0.64 0.36 6 0.90 1.23 0.63 0.22 0.68 7 0.00 0.00 1.02 1.07 0.76 8 0.63 1.14 1.28 0.78 0.73 9 0.20 1.66 1.87 1.31 0.84 10 0.89 0.48 1.26 0.78 0.00 11 0.30 0.39 0.78 0.47 0.45 12 2.38 0.52 1.08 2.26 0.65 13 0.00 0.35 0.79 1.31 0.73 14 0.00 2.11 0.66 0.46 0.92 15 0.57 1.30 0.96 0.96 0.82 16 0.00 1.01 1.46 0.35 0.85 17 0.00 2.88 1.26 2.04 0.79 18 1.01 2.73 1.00 0.98 1.17 19 2.36 0.24 1.02 0.80 0.39 20 1.40 0.83 0.00 1.43 0.93 21 2.22 1.40 0.00 1.19 0.65 22 0.83 2.75 1.81 1.21 0.00 23 2.48 0.66 2.19 1.40 1.20 24 0.00 0.22 1.23 0.24 1.16 25 0.00 1.01 1.47 1.77 0.58 26 0.72 0.18 0.92 1.96 0.57 27 1.01 0.68 0.51 1.38 0.96 28 0.00 1.48 0.75 0.57 0.00 29 0.67 0.74 1.45 0.53 1.12 30 1.12 1.06 0.42 0.00 0.78 31 0.68 1.32 1.23 1.15 0.71 32 0.66 0.75 2.11 1.40 0.32 33 0.46 1.28 0.43 2.47 0.70
![Page 161: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/161.jpg)
150
34 0.63 0.89 1.19 0.43 0.69 35 0.93 0.00 0.00 0.67 0.58 36 0.00 2.17 0.43 1.74 0.49 37 0.82 2.53 0.54 0.75 0.48 38 0.74 1.80 1.20 0.00 0.75 39 1.03 0.80 1.03 0.92 1.92 40 1.31 0.63 1.36 0.71 0.39 41 0.97 0.60 1.20 0.54 0.73 42 1.00 1.05 0.52 0.00 0.63 43 1.06 1.22 1.18 1.25 0.90 44 0.00 0.74 1.42 0.00 0.65 45 1.80 1.09 0.60 0.00 0.70 46 0.00 0.85 3.13 0.00 0.71 47 1.14 1.64 1.69 1.67 0.52 48 0.00 0.61 0.42 0.84 0.80 49 0.34 0.74 1.14 1.08 1.27 50 0.80 0.99 0.00 0.68 0.84 51 0.62 0.44 0.69 52 0.59 0.48 53 1.13 0.43
Note: Splice sites, Alus, and alignment gaps are excluded. RxC chi-squared tests (see supplementary table 6d) were conducted on comparisons among genes of binned distributions with bins in increments of 0.25%, up to 2.25% (0.25, 0.5, 0.75, etc.). These bins enabled the best resolution in comparisons among genes and bin size did not alter the results. See Materials and Methods (Chapter 2) for additional information.
![Page 162: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/162.jpg)
151
Supplementary Table 6d
Summary of RxC Chi-Squared Tests for Clade A Collagen Gene Introns
Chi-squared Value Region Compared
Included Introns COL1a2 COL2a1 COL3a1 COL5a2
Intron Length All 38.0* 24.3 31.6* 63.8* All 85.5* 9.7 94.6* 94.8* Intron GC-
content 80-500 bp 66.6* 10.1 78.0* 57.0* Intron Divergence 80-500 bp 14.0 13.4 12.9 7.9
Note: * denotes comparisons that are statistically significant after a Bonferroni correction, P<0.0125.
![Page 163: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/163.jpg)
152
Supplementary Table 6e
Summary of Correlation Coefficient (r) Analyses for Clade A Collagen Gene
Introns
Gene Length vs. GC-content
Length vs. No. of
Differences
GC-content vs.
Divergence COL1a1 -0.46* 0.51* 0.17 COL1a2 -0.27 0.89** 0.25 COL2a1 -0.21 0.95** -0.11 COL3a1 -0.02 0.80** 0.36 COL5a2 0.19 0.95** 0.51*
Note: * denotes statistical significance at P<0.001; **P<10-6.
![Page 164: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/164.jpg)
153
Supplementary Table 6f
Summary of F-tests Comparing Linear Regressions of the Number of G/C bp to
the Number of A/T bp for Human Clade A Collagen Gene Introns
Genes Compared COL1a1COL1a2 92.3* COL2a1 0.5 COL3a1 143.9* COL5a2 9.8*
Note: * denotes statistical significance at P<0.002.
![Page 165: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/165.jpg)
154
Supplementary Fig. 1. Phylogenetic trees displaying dN/dS rate classes for each
COL1a1 domain. Specifically, color-coding is used to highlight branches that fall
into the 3, 3, and 6 significantly different evolutionary rate classes within each of
the (A) N-terminal domain, (B) C-terminal domain, and (C) triple-helix domain,
respectively, as determined by hypothesis testing with the GABranch algorithm.
See Materials and Methods (Chapter 2) for more information.
![Page 166: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/166.jpg)
A
155
B
C
![Page 167: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/167.jpg)
156
APPENDIX B
SUPPLEMENTARY MATERIAL: CHAPTER 3
![Page 168: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/168.jpg)
157
Supplementary Table 1
Global Diversity and Human-Chimpanzee Divergence Estimates by Gene Region
Region sitesa Sb θπc Dd de
Total 16,993 133 0.07 -1.37 0.62Promoter 1,223 5 0.02 -1.49 0.82UTRf 263 3 0.19 -0.06 0.84First introng 1,459 10 0.04 -1.46 0.75Other intronsg 9,463 97 0.12 -1.02 0.69Synonymous 1,210 12 0.02 -2.16 1.49Nonsynonymous 3,177 6 0.01 -1.87 0 Silenth 10,673 109 0.11 -1.22 0.78 a Number of nucleotides
b Number of SNPs
c Average number of pairwise differences between sequences (%)
d Tajima’s D statistic
e Divergence as number of differences per nucleotide (%)
f 5’ and 3’ mRNA untranslated regions (UTR)
g Excludes splice sites
h Includes synonymous and intron sites, excluding the first intron and splice sites
![Page 169: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/169.jpg)
158
Supplementary Table 2
COL1a1 Nonsynonymous Polymorphisms
Exon cDNA Protein Populationa
8 C613G Pro205Alab Italian (1) 17 G1135C Ala379Pro Japanese (1) 24 C1663T Pro555Ser Russian (1) 44 G3223A Ala1075Thrc Middle Eastern (1), North African (3) 49 G3979A Gly1327Ser Middle Eastern (1) 50 C4195T Arg1399Cys Sub-Saharan African (1) Note: In “cDNA” and “Protein” columns, polymorphism listed denotes the change from ancestral to derived allele state (as inferred by comparison with chimpanzee as an outgroup). Numbers reflect position in sequence starting with either the first base of the start codon for “cDNA” or the first amino acid for “Protein.” The first 4 polymorphisms occur in the triple-helix domain and the last 2 in the C-terminal non-collagenous domain. a Population sample where SNP was identified with the frequency of SNP in parentheses. b Polymorphism previously identified in association with osteopenia (Spotila et al. 1994; Dalgleish 1997). c Polymorphism previously identified, but with no known associated phenotype (Dalgleish 1997).
![Page 170: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/170.jpg)
159
Supplementary Table 3
Observed Sp1-T Allele Frequencies
Population Allele Frequency (%)Global 8.8 Sub-Saharan African 5.6 North African 14.3 Middle Eastern 10.0 Russian 10.0 Chinese 0 Japanese 0 Southeast Asian 0 Mexican 20.0 Northern European 15.0 Italian 15.0
![Page 171: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/171.jpg)
160
Supplementary Fig. 1. Estimates of COL1a1 population differentiation. Pairwise
FST values are listed below the diagonal and Hudson’s (2000) Snn statistic above
the diagonal. Both measures were calculated using SNPs >5% in frequency. *
denotes statistically significant Snn values after a Bonferroni correction, P<0.001.
See Materials and Methods (Chapter 3) for additional information.
Sub-
Saha
ran
Afr
ican
Nor
th A
fric
an
Mid
dle
East
ern
Rus
sian
Chi
nese
Japa
nese
Sout
heas
t Asi
an
Mex
ican
Nor
ther
n Eu
rope
an
Italia
n
Sub-Saharan African
0.72 0.83* 0.85* 0.88* 0.93* 0.82* 0.82* 0.82* 0.82*
North African
0.11 0.63 0.61 0.88* 0.90* 0.77 0.70 0.76* 0.69
Middle Eastern
0.25 0.07 0.52 0.75* 0.73 0.57 0.50 0.35 0.38
Russian 0.23 0.07 0 0.81* 0.76* 0.61 0.64 0.57 0.49
Chinese 0.35 0.20 0.07 0.08 0.57 0.47 0.73* 0.68 0.67
Japanese 0.32 0.22 0.07 0.06 0 0.44 0.75* 0.58 0.63
Southeast Asian
0.26 0.14 0.02 0.01 0 0 0.55 0.56 0.51
Mexican 0.30 0.06 0.06 0.09 0.10 0.18 0.10 0.60 0.57
Northern European
0.24 0.11 0 0 0.02 0 0 0.11 0.38
Italian 0.26 0.08 0 0 0.02 0.02 0 0.05 0
![Page 172: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/172.jpg)
161
Supplementary Fig. 2. Inferred haplotypes for SNPs in the 5’ (on the left) and 3’
(on the right) “haploblocks” for each of the 10 human populations. Note that
haplotypes in the 5’ region have been sorted independently of those in the 3’
region. The derived allele for each site, as inferred from human-chimpanzee
contrasts, is represented with a grey box. Positions (in bp) are numbered starting
with the first nucleotide of the first exon. See Results and fig. 4 (Chapter 3) for
additional information.
![Page 173: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/173.jpg)
Chi
nese
Sub-
Saha
ran
Afr
ican
Nor
th A
fric
anM
iddl
e E
aste
rnR
ussia
nPo
sitio
n (b
p)36
811
2618
9723
1326
1627
0633
8034
1935
7736
4242
45
5714
6814
7436
8771
8966
9443
9567
9848
1042
611
284
1157
813
141
1332
713
443
1496
615
848
1609
4
162
![Page 174: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/174.jpg)
163
Japa
nese
Sout
heas
t Asia
nM
exic
anN
orth
ern
Eur
opea
nIt
alia
nPo
sitio
n (b
p)36
811
2618
9723
1326
1627
0633
8034
1935
7736
4242
45
5714
6814
7436
8771
8966
9443
9567
9848
1042
611
284
1157
813
141
1332
713
443
1496
615
848
1609
4
![Page 175: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/175.jpg)
164
Supplementary Fig. 3. Estimates of COL1a1 population differentiation for the 5’
region. Pairwise FST values are listed below the diagonal and Hudson’s (2000) Snn
statistic above the diagonal. Both measures were calculated using SNPs >5% in
frequency positioned 5’ to the identified “hotspot” (i.e., SNPs 368-4245 in fig. 4,
Chapter 3). * denotes statistically significant Snn values after a Bonferroni
correction, P<0.001. See Materials and Methods (Chapter 3) for additional
information.
Sub-
Saha
ran
Afr
ican
Nor
th A
fric
an
Mid
dle
East
ern
Rus
sian
Chi
nese
Japa
nese
Sout
heas
t Asi
an
Mex
ican
Nor
ther
n Eu
rope
an
Italia
n
Sub-Saharan African
0.58 0.78* 0.81* 0.88* 0.88* 0.83* 0.81* 0.76* 0.78*
North African
0.13 0.52 0.52 0.66 0.67 0.60 0.52 0.61 0.57
Middle Eastern
0.51 0.14 0.43 0.50 0.53 0.48 0.48 0.47 0.39
Russian 0.48 0.12 0 0.51 0.57 0.48 0.45 0.53 0.47
Chinese 0.62 0.27 0.02 0.01 0.48 0.46 0.56 0.53 0.50
Japanese 0.62 0.27 0.05 0.04 0 0.49 0.62 0.50 0.52
Southeast Asian
0.53 0.17 0 0 0 0 0.53 0.52 0.49
Mexican 0.42 0.07 0 0 0.08 0.11 0.02 0.60 0.52
Northern European
0.51 0.15 0 0 0.02 0.01 0 0.01 0.42
Italian 0.53 0.17 0 0 0.02 0.04 0 0 0
![Page 176: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/176.jpg)
165
Supplementary Fig. 4. Estimates of COL1a1 population differentiation for the 3’
region. Pairwise FST values are listed below the diagonal and Hudson’s (2000) Snn
statistic above the diagonal. Both measures were calculated using SNPs >5% in
frequency positioned 3’ to the identified “hotspot” (i.e., SNPs 5714-16094 in fig.
4, Chapter 3). * denotes statistically significant Snn values after a Bonferroni
correction, P<0.001. See Materials and Methods (Chapter 3) for additional
information.
Sub-
Saha
ran
Afr
ican
Nor
th A
fric
an
Mid
dle
East
ern
Rus
sian
Chi
nese
Japa
nese
Sout
heas
t Asi
an
Mex
ican
Nor
ther
n Eu
rope
an
Italia
n
Sub-Saharan African
0.83* 0.64 0.76* 0.75* 0.68 0.66 0.82* 0.71 0.64
North African
0.10 0.62 0.63 0.83* 0.90* 0.78* 0.59 0.78* 0.67
Middle Eastern
0.04 0.02 0.51 0.58 0.69 0.51 0.46 0.48 0.38
Russian 0.01 0.05 0 0.68 0.75* 0.57 0.57 0.54 0.43
Chinese 0.12 0.14 0.08 0.05 0.53 0.52 0.63 0.52 0.58
Japanese 0.07 0.19 0.09 0.03 0.01 0.43 0.72* 0.60 0.64
Southeast Asian
0.03 0.12 0.05 0 0 0 0.57 0.48 0.48
Mexican 0.21 0.06 0.11 0.15 0.09 0.21 0.15 0.57 0.51
Northern European
0.02 0.09 0.01 0 0.02 0 0 0.15 0.44
Italian 0.01 0.02 0 0 0.02 0.03 0 0.08 0
![Page 177: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/177.jpg)
166
Supplementary Fig. 5. Neighbor-joining tree constructed from COL1a1 SNPs
>5% allele frequency in our global population of 192 chromosomes, rooted with
chimpanzee. Sp1-T alleles are boxed in red. Certain clades lacking these alleles
have been collapsed to conserve space. The population sample of origin and the
number of alleles per population are shown for each branch. SA = sub-Saharan
African; NA = North African; ME = Middle Eastern; RU = Russian; CH =
Chinese; JA = Japanese; SA = Southeast Asian; MX = Mexican; NE = Northern
European; and IT = Italian. Reference “bar” reflects “2” substitutions. See Results
(Chapter 3) for additional information.
![Page 178: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/178.jpg)
167
NA17041b
NA17329a
NA17090b
NA17327b
NA17060a
NA17381b
NA16654a
2
NA-1, ME-4, RU-4, CH-1, SA-1, MX-1, NE-4, IT-2
IT-1
ME-1JA-1
SS-1
ME-1SS-1
IT-1JA-1, SA-1
ME-1ME-1
MX-1JAS
NE-1SS-1
SS-1, RU-1, NE-1
SS-11, NA-4, RU-1, SA-1, MX-1
RU-1
NA-2, ME-2, RU-5, SA-2, MX-5, NE-1, IT-3
CH-1, JA-2, SA-2, MX-3CH-4JA-1, ME-1
-1S-1
NA-1NA-2
ME-1MX-1
ME-1ME-1, JA-1, NE-2
IT-1SA-1
CH-1, SA-1IT-1JA-1
NA-1CH-1
ME-2, RU-6, CH-4, JA-7, SA-6, MX-1, NE-6, IT-4CH-1, SA-1, MX-1
ME-1, CH-7, JA-5, SA-4, MX-1, NE-2, IT-3
NE-1, IT-1NE-1
ME-1, IT-1MX-1
MX-1MX-2
NA-2, ME-1, RU-2, NE-1, IT-1
ME-1IT-1
SS-1
SS-1
NA-1ME-1
Chimpanzee
2
![Page 179: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/179.jpg)
168
APPENDIX C
SUPPLEMENTARY MATERIAL: CHAPTER 4
![Page 180: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/180.jpg)
169
Supplementary Table 1
COL1a1 Gene Region Polymorphism and Divergence Estimates
Number of differences Sites Polymorphism DivergenceSilent 76 74 Synonymous 12 15 Nonsynonymous 0 0 Promoter 4 8 UTR 1 1 First intron 11 9 Note: “Silent” includes synonymous and intron sites (excluding intron splice sites and the first intron). “Divergence” estimates are between human and chimpanzee.
![Page 181: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/181.jpg)
170
Supplementary Table 2
Chimpanzee SNP Diversity Estimates for PCR Fragments 5’ and 3’ of COL1a1
PCR Start Positiona
Stop Positiona
Regionb nc Sd θπe Df C-H Diver.g
C-B Diver.h
1 -89718 -87795 Silent 1748 4 0.037 -0.73 1.030 0.400 2 -81315 -80010 Silent 1273 0 0 n/a 1.021 0.236 3 -69735 -68058 Silent 594 3 0.181 1.14 1.178 0.337 Promoter 722 5 0.290 1.99 0.277 1.939 4 -66607 -65281 Silent 1260 3 0.047 -0.33 0.873 0 5 -35260 -33999 Silent 1030 2 0.054 0.35 1.845 0.194 6 -27926 -26320 Silent 1337 8 0.199 1.18 1.047 0.150 7 -21648 -17817 Silent 3430 14 0.130 1.13 0.583 0.379 8 -15499 -13839 Silent 1271 4 0.134 1.95 0.708 0.157 9 -12214 -10925 Silent 1171 7 0.197 1.11 1.025 0.171 10 25163 26632 Silent 918 3 0.100 0.65 1.198 0.109 Nonsyn 134 2 0.490 0.74 0.749 0 UTR 235 0 0 n/a 2.979 0 11 27205 28521 Silent 1161 2 0.043 0.12 0.775 0.172 12 28764 30323 Silent 1479 10 0.176 0.32 1.082 0.203 13 30239 34092 Silent 2790 12 0.115 0.44 0.932 0.215 Nonsyn 605 1 0.031 -0.31 1.323 0.331 14 34938 37156 Silent 654 3 0.066 -0.86 0.306 0.459 Nonsyn 28 0 0 n/a 0 0 UTR 36 0 0 n/a 2.778 0 Promoter 1000 5 0.146 0.62 0.900 0.100 1st intron 487 3 0.162 0.26 0.821 0.205 15 42564 44028 Silent 1432 4 0.029 -1.34 0.559 0.070 16 52684 53980 1st intron 1198 1 0.004 -1.12 0.584 0 17 67857 69935 Silent 1941 3 0.028 -0.48 0.567 0.309 18 81653 82973 1st intron 1301 1 0.014 -0.31 0.922 0 19 86876 88525 Silent 1298 4 0.019 -1.76 0.539 0.385 Nonsyn 315 1 0.130 1.06 0.318 0 20 88741 90616 Silent 1612 9 0.141 0.22 0.931 0.062
5’i - - Silent 13114 45 0.107 1.15 0.938 0.252 3’i - - Silent 13286 50 0.084 -0.18 0.790 0.211
a Start and stop positions refer to the first and last bp of the PCR primers according to the first base of the first exon of COL1a1 as position 1 as determined from the human genome reference sequence (NCBI build 36.1). PCR fragments
![Page 182: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/182.jpg)
171
with overlapping DNA sequences have been combined into a single “fragment.” See fig. 7 (Chapter 4) for a visual representation.
b “Silent” includes intergenic, synonymous, and intron sites (excluding first introns and splice sites; see Materials and Methods, Chapter 4). Potentially functional regions are listed separately as nonsynonymous, promoter, and UTR depending on the contents of each PCR fragment (e.g., some fragments only spanned introns). c Number of nucleotides d Number of SNPs e Average number of pairwise differences between sequences (%) f Tajima’s D statistic g Chimpanzee-Human divergence as number of differences per nucleotide (%) h Chimpanzee-Bonobo divergence as number of differences per nucleotide (%)
i Estimates derived from concatenating sequences from all PCR fragments 5’ or 3’ of COL1a1
![Page 183: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/183.jpg)
172
Supplementary Table 3
Bonobo SNP Diversity Estimates for PCR Fragments 5’ and 3’ of COL1a1
PCR Start Positiona
Stop Positiona
Regionb nc Sd θπe Df B-H Diver.g
1 -89718 -87795 Silent 1748 2 0.017 -0.96 1.030 2 -81315 -80010 Silent 1273 3 0 -0.09 1.021 3 -69735 -68058 Silent 594 3 0.038 -1.74 1.178 Promoter 722 3 0.061 -1.07 0.277 4 -66607 -65281 Silent 1260 3 0.035 -1.09 0.873 5 -35260 -33999 Silent 1030 4 0.071 -0.80 1.845 6 -27926 -26320 Silent 1337 9 0.168 -0.16 1.047 7 -21648 -17817 Silent 3430 11 0.064 -0.78 0.583 8 -15499 -13839 Silent 1271 4 0.066 -0.54 0.708 9 -12214 -10925 Silent 1171 2 0.047 0.10 1.025 10 25163 26632 Silent 918 0 0 n/a 1.198 Nonsyn 134 0 0 n/a 0.749 UTR 235 0 0 n/a 2.979 11 27205 28521 Silent 1161 3 0.026 -1.51 0.775 12 28764 30323 Silent 1479 5 0.044 -1.42 1.082 13 30239 34092 Silent 2790 7 0.045 -0.96 0.932 Nonsyn 605 2 0.048 -0.96 1.323 14 34938 37156 Silent 654 0 0 n/a 0.306 Nonsyn 28 0 0 n/a 0 UTR 36 0 0 n/a 2.778 Promoter 1000 0 0 n/a 0.900 1st intron 487 1 0.016 -1.16 0.821 15 42564 44028 Silent 1432 2 0.011 -1.51 0.559 16 52684 53980 1st intron 1198 3 0.067 0.03 0.584 17 67857 69935 Silent 1941 7 0.031 -2.04 0.567 18 81653 82973 1st intron 1301 2 0.045 0.25 0.922 19 86876 88525 Silent 1298 4 0.058 -0.77 0.539 Nonsyn 315 1 0.024 -1.16 0.318 20 88741 90616 Silent 1612 8 0.059 -1.71 0.931
5’h - - Silent 13114 41 0.063 -0.87 1.022 3’h - - Silent 13286 36 0.035 -1.89 0.858
a Start and stop positions refer to the first and last bp of the PCR primers according to the first base of the first exon of COL1a1 as position 1 as determined from the human genome reference sequence (NCBI build 36.1). PCR fragments
![Page 184: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/184.jpg)
173
with overlapping DNA sequences have been combined into a single “fragment.” See fig. 7 (Chapter 4) for a visual representation.
b “Silent” includes intergenic, synonymous, and intron sites (excluding first introns and splice sites; see Materials and Methods, Chapter 4). Potentially functional regions are listed separately as nonsynonymous, promoter, and UTR depending on the contents of each PCR fragment (e.g., some fragments only spanned introns). c Number of nucleotides d Number of SNPs e Average number of pairwise differences between sequences (%) f Tajima’s D statistic g Bonobo-Human divergence as number of differences per nucleotide (%) h Estimates derived from concatenating sequences from all PCR fragments 5’ or 3’ of COL1a1
![Page 185: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/185.jpg)
174
Supplementary Fig. 1: Inferred haplotypes for polymorphisms >5% frequency in
chimpanzees. Chromosomes with identical haplotypes have been combined into
one row with the number per haplotype listed on the side. Positions (bp) of each
site are numbered according to the first base of the first exon of COL1a1 as
position 1 as determined from the human genome reference sequence (NCBI
build 36.1). “5’ region” and “3’ region” refer to polymorphisms found within our
additional sequenced fragments 5’ or 3’ of COL1a1, respectively. Outside of
COL1a1, polymorphisms at nonsynonymous, promoter, UTR, and first intron
sites are labeled as such. Gene regions of all sites within the COL1a1 locus are
labeled and the separation of haplotypes into haplogroups A and B is indicated on
the side. The “ancestral” state for each site was inferred from human-chimpanzee-
macaque-orangutan contrasts. See Materials and Methods and Results (Chapter 4)
for additional information.
![Page 186: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/186.jpg)
175
TMEM92 Promoter
TMEM92 Promoter
TMEM92 Promoter
TMEM92 Promoter
Promoter
Promoter
Promoter
Promoter
Promoter
Intron 1
Intron 1
Intron 1
Intron 1
Intron 1
Intron 1
Intron 1
Intron 1
Intron 1
Intron 1
Intron 2
Intron 2
Intron 2
Intron 4
Posi
tion
(bp)
-89270
-88456
-69443
-69144
-69001
-68958
-68854
-68640
-65508
-65444
-34718
-27765
-27642
-27499
-27451
-27426
-21142
-20607
-20543
-19975
-19850
-19662
-19417
-19383
-18800
-18350
-15203
-14601
-14498
-14278
-12156
-12098
-11775
-11285
-11247
-1128
-746
-708
-678
-277
111
148
165
284
509
702
726
1064
1227
1324
1831
1881
1919
2134
CO
L1a1
H
aplo
grou
pA
nces
tral
GC
CC
AA
CC
CG
CG
AG
GG
GG
TG
AG
CC
CG
CC
GT
AT
GG
CT
AT
CTC
CC
CG
GG
GC
CT
AC
G
# ch
rom
o. 5
.T
A.
.G
..
T.
TA
.A
AA
AA
..
GC
.T
T.
GG
TC
..
AA
.A
CC
.TC
G de
letio
n.
..
.A
T.
.G
A.
T.
A2
.T
A.
.G
..
T.
TA
.A
AA
AA
..
GC
.T
T.
GG
TC
..
AA
.A
CC
.TC
G de
letio
n.
..
.A
T.
.G
A.
T.
A1
..
.T
G.
GT
..
TA
GA
AA
.A
..
.C
.T
T.
GG
T.
..
AA
.A
C.
GTC
G de
letio
n.
..
.A
T.
.G
A.
T.
A1
.T
A.
.G
..
T.
TA
.A
AA
.A
..
.C
.T
T.
GG
T.
..
AA
.A
C.
GTC
G de
letio
n.
..
.A
T.
.G
A.
T.
A1
..
A.
.G
..
T.
TA
.A
AA
.A
..
.C
.T
T.
GG
T.
..
AA
.A
C.
GTC
G de
letio
n.
..
.A
T.
.G
A.
T.
A1
..
A.
.G
..
..
TA
.A
AA
.A
..
.C
.T
T.
GG
T.
..
AA
.A
C.
GTC
G de
letio
n.
..
.A
T.
.G
A.
T.
A1
..
A.
.G
..
..
TA
GA
AA
.A
..
.C
.T
T.
GG
T.
..
AA
.A
C.
.TC
G de
letio
n.
..
.A
T.
TTCC
inse
rtion
GA
.T
.A
3.
.A
..
G.
..
.T
AG
AA
A.
A.
..
C.
TT
.G
GT
..
.A
A.
AC
..
TCG
dele
tion
..
..
AT
.TT
CC in
serti
onG
A.
T.
A1
C.
.T
G.
GT
..
..
G.
..
..
GA
..
..
.A
..
..
GG
..
.A
CC
.TC
G de
letio
n.
..
.A
T.
.G
A.
T.
A1
..
.T
G.
GT
..
..
G.
..
..
GA
..
..
.A
..
..
GG
..
..
..
.TC
GT
AT
A.
..
..
..
T.
A1
..
A.
.G
..
..
..
G.
..
..
GA
..
..
.A
..
..
GG
..
.A
C.
GTC
G de
letio
n.
..
.A
T.
.G
A.
T.
A1
..
A.
.G
..
..
..
G.
..
..
GA
..
..
.A
..
..
GG
..
.A
C.
GTC
G de
letio
n.
..
.A
T.
.G
A.
T.
A3
C.
.T
G.
GT
.A
..
G.
..
..
GA
..
..
.A
..
..
GG
..
.A
C.
GTC
G de
letio
n.
..
.A
T.
.G
A.
T.
A1
..
A.
.G
..
..
TA
GA
AA
.A
..
.C
.T
T.
GG
T.
..
AA
..
..
.TC
GT
AT
A.
..
..
.C
.G
.3
..
.T
G.
GT
..
..
G.
..
..
GA
..
..
.A
..
..
GG
..
..
..
.TC
GT
AT
A.
..
..
.C
.G
.1
..
.T
G.
GT
..
..
G.
..
..
GA
..
..
.A
..
..
GG
..
..
..
.TC
GT
AT
A.
..
..
.C
.G
.2
..
A.
.G
..
..
..
G.
..
..
GA
..
..
.A
..
..
GG
..
..
..
.TC
GT
AT
A.
..
..
.C
.G
.1
..
.T
G.
GT
..
TA
.A
AA
AA
..
GC
.T
T.
GG
TC
..
AA
..
..
.TC
GT
AT
A.
..
..
.C
.G
.1
..
.T
G.
GT
..
TA
.A
AA
AA
..
GC
.T
T.
GG
TC
..
AA
..
..
.TC
GT
AT
A.
..
..
.C
.G
.3
..
.T
G.
GT
..
..
G.
..
.A
..
.C
TT
T.
GG
T.
..
AA
T.
..
.TC
GT
AT
A.
.C
inse
rtion
..
.C
.G
.1
C.
.T
G.
GT
..
..
G.
..
.A
..
.C
TT
T.
GG
T.
..
AA
T.
..
.TC
GT
AT
A.
.C
inse
rtion
..
.C
.G
.4
..
.T
G.
GT
..
..
G.
..
.A
..
.C
TT
T.
GG
T.
..
AA
T.
..
.TC
GT
AT
A.
.C
inse
rtion
..
.C
.G
.1
..
.T
G.
GT
..
..
G.
..
.A
..
.C
TT
T.
GG
T.
..
AA
T.
..
.TC
GT
AT
A.
.C
inse
rtion
..
.C
.G
.
A B
5' R
egio
nC
OL1
a1
![Page 187: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/187.jpg)
176
Intron 4
Intron 4
Exon 5
Exon 5
Intron 5
Intron 5
Intron 5
Intron 5
Intron 5
Intron 5
Intron 6
Intron 6
Intron 7
Exon 8
Intron 8
Intron 8
Intron 9
Intron 9
Intron 9
Intron 9
Intron 9
Intron 10
Intron 11
Intron 11
Intron 12
Intron 12
Exon 13
Intron 13
Intron 14
Intron 15
Intron 16
Intron 16
Exon 17
Intron 19
Intron 19
Intron 19
Intron 20
Intron 20
Intron 20
Exon 21
Intron 21
Intron 21
Intron 22
Exon 23
Intron 23
Intron 24
Intron 27
Exon 28
Intron 28
Intron 29
Intron 29
Intron 29
Intron 29
Intron 29
Intron 30
Posi
tion
(bp)
2163
2169
2264
2282
2361
2522
2578
2617
2672
2777
3138
3233
3431
3550
3599
3701
3836
3911
3937
4037
4253
4436
4688
4792
4927
4969
4991
5076
5234
5403
5624
5790
5860
6351
6358
6403
6498
6499
6638
6787
6827
6848
7010
7075
7246
7429
8777
8832
8889
9165
9209
9238
9484
9485
9569
CO
L1a1
H
aplo
grou
pA
nces
tral
CC
TC
TT
TT
GT
ACA
GC
TT
GG
GC
GT
TG
CT
GG
GG
CC
TA
TC
CA
AC
CC
AT
GC
CC
CC
TC
CC
G
# ch
rom
o. 5
..
.T
CC
C.
.A
.A
CA d
elet
ion
..
A.
..
A.
G de
letio
nG
.C
TC
..
..
..
CG
..
T.
G.
TT
GG
C.
.T
.T
..
.T
.2
..
.T
CC
C.
.A
.A
CA d
elet
ion
..
A.
..
A.
G de
letio
nG
.C
TC
..
..
..
CG
..
T.
G.
TT
GG
C.
.T
.T
..
.T
.1
..
.T
CC
C.
.A
.A
CA d
elet
ion
..
A.
..
A.
G de
letio
nG
C.
TC
..
..
..
CG
..
T.
G.
TT
GG
..
.T
.T
..
.T
.1
..
.T
CC
C.
.A
.A
CA d
elet
ion
..
A.
..
A.
G de
letio
nG
C.
TC
..
..
..
CG
..
T.
G.
TT
GG
..
.T
AT
C.
.T
.1
..
.T
CC
C.
.A
.A
CA d
elet
ion
..
A.
..
A.
G de
letio
nG
C.
TC
..
..
..
CG
..
T.
G.
TT
GG
..
.T
AT
C.
.T
.1
..
.T
CC
C.
.A
.A
CA d
elet
ion
..
A.
..
A.
G de
letio
nG
C.
TC
..
..
..
CG
..
T.
G.
TT
GG
..
.T
AT
C.
.T
.1
.T
.T
CC
C.
.A
.A
CA d
elet
ion
CA
A.
A.
A.
G de
letio
nG
..
TC
..
..
T.
CG
..
T.
G.
TT
GG
..
..
..
CT
G.
.3
.T
.T
CC
C.
.A
.A
CA d
elet
ion
CA
A.
A.
A.
G de
letio
nG
..
TC
..
..
T.
CG
..
T.
G.
TT
GG
..
..
..
CT
G.
.1
..
.T
CC
C.
.A
.A
CA d
elet
ion
..
A.
..
A.
G de
letio
nG
.C
TC
..
..
..
CG
..
T.
G.
TT
GG
..
.T
.T
..
.T
.1
..
.T
CC
C.
.A
.A
CA d
elet
ion
.A
A.
..
A.
G de
letio
nG
..
TC
..
..
..
CG
..
T.
G.
TT
GG
..
.T
.T
..
.T
.1
..
.T
CC
C.
.A
.A
CA d
elet
ion
..
A.
..
A.
G de
letio
nG
C.
TC
..
..
..
CG
..
T.
G.
TT
GG
..
.T
AT
C.
.T
T1
..
.T
CC
C.
.A
.A
CA d
elet
ion
..
A.
..
A.
G de
letio
nG
C.
TC
..
..
..
CG
..
T.
G.
TT
GG
..
.T
AT
C.
.T
T3
..
.T
CC
C.
.A
.A
CA d
elet
ion
..
A.
..
A.
G de
letio
nG
C.
TC
..
..
..
CG
..
T.
G.
TT
GG
..
.T
AT
C.
.T
T1
T.
C.
..
.C
C in
serti
on.
C.
..
.G
.C
.T
..
..
..
A.
CA
.T
..
CT
.C
.T
..
G.
.T
..
..
CT
G.
.3
T.
C.
..
.C
C in
serti
on.
C.
..
.G
.C
.T
..
..
..
A.
CA
.T
..
CT
.C
.T
..
..
.T
..
.T
CT
G.
.1
T.
C.
..
.C
C in
serti
on.
C.
..
.G
.C
.T
..
..
..
A.
CA
.T
..
CT
.C
.T
..
..
.T
..
.T
CT
G.
.2
T.
C.
..
.C
C in
serti
on.
C.
..
.G
.C
.T
..
..
..
A.
CA
.T
..
CT
.C
.T
..
G.
.T
..
.T
CT
G.
.1
T.
C.
..
.C
C in
serti
on.
C.
..
.G
.C
.T
..
..
..
A.
CA
.T
..
CT
.C
.T
..
G.
.T
..
..
CT
G.
.1
T.
C.
..
.C
C in
serti
on.
C.
..
.G
.C
.T
..
..
..
A.
CA
.T
..
CT
.C
.T
..
G.
.T
..
..
CT
G.
.3
T.
C.
..
.C
C in
serti
on.
C.
..
.G
.C
.T
..
..
..
AT
CA
.T
..
CT
.C
.T
..
G.
.T
G.
..
CT
G.
.1
T.
C.
..
.C
C in
serti
on.
C.
..
.G
.C
.T
..
..
..
AT
CA
.T
..
CT
.C
.T
..
G.
.T
G.
..
CT
G.
.4
T.
C.
..
.C
C in
serti
on.
C.
..
.G
.C
.T
..
..
..
AT
CA
.T
..
CT
.C
.T
..
G.
.T
G.
..
CT
G.
.1
T.
C.
..
.C
C in
serti
on.
C.
..
.G
.C
.T
..
..
..
AT
CA
.T
..
CT
.C
.T
..
G.
.T
G.
..
CT
G.
.
A B
CO
L1a1
con
tinue
d
![Page 188: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/188.jpg)
177
Intron 34/Exon 35
Intron 40
Exon 48
Exon 49
Intron 49
Intron 49
Intron 49
Intron 50
Intron 50
Intron 50
3'UTR
SGCA (Arg/Leu)
SGCA (Asn/Ser)
SGCA (Arg/Cys)
SGCA Intron 1
SGCA Intron 1
SGCA Promoter
SGCA Promoter
SGCA Promoter
SGCA Promoter
SGCA Promoter
SAMD14 Intron 1
SAMD14 (Ser/Thr)
PDK2 3'UTR
Posi
tion
(bp)
11171
12372
14795
15098
15345
15414
15443
15774
15816
15826
16026
26057
26144
26150
26465
28031
28364
28930
29083
29549
29661
29758
29809
29918
31345
31617
32034
32173
32239
32350
32545
32727
32842
32896
33060
35031
35327
35588
35772
35884
36064
36499
36652
43549
68078
68082
82819
87239
89071
89090
89344
90399
90414
90421
90428
90435
90523
COL1
a1
Hapl
ogro
upAn
cest
ral
GC
CC
GT
CA
CC
AC
TC
CG
GG
GA
CG
TG
CC
GG
CA
GG
GA
CG
GTC
AA
GTCT
C G
CC
GC
AC
GG
AG
GC
GC
GC
# ch
romo
. 512
4-bp
dup
licat
ion
AT
TT
A.
.G
TT
..
..
..
..
AG
T.
..
A.
..
..
..
..
..
..
TCA
AGT
CTC
delet
ion
A.
G.
..
.C
..
..
.C
..
.2
124-
bp d
uplic
atio
nA
TT
TA
..
GT
T.
..
..
..
.A
GT
..
.A
..
..
..
..
..
..
.TC
AA
GTCT
C de
letio
nA
.G
..
..
C.
..
..
..
A.
1.
AT
T.
A.
.G
TT
..
..
..
.A
..
..
.C
..
..
A.
..
..
.T
..
..
..
C.
..
C.
..
..
..
..
1.
AT
T.
A.
.G
TT
..
C.
.A
.A
..
..
TAGG
inse
rtion
C.
..
.A
..
..
..
T.
..
..
..
.C
.C
..
..
..
..
.1
.A
TT
.A
..
GT
T.
.C
..
A.
A.
..
.TA
GG in
serti
onC
..
..
A.
..
..
.T
..
..
..
..
C.
C.
..
..
..
..
1.
AT
T.
A.
.G
TT
..
C.
.A
.A
..
..
TAGG
inse
rtion
C.
..
.A
..
..
..
T.
..
..
..
.C
.C
..
..
..
..
.1
..
..
..
A.
..
..
..
G.
..
..
G.
..
C.
..
..
..
..
A.
..
CTC
AA
GTCT
C de
letio
nA
..
C.
..
C.
..
..
C.
..
3.
..
..
.A
..
..
..
.G
..
..
.G
..
.C
..
..
..
..
.A
..
.C
TCA
AGT
CTC
delet
ion
A.
.C
..
.C
..
AC
..
..
.1
.A
TT
TA
..
GT
T.
..
..
..
A.
..
..
C.
..
.A
..
..
..
T.
..
..
.C
..
.C
..
..
..
..
.1
.A
TT
TA
..
GT
T.
..
..
..
A.
..
..
C.
..
.A
..
..
..
T.
..
..
..
..
..
.G
..
..
T.
.1
.A
TT
.A
.T
GT
T.
..
..
..
.A
GT
..
..
..
..
..
..
..
..
.TC
AA
GTCT
C de
letio
nA
.G
..
..
C.
..
..
C.
..
1.
AT
T.
A.
TG
TT
..
C.
..
..
AG
TT
..
..
.A
.T
.A
..
.T
..
..
..
..
..
..
G.
..
.T
..
3.
AT
T.
A.
TG
TT
..
C.
..
..
AG
TT
..
..
.A
.T
.A
..
.T
..
..
..
..
.T
.A
G.
.T
.T
..
1.
..
..
..
..
..
..
.G
..
..
AG
TT
..
..
.A
..
..
..
..
..
TCA
AGT
CTC
delet
ion
..
..
..
.C
..
..
..
.A
.3
..
..
..
..
..
..
..
G.
.A
A.
..
..
C.
..
.A
.G
..
.G
T.
..
..
..
..
.C
..
..
..
..
.1
..
..
..
..
..
..
..
G.
.A
A.
..
..
C.
..
.A
.G
..
.G
T.
..
..
..
..
.C
..
..
.C
..
.2
..
..
..
..
..
..
..
G.
..
A.
..
..
C.
..
.A
..
..
..
T.
..
..
..
..
.C
..
..
..
..
A1
..
..
..
..
..
..
..
G.
..
A.
..
..
C.
..
.A
..
..
..
T.
..
..
..
..
.C
..
..
..
..
A1
..
..
..
..
..
..
..
G.
..
A.
..
..
C.
..
.A
..
..
..
T.
..
..
..
..
T.
AG
..
T.
T.
.3
..
..
..
..
..
.T
A.
GT
..
.A
GT
..
..
TT
..
..
.T
..
.A
.TC
AA
GTCT
C de
letio
nA
T.
.T
..
.A
G.
.T
.T
..
1.
..
..
..
..
..
TA
.G
T.
..
AG
T.
..
.T
T.
..
..
T.
..
A.
TCA
AGT
CTC
delet
ion
AT
..
T.
..
AG
..
T.
T.
.4
..
..
..
..
..
.T
A.
GT
..
.A
GT
..
..
TT
..
..
.T
..
.A
.TC
AA
GTCT
C de
letio
nA
T.
.T
..
C.
..
..
..
..
1.
..
..
..
..
..
TA
.G
T.
..
AG
T.
..
.T
T.
..
..
T.
..
A.
TCA
AGT
CTC
delet
ion
AT
..
T.
..
..
..
..
..
.
A B
3' R
egio
nCO
L1a1
con
tinue
d
![Page 189: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/189.jpg)
178
Supplementary Fig. 2. LD comparisons among informative polymorphisms (>5%
frequency) in 40 chimpanzee chromosomes for sequenced chromosome 17
regions in and around COL1a1. See supplementary fig. 1 for polymorphism
positions. Significant pairwise associations according to r2 are represented by the
blue filled boxes above the diagonal. Filled boxes below the diagonal represent
pairwise comparisons in significantly more (blue boxes) or less (red boxes) LD
than expected given a locus-specific evolutionary model of recombination. See
Materials and Methods and Results (Chapter 4) for additional information. -
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
-
![Page 190: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/190.jpg)
179
Supplementary Fig. 3. REHH vs. COL1a1 core haplotype frequency. REHH is
measured at ~85-kb away from each core haplotype. REHH values have been
grouped according to haplotype frequency in overlapping bins of 5% (e.g., 10-
15%, 15-20%, etc.) with horizontal lines indicating the mean REHH for each bin.
Circled data points are REHH (and EHH in parentheses) for the haplotype 5’
(3.414) and 3’ (3.200) of the COL1a1 exon duplication, which are not
significantly different from the mean REHH observed in the 15-20% frequency
bin (P>0.09). See Results (Chapter 4) for additional information.
![Page 191: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/191.jpg)
180
Supplementary Fig. 4. Inferred haplotypes for polymorphisms >5% frequency in
bonobos. Chromosomes with identical haplotypes have been combined into one
row with the number per haplotype listed on the left. Positions (bp) of each site
are numbered according to the first base of the first exon of COL1a1 as position 1
as determined from the human genome reference sequence (NCBI build 36.1). “5’
region” and “3’ region” refer to polymorphisms found within our additional
sequenced fragments 5’ or 3’ of COL1a1, respectively. Polymorphisms in
nonsynonymous, promoter, UTR, and first intron sites are labeled as such. The
“ancestral” state for each site was inferred from human-chimpanzee-macaque-
orangutan contrasts. See Results (Chapter 4) for additional information.
![Page 192: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/192.jpg)
TMEM92 Promoter
TMEM92 Promoter
TMEM92 Promoter
Intron 1
SGCA (Arg/His)
SGCA Promoter
PPP1R9B Intron 1
PPP1R9B Intron 1
SAMD14 Intron 1
PDK2 3'UTR
Posi
tion
(bp)
-89390
-80761
-80509
-69267
-69195
-68977
-65607
-65359
-34900
-34780
-27633
-27499
-27392
-27115
-26561
-26505
-21142
-20923
-20379
-20192
-19749
-19746
-19553
-15229
-14515
-11285
132
11253
27465
29882
29972
30735
31268
31536
32173
35518
52871
53484
68441
81946
88063
89074
89324
90232
90587
Anc
estr
alC
GA
GA
GG
CC
AC
GT
TC
GG
TAC
GC
GC
CA
CG
GCA
CC
CC
AC
GC
CC
GG
CC
T#
chro
mo.
4.
..
..
..
..
..
AC
..
..
..
..
CA
.C
T.
.CA
del
etio
n.
.T
..
.A
GGA
GGGA
inse
rtion
A.
.T
C.
..
.1
..
..
..
.T
..
.A
C.
..
..
..
.C
A.
CT
..
CA d
elet
ion
..
T.
..
AGG
AGG
GA in
serti
onA
..
TC
..
..
1.
..
.C
..
T.
..
AC
..
..
..
..
CA
.C
T.
.CA
del
etio
n.
.T
..
.A
GGA
GGGA
inse
rtion
A.
.T
C.
..
.2
.A
..
..
..
..
.A
C.
..
..
..
.C
A.
CT
..
CA d
elet
ion
..
T.
..
AGG
AGG
GA in
serti
onA
..
TC
.A
..
1.
..
..
..
..
..
AC
..
..
..
..
CA
.C
T.
ACA
del
etio
n.
.T
..
.A
GGA
GGGA
inse
rtion
..
..
..
..
.1
..
..
..
.T
..
..
C.
.A
A.
..
..
..
..
..
..
..
..
..
..
..
..
..
.1
GA
G.
..
..
..
..
C.
.A
A.
..
..
..
..
..
..
..
..
..
..
..
C.
..
.1
..
..
..
..
..
..
C.
.A
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.1
..
..
..
..
..
..
C.
.A
..
..
..
..
..
..
..
..
.C
..
..
..
..
..
.1
.A
..
..
..
..
..
C.
.A
.C
..
..
..
..
..
.T
..
.C
..
..
..
..
..
.2
..
..
..
..
..
..
C.
.A
.C
..
..
..
..
..
.T
..
.C
..
..
..
..
..
.1
..
G.
..
..
..
..
..
..
..
..
T.
..
..
..
..
..
..
..
..
..
..
..
.1
..
..
..
..
.G
..
..
..
..
T.
T.
..
..
..
..
..
..
..
..
..
..
..
.1
..
..
..
..
.G
..
..
..
..
T.
..
..
..
..
..
..
..
..
..
..
..
..
.1
..
G.
..
..
.G
..
..
..
..
T.
..
..
..
..
..
..
..
..
..
..
..
..
.1
..
GA
CC
A.
TG
..
..
..
..
T.
..
..
..
.A
..
T.
..
..
A.
.T
C.
..
.2
G.
G.
..
..
TG
..
..
..
..
T.
..
..
..
.A
..
T.
..
..
A.
.T
C.
..
.1
..
GA
.C
A.
..
T.
.C
A.
..
.A
..
.T
..
T.
..
..
T.
T.
.T
..
.C
.T
.2
..
..
..
..
..
T.
.C
A.
..
.A
..
.T
..
T.
..
..
T.
T.
.T
T.
.C
.T
A
CO
L1a1
3' R
egio
n5'
Reg
ion
181
![Page 193: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/193.jpg)
182
Supplementary Fig. 5. LD comparisons among informative polymorphisms (>5%
frequency) in 20 bonobo chromosomes for sequenced chromosome 17 regions.
Positions (in bp) of each site are numbered according to the first base of the first
exon of COL1a1 as position 1 as determined from the human genome reference
sequence (NCBI build 36.1). Significant pairwise associations according to r2 are
represented by the blue filled boxes above the diagonal. Filled boxes below the
diagonal represent pairwise comparisons in significantly more (blue boxes) or less
(red boxes) LD than expected given a locus-specific evolutionary model of
recombination. See Materials and Methods and Results (Chapter 4) for additional
information.
Position (bp) -8
9390
-807
61
-805
09-6
9267
-691
95-6
8977
-656
07-6
5359
-349
00-3
4780
-276
33-2
7499
-273
92
-271
15-2
6561
-265
05-2
1142
-209
23-2
0379
-201
92
-197
49-1
9746
-195
53-1
5229
-145
15-1
1285
132
1125
3
2746
529
882
2997
2
3073
531
268
3153
632
173
3551
852
871
5348
4
6844
181
946
8806
389
074
8932
490
232
9058
7
-89390-80761-80509-69267-69195-68977-65607-65359-34900-34780-27633-27499-27392-27115-26561-26505-21142-20923-20379-20192-19749-19746-19553-15229-14515-11285
132112532746529882299723073531268315363217335518528715348468441819468806389074893249023290587
![Page 194: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/194.jpg)
183
Supplementary Fig. 6. Neighbor-joining tree based on the number of substitutions
(excluding nonsynonymous sites) among 40 chimpanzee chromosome sequences
at the COL1a1 locus. The two high-frequency COL1a1 haplogroups are indicated
with “A” and “B.” Sequences bearing the exon duplication are boxed in red. The
tree is rooted using the “Ancestral” sequence inferred from human-chimpanzee-
orangutan-macaque contrasts. Reference “bar” reflects “5” substitutions. See
Results (Chapter 4) for more information.
![Page 195: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/195.jpg)
5
Ancestral
1b6b9a9b14b18b19b20b
2b3a19a
3b17b
10b12b13b16b
7a8b10a11a
12a
1a5a8a15a15b16a17a
5b20a
11b13a14a
2a4a4b6a18a
A
B
5
184
![Page 196: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/196.jpg)
185
Supplementary Fig. 7. Neighbor-joining tree based on the number of substitutions
(excluding nonsynonymous sites) among 40 chimpanzee chromosome sequences
for the COL1a1 locus and flanking regions spanning ~180 kb (see fig. 7, Chapter
4). The two high-frequency COL1a1 haplogroups are indicated with “A” and “B.”
Sequences bearing the COL1a1 exon duplication are boxed in red. The tree is
rooted using the “Ancestral” sequence inferred from human-chimpanzee-
orangutan-macaque contrasts. Reference “bar” reflects “10” substitutions. See
Results (Chapter 4) for more information.
![Page 197: Molecular Evolution of Type I Collagen (COL1a1) and Its … · 2011. 8. 12. · Skeletal diseases related to reduced bone strength, like osteoporosis, vary in frequency and severity](https://reader035.fdocuments.in/reader035/viewer/2022071218/605108cbd47a875cbc33bf5c/html5/thumbnails/197.jpg)
186
10
Ancestral
10
6b7b18b19b20b
1b9a14b
3b17b
10b12b16b
B
9b
19a2b
13b
3a
2a4b18a
6a
8a15a15b16a17a
5b
10a7a8b11a
A
4a12a
13a11b
20a
1a5a
14a