Population Genetics and Genome Organization of Norway...

46
ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2012 Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 969 Population Genetics and Genome Organization of Norway Spruce HANNA LARSSON ISSN 1651-6214 ISBN 978-91-554-8465-1 urn:nbn:se:uu:diva-180370

Transcript of Population Genetics and Genome Organization of Norway...

Page 1: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2012

Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 969

Population Genetics andGenome Organization ofNorway Spruce

HANNA LARSSON

ISSN 1651-6214ISBN 978-91-554-8465-1urn:nbn:se:uu:diva-180370

Page 2: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

Dissertation presented at Uppsala University to be publicly examined in Zootissalen,Evolutionsbiologiskt Centrum (EBC), Villavägen 9, Uppsala, Friday, October 19, 2012 at10:00 for the degree of Doctor of Philosophy. The examination will be conducted in English.

AbstractLarsson, H. 2012. Population Genetics and Genome Organization of Norway Spruce. ActaUniversitatis Upsaliensis. Digital Comprehensive Summaries of Uppsala Dissertations fromthe Faculty of Science and Technology 969. 44 pp. Uppsala. ISBN 978-91-554-8465-1.

Understanding the underlying genetic causes of adaptation to local conditions is one of themain goals of population genetics. A strong latitudinal cline in the phenotypic trait of bud set isobserved in present day populations of Norway spruce (Picea abies (L.) Karst). The first steptowards determining how this strong selection on adaptive traits translates at the loci underlyingthe trait was to use multilocus sequence data to gain information on the fundamental populationgenetic properties of Norway spruce. We determined that the level of LD was low and geneticdiversity was in the low range. Coalescent simulations revealed a demographic scenario of afairly old and severe bottleneck as consistent with the observed data. To examine the role ofselection at genes putatively involved in the control of bud set we, again, used a multilocus dataset to test for deviations from neutrality and demographic scenarios inferred from backgroundloci. Different candidate genes were identified by using different approaches, highlighting thedifficulty in predicting how local adaptation will manifest itself on different time scales and inrangewide samples. When examining properties important in the design of association studies,the inevitable next step in identifying genes involved in local adaptation, we found that previousestimates of a low level of LD were highly influenced by the joint analysis of several lociover a large distribution range and that estimates of LD was in fact heterogeneous across lociand increased within populations. In addition, we found that within species tests for deviationsfrom neutral expectations were highly sensitive to sample size. Additional genomic sequencecharacterization in Norway spruce is necessary to provide more comprehensive sets of markersfor association studies, also including gene promoters and non-genic regions of the genome. Inthe final paper we show that the HMPR method is effective in constructing libraries enrichedfor the single copy fraction of the genome when applied to the large and dominantly repetitivegenome of Norway spruce.

In summary, the studies presented in this thesis together constitute a foundation for futurestudies on adaptive evolution in Norway spruce.

Keywords: Picea abies, demographic history, linkage disequilibrium, HMPR

Hanna Larsson, Uppsala University, Department of Ecology and Genetics, Plant Ecology andEvolution, Norbyvägen 18 D, SE-752 36 Uppsala, Sweden.

© Hanna Larsson 2012

ISSN 1651-6214ISBN 978-91-554-8465-1urn:nbn:se:uu:diva-180370 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-180370)

Page 3: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

For my boys Stefan, Holger and Folke

Page 4: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization
Page 5: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Heuertz, M., De Paoli, E., Källman, T., Larsson, H., Jurman, I., Morgante, M., Lascoux, M. and Gyllenstrand, N. (2006) Multilocus Patterns of Nucleotide Diversity, Linkage Disequilibrium and Demographic History of Norway Spruce [Picea abies (L.) Karst]. Genetics, 174:2095-2105

II Källman, T., De Mita, S., Larsson, H., Gyllenstrand N., Heuertz, M., Parducci, L., Suyama, Y., Lagercrantz, U., and Lascoux M. (2012). Evolution of photoperiod related genes in the conifer Norway spruce [Picea abies L. (Karst)]. Manuscript.

III Larsson, H., Källman, T., Gyllenstrand, N. and Lascoux M. (2012) Distribution of long-range Linkage Disequilibrium and Tajima’s D in Scandinavian populations of Norway spruce (Picea abies). Manu-script.

IV Larsson, H., De Paoli, E., Morgante, M., Lascoux, M. and Gyl-lenstrand, N. (2012) The HypoMethylated Partial Restriction (HMPR) method reduces the repetitive content of genomic libraries in Norway spruce (Picea abies). Accepted for publication in Tree Genetics & Genomes.

Paper I is reprinted with kind permission from The Genetics Society of America.

Page 6: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization
Page 7: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

Contents

Introduction ..................................................................................................... 9  Evolutionary processes ............................................................................. 10  Detecting evolutionary processes ............................................................. 11  

Selection ............................................................................................... 11  Recombination ..................................................................................... 12  Population structure ............................................................................. 13  

Disentangling evolutionary processes ...................................................... 14  Identifying genes involved in adaptation .................................................. 15  The plant species: Norway spruce (Picea abies) ...................................... 16  

Phylogeny of P. abies .......................................................................... 17  Phylogeography of P. abies ................................................................. 17  Genome of P. abies .............................................................................. 18  

Adaptive trait: Bud set and bud burst ....................................................... 20  The search for adaptive variation in Norway spruce ................................ 22  Aims of the study ...................................................................................... 23  

Results and discussion ................................................................................... 24  Paper I ....................................................................................................... 24  Paper II ..................................................................................................... 26  Paper III .................................................................................................... 27  Paper IV .................................................................................................... 30  

Conclusion ..................................................................................................... 32  

Svensk sammanfattning ................................................................................. 34  

Acknowledgements ....................................................................................... 36  

References ..................................................................................................... 38  

Page 8: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization
Page 9: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

9

Introduction

“Population geneticists spend most of their time doing one of two things: de-scribing the genetic structure of populations or theorizing on the evolutionary forces acting on populations. On a good day, these two activities mesh and true insights emerge.”

-John H Gillespie

The bewildering phenotypic variation that exists both within and among species is the result of populations constantly adapting to new or changing environments. Understanding the underlying genetic causes of adaptation to local conditions is one of the main goals of population genetics. Presuming that phenotypic differences are at least partly due to genetic differences among individuals, the genetic makeup of a population of individuals pro-vides valuable information on the underlying causes of the observed pheno-typic variation. Population genetics attempts to explain the phenomena of adaptation (and speciation) by studying the frequency and interaction of alleles and genes in populations. The distribution of allele frequencies within a population changes over time under the influence of four main evolution-ary processes; mutation, natural selection, genetic drift and gene flow. Other factors that need to be taken into account are recombination and population structure. The ability to link phenotypic variation to genotypic variation relies on the possibility to correctly identify the relative contributions of all factors affecting the observed distribution of allele frequencies in a popula-tion as the signatures left in the genome by neutral processes during the his-tory of a species need to be separated from the signatures created by selec-tion. While genome wide estimates of fundamental population genetics pa-rameters represent the best way of determining the foundation on which to search for adaptive variation, the gene space is routinely used to provide estimates in the absence of genome information.

Page 10: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

10

Evolutionary processes Mutation, a permanent change in the DNA sequence, is the ultimate and underlying cause for genetic variation upon which all other processes can act to maintain, remove or enhance the variation. Mutations range in size from a large segment of a chromosome down to a single DNA base and a mutation event may constitute a deletion or insertion of DNA base(s) or a substitution of one base for another. The phenotypic effect of a mutation depends on the nature and location of the mutation. Roughly, most mutations occurring outside of the gene space have little or no effect whereas mutations altering the coding sequence of a gene can have a large effect on the gene product, and hence the phenotype. If the effect of a mutation is harmful, rendering the gene product useless, the mutation is quickly removed from the population through the process of natural selection. Alternatively, a beneficial muta-tion once arisen may increase in frequency in a population because of natu-ral selection favoring the phenotype. Most mutations entering a population are, however, neutral (Wright 1931, Fisher 1930) and frequencies are decid-ed on a random basis from one generation to the next through the process that is known as genetic drift. Drift is a stochastic event, where, just by chance, one individual produces more offspring than another, altering the allele frequency in the population over generations. The effect of stochastic events is stronger in small populations, even outcompeting natural selection as the most important decider of allele frequency distributions over time. Gene flow, the transfer of alleles from one population to another, may create marked changes in allele frequencies with new alleles entering a population through the migration of individuals from one population to another. In gen-eral, maintained gene flow among populations tends to lessen the genetic difference between populations, making them more similar.

The combination of different alleles within a chromosome is called a haplo-type. A mutation event creates a unique allele in the genome where it occurs and, accordingly, a new haplotype in the population. This new allele is linked to all other alleles within the same chromosome. The association be-tween alleles is broken through recombination, an inherent property of the genome where a segment of DNA is physically broken and joined to another DNA strand during cell division. Recombination thereby acts to create new allele combinations and increase the haplotype diversity in a population. In a new combination, the frequency of the allele is again under the influence of drift and gene flow and, if the new combination is more or less favorable than other haplotypes, natural selection.

Page 11: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

11

The study of populations is not, as one might believe, straightforward. A geographic population occupying a defined area may not be one, genetically connected population. Barriers to gene flow may exist that are not easily detected by simple observations and individuals may have preferences in their choice of reproduction, resulting in small, genetically divergent popula-tions of the same species living in close proximity to each other. Conversely, individuals may be exchanging genetic material over large distances and populations at separate geographic locations may actually be genetically indistinguishable from each other. Studying the population structure of a species provides valuable insights into the way diversification and speciation occur in nature.

Detecting evolutionary processes In the analysis of DNA sequence data, population genetics is mostly con-cerned with single base mutations known as single nucleotide polymor-phisms (SNPs). Each polymorphism is regarded as one allele and the fre-quency of the allele is simply the proportion of individuals in a sample car-rying the allele. A sample of individuals is assumed to be a random repre-sentation of the allele frequencies in the population and the frequency obtained from the sample is consequently an estimate of the population fre-quency. The diversity of alleles, or nucleotide diversity, among individuals represents the genetic variation within a population.

Selection In the search for the signature that selection leaves on allele frequencies in a population the observed genetic variation is compared to a null hypothesis of neutral evolution formulated by Kimura in 1968 and formalized mathemati-cally by Kingman in 1982. This Standard Neutral Model (SNM) assumes a random-mating population of constant size where the observed variation mainly reflects neutral mutations under the influence of random genetic drift. The mathematics traces the origin of alleles backwards in time and describes the process whereby alleles meet, or coalesce, in previous ances-tors of the present population. The rate of coalescence is inversely propor-tional to the effective population size and the most recent common ancestor to an allele will be found after only a short evolutionary time in a small pop-ulation.

Under the SNM, allele frequencies in the population are at equilibrium be-tween mutation rate and genetic drift and both the number of polymorphic, or segregating, sites (S), standardized by a function of the sample size that reflects the structure of the gene genealogy (Watterson 1975), and the aver-

Page 12: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

12

age number of pairwise nucleotide differences between samples (π) will be unbiased estimators of the population mutation rate (θ), which is a product of the underlying mutation rate (µ) and the effective population size (Ne), θ=4Neµ. These properties were used by Fumio Tajima to construct a test of neutrality through a comparison of S and π where values of Tajima’s D sig-nificantly different from zero leads to a rejection of the assumption of neu-trality and the value of the test statistic gives important information on how the genetic diversity is shaped in the sample (Tajima 1989). Several other tests using the allele frequency spectrum in various ways to test for devia-tions from the SNM have been conceived since Tajima’s D (e.g. Fu and Li 1993), some of which utilize information from closely related species to infer the ancestral state of the allele (e.g. Fay and Wu 2000), and the results from several tests are often used to determine the most likely scenario caus-ing the observed frequencies. Other neutrality tests are based on comparison between polymorphisms and divergence such as Hudson et al. (1987) or McDonald and Kreitman (1991).

Recombination Several methods exist aimed at detecting the presence or absence of recom-bination. A widely used test is based on observing the existing combination of alleles at pairs of variable sites and deducing whether recombination has occurred between the sites or not (Weir 1979, Hudson and Kaplan 1985). By comparing all pairs of sites within a sample a minimum number of recombi-nation events (Rm) in the history of the sample can be determined. Another way of determining this number is by comparing the number of haplotypes to the number of polymorphic sites (Myers and Griffiths 2003). Both these methods are useful for determining the presence of recombination but esti-mates are conservative and generally lower than the actual number of re-combination events as simulations of the coalescent has shown that certain conditions need to be met in order for a recombination event to be detected through these methods (e.g. Wiuf et al. 2001).

With the absence of recombination breaking up combination of alleles, al-leles at different sites may be in linkage disequilibrium (LD). The term LD refers to the observed non-random association between alleles segregating in a population when compared to the expected random association of alleles that are neutrally inherited from one generation to the next. There are several measures of LD based on observed allele frequencies. One of them is the D statistic, which is calculated as the difference between observed and ex-pected frequencies of the haplotype formed by pairs of alleles (Lewontin and Kojima 1960). The standardized value of D (D´) is commonly used, where a value of 1 indicates a complete absence of recombination (Lewontin 1964). A drawback with estimates of recombination using D´ is that a failure to

Page 13: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

13

capture all four possible allele combinations in a sample automatically re-sults in a value of 1, resulting in inflated values of LD in small sample sizes. Another commonly used measure is the correlation of the squared allele frequencies (r2) between pairs of segregating sites, which is less sensitive to sample size (Hill and Robertson 1968).

LD tests can be used to measure the signal of recent positive selection, or selective sweeps, where a newly arisen allele is favored by natural selection and rises quickly in frequency in the population. With this increase in fre-quency follows the alleles associated with the beneficial allele since recom-bination does not have sufficient time to break the association between the new mutation and its surrounding alleles. Examples of LD based tests for selection are the K-test (Depaulis and Veuille 1998), Kelly’s ZnS (Kelly JK 1997) and the EHH-test (Sabeti et al. 2002).

The amount of LD between two alleles depends on the recombination rate between them and it follows that the underlying recombination rate can be inferred by looking at patterns of LD in populations. Under the SNM, an equilibrium will be reached regarding the diversity generated by recombina-tion and the diversity removed through genetic drift. The amount of recom-bination in a population is determined by the population recombination pa-rameter ρ=4Ner, where Ne is effective population size and r is the recombi-nation rate per generation. The underlying process of the coalescent can be modeled to incorporate recombination (Griffiths and Marjoram 1996, Hud-son and Kaplan 1988) and several likelihood-based methods have been de-veloped to estimate recombination rates from population data (see Table 1 in Stumpf and McVean (2003)). The full likelihood approaches are computa-tionally intensive and not feasible for large data sets. Approximate likeli-hood methods avoid the full computational burden by considering only a subset of sites at a time. The approach implemented in the LDhat software (McVean et al. 2002) estimates the ρ with the highest likelihood of produc-ing the observed frequencies for pairs of sites and then combines the sepa-rate likelihoods to a composite likelihood estimate of ρ for the entire sample.

Population structure One of the most common statistics to measure the degree of genetic structure between populations is the FST statistic, which specifically measures the variance in allele frequencies between subpopulations. With complex calcu-lations becoming less and less computationally demanding the use of meth-ods such as the Bayesian clustering method implemented in the software STRUCTURE (Pritchard et al. 2000) has increased. The method assumes a model in which K populations exist, each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned

Page 14: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

14

(probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed.

Disentangling evolutionary processes The SNM describes a population evolving neutrally where no changes in population size occur and individuals within the population are mating ran-domly. Even though the neutral model is rather robust to deviations from some of the assumptions, the assumptions of constant population size and random mating are violated by most, if not all, natural populations, which may lead to departures from the model for reasons other than selection. Ex-amining the population structure of the sample is therefore not only interest-ing in itself but of great importance to determine to what extent the assump-tion of random mating is violated since allele frequencies are independently evolving within every genetic population and pooling allele frequencies from several populations may skew the allele frequency spectrum and lead to false conclusions about the evolutionary processes at work. Over time, the distribution range of a population will change due to local or global changes in climate, individuals will colonize new areas, and found populations that accumulate independent genetic mutations, and migration between these divergent populations will occur at various degrees depending on local con-ditions. All these events make up the demographic history of a population and lead to violation of the SNM assumption of constant population size. A complex demographic history can leave signatures in the genome that are identical to those left by selection when identifying departures from the SNM. The difference between demographic events and selection events is that demography leaves a genome wide signature whereas selection acts on the areas of the genome that control the selected trait. Consequently, the analysis of a number of loci can be used to distinguish demography from selection.

Demographic changes and selection often go hand in hand; the colonization of new environments of a few individuals is followed by an expansion of the population with a simultaneous adaptation to the new selection pressures. The standard neutral coalescent has been extended to incorporate and model the outcomes of recombination, population structure and growth in popula-tions (Hudson 2002) and with the rapid progress in computational methods the ability to test multilocus polymorphism data against various demograph-ic scenarios has been made possible (Beaumont et al. 2002, Haddrill et al. 2005, Marjoram and Tavaré 2006, Thornton and Andolfatto 2006). The combined use of paleoecology climate records can help to evaluate the plau-sibility of the modeled demographic history.

Page 15: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

15

Violations of the assumptions of SNM also affect the estimate of population recombination rate, since genetic structure and demographic history may lead to artificial patterns of LD. In tests for selection using LD, a high de-gree of LD may be the result of selection for a favorable allele but may also reflect a reduced recombination rate of a specific gene evolving neutrally as recombination rates vary along chromosomes. Simulations of complex de-mographic scenarios combined with the presence of positive selection in a coalescent framework are being developed to determine areas under selec-tion in genome wide scans (e.g. Huff et al. 2010).

Identifying genes involved in adaptation A common approach to identify genes involved in local adaptation is to dis-sect phenotypic traits involved in local adaptation into their genetic compo-nents. Quantitative trait loci (QTL) mapping is a method of identifying ge-netic regions responsible for variation in complex traits, that is, traits under the control of several genes interacting with the environment. The method requires the phenotyping of the trait in a large number of individuals and, ideally, the genotyping of genetic markers evenly distributed across the en-tire genome. A controlled cross between individuals allows for calculation of the statistical association of phenotypic values and genotypes in the progeny. In model organisms, where genome data is readily available, QTL mapping has been a successful method. Association mapping, like QTL mapping, is also based on the identification of genetic markers in linkage disequilibrium but uses the historical LD in populations rather than LD generated through controlled crosses to find statistical associations between genetic markers and phenotype. In general, utilizing the complete recombinational history of a population allows for a finer mapping of variants controlling a trait since only tightly linked loci will show a statistical association after many genera-tions of recombination and random mating. The success and precision of association mapping relies heavily on knowledge about the fundamental population genetic properties and demographic history of the species. The ideal approach of QTL and association studies, to investigate all possible candidate genes for association, is not possible in species where genome data is unavailable and the knowledge of genome wide patterns of variability and LD along with the demographic history is limited. Instead, a selection of candidate genes is made based on functional knowledge or sequence similar-ity to genes of interest in a model species. Association mapping of candidate genes has been successful in several organisms. Problems may arise, howev-er, when a large evolutionary distance separates the species of interest from the model species. Another way of identifying candidate genes is to use population genetic data sets to identify genes showing signatures of selection compared to empirical or simulated genome wide expectations (Akey et al.

Page 16: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

16

2002, Thornton and Andolfatto 2006). A signature of selection is not, how-ever, a guarantee that the gene in question is involved in local adaptation and this approach needs to be coupled with a functional evaluation to be conclu-sive.

The plant species: Norway spruce (Picea abies) Norway spruce (Picea abies (L.) Karst) is a large evergreen coniferous tree dominating many of the forests in northern Europe, making it an ecological-ly important species. Its use as a source for high-quality pulp and, to some extent, timber also renders it a species of great economical importance. The natural distribution of Norway spruce ranges from northern Scandinavia to southern areas of the Balkan Peninsula and eastward into Russia from the east of France. The eastern limit in Russia is hard to define due to extensive hybridization with Siberian spruce (Picea obovata) but is usually given as the Ural Mountains (Figure 1).

Figure 1. Distribution map of Norway spruce (Picea abies) EUFORGEN 2009, www.euforgen.org

Norway spruce is monoecious, producing male and female flowers in sepa-rate cones on the same tree, and individuals in natural stands will generally set seeds for the first time at an age of 15-20 years or older (Wareing 1959) and continue to produce seeds during their average life span of several hun-dreds of years, although seed quality may decline with age. Male cones re-lease their pollen to the wind and the pollen is capable of reaching the fe-male cones of trees at a distance several hundred kilometers away (Tjoelker et al. 2007, and references therein). Seeds are also mainly wind dispersed

!!!!

!

!

!!!

!

!

!!

!!! !

!

!!!

!!!

80°E70°E60°E50°E40°E30°E20°E10°E0°10°W20°W

50°N

40°N

30°N

Picea abies

This distribution map, showing the natural distribution area of Picea abies was compiled by members of the EUFORGEN Networks based on an earlier map published by H. Schmidt-Vogt in 1977 (Die Fichte, Verlag Paul Parey, Hamburg and Berlin, p.647).

Citation: Distribution map of Norway spruce (Picea abies ) EUFORGEN 2009, www.euforgen.org.

First published online in 2003 - Updated on 24 July 20080 750 1,500375

Km

EUFORGEN Secretariat c/o Bioversity International Via dei Tre Denari, 472/a00057 Maccarese (Fiumicino)Rome, ItalyTel. (+39)066118251Fax: (+39)[email protected] information and other maps at:www.euforgen.org

Page 17: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

17

and may travel for many hundred meters, although animals and waterways can carry seeds over greater distances.

Phylogeny of P. abies Picea abies is one of around 34 species belonging to the genus Picea (Farjón, 1990). Together with their phylogenetically closest relatives, spe-cies of the Pinus genus, and some eight other genera they constitute the fam-ily Pinaceae that in turn is a member of the Pinales order (formerly Conifer-ales) of gymnosperms. The exact phylogenetic relationship between spruce species has not been unequivocally determined. The earliest attempts at re-constructing the species tree was based on morphological characters and resulted in a classification of species into groups of more closely related species but proved unsuccessful at resolving the phylogenetic relationships between the groups (Wright 1955). Recent phylogenies based on non-recombining molecular markers do not reflect the classification obtained by morphological markers and, in addition, are not congruent with each other (Ran et al. 2006, Bouillé et al. 2010).

Phylogeography of P. abies The current distribution of Norway spruce is largely a product of the spe-cies’ re-colonization of the European continent after the last Ice Age. At the time of the latest glacial maximum, around 18000 years ago, more or less all of Scandinavia and large parts of continental Europe was covered by ice (Clark and Mix 2002). The ice sheet forced spruce, along with many other present day species, to reside in refugia of favorable climatic conditions at more southern latitudes. As the ice receded, pollen data indicates that P. abies expanded strongly across Northwestern Europe (Giesecke and Bennett 2004, Knaap 2006), with populations from different refugia following dif-ferent re-colonization routes (Tollefsrud et al. 2009). The present day popu-lations of central Europe originate from at least two main refugia with sepa-rate re-colonization routes, resulting in a complex genetic structure among populations despite extensive gene flow (Chen et al. 2012, paper I). Scandi-navia was recolonized from populations located in central Russia, most like-ly with some contributions from cryptic refugia in the Northern part of Eu-ropean Russia (Väliranta et al. 2011, Chen et al. 2012). Eastern Finland was recolonized around 6,500 years ago and Norway spruce reached eastern central Sweden less than 4000 years later; about 2,700 years ago (Giesecke and Bennett 2004, Seppä et al. 2010). There is some evidence that small glacial refugia along the Norwegian coast has held surviving populations of P. abies (Parducci et al. 2012) but their contribution to current populations appears to have been limited (Chen et al. 2012). The contraction and subse-quent expansion of population size in response to the climate change during

Page 18: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

18

the last glacial maximum is one of many fluctuations in population size in response to glacial cycles occurring during the late Quaternary (Davis and Shaw 2001). The evidence of a severe restriction in population size (bottle neck) can be found in the genome of not only Norway spruce (paper I) but also of the conifer Pinus sylvestris and the angiosperm tree Populus tremu-lus, which share the present day distribution of Norway spruce to a large extent (Pyhäjärvi et al. 2007, Ingvarsson 2008).

Genome of P. abies Norway spruce is a diploid species with the genome arranged in 24 chromo-somes (Sax and Sax 1933). Generally, extant species of gymnosperms have large genome sizes and Norway spruce, with a genome size of ~18 Gbp, is no exception (Siljak-Yakovlev et al. 2002). For comparison, the genome size of Norway spruce is more than five times larger than the genome of humans and over a 100 times larger than the genome of the model plant Arabidopsis thaliana. The expansion of genome size in gymnosperms is thought to be the result of the propagation of repetitive sequences (see below) rather than the frequent occurrences of polyploidy responsible for most large genomes found in angiosperm species (Ahuja and Neale 2005). Indeed, the total amount of single copy sequences belonging to the gene space of Norway spruce has been estimated to comprise as little as 1% of the total DNA con-tent while the proportion of repetitive DNA sequences make up roughly 75% of the genome (Morgante and De Paoli 2011). The genome size and the huge amount of repetitive DNA have until recently deterred attempts to fully se-quence and assemble the genome of any conifer. However, with the discov-ery that most repetitive elements were active in a common ancestor to the Pinaceae family at least 87 million years ago (De Paoli 2006, Morse et al. 2009) the age of these repetitive sequences suggests that they will have ac-cumulated sufficient amounts of variation through time to make the assem-bly of a genomic sequence feasible and an attempt at sequencing the genome of Norway spruce is now underway (http://www.congenie.org), along with other conifer species. With the absence of complete genome data, we rely on the sequencing efforts that have led to the retrieval of sequence data from the expressed parts of the genome and some full-length genomic sequences of genes from Norway spruce (e.g. Chen 2012). Compared to other species of conifer the data is scarce, with only 8,715 Putative Unique Transcripts avail-able, and poorly annotated (http://www.plantgdb.com). The availability of expressed sequences from a closely related species (P. glauca) has been utilized to study changes in gene expression on a genome wide scale (Käll-man 2009) and candidate genes have been investigated for differences in gene expression in response to various conditions (Gyllenstrand et al. 2007, Chen et al. 2012, Karlgren et al. (unpublished)).

Page 19: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

19

Repetitive elements Based on their genomic organisation and position on the chromosomes, re-petitive DNA sequences in plants can be divided into two major categories of (1) tandemly repeated and (2) dispersed or interspersed transposable ele-ments (TE). The tandemly repeated sequences include the ribosomal RNA genes, which are found at a few sites in the genome, mainly clustered on the nucleolus-organising regions of the chromosomes. The dispersed transposa-ble elements represent the major fraction of repetitive DNA sequences and are divided into class I and class II transposable elements based on their mode of replication. Class I TEs, or retrotransposons, are replicated in a copy and paste fashion through an RNA intermediate and require reverse transcriptase (RT) to convert RNA back to DNA in order to reinsert them-selves into the genome. This replication of sequences has the potential to substantially increase the size of the host genome. Retrotransposons are fur-ther divided into autonomous retrotransposons, containing genes for synthe-sis of the RT required for their replication, and non-autonomous retrotrans-posons, lacking genes coding for RT and thereby being reliant on externally produced RT for replication. Autonomous retrotransposons include the pre-dominant order of retroelements in plants that contain a section of long ter-minal repeats (LTRs) as well as non-LTR retrotransposons related to long interspersed nuclear elements (LINEs). LTR retrotransposons are further sub-classified into the Ty1-copia-like and Ty3-gypsy-like groups based on both their degree of sequence similarity and the order of encoded gene prod-ucts. In large plant genomes, retrotransposons belonging to the copia and gypsy groups are commonly found in copy numbers up to a few million (Kumar and Bennetzen 1999). The largest group of non-autonomous re-trotransposons is short interspersed nuclear elements (SINEs), which, like LINE’s, also lack LTR. Class II TEs, or DNA transposons, replicate in a cut and paste fashion through a DNA intermediary. The transposition requires transposases to recognise specific target sites in the flanking regions of the element, which consists of terminal inverted repeats (TIR), cut the transpos-on from its current position on a chromosome and reinsert it at a donor site, resulting in target-site duplications flanking the element (Craig 1995). Du-plication of entire DNA transposons arises when transpositions take place during cell division. Similar to class I TEs, autonomous DNA transposons produce their own transposases whereas non-autonomous DNA transposons utilize transposases produced externally by transposons with a shared recog-nition site. The analysis of sequence similarity of transposase genes has re-vealed several superfamilies of autonomous DNA transposons, including hAT, CACTA and MULE. The exceptions among DNA transposons are the Helitrons, which have been proposed to replicate through a rolling-circle mechanism (Kapitonov and Jurka 2001).

Page 20: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

20

Repetitive elements in Norway spruce A substantial amount of the repetitive sequences in Norway spruce are very ancient repetitive elements responsible for the massive expansion of the genome to its current size through a number of waves of propagation (De Paoli 2006). An effort to characterize the repetitive content of the genome in Norway spruce revealed the largest recognizable repetitive component to be Class I transposable elements, in particular LTR retrotransposons (De Paoli 2006). LINEs were more common than in angiosperms but still constituted only a small part of all retrotransposons. Class II elements amounted to a minor fraction of the genome, even less than the gene content. The abun-dance of the gypsy superfamily was found to be two-fold higher than the copia superfamily, but the opposite result was obtained in a separate study (Sarri et al. 2011). As copia elements were generally found to be older than gypsy the discrepancy can be explained by the use of different techniques and analysis methods with different tolerances for sequence degeneration. However, almost 50% of the repetitive fraction remained unidentified de-spite extensive efforts of characterization, reflecting the poor genomic re-sources available for gymnosperms.

Adaptive trait: Bud set and bud burst With a distribution range in the temperate region of the world, individuals of Norway spruce, as well as other perennial plants, need to adjust the timing of crucial developmental processes with favorable seasons of the year to max-imize their chance of survival and reproductive success. This results in a cyclic shift over the year between a dormant phase during the winter fol-lowed by an active growth and reproductive phase during the summer. In preparation for the dormant period, the stems cease to grow, terminal buds form (bud set) and the tree enters endodormancy in the fall. After an extend-ed period of temperatures around 0°C the tree transitions from the endo-dormant state to a state of ectodormancy or quiescence, where it is capable of withstanding very low temperatures. With increasing temperatures in the spring, the tree is released from the ectodormant state and the buds initiated during the preceding autumn starts to elongate (bud flush) and growth is resumed (Figure 2). The timing of bud set in the autumn, and bud flush in the spring, represents a critical ecological and evolutionary trade-off be-tween survival and fitness. A late response in bud set may lead to frost dam-age if there has not been enough time to develop an adequate level of cold hardiness before the first frost of autumn. Likewise, if bud flush occurs too early in the spring, the growing tissue may be killed by a late frost. On the other hand, trees that set bud too early in the fall or initiate growth too late in

Page 21: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

21

the spring will have a shortened growing season, which reduces their com-petitive ability, growth potential and, ultimately, reproductive success.

Figure 2. The annual growth cycle of Norway spruce. From Gyllenstrand et al. 2007.

The timing of bud set and bud flush is achieved through a complex interac-tion between genes and environment. The increase in night length in the fall triggers a genetic pathway to initiate bud set while increasing temperatures in the spring controls bud flush (Dormling 1973, Hannerz 1999). Physiolog-ical studies have revealed that the critical night length that initiates the bud set response is correlated with the latitudinal origin of the trees (e.g. Dormling 1979, Hannerz et al. 1999). Populations growing at northern lati-tudes initiate growth cessation and bud set at night lengths of two to three hours, whereas the most southern populations require close to 10 hours of night to initiate the same response. This gradient is under strong genetic control and the trait has high estimates of heritability (Eriksson et al. 1978, Liesch 2005). The actual genetic background of the response to changing day length is largely unknown in P. abies. In the angiosperm model plant Arabidopsis thaliana however, the molecular mechanism behind sensing seasonal changes through day length, or photoperiod, has been well charac-terized and consists of a number of interconnected feedback loops with a circadian clock at its core by which the plant keeps track of diurnal changes and initiates appropriate physiological responses. Evidence has been found that the pathway is conserved between angiosperm trees and herbs (Böhleni-us et al. 2006), and genes with sequence similarity to photoperiod genes in

Page 22: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

22

Arabidopsis are also found in Norway spruce (Gyllenstrand et al. 2007, Källman 2009, Karlgren et al. 2011, Chen et al. 2012, paper I, paper III).

The search for adaptive variation in Norway spruce Picea abies, with its large and poorly characterized genome, would not ap-pear an ideal species to utilize in the search for the genetic basis of local adaptation and the situation is not improved by the long generation time, making changes in phenotypes difficult to observe within the time limit of standard research projects. There are, however, a number of mitigating cir-cumstances that set the species apart from the traditional organisms used as models for studying evolution. First, in working with Norway spruce we have access to natural populations with abundant genetic and phenotypic variation. Second, extant populations are the result of natural evolutionary forces and questions pertaining to adaptation and demography will be unaf-fected by the human impact of domestication. Third, a huge distribution range and the large effective population size implied by their outcrossing mode of reproduction suggest that natural selection, if present, will be highly efficient and its signature in the genome relatively easily identified. Fourth, being a forest tree of great economical importance, Norway spruce has been the subject of provenance tests, progeny testing and growth chamber exper-iments for decades, resulting in a very well characterized phenotypic and quantitative genetic variation. Last, but not least, we have the benefit of grand scale genetic surveys made with previously popular markers which, together with the efforts of paleoecologists, have endowed us with a rather detailed description of the history of the species over the last 20,000 years or so (Lagercrantz and Ryman 1990, Tollefsrud et al. 2008, 2009).

The strong latitudinal clines in the phenotypic trait of bud set we observe in present day populations of Norway spruce have been established during a short evolutionary period. This adaptation to local climate has occurred de-spite the extensive gene flow among populations that is evident by the lack of strong differentiation among populations observed at neutral markers (Lagercrantz and Ryman 1990). In order to investigate how this strong selec-tion on adaptive traits translate at the loci underlying the trait we need in-formation on the fundamental population genetic properties of Norway spruce, such as levels of diversity and patterns of linkage disequilibrium, and knowledge on how the demographic history of the species has shaped the genetic diversity. Being a non-model organism, genome data is not yet ac-cessible and the estimates obtained reflect the properties of genes only. While complete genome sequencing represents the ideal method to compare levels of diversity between organisms the use of genome wide genetic mark-ers can in most instances help answer the relevant questions of population

Page 23: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

23

genetics, association studies and evolution equally well. With the develop-ment of new sequencing techniques the discovery, sequencing and genotyp-ing of thousands of markers across an entire genome is now plausible. For Norway spruce, however, the size of the genome and the huge amount of repetitive sequences makes the endeavor to develop and re-sequence genetic markers across the entire genome of a large number of individuals economi-cally prohibitive.

Aims of the study This thesis attempts to fill in some of the gaps in the knowledge of the fun-damental population genetics parameters in Norway spruce and use it to identify the signature of selection in genes putatively involved in the pheno-typic adaptive trait of bud set. We also investigate the effect of sample size on estimates of neutrality in Norway spruce. In preparation for future large-scale association studies, we investigate levels of linkage disequilibrium in a number of genes and evaluate the possibility of reducing the repetitive con-tent of the genome of Norway spruce in the construction of whole genome libraries. The specific aims of each paper in the thesis are as follows:

I Characterize patterns of nucleotide diversity and linkage disequilibrium

in Norway spruce and infer the demographic history of the species. II Evaluate the role of selection in the evolution of putative photoperiod

related genes in Norway spruce using specific demographic scenarios.

III Characterize levels of linkage disequilibrium within Scandinavian popu-lations of Norway spruce and examine the effect of sample size on esti-mates of summary statistics of the site frequency spectrum.

IV Evaluate the possibility of reducing the complexity of the genome of Norway spruce for further genome wide studies of diversity and associa-tion with complex traits.

Page 24: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

24

Results and discussion

Paper I In paper I we used a multilocus data set to investigate the level of LD and patterns of diversity in populations sampled from the natural distribution range of Norway spruce. The summary statistics obtained were used to ex-amine the role of demography in shaping the observed variability. Sequence data from a total of 22 genes was retrieved from close to 50 gametes repre-senting seven natural populations. Half of the genes were originally identi-fied as genes similar to photoperiod related genes in the model organism Arabidopsis thaliana, and the remaining half was randomly chosen from a Norway spruce EST library. As in other conifers, genetic diversity estimated from both coding and non-coding sequences was in the low range (Brown et al. 2004, Pyhäjärvi et al. 2007), with non-synonymous diversity being a third of the diversity at silent sites. With the pooling of results of pairwise correlation estimates from all genes, LD was found to decay within a few hundred base pairs (Figure I.1). This is again consistent with results obtained in other conifers.

Figure I.1. Plot of the squared correlation of allele frequencies (r2) vs. distance in base pairs between polymorphic sites across 22 loci for different subsets of popula-tions. Copyright © 2006 by the Genetics Society of America.

Page 25: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

25

The estimate of overall population differentiation obtained with the sequence data in this study was higher than previously reported for neutral markers (Lagercrantz and Ryman 1990, Acheré et al. 2005). In general, diversity among the populations was differentiated into a geographic pattern corre-sponding with the proximity of present day populations to their inferred glacial refugia. This result was obtained both with model-based analysis (Structure) and traditional FST approaches. FST estimates also revealed that populations within the Scandinavian distribution were somewhat differenti-ated, whereas populations from the Alpine domain were not.

All loci seemed to evolve neutrally as no significant deviations from neutral-ity was found. With around 50% of all polymorphisms being singletons the mean value for Tajima’s D was negative (-0.92), suggesting that the diversi-ty in Norway spruce may have been shaped by demographic events. Com-bining the result of Tajima’s D with the estimated negative mean value of Fay and Wu’s H (-0.74), coalescent simulations revealed a demographic scenario of a fairly old and severe bottleneck as consistent with the data. Interestingly, the inferred bottleneck predates the latest glaciation events of Europe (18000ya) and even though we did not attempt to time it explicitly, due to the large uncertainty in estimates of mutation rate, the timing coin-cides with bottleneck events later inferred from two other European tree species, Pinus sylvestris and Populus tremula (Pyhäjärvi et al. 2007, Ingvarsson 2008).

The level of population structure detected in this study and the overall depar-ture from the SNM of spruce populations imply that these factors will have to be taken into account when carrying out association-mapping studies and when interrogating candidate genes for signatures of natural selection re-spectively. The rapid decay in LD in spruce will allow high-resolution map-ping in association studies, given that the right candidate genes are chosen, but will also require a high-density marker screening due to the limited pre-diction power of single SNPs over neighbor sequence diversity. However, pooling loci to estimate LD assumes that all loci have an equal recombina-tion rate but may in fact mask differences between loci. Also, the estimates in this study was obtained from sequences with a mean length of ~500 base pairs, with only one gene exceeding 3000 base pairs.

Page 26: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

26

Paper II Nineteen sequence homologs to photoperiod genes in Arabidopsis were identified as candidate genes for bud set response in Norway spruce. The genes were divided into three groups representing their putative position in the photoperiod pathway; photoreceptors, circadian clock genes and down-stream targets, and investigated for deviations from neutral evolution using two different approaches and a sample of ten populations distributed across the natural range of Picea abies.

The HKA test (Wright and Charlesworth 2004) compares within species diversity with divergence under a simple split model. Only genes from the photoreceptor group deviated significantly from the fit of a neutral model inferred from 14 background loci a priori not assumed to be involved in local adaptation. A previous study in Arabidopsis also found earlier acting genes of the photoperiod pathway to be under selection (Olsen et al. 2002) but a similar study in Populus tremula instead highlighted genes from the circadian clock (Hall et al. 2011).

Since we established in paper I that summary statistics suggest that the over-all distribution of diversity in populations deviate from the SNM, the candi-date genes were tested for deviations from three demographic models with increasing complexity using an Approximate Bayesian Computation (ABC) approach (DeMita et al. 2007, DeMita and Siol 2012). The main goal of the study was not to propose a new demographic model for Norway spruce, but rather to test patterns of variation at candidate genes against the standard neutral model as well as against a set of plausible and more realistic demo-graphic scenarios. Hence, the demographic models were the standard neutral model (SNM), a population expansion model (PEM), and a demographic scenario that aimed at capturing some of the main features of the demo-graphic history of the species (SPM) (Figure II.1).

Figure II.1. The three demographic scenarios modeled for evaluating signatures of selection in photoperiod genes using the ABC approach. From left to right; Standard Neutral Model (SNM), Population Expansion Model (PEM), Split Population Model (SPM).

PEM BNMSNM

!Present

Past

! !

T

df

"

!"

# $

!"

%

&'

!' !(

! !&(

Page 27: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

27

Summary statistics used in the demographic models were estimated from the 14 background loci and candidate genes were tested against the inferred posterior distribution of demographic models. Only one gene, the putative circadian clock gene PaPRR3, was identified as a robust outlier, departing significantly from all three demographic models.

Forest tree species in temperate regions generally show a strong latitudinal cline for growth cessation and bud set in response to photoperiod (e.g. Hol-liday et al. 2008; Ma et al. 2010; Keller et al. 2012; Kujala and Savolainen 2012) and we would therefore expect selection to have influenced nucleotide variation at some of the genes from the photoperiod pathway in Norway spruce. In the present study, we did indeed detect signatures of selection at some of the genes identified through sequence homology to Arabidopsis photoperiod genes. However, in spruce, as well as in other tree species, the identity of the genes at which selection is detected seem to strongly depend on the method and the sampling used to detect selection.

It is difficult to predict which groups of genes in the pathway would show the strongest signs of selection using the approach of this study, as adaptive evolution occurs on a local scale. We cannot from the present data pinpoint the nature of the selection that acted on the circadian clock associated gene PaPRR3, and in particular conclude whether PaPRR3 is involved in local adaptation. However, although PaPRR3 was not among the top candidate genes in studies of clinal variation in Norway spruce, signs of selection were detected (Chen et al. 2012). Functional studies have revealed a cyclic ex-pression of PaPRR3 supporting its role in the circadian clock (Gyllenstrand et al. unpublished) and validating the approach of using signatures of selec-tion as a way to identify candidate genes potentially involved in local adap-tation. Large-scale association studies are now the next step needed to reveal the role of PaPRR3, and other photoperiod pathway related genes, in the strong latitudinal cline in bud set commonly observed in Norway spruce.

Paper III In order to successfully design an association mapping experiment, a de-tailed knowledge about several basic population genetic parameters of the species of interest is required. In paper III, we take a closer look at Scandi-navian populations of Norway and focus on levels of linkage disequilibrium (LD) within genes, estimates of the site frequency spectrum (SFS) of muta-tions and the effect of sample size on SFS summary statistics. Ten popula-tions were sampled along a latitudinal cline along both sides of the Baltic Sea and 11 loci with a mean length of 3.7 kb were chosen for a paired end sequencing approach to obtain estimates of long range LD. Eight of the loci

Page 28: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

28

were hypothesized to be involved in the regulation of bud set due to se-quence similarity to known genes involved in the regulation of flowering time in the model species Arabidopsis thaliana. The remaining loci corre-spond to genes exhibiting a significant differential expression under altered day length conditions and are possibly involved in stress induced cellular responses. Eight individuals were sampled for seven of the populations, and three populations were more densely sampled using 24 individuals.

We were unable to detect a clear genetic structure among the populations sampled, confirming the general lack of structure observed in populations in central Scandinavia, and population structure was therefore not considered to severely influence estimates of LD and SFS.

The level of LD was generally low, with a mean r2 of 0.11 over all pairwise comparisons, and pooling all squared correlation frequencies resulted in a rapid decay of LD, similar to the results of Paper I (Figure III.1a). In within gene estimates of LD there were distinct differences between the genes ob-served, where the putative stress response gene PaZIP stands out with a mean r2 of 0.2 and polymorphisms in complete LD extending the length of the gene (4 kb) (Figure III.1b). Recombination was revealed as the major force behind the variation in all loci except PaZIP and the photoperiod relat-ed gene PaPRR1, where estimates of population recombination rate (ρ) were low. No significant correlation between ρ and mean r2 was found across the genes, which may be contributed to the flat likelihood curves when estimat-ing ρ for four of the genes.

Figure III.1. Left hand image: a) Plot of the squared correlation of allele frequencies (r2) vs. distance in base pairs between polymorphic sites across all eleven loci in this study. Right hand image: b) Plot of the squared correlation of allele frequencies (r2) vs. distance in base pairs between polymorphic sites in the gene PaZIP.

0 1000 2000 3000 4000 5000 6000

0.0

0.2

0.4

0.6

0.8

1.0

All loci

Distance in bp

r2

0 1000 2000 3000 4000

0.0

0.2

0.4

0.6

0.8

1.0

PaZIP

Distance in bp

r2

Page 29: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

29

The level of LD was drastically increased in the three more densely sampled populations (Figure III.2). In part, the inflated values of LD can be explained by the reduced sample size in within population estimates since an absence of rare alleles will tend to inflate LD estimates slightly (Hedrick and Kumar 2001). However, the level of LD was, apart from increased, also quite varia-ble between the three populations and the increase and difference in levels of LD most likely reflects different demographic histories among the popula-tions, despite the inability of this study to detect population structure be-tween the Scandinavian populations. A demographic explanation is also supported by the fact that differences in the decay of D´ was much less strik-ing than for the decay of r2 with distance, as D´ reflects recombination be-tween the two loci whereas r2 more reflects the gene genealogies (McVean 2001).

Figure III.2. From left to right; Plots of the squared correlation of allele frequencies across all eleven loci in this study in populations SE-61, SE-64 and FI-67.

Estimates of nucleotide diversity were not markedly affected by sample size but the estimate of deviations from neutrality using the SFS based test Taji-ma’s D showed high effect of sample size and revealed that care should be taken to draw conclusions from this test in small sample sizes, consistent with the results in Populus nigra (Marroni et al. 2011). The SFS was inves-tigated for a clinal pattern in departure from the SNM through Tajima’s D, as a study of climate related candidate genes in six populations of Sitka spruce suggested that sequential population bottlenecks during postglacial re-colonization created a pattern of rare variants being more common in the south, and medium-frequency variants being more common in the north (Holliday et al. 2010). We found no distinct correlation between latitude and mean values of Tajima’s D for all populations but as the results from resampling clearly revealed that estimates of Tajima’s D were severely af-fected by small sample size the results are inconclusive and confirmation using increased sample size is needed.

The result of a low level of LD within genes from paper I leads to the con-clusion that a candidate gene approach to association mapping is appropriate in Norway spruce, as the marker density required in genome wide associa-tion studies would not be cost effective. However, the results from this study

0 1000 2000 3000 4000 5000 6000

0.0

0.2

0.4

0.6

0.8

1.0

SE-61

Distance in bp

r2

0 1000 2000 3000 4000 5000 6000

0.0

0.2

0.4

0.6

0.8

1.0

SE-64

Distance in bp

r2

0 1000 2000 3000 4000 5000 6000

0.0

0.2

0.4

0.6

0.8

1.0

FI-67

Distance in bp

r2

Page 30: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

30

are in line with recent studies in conifers previously characterized with low levels of LD, which have revealed a high level of LD in some of the exam-ined genes (Lepoittevin et al. 2008, Namroud et al. 2010, Pavy et al. 2012, Pyhäjärvi et al. 2011), indicating a heterogeneous pattern in LD which could have important consequences for the design of future association studies in Norway spruce. The results from this study are based on a limited number of loci from the gene space only and, although genes are going to remain inter-esting for association mapping in Norway spruce with its large genome, efforts should now be made to estimate LD and factors that influence LD at full genome level.

Paper IV With the advent of new sequencing technologies, the discovery and genotyp-ing of genome wide genetic markers that can help to answer many of the important questions in population genetics is plausible for many species. For Norway spruce, however, the large repetitive, and mostly uninformative, portion of the genome makes the effort to utilize the entire genome for marker development cost-prohibitive. In paper IV, we constructed genomic libraries through the use of methylation sensitive restriction enzymes with the purpose of eliminating repetitive sequences; so called HypoMethylated Partial Restriction (HMPR) libraries. Following the studies made in the model plant maize (Emberton et al. 2005), we used the enzymes HpaII and HpyCH4IV and employed partial restriction on genomic DNA from both needles and seeds. A standard shotgun genome library was constructed for reference. Resulting clones were filtered for classification against nucleotide sequences, amino acid sequences and EST sequences deposited at Genbank, and RepBase was used specifically to identify known repetitive sequences. The complementary approach of classifying sequences of a repetitive nature by sequence similarity to other clones within the library was taken to ac-commodate for the low characterization of repetitive elements in Norway spruce.

We analyzed a total of 1213 fragments in the HMPR library and 7883 frag-ments in the reference library. The results were very encouraging; we ob-tained a nine-fold enrichment of genes and the repetitive content was re-duced by 45% in the HMPR libraries compared to the reference library. Apart from sequence recognition of actual genes, the increase in SSRs and Class II transposable elements, both commonly associated with single copy expressed parts of the genome, serves as good indications that the HMPR method indeed targeted the gene space. Due to the uncharacterized genome of Norway spruce, a large number of sequences remained unidentified after classification and an attempt was made to elucidate the nature of these se-

Page 31: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

31

quences. The amplification of some of these fragments in cDNA revealed a spatially differentiated expression, and the retrieval of full-length transcripts suggested they might be long non-coding RNAs, which are abundant in most organisms investigated and associated with the gene space.

Figure IV.1. The result of sequence classification in HMPR libraries (black bars) and a reference standard shotgun library (grey bars) constructed from genomic DNA of Norway spruce. Y-axis is percent; sequence categories are given on the X-axis.

We show in this paper that the HMPR method is effective in constructing libraries enriched for the genic fraction of the genome, while simultaneously reducing the repetitive fraction, when applied to the genome of Norway spruce. Combined with NGS technology, we believe that this method could be a valuable tool for years to come for the discovery, validation and as-sessment of genome wide genetic markers in populations of Norway spruce making future genome wide studies of genetic diversity and large-scale as-sociation studies in Norway spruce both plausible and cost-efficient.

7.7 3.8

11.6

41

16.4 19.5

0.2 1.1 2.5

15.4 11.7

69.2

Gene Putative gene

EST No hit Putative repetetive element

Repetitive element

HMPR libraries Reference library

Page 32: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

32

Conclusion

“Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning.”

-Winston Churchill

The studies presented in this thesis together constitute a foundation for fu-ture studies on adaptive evolution in Norway spruce. Using multilocus data we were able to estimate key population genetics parameters and elucidate the demographic history of Norway spruce. With the understanding that the genetic diversity in Norway spruce reflects the changes in distribution range of the species as a result of past climate changes, we again used a multilocus data set to test for the presence of selection at genes putatively involved in local adaptation. The adaptive trait focused on in this study is bud set, a trait of major importance as it determines the growth and reproductive success of trees in temperate regions. The different approaches used to detect signatures of selection identified different candidate genes deviating from neutral ex-pectations, highlighting the difficulty in predicting how local adaptation will manifest itself on different time scales and in distribution wide samples. The approach of using genes in this thesis has proven useful in identifying de-mographic scenarios and in detecting genes under selection. However, the inevitable next step in identifying genes involved in local adaptation is large-scale association studies. Previous studies in conifers have led to the conclu-sion that a candidate gene approach to association studies would be the most appropriate, given the rapid breakdown of association between alleles. When examining properties important in the design of association studies, we found that previous estimates of a low level of LD were highly influenced by the joint analysis of several loci over a large distribution range and that esti-mates of LD were in fact heterogeneous across loci and increased within populations. We also found that within species tests for deviations from neutral expectations were highly sensitive to sample size. The question is whether estimates of population genetics parameters obtained from the gene space is sufficient to utilize in marker development for association studies. As the power to detect small deviations from neutrality is generally low in association studies, and the effects of individual loci on quantitative traits are mostly small, additional genomic sequence characterization in conifers is necessary to provide more comprehensive sets of markers, also including gene promoters and non-genic regions of the genome. The large size and

Page 33: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

33

dominant repetitive portion of the genome of Norway spruce makes the ef-fort to utilize the entire genome for marker development cost-prohibitive. In the final paper we show that the HMPR method is effective in constructing libraries enriched for the single copy fraction of the genome, while simulta-neously reducing the repetitive fraction, when applied to the genome of Norway spruce. In the end, a reference genome for conifers will be a funda-mental part of the information resources necessary for finding and under-standing the full range of relationships between alleles, phenotypes, and environments. However, genome wide genetic markers can help to answer many of the important questions in population genetics without sequencing entire genomes. The continued use of investigating specific candidate genes in association studies is likely to prevail for some time but combining candi-date genes with the expanded, genome wide sets of markers that can be de-veloped from a reduced genome using high throughput sequencing methods has the potential to greatly facilitate studies of genetic diversity and large-scale association studies in Norway spruce and other conifers. This would also provide a powerful tool for studies on forest trees that are directed to-wards obtaining faster genetic improvement in timber quality, growth rate, and stress and disease tolerance.

Page 34: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

34

Svensk sammanfattning

Den mångfald av variation som observeras i fysiska egenskaper (fenotyper) både inom och mellan arter är till stor del följden av populationers anpass-ning till nya och föränderliga miljöer. Att fastställa de underliggande gene-tiska orsakerna till lokal anpassning är en av de största drivkrafterna inom populationsgenetiken. Gran (Picea abies) dominerar stora delar av Europas naturliga skogar och uppvisar en latitudgradient i knoppsättning över hela sin utbredning. Denna gradient är mest troligt resultatet av populationers anpassning till lokala klimatförhållanden då knoppsättningen framförallt styrs av en förändring i dagslängd. Tiden för knoppsättning är avgörande för granens förmåga att överleva vinterns låga temperaturer då detta är start-punkten för uppbyggnaden av ett frostskydd. Granar längre norrut inducerar knoppsättning tidigare på hösten än granar i de södra delarna av utbred-ningsområdet, vilket innebär att populationerna maximerar sin tillväxtsäsong i förhållande till överlevnad. Det finns en stark genetisk komponent som styr knoppsättning och ett antal, men långt ifrån alla, gener har identifierats i gran som potentiellt är inblandade i processen. För att korrekt kunna associ-era fenotyper med underliggande genetisk variation krävs en detaljerad kun-skap om de faktorer som påverkar artens genetiska variation och denna av-handling utgör en grund för fortsatta studier av lokal anpassning i gran.

I den första artikeln analyserade vi ett flertal gener för att beskriva grund-läggande populationsgenetiska egenskaper hos gran. Analyserna visade att granens genetiska variation är lägre än förväntat med tanke på sin populat-ionsstorlek och sitt stora utbredningsområde. Detta är med största sannolik-het resultatet av fluktuationer i populationsstorlek under artens historia. Den nuvarande populationen visar tydliga tecken på populationstillväxt men detta tycks inte bero på en återerövring av den europeiska kontinenten efter den senaste istiden, utan kan istället spåras till händelser längre tillbaka i tiden. Den statistiska associationen mellan allelfrekvenser (så kallad linkage disequilibrium, LD) var låg, vilket var förväntat från granens låga förekomst av självbefruktning och stora populationsstorlek.

Med vetskapen om att granens demografiska historia måste beaktas i sö-kandet efter gener som uppvisar spår av selektion använde vi i den andra artikeln återigen information från flera lokus för att identifiera de gener som potentiellt är involverade i knoppsättning. Vi använde två olika metoder för att upptäcka gener som avvek från de neutrala förväntningar som simulera-des från bakgrundsgener. De olika metoderna resulterade i att olika gener

Page 35: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

35

identifierades och belyste svårigheterna i att förutse hur lokal anpassning manifesteras i olika tidsskalor och i olika urval av populationer. Det faktum att vissa gener uppvisar tecken på selektion stärker deras potentiella in-blandning i lokal anpassning.

Tillvägagångssättet att använda populationsgenetiska estimat från gener för att härleda granens demografiska historia och identifiera potentiella ge-ner inblandade i lokal adaption måste betraktas som framgångsrikt. För att bekräfta dessa, och andra, geners roll i lokal anpassning krävs dock storska-liga associationsstudier. I den tredje artikeln fokuserade vi därför på egen-skaper som är speciellt viktiga för utförandet av dessa studier. Vi fann att de tidigare låga estimaten av LD som både vi och andra studier i barrträd erhål-lit var kraftigt påverkade av att estimat från flera gener slagits samman inför analysen. Estimaten påverkades också av att de undersökta populationerna var från en stor del av artens distributionsområde. Då vi studerade generna individuellt klargjordes att olika gener uppvisar olika nivåer av LD och att den totala nivån av LD ökade då estimat från enstaka populationer beräkna-des. Vi fann också att vid användandet av ett test för avvikelser från neutrala förväntningar baserat på allelfrekvenser påverkades resultaten starkt av sam-pelstorleken på populationen, vilket tyder på att resultat från detta test bör tolkas med försiktighet vid små sampelstorlekar.

Resultaten från artikel tre visar att LD varierar över olika områden i gra-nens genom. Frågan är om estimat som endast är baserade på gener är till-räckligt för att utforma associationsstudier på ett sätt som identifierar gener inblandade i lokal anpassning. I associationsstudier ökar kapaciteten att upp-täcka gener som har en liten effekt på en fenotyp med antalet genetiska mar-körer och populationsgenetiska estimat från ytterligare genomregioner, såsom promotorer och intergeniska regioner, är nödvändiga för att utveckla en stor mängd markörer. Granens genom är mycket stort och består till 75% av repetitiva sekvenser. Detta försvårar sekvensering av genomet och där-med möjligheten att till fullo karaktärisera den genetiska variationen i gran. I den fjärde och sista artikeln utforskar vi därför möjligheten att konstruera helgenomsbibliotek med en reducerad andel av repetitiva sekvenser. Vi an-vände oss av metyleringskänsliga restriktionsenzym och separerade de oklippta, metylerade fragmenten innehållande de repetitiva sekvenserna från de övriga fragmenten. De resulterande biblioteken innehöll nio gånger fler genfragment och nästan hälften så mycket repetitiva sekvenser som ett refe-rensbibliotek. Denna metod, i kombination med de nya sekvenseringsme-toder, har potentialen att betydligt underlätta den fortsatta karaktäriseringen av genetisk variation i gran. Detta möjliggör i sin tur utvecklingen av gene-tiska markörer som kan användas i storskaliga studier av lokal anpassning i knoppsättning och för att identifiera gener inblandade i andra fenotyper av intresse.

Page 36: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

36

Acknowledgements

At the end of my time as a PhD student I am, admittedly, proud of the things I have learned and accomplished but I am first and foremost grateful to the people who have taught, helped, advised and supported me over the years. First of all, of course, I would like to thank my supervisor Martin for giving me this opportunity (even though you were afraid of me) and patiently wait-ing for me to finish my thesis through two maternity leaves. Your insights into population genetics, baked goods, politics, social interactions (=gossip) and the Swedish way of life have all been valuable. Ulf, my co-supervisor, thank you for always keeping your door open. Niclas, also co-supervisor, thank you for sharing all your lab expertise (“det är inte så noga”) and for all the really good laughs. Thomas, I could never mention or thank you enough for all the things you have helped me with, big and small, but above all I would like to thank you for always being there, as a colleague but also as a great friend, and never making me feel like I am bothering you with all my questions. Darling Kalle, your support has been invaluable, thank you for showing me your softer side when I needed it and for being your great, ironic, sarcastic, fun self the rest of the time. The conversations and laughs I’ve had with you guys have really made the high points of most workdays. Kerstin, thank you for all the great talks, for your constant cheerful and help-ful disposition and for keeping the lab in perfect shape (all three of them). You are truly a wonderful woman! My kids and I also thank you for assis-tance in the lab during my pregnancies. Rose-Marie, thank you for all the help I’ve received from you over the years and all the fun chats we’ve had. Michael, you are the best roommate anyone could ask for, thank you for sharing space, time and laughs with me. Jun, thank you for helping me with scripts and doing it (seemingly) happily! Fellow ladies of the Picea-group; Anna K and Sofia, thank you for talks about all things not Picea-related. To all the people at the department, past and present, thank you for making it a great place to work at. I would especially like to thank Susanne, Anna P, Per, Johan F (the best boss I’ve ever had), Johan W, and Marita (who talked even more than me) for fun times in the past. Anna-Carin, you one-of-a-kind lady, I have missed you sorely since you left. Thank you for giving out all the best advices and for always ruling the dance floor!

Page 37: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

37

I would also like to thank fellow authors and contributors to the work pre-sented in this thesis; Myriam (a force of nature), Emanuele, Laura, Yoshi, Stéphane DeMita, Michele Morgante, Irena Jurman and Xiao-Fei. Friends who have brightened my days; Karolina, sharing our burdens made them so much lighter; Andreas and Kristiina, time has a wonderful way of just flying by whenever I spend it with either of you (I don’t know if you know it but you made all the difference that evening); Björn, don’t ever change; Jonas, så; Andreas; thank you for helping me focus on the really important stuff (the party) in the last weeks of writing and for keeping my spirits up; Erika, soul mate; Sofia and Stina; can’t wait to get back to our dinner dates. Jenny and Turid, words are not enough. I love you both dearly. How did we become such grown-ups? To all the people near and dear to me: Thank you, you have all in one way or another encouraged me during my time as a PhD-student. This thesis would be a long way from finished if it had not been for the sup-port of my family. Mom and Dad, your unwavering love and support is overwhelming. I promise I will (probably) never ask for anything ever again after this summer. Gunnel, you came and made everything better in our time of need. Sist, men inte minst; mina grabbar. Holger och Folke, tack för att ni gett mig en enorm livsglädje, nyttigt tålamod, en gnutta självinsikt och massor av kärlek! Ni är min största lycka och ni gör mig stolt varje dag. Mamma älskar er. Stefan. Tack för att du står kvar. Du har mitt hjärta.

Page 38: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

38

References

Acheré VJM, Favre G, Besnard G, Jeandroz S (2005) Genomic organization of molecular differentiation in Norway spruce (Picea abies). Mol Ecol 14:3191-3201.

Ahuja MR and Neale DB (2005) Evolution of genome size in conifers. Silvae genet-

ica 54:126-137 Akey JM, Zhang G, Zhang K, Jin L, Shriver MD (2002) Interrogating a high-density

SNP map for signatures of selection. Genome Res 12(12):1805-14. Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian computation in

population genetics. Genetics 162(4):2025-35. Bouillé M, Senneville S, Bousquet J (2010) Discordant mtDNA and cpDNA phy-

logenies indicate geographic speciation and reticulation as driving factors for the diversification of the genus Picea. Tree Genet & Genomes 7(3):469-484.

Brown GR, Gill GP, Kuntz RJ, Langley CH, Neale DB (2004) Nucleotide diversity

and linkage disequilibrium in loblolly pine. Proc Natl Acad Sci USA 101(42):15255-60.

Böhlenius H, Huang T, Charbonnel-Campaa L, Brunner AM, Jansson S, Strauss SH,

Nilsson O (2006) CO/FT regulatory module controls timing of flowering and seasonal growth cessation in trees. Science 312(5776):1040-1043.

Chen J, Källman T, Ma X, Gyllenstrand N, Zaina G, Morgante M, Bousquet J, Eck-

ert A, Wegrzyn J, Neale D, Lagercrantz U, Lascoux M (2012) Disentangling the roles of history and local selection in shaping clinal variation in allele frequency and gene expression in Norway spruce (Picea abies). Genetics 191(3):865-881.

Chen J (2012) Conifer evolution, from demography and local adaptation to evolu-

tionary rates. Examples from the Picea genome. Dissertation, Uppsala universi-ty.

Clark P and Mix A (2002) Ice sheets and sea level of the last glacial maximum.

Quaternary Sci Rev 21:1-7. Craig N (1995) Unity in transposition reactions. Science 270:253-254 Davis MB and Shaw RG (2001) Range shifts and adaptive responses to Quaternary

climate change. Science 292(5517):673-679.

Page 39: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

39

De Mita S, Ronfort J, McKhann HI, Poncet C, El Malki R, Bataillon T (2007) In-vestigation of the demographic and selective forces shaping the nucleotide di-versity of genes involved in nod factor signaling in Medicago truncatula. Ge-netics 177: 2123–2133.

De Mita S and Siol M (2012) EggLib: processing, analysis and simulation tools for

population genetics and genomics. BMC Genet 13:27-38. DePaoli E (2006) Diversitá genetica, linkage disequilibrium e componente ripetitiva

del genoma in Abete Rosso (Picea abies (L.) Karst.). Dissertation, Universitá Degli Studi di Udine.

Depaulis F, Veuille M (1998) Neutrality tests based on the distribution of haplo-

types under an infinite-site model. Mol Biol Evol 15(12):1788-90. Dormling I (1973) Photoperiodic control of growth and growth cessation in Norway

spruce seedlings. International Symposium on Dormancy in Trees, IUFRO divi-sion 2, working party 2.01. 4, Kornik, Poland.

Dormling I (1979) Influence of light intensity and temperature on photoperiodic

response of Norway spruce provenances. In IUFRO Jt Meet work Parties Nor-way spruce provenances Norway spruce breed.

Emberton J, Ma J, Yuan Y, SanMiguel P, Bennetzen JL (2005) Gene enrichment in

maize with hypomethylated partial restriction (HMPR) libraries. Genome Res 15:1441-1446.

Eriksson G, Ekberg I, Dormling I, Matern B, von Wettstein D (1978) Inheritence of

bud-set and bud-flushing in Picea abies (L.) Karst. Theor Appl Genet 52:3-19. Eriksson G (2010) Picea abies Recent genetic research. Uppsala: Department of

Plant Biology and Forest Genetics, SLU. Farjón A (1990) Pinaceae: Drawings and descriptions of the genera Abies, Cedrus,

Pseudolarix, Keteleeria, Nothotsuga, Tsuga, Cathaya, Pseudotsuga, Larix and Picea. Koeltz ScientiWc Books.

Fisher RA (1930) The Genetical Theory of Natural Selection. Oxford University

Press. Fay JC and Wu CI (2000) Hitchhiking under positive Darwinian selection. Genetics

155(3):1405-13. Fu YX and Li WH (1993) Statistical tests of neutrality of mutations. Genetics

133(3):693-709. Giesecke T and Bennett KD (2004) The Holocene spread of Picea abies (L.) Karst.

in Fennoscandia and adjacent areas. J Biogeogr 31(9):1523-1548. Griffiths RC and Marjoram P (1996) Ancestral inference from samples of DNA

sequences with recombination. J of Comp Biol 3: 479-502.

Page 40: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

40

Gyllenstrand N, Clapham D, Källman T, Lagercrantz U (2007) A Picea abies FT homolog is implicated in control of growth rhythm in conifers. Plant Physiol 144:248-257.

Haddrill PR, Thornton KR, Charlesworth B, Andolfatto P (2005) Multilocus patt-

terns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. Genome Res 15(6):790-99.

Hall D, Ma X-F, Ingvarsson PK (2011) Adaptive evolution of the Populus tremula

photoperiod pathway. Mol Ecol 20: 1463–74. Hannerz M (1999) Evaluation of temperature models for predicting bud burst in

Norway spruce. Can J Forest Res 29:9-19. Hannerz M, Sonesson J, Ekberg I (1999) Genetic correlations between growth and

growth rhythm observed in a short-term test and performance in long-term field trials of Norway spruce. Can J For Res 29: 768-778.

Hedrick PW and Kumar S (2001) Mutation and linkage disequilibrium in human

mtDNA. European Journal of Human Genetics 9: 969-972. Hill WG and Robertson A (1968) Linkage disequilibrium in finite populations.

Theor Appl Genet 38: 226-231. Holliday JA, Ralph SG, White R, Bohlmann J, Aitken SN (2008) Global monitoring

of autumn gene expression within and among phenotypically divergent popula-tions of Sitka spruce (Picea sitchensis). New Phytol 178:103-22.

Holliday JA, Yuen M, Ritland K, Aitken SN (2010) Postglacial history of a wide-

spread conifer produces inverse clines in selective neutrality tests. Mol Ecol 19:3857-3864.

Hudson RR and Kaplan NL (1985) Statistical properties of the number of recombi-

nation events in the history of a sample of DNA sequences. Genetics 111: 147 - 164.

Hudson RR, Kreitman M, Aguadé M (1987) A test of neutral molecular evolution

based on nucleotide data. Genetics 116:153-159. Hudson RR and Kaplan NL (1988) The coalescent process in models with selection

and recombination. Genetics 120: 831-840. Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of

genetic variation. Bioinformatics 18(2):337-338. Huff CD, Harpending HC, Rogers AR (2010) Detecting positive selection from

genome scans of linkage disequilibrium. BMC Genomics 11:8. Ingvarsson PK (2008) Multilocus patterns of nucleotide polymorphism and the

demographic history of Populus tremula. Genetics 180(1):329-340.

Page 41: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

41

Karlgren A, Gyllenstrand N, Källman T, Sundström JF, Moore D, Lascoux M, Lagercrantz U (2011) Evolution of the PEBP gene family in plants: functional diversification in seed plant evolution. Plant Physiol 156(4):1967-77.

Kapitonov VV and Jurka J (2001) Rolling-circle transposons in eukaryotes. Proc

Natl Acad Sci USA 98(15):8714-8719. Keller SR, Levsen N, Olson MS, Tiffin P (2012) Local Adaptation in the Flowering-

Time Gene Network of Balsam Poplar, Populus balsamifera L. Mol Biol Evol (in press)

Kelly JK (1997) A test of neutrality based on interlocus associations. Genetics

146(3):1197-206. Kimura M (1968) Evolutionary rate at the molecular level. Nature 217:624-626. Kingman JFC (1982) The coalescent. Stochastic Processes and their Applications

13: 235-248. Knaap WOVD (2006) Late Quaternary expansion of Norway spruce Picea abies (L.)

Karst. in Europe according to pollen data. Quaternary Sci Rev 25:2780-2805. Kujala S and Savolainen O (2012) Sequence variation patterns along a latitudinal

cline in Scots pine (Pinus sylvestris): signs of clinal adaptation? Tree Genet & Genomes DOI:10.1007/s11295-012-0532-5.

Kumar A, Bennetzen JL (1999) Plant retrotransposons. Annu Rev Genet 33:479-532 Källman T (2009) Adaptive evolution and demographic history of Norway spruce

(Picea abies). Dissertation, Uppsala university. Lagercrantz U and Ryman N (1990) Genetic structure of Norway spruce (Picea

sbies): concordance of morphological and allozymic variation. Evolution 44: 38-53.

Lepoittevin, C., Harvengt, L., Plomion, C., & Garnier-Gere, P. (2012). Association

mapping for growth, straightness and wood chemistry traits in the Pinus pinaster Aquitaine breeding population. Tree Genetics & Genomes 8: 113–126.

Lewontin RC and Kojima K (1960) The evolutionary dynamics of complex poly-

morphisms. Evolution 14: 458-472. Lewontin RC (1964) The interaction of selection and linkage. I. General

considerations; heterotic models. Genetics 49: 49-67. Liesch R (2005) Statistical Genetics for the Budset in Norway Spruce, Diploma

thesis, Swiss Federal Institute of Technology Zurich. Ma X-F, Hall D, St Onge KR, Jansson S, Ingvarsson PK (2010) Genetic Differentia-

tion, Clinal Variation and Phenotypic Associations With Growth Cessation Across the Populus tremula Photoperiodic Pathway. Genetics 186: 1033–1044.

Page 42: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

42

Marjoram P and Tavaré S (2006) Modern computational approaches for analyzing molecular genetic variation data. Nat Rev Genet 7(10): 759-70.

Marroni F, Pinosio S, Zaina G, Fogolari F, Felice N, Cattonaro F, Morgante M

(2011) Nucleotide diversity and linkage disequilibrium in Populus nigra cin-namyl alcohol dehydrogenase (CAD4) gene. Tree Genetics & Genomes 7: 1011–1023.

McDonald JH and Kreitman M (1991) Adaptive protein evolution at the Adh locus

in Drosophila. Nature 351(6328): 652-654. McVean GAT (2001). What do patterns of genetic variability reveal about mito-

chondrial recombination? Heredity 87: 613–620. McVean GAT, Awadalla P, Fearnhead P (2002) A coalescent-based method for

detecting and estimating recombination rates from gene sequences. Genetics 160:1231–1241.

Morgante M and De Paoli E (2011) Toward the conifer genome sequence. In: "Co-

nifers" (Eds. C. Plomion & J. Bousquet). Series on "Genomics of Industrial Crops" (Ed. C. Kole). Pp. 389-403, Science Publishers, Inc., New Hampshire; Edenbridge Ltd., British Isles.

Morse AM, Peterson DG, Islam-Faridi MN, Smith KE, Magbanua Z, Garcia SA,

Kubisiak TL, Amerson HV, Carlson JE, Nelson CD, Davis JM (2009) Evolu-tion of genome size and complexity in Pinus. PLoS ONE 4(2): e4332

Myers SR and Griffiths RC (2003) Bounds on the minimum number of recombinat-

ion events in a sample history. Genetics 163: 375-394. Namroud M-C, Guillet-Claude C, Mackay J, Isabel N, Bousquet J (2010) Molecular

evolution of regulatory genes in spruces from different species and continents: heterogeneous patterns of linkage disequilibrium and selection but correlated recent demographic changes. J Mol Evol 70(4): 371-386.

Olsen KM, Womack A, Garrett AR, Suddith JI, Purugganan MD (2002) Contrasting

evolutionary forces in the Arabidopsis thaliana floral developmental pathway. Genetics 160: 1641–1650.

Parducci L, Jørgensen T, Tollefsrud MM, Elverland EE, Alm TR et al. (2012)

Glacial survival of boreal trees in northern Scandinavia. Science 335(6072): 1083-1086.

Pavy N, Namroud MC, Gagnon F, Isabel N, Bousquet J (2012) The heterogeneous

levels of linkage disequilibrium in white spruce genes and comparative analysis with other conifers. Heredity 108(3): 273-284.

Pritchard JK, Stephens M, Donnelly P (2000) Inference of Population Structure

Using Multilocus Genotype Data. Genetics 155: 945-959.

Page 43: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

43

Pyhäjärvi T, García-Gil MR, Knürr T, Mikkonen M, Wachowiak W et al. (2007) Demographic history has influenced nucleotide diversity in European Pinus syl-vestris populations. Genetics 177(3): 1713-1724.

Pyhäjärvi T, Kujala ST, Savolainen O (2011) Revisiting protein heterozygosity in

plants-nucleotide diversity in allozyme coding genes of conifer Pinus sylvestris. Tree Genetics & Genomes 2: 385-397.

Ran JH, Wei XX, Wang XQ (2006) Molecular phylogeny and biogeography of

Picea (Pinaceae): implications for phylogeographical studies using cytoplasmic haplotypes. Mol Phylogenet Evol 41(2): 405-419.

Sabeti PC, Reich D, Higgins JM, Levine HZP, Richter DJ, et al. (2002) Detecting

recent positive selection in the human genome from haplotype structure. Nature 419: 832–837.

Sarri V, Ceccarelli M, Cionini PG (2011) Quantitative evolution of transposable and

satellite DNA sequences in Picea species. Genome 54: 431-435. Sax K and Sax HJ (1933) Chromosome number and morphology in the conifers.

Jour Arnold Arbor 14: 356-375. Seppä H, Birks HJB, Bjune AE, Nesje A (2010) Current continental palaeoclimatic

research in the Nordic region (100 years since Gunnar Andersson 1909) –Introduction. Boreas 39(4): 649-654.

Siljak-Yakovlev S, Cerbah M, Coulaud J, Stoian V, Brown SC, Zoldos V, Jelenic S,

Papes D (2002) Nuclear DNA content, base composition, heterochromatin and rDNA in Picea omorika and Picea abies. Theor Appl Genet 104(2-3): 505-512

Stumpf MPH and McVean GAT (2003) Estimating recombination rates from popu-

lation-genetic data. Nat Rev Genet 4: 959-968. Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by

DNA polymorphism. Genetics 123(3): 585-95. Thornton KR and Andolfatto P (2006) Approximate Bayesian inference reveals

evidence for a recent, severe bottleneck in a Netherlands population of Dro-sophila melanogaster. Genetics 172(3): 1607-19.

Tjoelker MG, Boratynski A, Bugala W (Eds.) (2007) Biology and ecology of Nor-

way spruce. Series: Forestry Sciences Vol. 78, Springer. Tollefsrud MM, Kissling R, Gugerli F, Johnsen OY, Skroppa T, Cheddadi R, der

Knaap, van WO, et al. (2008). Genetic consequences of glacial survival and postglacial colonization in Norway spruce: combined analysis of mitochondrial DNA and fossil pollen. Mol Ecol, 17(18): 4134–4150.

Tollefsrud MM, Sonstebo JH, Brochmann C, Johnsen O, Skroppa T, Vendramin G

G (2009). Combined analysis of nuclear and mitochondrial markers provide new insight into the genetic structure of North European Picea abies. Heredity, 102(6): 549–562.

Page 44: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

44

Väliranta M, Kaakinen A, Kuhry P, Kultti S, Salonen JS et al. (2011) Scattered late-

glacial and early Holocene tree populations as dispersal nuclei for forest devel-opment in north-eastern European Russia. J Biogeogr 38(5): 922-932.

Wareing PF (1959) Problems of juvenility and flowering in trees. Bot J Linn Soc,

London 56(366): 282-289. Watterson GA (1975) On the number of segregating sites in genetical models with-

out recombination. Theor Popul Biol 7(2): 256-276. Weir BS (1979) Inferences about Linkage Disequilibrium. Biometrics 35: 235-254. Wiuf C, Christensen T, Hein, J (2001) A simulation study of the reliability of re-

combination detection methods. Mol Biol Evol 18:1929–1939. Wright S (1931) Evolution in Mendelian Populations. Genetics 16(2): 97-159. Wright JW (1955) Species crossability in spruce in relation to distribution and tax-

onomy. Forest Science 1: 319-340. Wright SI, Charlesworth B (2004). The HKA test revisited: a maximum-likelihood-

ratio test of the standard neutral model. Genetics 168: 1071–6

Page 45: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization
Page 46: Population Genetics and Genome Organization of Norway Spruceuu.diva-portal.org/smash/get/diva2:549747/FULLTEXT01.pdf · Larsson, H. 2012. Population Genetics and Genome Organization

Acta Universitatis UpsaliensisDigital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 969

Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science andTechnology, Uppsala University, is usually a summary of anumber of papers. A few copies of the complete dissertationare kept at major Swedish research libraries, while thesummary alone is distributed internationally throughthe series Digital Comprehensive Summaries of UppsalaDissertations from the Faculty of Science and Technology.

Distribution: publications.uu.seurn:nbn:se:uu:diva-180370

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2012