The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of...

52
The genomic signatures of adaptive evolution in Populus Jing Wang Department of Ecology and Environmental Science Umeå 2016

Transcript of The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of...

Page 1: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

The genomic signatures of adaptive evolution in Populus

Jing Wang

Department of Ecology and Environmental Science

Umeå 2016

Page 2: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

This work is protected by the Swedish Copyright Legislation (Act 1960:729) Copyright ©Jing Wang 2016 ISBN: 978-91-7601-463-9 Cover picture: From Nathaniel R. Street and designed by the author Eletronic version available at http://umu.diva-portal.org/ Printed by: Print & Media Umeå, Sweden 2016

Page 3: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

To my lovely family

Page 4: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

i

Table of Contents

Table of Contents iList of papers iiiAuthor contributions vAbstract 1Sammanfattning 2Introduction 3Signatures of adaptive evolution at the macro-evolutionary level 4Signatures of adaptive evolution at the micro-evolutionary level 7The impact of recombination and other population genetic factors on adaptive

evolution 11Study system 14Research aims 15Methods 17Results and Discussion 21Paper II: Comparative population genomics among three related Populus

species 21Paper III: Genomic footprints of speciation between two aspen species 24Paper IV: The genomic signatures of local adaptation in P. tremula 26Conclusions and perspectives 30Acknowledgments 32References 33My gratitude 43

Page 5: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

ii

Page 6: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

iii

List of papers

This thesis is a summary of the following papers that are referred to in the text by their Roman numbers:

I. Wang, J., Scofield, D.G., Street, N.R., and Ingvarsson, P.K. 2015. Variant calling using NGS data in European aspen (Populus tremula). In: Sablok G, Kumar S, Ueno S, Kuo J, Varotto C, editors. Advances in the understanding of biological sciences using next generation sequencing (NGS) approaches. New York: Springer International Publishing. pp. 43-61.

II. Wang, J., Street, N.R., Scofield, D.G., and Ingvarsson, P.K. 2016. Natural selection and recombination rate variation shape nucleotide polymorphism across the genomes of three related Populus species. Genetics 202: 1185-1200

III. Wang, J., Street, N.R., Scofield, D.G., and Ingvarsson, P.K. 2016. Variation in linked selection and recombination drive genomic divergence during allopatric speciation between European and American aspens. Molecular Biology and Evolution doi: 10.1093/molbev/msw051.

IV. Wang, J., Tan, B., Street, N.R., Scofield, D.G., and Ingvarsson, P.K. The signatures of local adaptation at the genomic level in European aspen (Populus tremula). Manuscript.

Paper I is reprinted with the kind permission from Springer International Publishing.

Page 7: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

iv

I also contributed to the following articles during my graduate studies:

1. Peng, Y.L., Tian, B., Tian, X., Wang, J., Hensen, I., and Liu, J.Q. 2015. Range expansion during the Pleistocene drove morphological radiation of the fir genus (Abies, Pinaceae) in the Qinghai-Tibet Pleateau and Himalayas. Botanical Journal of the Linnean Society. 179: 444-453.

2. Wang, J., Abbott, R.J., Ingvarsson, P.K., and Liu, J.Q. 2014. Increased genetic divergence between two closely related fir species in areas of range overlap. Ecology and Evolution. 4: 1019-1029.

3. Lockwood, J.D., Aleksic, J.M., Zou, J.B., Wang, J., Liu, J.Q., and Renner, S.S. 2013. A new phylogeny for the genus Picea from plastid, mitochondrial, and nuclear sequences. Molecular Phylogenetics and Evolution. 69: 717-727.

4. Peng, Y.L., Yin, S., Wang, J., Tian, B., Ren, G., Guo, Q., and Liu, J.Q. 2012. Phylogeographic analysis of the fir species in southern China suggests complex origin and genetic admixture. Annals of Forest Science. 69: 409-416.

5. Lin, Y.C., Delhomme, N., Wang, J., Schiffthaler, B., Grabherr, M.G., Zamani, N., Höppner, M., Mannapperuma, C., Mähler, N., Sundell, D., Van de Peer, Y., Hvidsten, T., Jansson, S., Ingvarsson, P.K., Street, N.R. Genome sequencing and de novo assembly of aspen species. Manuscript

6. Mähler, N., Terebieniec, B.K., Wang, J., Ingvarsson, P.K., Street, N.R., Hvidsten, T.R. The genetic architecture of gene expression natural variation in a forest tree suggests buffering of central genes. Manuscript

Page 8: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

v

Author contributions

Authors are referred to by their initials.

Paper I: PKI designed experiment and obtained the sequencing data. JW performed all data analyses with support from NRS, DGS, and PKI. JW wrote the manuscript. All authors contributed with comments on the final version of the paper.

Paper II: PKI designed experiment and obtained the sequencing data. JW performed all data analysis with support from NRS, DGS, and PKI. JW wrote the manuscript. All authors contributed with comments on the final version of the paper.

Paper III: PKI designed experiment and obtained the sequencing data. JW performed all data analysis with support from NRS, DGS, and PKI. JW wrote the manuscript. All authors contributed with comments on the final version of the paper.

Paper IV: PKI designed experiment and obtained the sequencing data. JW performed most of the data analyses. BT performed imputation analyses. JW wrote the manuscript. All authors contributed with comments on the manuscript.

Page 9: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

1

Abstract

Understanding the genetic basis of adaptive evolution, and how natural selection has shaped patterns of polymorphism and divergence within and between species are enduring goals of evolutionary genetics. In this thesis, I used whole genome re-sequencing data to characterize the genomic signatures of natural selection along different evolutionary timescales in three Populus species: Populus tremula, P. tremuloides and P. trichocarpa. First, our study shows multiple lines of evidence suggesting that natural selection, due to both positive and purifying selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. Second, we characterize the evolution of genomic divergence patterns between two recently diverged aspen species: P. tremula and P. tremuloides. Our findings indicate that the two species diverged ~2.2-3.1 million years ago, coinciding with the severing of the Bering land bridge and the onset of dramatic climatic oscillations during the Pleistocene. We further explore different mechanisms that may explain the heterogeneity of genomic divergence, and find that variation in linked selection and recombination likely plays a key role in generating the heterogeneous genomic landscape of differentiation between the two aspen species. Third, we link whole-genome polymorphic data with local environmental variables and phenotypic variation in an adaptive trait to investigate the genomic basis of local adaptation in P. tremula along a latitudinal gradient across Sweden. We find that a majority of single nucleotide polymorphisms (SNPs) (>90%) identified as being involved in local adaptation are tightly clustered in a single genomic region on chromosome 10. The signatures of selection at this region are more consistent with soft rather than hard selective sweeps, where multiple adaptive haplotypes derived from standing genetic variation sweep through the populations simultaneously, and where different haplotypes rise to high frequency in different latitudinal regions. In summary, this thesis uses phylogenetic comparative approaches to elucidate how various evolutionary forces have shaped genome-wide patterns of sequence evolution in Populus.

Page 10: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

2

Sammanfattning

Ett av målen inom evolutionär genetik är att förstå den genetiska basen för olika evolutionära anpassningar samt hur naturliga selektion formar genetisk diversitet och divergens inom och mellan arter. I denna avhandling använder jag data från hel-genomsresekvensering för att karakterisera genetisk diversitet och naturlig selektion i tre olika arter av Populus: P. tremula, P. tremuloides och P. trichocarpa. Mina studier visar att både positiv och negativ naturlig selektion har i stor utsträckning format den genetiska diversiteteten inte bara på positio som är under direkt selektion utan även på kopplade, neutrala positioner i alla tre arter som studerats. De skillnader jag ser i effekterna av kopplad naturlig selektion mellan de tre arterna kan till stor del förklaras av skillnader i effektiv populationsstorlek och rekombination mellan arterna. Jag har även använt samma resekvenseringsdata för att karakterisera artbildningsprocessen mellan de två mest närbesläktade arterna, P. tremula och P. tremuloides. Mina resultat visar att de två arterna divergerade för ca. 2.2-3.1 miljoner år sedan, en period som väl sammanfaller med öppnandet av Berings sund och den period av dramatiska klimatsvängningar som karakteriserade denna tid av Pleistocen. Jag har studerat vilka olika mekanismer som kan förklara den heterogenitet i genetisk divergens vi ser över genomet när vi jämför de två arterna, och visar att även här spelar länkad naturlig selektion och rekombination en viktig roll. Slutligen har jag använt helgenomdata för att identifiera genetiska varianter och kopplat denna genetiska variation till variation i lokala miljövariabler och variation in adaptiv egenskap, tillväxtavslut, för att på så sätt undersöka den genetiska basen för lokal anpassning i P. tremula längs en latitudinell gradient genom Sverige. Resultaten visar att en majoritet (> 90%) av de genetiska varianter som jag identifier som involverade i lokal anpassning är tätt samlade i en enda genomisk region på kromosom 10. De tecken på naturlig selektion som jag kan identifierar i denna kromosomregion ligger väl i linje med sk. ”mjuka” ”selective sweeps”, snarare än ”hårda” sweeps som uppkommer genom nya adaptiva mutationer. Sammanfattningsvis så har ja i denna avhandling använt olika jämförande metoder för att studera hur evolutionära krafter har format genetisk variation Populus.

Page 11: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

3

Introduction

Understanding the genetic basis of how organisms adapt to diverse environments is a major goal in ecology and evolutionary biology (Lewontin 1974; Anderson, et al. 2011). With the advent of next generation sequencing (NGS) technologies, it has become feasible to perform genomic studies on ecologically and economically important organisms, rather than only on humans and laboratory-based model species (Stapley, et al. 2010; Neale and Kremer 2011).

The massive amounts of genome-wide data are transforming the field of population genetics to population genomics, and simultaneously, are revolutionizing our understanding of how natural populations and species evolve (Luikart, et al. 2003; Charlesworth 2010; Weigel and Nordborg 2015). Before the genomics era, population genetic studies mainly relied on the analysis of individual candidate genes to search for the evidence of adaptive evolution (Wright and Gaut 2005). However, this approach was limited due to a non-random sampling of selected genes and poor consideration of the effects of demographic history. With the advent of fully sequenced genomes of large population samples, the field of population genomics can provide effective solutions to these key limitations of candidate gene studies (Akey 2009; Ellegren 2014). In the context of genome-wide data analyses, it is much more feasible to elucidate the relative contribution of mutation, genetic drift, recombination and natural selection to extant amount and patterns of genetic variation (Sella, et al. 2009; Olson-Manning, et al. 2012; Cutter and Payseur 2013). Furthermore, genome scans can facilitate the identification of selection signatures by minimizing the confounding effects of population demographic history (Mitchell-Olds and Schmitt 2006).

To understand the genetic basis of adaptive evolution, a major challenge is to determine the amount and type of selection during evolutionary history, and to identify loci that have been subject to selection (Nielsen 2005; Mitchell-Olds and

Page 12: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

4

Schmitt 2006). Patterns of polymorphism within species and divergence between species carry the footprints of various evolutionary events, which in turn can be used to study how selection and other evolutionary forces have shaped genomes of organisms (Mitchell-Olds, et al. 2007).

In general, natural selection acts in different modes and over various timescales (Sabeti, et al. 2006). Positive selection increases the frequency of favored genetic variants, purifying selection eliminates detrimental genetic variants, and balancing selection maintains genetic polymorphism (Storz 2005; Sella, et al. 2009). Distinctive signatures of selection depend on the time scales over which it occurred. Between-species comparisons are usually used to identify older selective events, while segregating polymorphisms within species are informative of recent and/or ongoing selective events (Oleksyk, et al. 2010). Below, I provide an overview of the signatures of adaptive evolution at both macro- and micro-evolutionary levels as well as its impact on genomes.

Signatures of adaptive evolution at the macro-evolutionary level

Comparing homologous traits or DNA sequences among related species is a powerful way to detect adaptive evolution at the macro-evolutionary level. The neutral theory holds that the vast majority of DNA substitutions between species are neutral, with a negligible contribution of positive selection and that evolution is mainly governed by random genetic drift (Kimura 1968, 1984). Ohta noted that a substantial fraction of mutant substitutions should be nearly, rather than strictly, neutral because they might also be caused by random fixation of very slightly deleterious mutations (Ohta 1973, 1992). Despite contentious argument over these theories, they have formed the basis of inference in population genetics over the past four decades. However, some recent studies in Drosophila (Sella, et al. 2009), mammals (Halligan, et al. 2010) and a few plant species (Slotte, et al. 2010; Strasburg, et al. 2011) suggested that

Page 13: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

5

a significant proportion of genome-wide substitutions had been the product of adaptive evolution, which challenges the basic expectations of the (nearly) neutral theory. With the development of genome-wide polymorphism datasets, some species-based comparative approaches have been taken to determine the extent to which genomes are shaped by neutral evolution, selective constraint and adaptive evolution (Cutter and Payseur 2013; Hough, et al. 2013; Vitti, et al. 2013).

One of the best-known methods using between-species comparisons is the ratio of non-synonymous and synonymous nucleotide substitutions in protein coding regions (dN/dS) (Messier and Stewart 1997; Yang 1998; Yang and Bielawski 2000). Non-synonymous mutations change the amino acid whereas synonymous mutations alter the codon but not the amino acid. If both synonymous and non-synonymous substitutions are assumed to be neutral, the dN/dS ratio is expected to equal one. If an amino acid change is beneficial, it will be fixed at a higher rate than a synonymous mutation, resulting in dN/dS>1. Conversely, when the amino acid change is deleterious, it is expected to be constantly purged from the population by purifying selection, causing dN/dS<1. It is apparent that in most cases dN/dS<1, implying that the vast majority of amino acid sites are constrained by purifying selection (Charlesworth, et al. 1993). Only if there is significant adaptive evolution, is dN expected to be greater than dS (Yang and Bielawski 2000; Wright, et al. 2002; Bustamante, et al. 2005).

In addition to the dN/dS test, the McDonald-Kreitman (MK) test and its extensions (McDonald and Kreitman 1991) have been extensively used to detect evidence for positive selection. The MK test compares the amount of variation within a species to the divergence between species at two types of sites, typically, synonymous and non-synonymous sites within protein-coding regions (Figure 1). This test, in principle, is applicable for any two sets of sites for which one is likely to be evolving neutrally (Andolfatto 2008). If mutations at synonymous and non-synonymous sites are both neutral, we would expect they contribute in similar proportion to polymorphism and

Page 14: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

6

divergence. Therefore, the proportion of non-synonymous and synonymous changes of polymorphism within the species (pN/pS) is expected to be the same as the divergence between species (dN/dS) (Figure 1). By contrast, purifying selection removes deleterious non-synonymous variants faster, and fewer non-synonymous fixed differences between the two species are expected. Purifying selection therefore likely causes an elevated ratio of non-synonymous to synonymous polymorphism relative to divergence (pN/pS > dN/dS). However, if some of the non-synonymous mutations are under positive selection, these adaptive mutations are expected to contribute more to substitutions than to polymorphisms over a prolonged period, leading to a general increase in non-synonymous divergence (dN/dS > pN/pS) (Figure 1).

Positive selection Neutrality Outgroup

species

Non-synonymous (N)

Synonymous (S)

N/S

Polymorphism 3 6 0.5

Divergence 2 4 0.5

Non-synonymous (N)

Synonymous (S)

N/S

Polymorphism 2 4 0.5

Divergence 5 4 1.25 Figure 1. Evidence for positive selection based on the McDonald-Kreitman (MK) test. Light grey lines represent a protein-coding region of studied species and dark grey lines represent the same region in outgroup species. Circles represent polymorphisms and stars represent substitutions. Red and blue colours represent non-synonymous and synonymous sites, respectively. The tables give summaries of numbers of sites in each category.

When combining the patterns of polymorphism and divergence across multiple genes, it is possible to infer the fraction of amino acid divergence that is adaptive (commonly

Page 15: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

7

referred to as α), α=1-(pN/pS)/(dN/dS) (Smith and Eyre-Walker 2002). A variety of species have been estimated α using this or other related methods, with evidence of widespread positive selection varying among taxa. Estimates in Drosophila suggest that about 40-50% of the amino acid substitutions are adaptive (Smith and Eyre-Walker 2002; Welch 2006; Begun, et al. 2007). Similarly, high proportions of adaptive fixations have also been found in some bacteria and vertebrate species, such as in Escherichia coli (56%) (Charlesworth and Eyre-Walker 2006), Pseudomonas syringae (56%) (Hughes, et al. 2008) and Mus musculus castaneus (57%) (Halligan, et al. 2010). However, little evidence of adaptive evolution is found in humans and in most plants (Sequencing and Consortium 2005; Boyko, et al. 2008; Gossmann, et al. 2010). Such contrasting patterns in the prevalence of adaptive evolution among taxa has led to the suggestion that differences in the effective population size and population structure are likely to be the determinants of rates of adaptive evolution among species (Wright and Andolfatto 2008).

Overall, tests based on inter-specific genomic comparisons are indeed very useful to detect old, recurrent selective events over quite long evolutionary timescales. However, they have limited power to detect recent and/or ongoing selection within species as well as to precisely pinpoint loci that are subject to selection. In this case, intra-specific population genomic data are needed to detect recent selection and to identify the actual alleles and/or mutations underlying selection.

Signatures of adaptive evolution at the micro-evolutionary level

Evidence of more recent selection can be garnered from analysis of intraspecific polymorphism. As a new advantageous mutation rises rapidly to fixation, neutral variants closely linked to it will also increase in frequency as a result of the hitchhiking effect (Smith and Haigh 1974). Therefore, such “selective sweep” events leave a highly characteristic molecular signature

Page 16: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

8

involving a local reduction of nucleotide diveristy around the selected locus (Figure 2). Apart from this, a selective sweep can also cause a skew in the distribution of allele frequencies towards rare variants (Tajima 1989). This is because subsequent to a selective sweep, new mutations would initially occur at low frequencies before mutation-genetic drift equilibrium is restored (Figure 2).

One of the most widely used methods is Tajima’s D test, which compares estimates of θ (=4Neµ) based on the observed number of segregating sites in a sample (θω) with the mean pairwise difference between all sequences in the sample (θπ) (Tajima 1989). Under the standard neutral model, the expected value of Tajima’s D is zero. Negative values of Tajima’s D indicate an excess of rare polymorphism, which could result from either positive or purifying selection. In comparison, a positive Tajima’s D is the result of an excess of intermediate frequency alleles and may therefore indicate balancing selection. Furthermore, Fay and Wu’s H test extends the Tajima’s D test and provides more power to detect the excess of high-frequency derived alleles, which is a hallmark signature of a selective sweep (Fay and Wu 2000). Compared to the Tajima’s D test that relies on the folded spectrum, Fay and Wu’s H test relies on the unfolded spectrum where ancestral and derived variants are distinguished using an outgroup sequence, which enables it to be more efficient to detect an ongoing selective sweep (Przeworski 2002).

Another popular approach for detecting ongoing or incomplete selective sweeps is the extended haplotype test and its derivatives (Sabeti, et al. 2006; Voight, et al. 2006). These tests rely on the principle that the sudden rise of a selected allele leaves insufficient time for recombination to break down the linkage disequilibrium (LD) between the associated neutral sites, further generating an unusually long identical haplotype (Figure 2). As a result, the presence of a high frequency haplotype with long-range LD, which would not be expected under neutral genetic drift, is a strong sign of recent positive selection (Voight, et al. 2006). Methods aimed at detecting

Page 17: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

9

Figure 2. Signatures of selective sweeps at the micro-evolutionary level. The lines indicate individual DNA sequences or haplotypes, and the polymorphic mutations are depicted as blue bars. (a) The pattern of polymorphism in the ancestral population before selection occurred. (b,c) Two new advantageous mutations (indicated by red and green stars) respectively appear on different haplotypes in the two sub-populations (A and B). (d,e) Beneficial mutations bring nearby hitchhiker variants to high frequency or even fixation, creating unusually long haplotypes of low variation in both sub-populations. (f) Recent selective sweeps usually leave several signatures on patterns of genomic variation around the selected locus, including a population-wide reduction in the genetic diversity (Θπ), a surplus of rare alleles (negative values of Tajima’s D statistics), a higher level of linkage disequilibrium (LD) or extended haplotype homozygosity (EHH), and increased allele frequency differences between populations (higher values of FST). Red and green lines represent the statistical values from sub-population A and B, respectively. Black dashed lines represent the average values of the corresponding statistical methods across the genome.

Page 18: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

10

recent and incomplete selective sweeps using LD information include the extended haplotype homozygosity (EHH) test (Sabeti, et al. 2002), integrated Haplotype Score (iHS) (Voight, et al. 2006) and Cross Population Extended Haplotype Homozogosity (XP-EHH) (Sabeti, et al. 2007). However, such LD-based approaches will rapidly lose power once the sweep is complete, because little variation is left to assess patterns of LD after selected alleles approach fixation.

When spatially separate populations are subject to distinct ecological and environmental conditions, heterogeneous selection at different sites may increase the allele frequency differences between populations (Figure 2). Therefore, increased levels of population differentiation in specific genomic regions can be used to identify putative targets of spatially varying selection (Wright 1949; Holsinger and Weir 2009). The most commonly used statistical test for population differentiation is FST, a fixation index that measures the partitioning of genetic variance between versus within populations (Wright 1949; Holsinger and Weir 2009). The underlying rationale is that the expected pattern of genetic differentiation between populations is the same at all neutral loci and is mainly determined by neutral processes such as genetic drift and demographic history (Luikart, et al. 2003; Beaumont and Balding 2004). However, loci under selection and tightly linked neutral loci are expected to exhibit extreme levels of differentiation compared to the expectation under neutrality (Luikart, et al. 2003; Beaumont and Balding 2004) (Figure 2).

Most of the above signatures of recent selection are likely to be explained by the classical ‘hard’ selective sweep model, where selection quickly drives a new advantageous mutation to high frequency and ultimately to fixation and, as a consequence, all ancestral variation at closely linked sites is purged by the hitchhiking effect (Smith and Haigh 1974). However, recent empirical and theoretical studies emphasize that this type of selective sweep may not be prevalent in nature (Pritchard and Di Rienzo 2010; Pritchard, et al. 2010; Hernandez, et al. 2011). Adaptations may more likely originate from standing genetic

Page 19: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

11

variation following rapid environmental shifts because there is little time for new beneficial mutations to be introduced (Hermisson and Pennings 2005; Pennings and Hermisson 2006), resulting in a so-called “soft selective sweep” (Messer and Petrov 2013; Garud, et al. 2015). Importantly, simulations have shown that selection on standing genetic variation typically produces a weaker, less pronounced signature of selection, and therefore, may be more likely to be missed by current statistical methods based on hard sweep models (Prezeworski, et al. 2005; Teshima, et al. 2006). Traditional selection scans also have relatively low power to detect polygenic adaptation, which takes place by modest allele frequency changes across many loci. Accordingly, more efforts are still needed in future studies to examine the relative prevalence of hard sweeps, soft sweeps and polygenic adaptation in natural populations.

The impact of recombination and other population genetic factors on adaptive evolution

Recombination can impact adaptive evolution in a number of ways. First, as shown above, processes of both positive selection favoring beneficial mutations (genetic hitchhiking) and/or purifying selection against deleterious mutations (background selection) can cause a reduction in levels of linked neutral polymorphism (Begun and Aquadro 1992; Begun, et al. 2007). The impact of linked selection, however, depends on the local environments of recombination. Selection purges neutral polymorphic sites from physically larger regions as a result of the stronger linkage in low recombination regions compared to high-recombination regions (Begun and Aquadro 1992; Kulathinal, et al. 2008; Charlesworth and Campos 2014) (Figure 3). Therefore, a positive correlation between neutral polymorphism and recombination rate provides an independent evidence of the prevalence of linked selection across the genome (Begun and Aquadro 1992). Apart from influencing levels of polymorphism within species, linked selection can also increase inter-specific genetic differentiation as a result of accelerated linkage sorting in regions of low

Page 20: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

12

recombination (Cruickshank and Hahn 2014). Therefore, interactions between selection and recombination can also drive the evolution of heterogeneous genomic differentiation between recently diverged species (Renaut, et al. 2013; Burri, et al. 2015).

Figure 3. The impact of natural selection (denoted by red arrow) on linked neutral diversity (grey line) depends on the local environments of recombination. Longer footprints of the reduction in the genetic diversity are observed in regions of low recombination rate (b) than in regions of high recombination rate (a). The blue dotted line represent the baseline genetic diversity. Modified from Mejia (2012).

Furthermore, because the efficacy of selection (Nes) is determined by the product of the local effective population size (Ne) and the selection coefficient (s) (Kimura 1984), the reduced Ne due to the genetic linkage effects in regions of low recombination is expected to reduce the efficacy of both positive and purifying selection. Accordingly, recombination rate can affect the efficacy of selection at a given site that is linked to

Page 21: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

13

other targets of selection through the process known as Hill-Robertson interference (HBI) (Hill and Robertson 1966). In low recombination regions, we would therefore expect fewer beneficial mutations and larger load of deleterious mutations compared to those in regions of high recombination (Hill and Robertson 1966; Haddrill, et al. 2007; Campos, et al. 2014).

In addition to recombination, other factors such as species’ effective population sizes and complex demographic histories can also shape the course of adaptive evolution (Olson-Manning, et al. 2012; Phifer-Rixey, et al. 2012). For example, populations with large Ne experience more mutations per generation and thus have potentially more adaptive alleles than small ones. In addition, larger populations usually experience weaker effects of random genetic drift. Favorable alleles, therefore, are less likely to be lost by drift but rather more likely to eventually become fixed in large populations than in small populations (Wright and Andolfatto 2008). Moreover, complex demographic history, such as population expansion and/or bottlenecks, can also accelerate or inhibit adaptive evolution (Gossmann, et al. 2010). Expanding populations may contain large numbers of new mutations, resulting in a greater prevalence of hard selective sweeps compared to populations at equilibrium. Population bottlenecks, however, can increase stochastic variances throughout the genome and may lead to fixation of slightly deleterious mutations (Olson-Manning, et al. 2012).

Overall, as genome-wide data among multiple related species are becoming available, phylogenetic comparative approaches within and between species will thus place us in a stronger position to understand how various evolutionary forces have interacted to shape the heterogeneous patterns of nucleotide polymorphism and divergence across the genome (Hufford, et al. 2012; Cutter and Payseur 2013).

Page 22: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

14

Study system

The genus Populus has emerged as one of the model systems for studying multiple aspects of forest tree biology (Jansson and Douglas 2007), mainly because of its wide geographic distribution, important ecological role in a wide variety of habitats, multiple economic uses in wood and energy products, and relatively small genome size (Eckenwalder 1996; Tuskan, et al. 2006). In this thesis, I studied three Populus species that differ in morphology, geographic distribution, population size, and phylogenetic relationship (Figure 4).

The phylogenetic relationship of the three species [(P. tremula-P. tremuloides) P. trichocarpa] is well established by both chloroplast and nuclear DNA sequences (Hamzeh and Dayanandan 2004; Wang, et al. 2014) (Figure 4a). P. tremula and P. tremuloides (collectively, “aspens”) have wide native ranges across Eurasia and North America, respectively (Figure 4b). They are closely related and belong to the same section of the genus (section Populus) (Jansson, et al. 2010). Based on their morphological similarity and close phylogenetic relationship, they are considered to be sister species, or less commonly, conspecific subspecies (Eckenwalder 1996). In addition, the two aspen species can readily cross and artificial hybrids usually show high heterosis (Tullus, et al. 2012). In contrast, P. trichocarpa belongs to a different section of the genus (section Tacamahaca) that is reproductively isolated from members of the Populus section (Jansson, et al. 2010). The distribution of P. trichocarpa is restricted to western regions of North America (Figure 4b), and its distribution range is considerably smaller than both aspen species (Dickmann and Kuzovkina 2014). Importantly, P. trichocarpa is the first tree species having the genome sequenced (Tuskan, et al. 2006). Its genome sequence and annotation have undergone continual improvement (http://phytozome.jgi.doe.gov), which provides an important context for our genome comparisons.

Page 23: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

15

Figure 4. Study system. (a) The phylogenetic relationship of the three Populus species studied in this thesis, associated with the species that are focused in specific paper of this thesis. (b) Geographic distributions of P. tremula (orange areas), P. tremuloides (blue areas) and P. trichocarpa (green areas) and their sampling localities (black star symbols).

Research aims

The general aim of my thesis is to understand how natural selection has influenced the Populus genome over different timescales. To reach this goal, I used whole genome re-sequencing data from three related Populus species to characterize and compare genomic variation and divergence within and between species. In summary, this thesis consists of four papers and the specific aims for each paper are as follows:

Paper I: To establish bioinformatics pipeline for analyzing next-generation sequencing (NGS) data.

Paper II: To compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rate among three Populus species. To detect and

Page 24: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

16

compare the genomic signatures of both purifying and positive selection among these species, and to determine how selection and linkage have interacted to shape genome evolution in Populus. Finally, to characterize the dominant features driving the differences of linked selection among species.

Paper III: To estimate and infer the divergence time and historical demographic histories of the two aspen species (P. tremula and P. tremuloides), and to further investigate the extent to which neutral and selective forces have contributed to the overall pattern of genomic differentiation between the two species. Finally, to identify the candidate regions and/or genes likely have evolved in response to natural selection during the speciation process.

Paper IV: To obtain an empirical picture of the effects of local adaptation at the genomic level in Swedish populations of P. tremula, and finally to advance our understanding of the genomic architecture of local adaptation.

Page 25: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

17

Methods

The advent of NGS technologies has revolutionized biological disciplines in a variety of ways, and it advances sequence-based genomic research remarkably (Metzker 2010). Almost all NGS-based studies inevitably rely on the accurate detection of variants. Among the different types of variants detected from sequencing data, single nucleotide polymorphisms (SNPs) are likely to be the most abundant type (Li, et al. 2009; Nielsen, et al. 2011). In this thesis, I performed a number of computational analysis steps to translate the raw sequencing data into final high-quality SNPs and genotype calls for downstream analyses (Figure 5), which I explored in detail in Paper I.

Figure 5. Analysis workflow for SNP calling and genotype inference in this thesis.

Leaf samples were collected from 116 individuals of P. tremula (paper I-IV) and 24 individuals of P. tremuloides (paper II, III). Genomic DNA was extracted from these leaf

Page 26: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

18

samples, and paired-end sequencing libraries with insert sizes of 650bp were constructed for all trees. Whole genome re-sequencing with an expected depth of 20 × was performed on the Illumina HiSequation 2000 platform at the Science for Life Laboratory, Stockholm. Thirteen individuals of P. tremula and two individuals of P. tremuloides failed in the sequencing procedures and were then removed from subsequent analyses. In addition, I downloaded short-read Illumina data of 24 P. trichocarpa individuals from the National Center for Biotechnology Information (NCBI) Short Read Archive (SRA) (Table S1 in Paper II), which were selected to have similar read depth as the samples of the two aspen species.

Checking the quality of the raw sequence data is always the first step in NGS analysis pipeline (Altmann, et al. 2012). We used FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) to check the per-base sequence quality and found that quality always dropped toward the end of reads. We thus used the software Trimmomatic (Lohse, et al. 2012) to cut off bases from the start and/or end of reads when the quality values were <20. If the length of the processed reads was reduced to <36 bases after trimming, reads were completely discarded.

After quality control, all paired-end and orphaned single-end reads were mapped to the reference genome using the BWA-MEM algorithm with default parameters implemented in bwa-0.7.10 (Li 2013). In papers II and III, we mapped all reads to the P. trichocarpa genome (version 3.0) (Tuskan, et al. 2006) for comparative studies between species. In paper IV, we mapped all reads to a de novo assembly of the P. tremula genome (available at http://popgenie.org) (Sundell, et al. 2015) for local adaptation study within single species.

Several post-processing steps of alignments were performed to minimize the number of artifacts caused by misalignment. Given that sequence reads are often mapped with mismatching bases in regions with insertions and deletions (indels), we performed local realignment to solve such alignment problems.

Page 27: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

19

We first used the RealignerTargetCreator in GATK (DePristo, et al. 2011) to inspect suspicious-looking intervals that are likely in need of realignment, and then used IndelRealigner in GATK to realign reads over those intervals to get a final consensus alignment. Another commonly encountered issue is PCR duplicates, which are usually introduced during the sequencing library preparation. We used the MarkDuplicates in Picard package (http://broadinstitute.github.io/picard/) to remove orphan reads or read-pairs with identical external coordinates and insert lengths, and to retain only the read(s) with the highest summed base quality for downstream analyses. In addition, to exclude artifacts caused by paralogous or repetitive DNA sequences where reads were poorly mapped to the reference genome, we removed sites with extremely low- (less than an average of 4× per sample) and extremely high- (approximately twice as the mean depth) read depths after investigating the empirical distribution of read coverage. Because mapping quality score of zero is assigned to reads that can be mapped to multiple genomic locations with the same probability (Li, et al. 2009), we removed sites where, on average there were more than one multiple mapped reads per sample. Furthermore, because a short-read alignment is generally unreliable in highly repetitive genomic regions, we filtered out sites that overlapped with known repeat elements as identified by RepeatMasker (Tarailo-Graovac and Chen 2009)

Finally, we employed two complementary approaches for SNP and genotype calling. For those analyses that require accurate SNP and genotype calls, we performed multi-sample SNP calling using HaplotypeCaller implemented in GATK (DePristo, et al. 2011), and single-sample SNP calling using UnifiedGenotyper implemented in GATK (see Papers II-IV for details). However, as some recent studies have pointed out that single- or multiple- sample genotype calling can result in biases in the estimation of site frequency spectrum (SFS) (Nevado, et al. 2014), we employed another method implemented in the software ANGSD (Korneliussen, et al. 2014) to directly estimate the SFS and all other population genetic statistics derived from the SFS without calling genotypes. Later, a number of filtering steps were performed to reduce false positives, e.g. for genotype

Page 28: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

20

calling analyses we removed SNPs with a distance of less than 5 bp away from any indels, or those with high rates of missing genotypes; for direct estimations, only reads with a minimal mapping quality of 30 and bases with a minimal quality score of 20 were used. More details with regard to the filtering can be found in papers I-IV.

Page 29: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

21

Results and Discussion

Paper II: Comparative population genomics among three related Populus species

In paper II, we compared the genome-wide patterns of nucleotide polymorphism, site frequency spectrum and population-scaled recombination rate among three Populus species: P. tremula, P. tremuloides and P. trichocarpa. We found that both aspen species harbored substantial levels of nucleotide diversity throughout the genome (ΘΠ=0.0133 in P. tremula; ΘΠ=0.0144 in P. tremuloides), both of which were more than two times higher than diversity in P. trichocarpa (ΘΠ=0.0059). In accordance with the more extensive geographic ranges and larger census population sizes in both aspen species compared to P. trichocarpa, the higher genetic diversity most likely reflects their larger Ne. In contrast to the patterns of nucleotide diversity, P. tremuloides showed substantially greater negative values of Tajima’s D (average Tajima’s D=-1.169) across the genome than P. tremula (average Tajima’s D=-0.272) and P. trichocarpa (average Tajima’s D=0.064), reflecting large excess of low-frequency polymorphisms segregating in this species. In addition, the three Populus species exhibit different patterns of population-scaled recombination rates (ρ=4Nec, where Ne is the effective population size and c is the recombination rate), where P. tremuloides has the highest rates of recombination (8.42 kbp-1), followed by P. tremula (3.23 kbp-1) and P. trichocarpa (2.19 kbp-1). However, after accounting for the influence of Ne by dividing the nucleotide diversity (θW = 4Neµ, where Ne is the effective population size and µ is the mutation rate), we found that the ratio of recombination to mutation (ρ/θW or c/µ) was smaller in P. tremula (0.22) compared to P. tremuloides (0.39) and P. trichocarpa (0.38). If we assume that the mutation rates are unchanged between species, we could infer that P. tremula has lower recombination rates relative to the other two species, although such inference still deserves further investigation.

Page 30: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

22

Using four-fold synonymous sites as neutral reference, we employed a maximum-likelihood approach implemented in the program DFE-alpha to quantify the efficacy of both purifying and positive selection at zero-fold non-synonymous sites (Keightley and Eyre-Walker 2007; Eyre-Walker and Keightley 2009). We found that a majority of new non-synonymous mutations were strongly deleterious (Nes > 10) in all three species, suggesting that amino acid changes were likely to be under strong purifying selection in Populus. There were lower proportions of amino acid mutations that behave as effectively neutral (Nes < 1) in P. tremuloides (~16%), than in P. tremula (~23%) and P. trichocarpa (~31%). This likely reflects that the efficacy of purifying selection on weakly deleterious mutations is highest in P. tremuloides, lowest in P. trichocarpa and intermediate in P. tremula. This pattern among species is also highly associated with Ne that we have inferred for the three species. Furthermore, we employed an extension of the McDonald-Kreitman test (Eyre-Walker and Keightley 2009) to estimate the proportion (α) and the relative rate (ω) of adaptive substitution at zero-fold non-synonymous sites. We found prevalent adaptive evolution in all three species. In accordance with the pattern of purifying selection, the efficacy of positive selection is also correlated with Ne, where both α and ω are highest in P. tremuloides (α: ~65%, ω: ~0.24), intermediate in P. tremula (α: ~43%, ω: ~0.16) and lowest in P. trichocarpa (α: ~20%, ω: ~0.07).

There are multiple lines of evidence suggesting that genome-wide patterns of variation have been substantially shaped by widespread natural selection rather than neutral evolutionary processes in all three Populus species. First, we found significant positive correlations between levels of neutral diversity and population-scaled recombination rates in all species, even after controlling for confounding variables such as gene density, GC content, mutation rate, and the number of sites covered by the data. This pattern is only expected to occur if natural selection (either purifying or positive selection) is pervasive across the genome, because linked selection is supposed to leave a stronger imprint on patterns of diversity in low-recombination regions than in high-recombination regions

Page 31: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

23

(Begun and Aquadro 1992). The differences in the strength of the association between recombination and levels of neutral polymorphism between species likely reflect differences in Ne (Corbett-Detig, et al. 2015), as we observed substantially stronger signatures of linked selection in the two aspen species compared to P. trichocarpa. In addition to Ne, the impact of linked selection also depends greatly on the local environments of recombination, and in line with this, we observe the strongest signatures of linked selection in P. tremula rather than in P. tremuloides.

Second, we found slightly negative correlations between the ratio of non-synonymous to synonymous polymorphisms (Θzerofold/Θfourfold) and recombination rates in two aspen species. The pattern is likely to be explained by the reduction of the efficacy of purifying selection at eliminating weakly deleterious mutations in regions of low recombination. Given that weakly deleterious mutations are more likely to accumulate at a higher rate in regions of low recombination (Hill and Robertson 1966; Charlesworth, et al. 1993), the reduced efficacy of purifying selection can explain the observed pattern of low gene density in these regions as well.

Third, because genes usually represent the most likely targets of natural selection, regions of higher gene density are expected to have undergone stronger effects of linked selection and exhibit lower levels of neutral polymorphism. We did not, however, observe such correlations in genic regions among species. The obscured signatures of linked selection associated with gene density are likely caused by the positive co-variation between recombination rate and gene density since they have opposite effects on patterns of variation at linked neutral sites (Flowers, et al. 2012; Cutter and Payseur 2013). In contrast, more complex patterns were found in intergenic regions, where levels of intergenic polymorphism were mainly driven by recombination rates in regions of lower gene density, but primarily shaped by gene density in regions with higher density of genes.

Page 32: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

24

Finally, a distinctive signature of recurrent selective sweeps is the reduction of neutral polymorphism in regions experiencing frequent adaptive substitutions. In accordance with this prediction, we found significant negative correlations between levels of synonymous polymorphism (Θfourfold) and the rate of non-synonymous substitution (dzerofold) at the scale of single genes in all three species. This provides strong evidence of recurrent selective sweeps occurring in Populus.

Taken together, compared to the predictions of the neutral theory suggesting that most adaptive mutations contribute negligibly to fixed differences between species, we find that ~20-65% of non-synonymous substitutions are driven to fixation by adaptive evolution in three Populus species. Multiple lines of evidence indicate that natural selection (both purifying and positive selection) has largely shaped genome-wide patterns of neutral polymorphism in Populus. Differences in Ne and rates of recombination between species likely explain different magnitudes of linked selection that we observe between them.

Paper III: Genomic footprints of speciation between two aspen species

In paper III, we used whole genome re-sequencing data from 24 P. tremula and 22 P. tremuloides to study how genomes diverged during the process of speciation, and to characterize the genomic landscape of differentiation between the two species. We found clear population genetic structure between the two species using both a model-based approach and principal component analysis (PCA). Nevertheless, only 1.1% of mutations are fixed differences between species, and 16.7% are shared between species, with the remaining polymorphic sites being private in either P. tremula (32.0%) or P. tremuloides (50.2%). These results suggest that the two aspen species may have diverged too recently to allow complete lineage sorting.

Page 33: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

25

We therefore further characterized the historical demographic processes associated with species divergence. We used fastsimcoal2 (Excoffier, et al. 2013), a coalescent simulation-based method, to infer the past demographic histories of P. tremula and P. tremuloides from two-dimensional joint SFS. We evaluated eighteen divergence models based on the global maximum likelihood estimated from 50 independent runs using the Akaike information criterion and Akaike’s weight of evidence (Excoffier, et al. 2013). The best-fitting model was a simple isolation-with-migration model where, after the two species diverged around 2.3 million years ago (Mya), P. tremula experienced a stepwise population size change whereas exponential growth occurred in P. tremuloides. Migration rates between species in both directions were fairly low, which is not unexpected given their large geographical separation and disjunct distributions (Figure 4b). Our results thus support an allopatric model of speciation for the two species. Furthermore, we employed the multiple sequential Markovian coalescent (MSMC) (Schiffels and Durbin 2014) method to investigate changes of Ne over time in more detail. From this analysis, we found that the two aspen species experienced similar magnitudes of population decline following initial divergence. Population expansion of P. tremuloides occurred around 50,000-70,000 years ago and continued up to the present. In contrast, P. tremula experienced a more extended population contraction and the initiation of the population expansion occurred after the Last Glacial Maximum (LGM), as for many other forest trees in Europe (Hewitt 2000).

In accordance with the expectation of allopatric speciation where limited gene flow between species allows for the accumulation of genetic differentiation by stochastic drift (Coyne and Orr 2004), we detected relatively high levels genomic differentiation (mean FST=0.386) between species. By incorporating the inferred demographic scenarios into “the” null model, we investigated the extent to which demographic and selective events have contributed to the overall patterns of genomic differentiation between P. tremula and P. tremuloides. We found that neutral demographic processes could largely explain the observed patterns of genomic divergence. That said,

Page 34: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

26

a number of regions exhibit significantly (false discovery rate (FDR)<0.01) high and/or low inter-specific genetic differentiation compared to expectations based on the demographic coalescent simulations. These “outlier” regions also show multiple evidence of the action of either positive or balancing selection in both species.

If natural selection has truly been one of the dominant evolutionary forces shaping patterns of genomic differentiation between species, regions of low recombination would be expected to show increased FST values rather than increased dxy values. This occurs because natural selection removes neutral variation over longer distances in low recombination regions. As a consequence, relative measures of divergence (e.g. FST) that rely on intraspecific diversity are expected to be higher in regions of low recombination, whereas increased absolute divergence (e.g. dxy) is only expected if gene flow between species is reduced (Noor and Bennett 2009). Compared to the prediction of heterogeneous gene flow, we observe a significant negative correlation between recombination rate and FST, but not with dxy, in both aspen species. Our results thus highlight the significant effects of recombination and linked selection in generating the inter-specific heterogeneous patterns of genomic differentiation (Nachman and Payseur 2012; Cruickshank and Hahn 2014). Nonetheless, it should be noted that the coalescent simulations performed here are obvious simplifications of the true demographic histories of real populations. Considerable caution is thus needed in this study when interpreting candidate regions and/genes as targets of selection (Akey 2009).

Paper IV: The genomic signatures of local adaptation in P. tremula

In paper IV, we analyzed whole genome polymorphism data of 94 unrelated P. tremula trees sampled from twelve populations spanning latitudinal gradients across Sweden (Figure 4b). We identified a total of 4,432,934 SNPs with a minor allele

Page 35: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

27

frequency (MAF) greater than 5% across all samples after applying stringent criteria for quality control. Both model-based ADMIXTURE analysis (Alexander, et al. 2009) and principal component analysis (PCA) (Patterson, et al. 2006) suggest that genetic population structure is very weak in Swedish populations of P. tremula but that it is arranged along a north-south latitudinal gradient. Such clinal genetic structure corroborates earlier studies showing that the Swedish aspen populations are likely the result of a recent admixture between two distinct postglacial lineages following LGM (De Carvalho, et al. 2010). Although demographic processes such as recent admixture are capable of generating genome-wide patterns of clinal variation, the extremely low genetic differentiation (mean FST=0.0021) among these populations suggests that extensive gene flow within P. tremula has almost eradicated this signal.

We combined three complementary approaches to dissect the genotype-environment-phenotype links for identifying candidate SNPs involved in local adaptation. First, we used PCAdapt (Duforet-Frebourg, et al. 2015) to scan the genome for SNPs that are most strongly correlated to the observed population structure. Second, we used a latent factor mixed-effect model (LFMM) (Frichot, et al. 2013) to test for associations between SNPs and environmental variations, which were represented by the first environmental principal component following a PCA on the 39 environmental variables. Third, we used GEMMA (Zhou and Stephens 2012) to perform genome-wide association study (GWAS) on the timing of bud set, a highly heritable and adaptive trait (Hall, et al. 2007). The significant SNPs identified by the three methods largely overlapped, and we only considered those SNPs identified by at least two of the three methods to be associated with local adaptation. After anchoring the scaffolds of P. tremula to the corresponding chromosomes of the P. trichocarpa genome (Tuskan, et al. 2006, Lin, et al. in preparation), we found that 98.7% of the 910 SNPs identified by all three methods and 88.4% of the 705 SNPs that were identified by two of the three methods mapped to a single region (~700kbp) on chromosome 10. In accordance with recent theoretical predictions of adaptation in the face of migration (Yeaman 2015), our results

Page 36: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

28

provide empirical evidence suggesting that genomic architecture of local adaptation evolving under migration-selection balance is more likely to be enriched for few large-effect or tightly clustered loci rather than many independent small-effect loci. This is most likely because large-effect or tightly clustered loci are more able to overcome the homogenizing effect of migration and are highly swamping resistant, whereas small-effect loci are likely to be prone to swamping and may contribute ephemerally to local adaptation (Yeaman and Whitlock 2011).

We further performed a number of additional analyses to investigate the signature of selection at the candidate region on chromosome 10. First, we compared patterns of allele frequency among populations between the SNPs associated with local adaptation and 10,000 SNPs randomly selected across the genome. Compared to the negligible allele frequency differences between populations for randomly selected SNPs, we observed a steep allele frequency cline for candidate SNPs involved in local adaptation. This provides strong evidence of spatially varying selection operating directly on or near these variants. Second, we found significantly reduced genetic diversity in the candidate region relative to the genome-wide averages across the twelve local populations and all populations combined, which is in accordance with the expectation that positive selection acting on a locus will cause a local reduction in genetic diversity (Smith and Haigh 1974). In addition, we found more substantial reductions of genetic diversity and more derived alleles that were segregating at high frequency in the four northernmost populations (population 9-12). All these results suggest that populations of P. tremula at higher latitudes may have experienced a larger shift in environmental conditions following the postglacial re-colonization of Scandinavia and thus exhibit stronger signals of recent adaptation than do lower-latitude populations (Sedghifar, et al. 2016). Third, we calculated the standardized integrated haplotype score (iHS) (Voight, et al. 2006) for the candidate SNPs in this region and found stronger haplotype homozygosity among trees from high latitude populations than among those from low latitude populations. This provides further evidence supporting the

Page 37: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

29

geographically restricted occurrence of a recent, strong selective sweep in high latitude populations of P. tremula. Finally, we performed two newly developed haplotype statistics, H12 and H2/H1 (Garud, et al. 2015), to examine the relative roles of hard vs. soft selective sweeps involved in local adaptation at the candidate region. We found that most windows surrounding the candidate region exhibited both high values of H12 and H2/H1, suggesting that multiple frequent haplotypes occur across P. tremula populations in this candidate region. In particular, we found that alternate haplotypes rose to high frequency in different latitude regions. The occurrence of multiple adaptive haplotypes in this region are therefore more likely to have arisen from soft selective sweeps involving either multiple adaptive de novo mutations of independent origin or multiple alleles derived from standing genetic variation (Garud, et al. 2015).

A number of genes within the candidate region are known to play key roles in the regulation of growth and dormancy in forest trees. The SNP (Potra001246:25256) showing the strongest signal of local adaptation falls within an intronic region of FLOWERING LOCUS T (FT2, Potra001246g10694), which is a strong candidate gene controlling short-day induced growth cessation and bud set in Populus (Böhlenius, et al. 2006; Hsu, et al. 2011). We also found that trees carrying the derived G-allele for this SNP set bud much earlier than those carrying the ancestral T-allele, likely indicating the associations between the seasonal phenology and genotypes surrounding this SNP.

To conclude, this study links variation from whole genome re-sequencing data with environmental variables of interest and a locally adapted phenotype, to obtain an empirical picture of the effects of local adaptation at the whole-genome level in P. tremula.

Page 38: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

30

Conclusions and perspectives

Although the neutral theory of molecular evolution provides a theoretical basis and foundation for studying evolutionary genomics (Kimura 1968, 1984), rapidly accumulating large-scale genomic data are showing that this claim may not hold in a variety of species (Sella, et al. 2009). Using three Populus species as a study system, this thesis provides a genome-scale view of how different evolutionary forces have shaped patterns of sequence evolution at different timescales. At the same time, the data generated in this thesis also open up the door to a plethora of possibilities for future work.

First, regarding the distribution of fitness effects of new mutations, our results show that the majority of newly arising amino acid mutations are strongly deleterious and are likely to be under strong purifying selection in all three species. However, ~20-65% of amino acid substitutions between species have likely been driven to fixation by adaptive evolution. Our results thus argue that positive selection cannot be ignored in the study of genome evolution in Populus. In support of these arguments, multiple lines of evidence suggest that neutral genetic variation has been extensively shaped by both purifying and positive selection owing to the effects of selection at linked sites. Given the pervasive effects of linked selection that we have documented in these species, more attention should be paid on choosing reliable neutral sites for demographic inference in future studies. In addition, future studies to quantify the strength of positive and purifying selection in noncoding regulatory regions and across gene expression categories will further advance our understanding of the extent and nature of selection in Populus genomes (Hough, et al. 2013).

Next, this thesis examined the relative roles of different evolutionary forces in driving genomic divergence between two aspen species. Our results support an allopatric model of speciation, and suggest that neutral evolutionary processes such

Page 39: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

31

as stochastic genetic drift and demographic history can largely account for the observed pattern of genomic differentiation between the two species. Nonetheless, a few regions exhibit extreme differentiation relative to those expected under coalescent simulation, suggesting that natural selection is still a major evolutionary force generating the heterogeneous landscape of genomic divergence between incipient taxa. Our findings thus highlight that future work should integrate more information on the natural histories of speciation before interpreting the meaning of the observed heterogeneous patterns of differentiation between species. Moreover, although accumulating studies have provided evidence for the action of linked selection in shaping genetic variation within and between species (Cutter and Payseur 2013; Cruickshank and Hahn 2014), we remain uncertain about whether linked selection occurred in the form of positive selection for advantageous variants or purifying selection against slightly deleterious mutations. Future studies are required to quantify the effects and to distinguish the relative contributions of positive and purifying selection, which will facilitate demographic inference and will make it possible to build a null model in scans for targets of adaptation.

Finally, this thesis has characterized the genomic architecture of recent local adaptation in Swedish populations of P. tremula in an unprecedented detail. This may have important implications, not only for tree breeding and genetic improvement programmes (Neale and Kremer 2011), but also for predictions of how trees would response to future climate changes (Savolainen, et al. 2013). We have found that a majority of candidate SNPs (>90%) involved in local adaptation tightly cluster at a single genomic region on chromosome 10. Our results suggest that in the presence of migration, local adaptation tend to favor genomic architectures of few large-effect or tightly clustered loci than many loci of small effect. Nevertheless, the evolution of genomic architecture under migration-selection balance still needs to be studied in more detail, especially because detecting loci with small-effect sizes or with slight frequency changes remains difficult (Rockman 2012). Furthermore, several functional genes are found to be

Page 40: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

32

likely associated with local adaptation. One of the most promising gene is FT2, which has been known to play a key role in the regulation of short-day induced growth cessation in perennial plants (Hsu, et al. 2011). The majority of SNPs showing strong signatures of local adaptation are located in noncoding regions near or within genes, indicating that regulatory variants have probably played more important roles in driving local adaptation in P. tremula (Fraser 2013). A perspective for molecular genetics in future studies is therefore to identify causal variants and genes associated with the adaptive trait, and to further understand the cellular and molecular mechanisms of the traits underlying adaptation.

Acknowledgments

I would like to thank Barbara Giles for valuable comments on this summary. The research in this thesis was supported by the Swedish research council (to Pär K. Ingvarsson) and the JC Kempe Memorial Scholarship Foundation (to Jing Wang). The PhD study of Jing Wang in Sweden was funded by the State Scholarship from China Scholarship council.

Page 41: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

33

References Akey JM. 2009. Constructing genomic maps of positive

selection in humans: Where do we go from here? Genome Res. 19:711-722.

Alexander DH, Novembre J, Lange K. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19:1655-1664.

Altmann A, Weber P, Bader D, Preuß M, Binder EB, Müller-Myhsok B. 2012. A beginners guide to SNP calling from high-throughput DNA-sequencing data. Hum. Genet. 131:1541-1554.

Anderson JT, Willis JH, Mitchell-Olds T. 2011. Evolutionary genetics of plant adaptation. Trends Genet. 27:258-266.

Andolfatto P. 2008. Controlling type-I error of the McDonald–Kreitman test in genomewide scans for selection on noncoding DNA. Genetics 180:1767-1771.

Beaumont MA, Balding DJ. 2004. Identifying adaptive genetic divergence among populations from genome scans. Mol. Ecol. 13:969-980.

Begun D, Aquadro C. 1992. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356:519.

Begun DJ, Holloway AK, Stevens K, Hillier LW, Poh Y-P, Hahn MW, Nista PM, Jones CD, Kern AD, Dewey CN. 2007. Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol. 5:e310.

Böhlenius H, Huang T, Charbonnel-Campaa L, Brunner AM, Jansson S, Strauss SH, Nilsson O. 2006. CO/FT regulatory module controls timing of flowering and seasonal growth cessation in trees. Science 312:1040-1043.

Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR. 2008. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4:e1000083.

Burri R, Nater A, Kawakami T, Mugal CF, Olason PI, Smeds L, Suh A, Dutoit L, Bureš S, Garamszegi LZ. 2015. Linked

Page 42: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

34

selection and recombination rate variation drive the evolution of the genomic landscape of differentiation across the speciation continuum of Ficedula flycatchers. Genome Res. 25:1656-1665.

Bustamante CD, Fledel-Alon A, Williamson S, Nielsen R, Hubisz MT, Glanowski S, Tanenbaum DM, White TJ, Sninsky JJ, Hernandez RD. 2005. Natural selection on protein-coding genes in the human genome. Nature 437:1153-1157.

Campos JL, Halligan DL, Haddrill PR, Charlesworth B. 2014. The relation between recombination rate and patterns of molecular evolution and variation in Drosophila melanogaster. Mol. Biol. Evol. 31:1010-1028.

Charlesworth B. 2010. Molecular population genomics: a short history. Genet. Res. 92:397-411.

Charlesworth B, Campos JL. 2014. The relations between recombination rate and patterns of molecular variation and evolution in Drosophila. Annu. Rev. Genet. 48:383-403.

Charlesworth B, Morgan M, Charlesworth D. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289-1303.

Charlesworth J, Eyre-Walker A. 2006. The rate of adaptive evolution in enteric bacteria. Mol. Biol. Evol. 23:1348-1356.

Corbett-Detig RB, Hartl DL, Sackton TB. 2015. Natural selection constrains neutral diversity across a wide range of species. PLoS Biol. 13:e1002112.

Coyne JA, Orr HA. 2004. Speciation. Sunderland, MA. In: Sinauer Associates.

Cruickshank TE, Hahn MW. 2014. Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol. Ecol. 23:3133-3157.

Cutter AD, Payseur BA. 2013. Genomic signatures of selection at linked sites: unifying the disparity among species. Nat. Rev. Genet. 14:262-274.

De Carvalho D, Ingvarsson PK, Joseph J, Suter L, Sedivy C, MACAYA-SANZ D, Cottrell J, Heinze B, Schanzer I, Lexer C. 2010. Admixture facilitates adaptation from standing variation in the European aspen (Populus tremula L.), a widespread forest tree. Mol. Ecol. 19:1638-1650.

DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, Del Angel G, Rivas MA, Hanna M.

Page 43: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

35

2011. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43:491-498.

Dickmann DI, Kuzovkina J. 2014. Poplars and willows of the world, with emphasis on silviculturally important species, pp. 8-91 in Poplars and Willows: Trees for Society and the Environment, edited by J. D. Isebrands and J. Richardson. The Food and Agriculture Organization of the United Nations and CAB International. Rome

Duforet-Frebourg N, Luu K, Laval G, Bazin E, Blum MG. 2015. Detecting genomic signatures of natural selection with principal component analysis: application to the 1000 Genomes data. Mol. Biol. Evol. 33:1082-1093.

Eckenwalder JE. 1996. Systematics and evolution of Populus. Pp. 7-32 in Biology of Populus and its Implications for Management and Conservation (Part I), edited by R.F. Stettler, H. D. Bradshaw, P. E. Heilman, and T. M. Hinckley. NRC Research Press, Ottawa.

Ellegren H. 2014. Genome sequencing and population genomics in non-model organisms. Trends Ecol. Evol. 29:51-63.

Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M. 2013. Robust demographic inference from genomic and SNP data. PLoS Genet. 9:e1003905.

Eyre-Walker A, Keightley PD. 2009. Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations and population size change. Mol. Biol. Evol. 26:2097-2108.

Fay JC, Wu C-I. 2000. Hitchhiking under positive Darwinian selection. Genetics 155:1405-1413.

Flowers JM, Molina J, Rubinstein S, Huang P, Schaal BA, Purugganan MD. 2012. Natural selection in gene-dense regions shapes the genomic pattern of polymorphism in wild and domesticated rice. Mol. Biol. Evol. 29:675-687.

Fraser HB. 2013. Gene expression drives local adaptation in humans. Genome Res. 23:1089-1096.

Frichot E, Schoville SD, Bouchard G, François O. 2013. Testing for associations between loci and environmental gradients using latent factor mixed models. Mol. Biol. Evol. 30:1687-1699.

Page 44: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

36

Garud NR, Messer PW, Buzbas EO, Petrov DA. 2015. Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps. PLoS Genet. 11:e1005004.

Gossmann TI, Song B-H, Windsor AJ, Mitchell-Olds T, Dixon CJ, Kapralov MV, Filatov DA, Eyre-Walker A. 2010. Genome wide analyses reveal little evidence for adaptive evolution in many plant species. Mol. Biol. Evol. 27:1822-1832.

Haddrill PR, Halligan DL, Tomaras D, Charlesworth B. 2007. Reduced efficacy of selection in regions of the Drosophila genome that lack crossing over. Genome Biol. 8:R18.

Hall D, Luquez V, Garcia VM, St Onge KR, Jansson S, Ingvarsson PK. 2007. Adaptive population differentiation in phenology across a latitudinal gradient in European aspen (Populus tremula, L.): a comparison of neutral markers, candidate genes and phenotypic traits. Evolution 61:2849-2860.

Halligan DL, Oliver F, Eyre-Walker A, Harr B, Keightley PD. 2010. Evidence for pervasive adaptive protein evolution in wild mice. PLoS Genet 6:e1000825.

Hamzeh M, Dayanandan S. 2004. Phylogeny of Populus (Salicaceae) based on nucleotide sequences of chloroplast trnT-trnF region and nuclear rDNA. Am. J. Bot. 91:1398-1408.

Hermisson J, Pennings PS. 2005. Soft sweeps molecular population genetics of adaptation from standing genetic variation. Genetics 169:2335-2352.

Hernandez RD, Kelley JL, Elyashiv E, Melton SC, Auton A, McVean G, Sella G, Przeworski M. 2011. Classic selective sweeps were rare in recent human evolution. Science 331:920-924.

Hewitt G. 2000. The genetic legacy of the Quaternary ice ages. Nature 405:907-913.

Hill WG, Robertson A. 1966. The effect of linkage on limits to artificial selection. Genet. Res. 8:269-294.

Holsinger KE, Weir BS. 2009. Genetics in geographically structured populations: defining, estimating and interpreting FST. Nat. Rev. Genet. 10:639-650.

Hough J, Williamson RJ, Wright SI. 2013. Patterns of selection in plant genomes. Annu. Rev. Ecol. Evol. Syst. 44:31-49.

Page 45: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

37

Hsu C-Y, Adams JP, Kim H, No K, Ma C, Strauss SH, Drnevich J, Vandervelde L, Ellis JD, Rice BM. 2011. FLOWERING LOCUS T duplication coordinates reproductive and vegetative growth in perennial poplar. Proc. Natl. Acad. Sci. USA 108:10756-10761.

Hufford MB, Xu X, Van Heerwaarden J, Pyhäjärvi T, Chia J-M, Cartwright RA, Elshire RJ, Glaubitz JC, Guill KE, Kaeppler SM. 2012. Comparative population genomics of maize domestication and improvement. Nat. Genet. 44:808-811.

Hughes AL, Friedman R, Rivailler P, French JO. 2008. Synonymous and nonsynonymous polymorphisms versus divergences in bacterial genomes. Mol. Biol. Evol. 25:2199-2209.

Jansson S, Bhalerao RP, Groover AT. 2010. Genetics and genomics of Populus, Springer, New York.

Jansson S, Douglas CJ. 2007. Populus: a model system for plant biology. Annu. Rev. Plant Biol. 58:435-458.

Keightley PD, Eyre-Walker A. 2007. Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics 177:2251-2261.

Kimura M. 1968. Evolutionary rate at the molecular level. Nature 217:624-626.

Kimura M. 1984. The neutral theory of molecular evolution. Cambridge University Press.

Korneliussen TS, Albrechtsen A, Nielsen R. 2014. ANGSD: analysis of next generation sequencing data. BMC bioinformatics 15:356.

Kulathinal RJ, Bennett SM, Fitzpatrick CL, Noor MA. 2008. Fine-scale mapping of recombination rate in Drosophila refines its correlation to diversity and divergence. Proc. Natl. Acad. Sci. USA 105:10051-10056.

Lewontin RC. 1974. The genetic basis of evolutionary change. Columbia Univ. Press, New York.

Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. 2009. The Sequence

Page 46: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

38

Alignment/Map format and SAMtools. Bioinformatics 25:2078-2079.

Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, Wang J. 2009. SNP detection for massively parallel whole-genome resequencing. Genome Res. 19:1124-1132.

Lohse M, Bolger A, Nagel A, Fernie AR, Lunn JE, Stitt M, Usadel B. 2012. RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res. 40: W622-W627.

Luikart G, England PR, Tallmon D, Jordan S, Taberlet P. 2003. The power and promise of population genomics: from genotyping to genome typing. Nat. Rev. Genet. 4:981-994.

McDonald JH, Kreitman M. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652-654.

Mejia R. 2012. Natural selection, not mutation: recombination in Drosophila increases diversity. PLoS Biol. 10:e1001423.

Messer PW, Petrov DA. 2013. Population genomics of rapid adaptation by soft selective sweeps. Trends Ecol. Evol. 28:659-669.

Messier W, Stewart C-B. 1997. Episodic adaptive evolution of primate lysozymes. Nature 385:151-154.

Metzker ML. 2010. Sequencing technologies—the next generation. Nat. Rev. Genet. 11:31-46.

Mitchell-Olds T, Schmitt J. 2006. Genetic mechanisms and evolutionary significance of natural variation in Arabidopsis. Nature 441:947-952.

Mitchell-Olds T, Willis JH, Goldstein DB. 2007. Which evolutionary processes influence natural genetic variation for phenotypic traits? Nat. Rev. Genet. 8:845-856.

Nachman MW, Payseur BA. 2012. Recombination rate variation and speciation: theoretical predictions and empirical results from rabbits and mice. Phil. Trans. R. Soc. B. 367:409-421.

Neale DB, Kremer A. 2011. Forest tree genomics: growing resources and applications. Nat. Rev. Genet. 12:111-122.

Nevado B, Ramos-Onsins S, Perez-Enciso M. 2014. Resequencing studies of nonmodel organisms using closely related reference genomes: optimal experimental designs and bioinformatics approaches for population genomics. Mol. Ecol. 23:1764-1779.

Page 47: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

39

Nielsen R. 2005. Molecular signatures of natural selection. Annu. Rev. Genet. 39:197-218.

Nielsen R, Paul JS, Albrechtsen A, Song YS. 2011. Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12:443-451.

Noor MA, Bennett SM. 2009. Islands of speciation or mirages in the desert? Examining the role of restricted recombination in maintaining species. Heredity 103:439-444.

Ohta T. 1992. The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Evol. Syst. 23:263-286.

Ohta T. 1973. Slightly deleterious mutant substitutions in evolution. Nature 246:96-98.

Oleksyk TK, Smith MW, O'Brien SJ. 2010. Genome-wide scans for footprints of natural selection. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 365:185-205.

Olson-Manning CF, Wagner MR, Mitchell-Olds T. 2012. Adaptive evolution: evaluating empirical support for theoretical predictions. Nat. Rev. Genet. 13:867-877.

Patterson N, Price AL, Reich D. 2006. Population structure and eigenanalysis. PLoS Genet. 2:e190.

Pennings PS, Hermisson J. 2006. Soft sweeps II—molecular population genetics of adaptation from recurrent mutation or migration. Mol. Biol. Evol. 23:1076-1084.

Phifer-Rixey M, Bonhomme F, Boursot P, Churchill GA, Piálek J, Tucker PK, Nachman MW. 2012. Adaptive evolution and effective population size in wild house mice. Mol. Biol. Evol. 29:2949-2955.

Prezeworski M, Coop G, Wall JD. 2005. The signature of positive selection on standing genetic variation. Evolution 59:2312-2323.

Pritchard JK, Di Rienzo A. 2010. Adaptation–not by sweeps alone. Nat. Rev. Genet. 11:665-667.

Pritchard JK, Pickrell JK, Coop G. 2010. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Curr. Biol. 20:R208-R215.

Przeworski M. 2002. The signature of positive selection at randomly chosen loci. Genetics 160:1179-1189.

Renaut S, Grassa C, Yeaman S, Moyers B, Lai Z, Kane N, Bowers J, Burke J, Rieseberg L. 2013. Genomic islands of

Page 48: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

40

divergence are not affected by geography of speciation in sunflowers. Nat. Commun. 4:1827.

Rockman MV. 2012. The QTN program and the alleles that matter for evolution: all that’s gold does not glitter. Evolution 66:1-17.

Sabeti P, Schaffner S, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen T, Altshuler D, Lander E. 2006. Positive natural selection in the human lineage. Science 312:1614-1620.

Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ. 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419:832-837.

Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R. 2007. Genome-wide detection and characterization of positive selection in human populations. Nature 449:913-918.

Savolainen O, Lascoux M, Merilä J. 2013. Ecological genomics of local adaptation. Nat. Rev. Genet. 14: 807-820.

Schiffels S, Durbin R. 2014. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46:919-925.

Sedghifar A, Saelao P, Begun DJ. 2016. Genomic Patterns of Geographic Differentiation in Drosophila simulans. Genetics 202: 1229-1240.

Sella G, Petrov DA, Przeworski M, Andolfatto P. 2009. Pervasive natural selection in the Drosophila genome? PLoS Genet. 5:e1000495.

Sequencing TC, Consortium A. 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437:69-87.

Slotte T, Foxe JP, Hazzouri KM, Wright SI. 2010. Genome-wide evidence for efficient positive and purifying selection in Capsella grandiflora, a plant species with a large effective population size. Mol. Biol. Evol. 27:1813-1821.

Smith JM, Haigh J. 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23:23-35.

Smith NG, Eyre-Walker A. 2002. Adaptive protein evolution in Drosophila. Nature 415:1022-1024.

Page 49: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

41

Stapley J, Reger J, Feulner PG, Smadja C, Galindo J, Ekblom R, Bennison C, Ball AD, Beckerman AP, Slate J. 2010. Adaptation genomics: the next generation. Trends Ecol. Evol. 25:705-712.

Storz JF. 2005. Using genome scans of DNA polymorphism to infer adaptive population divergence. Mol. Ecol. 14:671-688.

Strasburg JL, Kane NC, Raduski AR, Bonin A, Michelmore R, Rieseberg LH. 2011. Effective population size is positively correlated with levels of adaptive divergence among annual sunflowers. Mol. Biol. Evol. 28:1569-1580.

Sundell D, Mannapperuma C, Netotea S, Delhomme N, Lin YC, Sjödin A, Van de Peer Y, Jansson S, Hvidsten TR, Street NR. 2015. The plant genome integrative explorer resource: PlantGenIE. org. New Phytol. 208:1149-1156.

Tajima F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595.

Tarailo-Graovac M, Chen N. 2009. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics 4: 1-4.

Teshima KM, Coop G, Przeworski M. 2006. How reliable are empirical genomic scans for selective sweeps? Genome Res. 16:702-712.

Tullus A, Rytter L, Tullus T, Weih M, Tullus H. 2012. Short-rotation forestry with hybrid aspen (Populus tremula L. × P. tremuloides Michx.) in Northern Europe. Scand J Forest Res. 27: 10-29).

Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A. 2006. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596-1604.

Vitti JJ, Grossman SR, Sabeti PC. 2013. Detecting natural selection in genomic data. Annu. Rev. Genet. 47:97-120.

Voight BF, Kudaravalli S, Wen X, Pritchard JK. 2006. A map of recent positive selection in the human genome. PLoS Biol. 4:e72.

Wang Z, Du S, Dayanandan S, Wang D, Zeng Y, Zhang J. 2014. Phylogeny reconstruction and hybrid analysis of Populus (Salicaceae) based on nucleotide sequences of multiple

Page 50: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

42

single-copy nuclear genes and plastid fragments. PloS One 9:e103645.

Weigel D, Nordborg M. 2015. Population genomics for understanding adaptation in wild plant species. Annu. Rev. Genet. 49:315-338.

Welch JJ. 2006. Estimating the genomewide rate of adaptive protein evolution in Drosophila. Genetics 173:821-837.

Wright S. 1949. The genetical structure of populations. Ann. Eugenics 15:323-354.

Wright SI, Andolfatto P. 2008. The impact of natural selection on the genome: emerging patterns in Drosophila and Arabidopsis. Annu. Rev. Ecol. Evol. Syst. 39:193-213.

Wright SI, Gaut BS. 2005. Molecular population genetics and the search for adaptive evolution in plants. Mol. Biol. Evol. 22:506-519.

Wright SI, Lauga B, Charlesworth D. 2002. Rates and patterns of molecular evolution in inbred and outbred Arabidopsis. Mol. Biol. Evol. 19:1407-1420.

Yang Z. 1998. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15:568-573.

Yang Z, Bielawski JP. 2000. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15:496-503.

Yeaman S. 2015. Local Adaptation by Alleles of Small Effect*. Amer. Nat. 186:S74-S89.

Yeaman S, Whitlock MC. 2011. The genetic architecture of adaptation using migration-selection balance. Evolution 65:1897-1911.

Zhou X, Stephens M. 2012. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44:821-824.

Page 51: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

43

My gratitude

The last four years of doing my PhD in Umeå have been a great experience!

There are many people to whom I would like to express my deepest gratitude. First of all, I want to thank my supervisor Pelle for all his guidance, support and encouragement during the last years. He gives me a lot of space to let me find my way to do scientific research. At the same time, he is always there with an open door and a positive attitude when needed. I have really enjoyed every moment of working with him!

I also want to thank my co-supervisor, Nat. He has taught me a great deal about bioinformatics, and he always helps me in a very kind and friendly way no matter how busy he is. Many thanks also to Doug. He seems to be my second co-supervisor. I am impressed by his extensive and broad knowledge of bioinformatics, computer science and biology. He was always patient and gave me good advice when I needed it. Even through he moved to Uppsala in 2013, he still provided many good comments for the manuscripts on which we have collaborated.

Thanks to the current and previous members of the Evolutionary Ecology group for all their help and valuable discussions. Biyue, Jin and Junwen, who are not only my academic family members but also are very good friends in my life. Carolina, Amarllyis, Amanda and Bettan, it has been fun working together with them and I really appreciate all their support and help. Xiao-Ru, Barbara and Folmer, they are excellent teachers making a great study atmosphere in our evolutionary group. I would also like to thank other members for scientific discussion in the Friday’s journal club: Baosheng, David, Alex, Wei, Jingxiang, Xavier, JungKoo, Tomas, Xi.

Page 52: The genomic signatures of adaptive evolution in Populus927986/FULLTEXT01.pdf · The impact of recombination and other population genetic factors on adaptive evolution 11 Study system

44

Many thanks also to my collaborators on aspen genome project: Nicolas, Yao-Cheng, Bastian, Niklas, David, Manfred, Torgeir, Chanaka. I would also thank Tiggy for collecting environmental variables used in the last paper of this thesis. Thanks Anna from WABI of SciLifeLab for providing us with bioinformatics support for detecting structural genomic variation.

A big thank to the entire staff at EMG for creating the amazing atmosphere of our department.

Finally, special thanks to my lovely family. Mama and Papa, they support me in all different ways. Everyday Facetime call with them provides me the biggest boost in motivation and energy. I would also like to thank my husband, Junyang, for accompanying me everyday and always making me happy.