THEME – 4 Genomic diversity of domestication in soybean
description
Transcript of THEME – 4 Genomic diversity of domestication in soybean
Genomic diversity of domestication in soybean
Institute of Crop Science
Chinese Academy of Agricultural Sciences
Li-juan Qiu
International Workshop on “Applied Mathematics and Omics Technologies for
Discovering Biodiversity and Genetic Resources for Climate Change Mitigation
and Adaptation to Sustain Agriculture in Drylands”
IAV Hassan II – Rabat - Morocco, 24-27 June 2014
1. Background
2. Genetic diversity of G. soja and G. max
3. Pan-genome of G. soja
4. Genomic variation between G. sojas and GmaxW82
5. Selection genes during domestication
Outline
Glycine
Soja
Glycine 26 perenial wild species (mainly in Australia)
Annual wild soybean (G.soja) (East Asia)
Cultivated soybean (G. max) (Worldwide)
Leguminosea, Papilollateae, Glycine
1. Background
G. soja
G. max
Landrace
G. max
Modern Cultivars
Domestication
Improvement
Glycine soja - the wild relative of cultivated soybean G. max
S ec ondary G ene P oolG P -2
unknown
Tertiary Gene Pool
GP-3
Wild perennial species
Tertiary Gene Pool
GP-3
Wild perennial species
From: Harlan and deWet (1971)
Two bottlenecks: domestication and breeding
G. soja
G. max vs G. soja
Plant
— Plant height
— Growth habit
Seed
— Size
— Color
— Pod dehiscence
Physiological
trait
— Protein content
— Oil content Modern cultivar
Genetic variation controlled the difference
The variation of soybean genome during domestication
Genetic variation, e.g. SNP, InDel, PAV, CNV
Domestication trait related gene
The genetic variation between
wild and cultivated soybean ?
Domestication related traits
The genetic variation between
wild and cultivated soybean ?
Domestication related traits
The history of soybean cultivation are more than
4500 years since agricultural ancestor Houji, who
planted five crops including soybean.
According to word record, the earlist name of
soybean was “shu” in “The Book of Odes”.
The other languages of soybean in the world are
was translated from the “shu”.
Cultivated soybean is native to China
China owns the most of soybean germplasms
More than 170,000 soybean accessions are in germplasm
collections. Among them, 45,000 accessions are unique
(Carter et al. 2004)
More than 23,000 cultivated and 7,000 soybean accessions
are conserved in Chinese National Gene bank (CNGB).
Constructing different level of core collections
Qiu et al 2003,Scientia Agricultura Sinics; Qiu et al 2009, PMB 2013
Core collection: represent the genetic diversity of a crop species
and its relatives with a minimum of repetitiveness
Primary
core
collection
Basic
collection
Core
collection
AAAABBBB
CCCCDDDDEEEE
FFFGGGHHH
AABB
CCDDEE
FFGGHHH
ABCEFGH
Primary core collection
(2794)
Basic collection
(23587)
Location
Phenotype
Phenotype
Genotype
Core collection
in the different level
(248; 433…)
Methods
Methods
Primary
core
collection
Basic
collection
Core
collection
AAAABBBB
CCCCDDDDEEEE
FFFGGGHHH
AABB
CCDDEE
FFGGHHH
ABCEFGH
AAAABBBB
CCCCDDDDEEEE
FFFGGGHHH
AABB
CCDDEE
FFGGHHH
ABCEFGH
Primary core collection
(2794)
Basic collection
(23587)
Location
Phenotype
Phenotype
Genotype
Core collection
in the different level
(248; 433…)
Methods
Methods
Primary core collection
(2794)
Basic collection
(23587)
Location
Phenotype
Phenotype
Genotype
Core collection
in the different level
(248; 433…)
Methods
Methods
The primary division of genetic diversity was between the
wild and domesticated accessions.
G. soja and G. max represent distinct germplasm pools.
B
A
SSR SSR+SNPSNP
C R K JNER NR HR SR
G.max G.soja
NER NR HR SR C R KJ
G.max G.soja
NER NR HR SR C R KJ
G.max G.soja
K=2
K=3
K=4
K=5
K=6
K=2
K=3
K=4
K=5
K=6
B
A
SSR SSR+SNPSNP
C R K JNER NR HR SR
G.max G.soja
NER NR HR SR C R KJ
G.max G.soja
NER NR HR SR C R KJ
G.max G.soja
K=2
K=3
K=4
K=5
K=6
K=2
K=3
K=4
K=5
K=6
A
SSR SSR+SNPSNP
C R K JNER NR HR SR
G.max G.soja
NER NR HR SR C R KJ
G.max G.soja
NER NR HR SR C R KJ
G.max G.soja
K=2
K=3
K=4
K=5
K=6
K=2
K=3
K=4
K=5
K=6
2. Differentiation between G.soja and G. max
S HH N NE Russia Korea Japan
99 SSR
554 SNP
SSR+SNP
S HH N NE Russia Korea Japan
S HH N NE Russia Korea Japan
99 SSR
554 SNP
SSR+SNP
Li et al. New Phytologist, 2010; Li et al. Theor Appl Genet, 2008
1863 landraces; 59 SSR 112 wild soybean; 99 SSR, 554 SNP
Population structure within species is accordance with
geographic origin in cultivated and wild soybeans respectively
Genetic diversity was remarkable decreased after domestication
Li et al. (2010) New Phytologist
Cultivated
Wild Hyten et al. (2006) PNAS
Accessions:
26 G. soja
94 G. max
Molecular data:
111 fragments from 102 genes
Accessions:
92 G. soja
279 G. max
Molecular data:
554 SNP markers
99 SSR markers
1807
Wild
0.871
1473
Cultivated
0.687 78.3%
81.5%
From Schmutz et al., Nature 2010; 463:178-183
The development of
sequencing technique
Cultivated soybean reference genome
Gmax W82
As an important source of genetic diversity, gene repertoire in G. soja remains largely unexplored
Pan-genome: The set of all genes present in the
genomes of a group of organisms
3. Pan-genome of G. soja
From: Morgante et al. Current Opinion in Plant Biology 10, 149-155 (2007)
Core genome: shared among individuals.
Dispensable genome: an individual-specific or partially-shared
among individuals.
Why pan-genome ?
Li et al. New Phytologist, 2010
The largest component of variation (~75%) was among
individuals within population
A single genome sequence might not reflect the entire genomic
complement of a species
AMOVA
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91
GsojaC GsojaB GsojaA GsojaG GsojaE GsojaD
Seven representative wild soybean (New Phytologist, 2010)
China: Northeast, North,
Huanghuai and South regions
Other countries:
Japan, Korea, Russia
Three libraries:
180bp, 500bp, 2kbp
Data
817Gb, 111.9 X in average
Pan-genome of annual wild soybean
ID GsojaA GsojaB GsojaC GsojaD GsojaE GsojaF GsojaG
Predicated
genome size (Mbp) 981.0 1000.8 1053.78 1118.34 956.43 992.66 889.33
Assembled
geneome
size(Mbp)
813 895 841 985 920 886 878
Contig N50 (Kbp)* 9 22.2 8 11 27 24.3 19.2
Scaffold N50 (Kbp) 18.3 57.2 17 48.7 65.1 52.4 44.9
No. of genes
predicated 58,756 56,655 60,377 62,048 58,414 57,573 58,169
No. of genes
confirmed 55,061 54,256 56,542 57,631 55,901 54,805 54,797
Number of predicated genes: average 55,570 genes/ genome
RNA-Seq validation: 67.3% of predicated genes
Summary of data and assembly
The pan-genome is dynamic and a single genome does not adequately represent the diversity of the species
The number of total genes
increased as additional
genomes were added and
the no of core genes
decreased
The average pan-genome
size of any two accessions
accounted for 78.2% of that
found using all seven
accessions
Pan-genome
Core: 48.6% of genes and 80.1% of genome sequence
Dispensable: 51.4% of genes and 19.9% of genome sequence
59,080 genes Genome size: 986.3 Mbp
Pan-genome of annual wild soybean
8.86/kb
19.93/kb
The dispensable gene set was more variable than the core
gene set, both structurally and functionally.
The dispensable genes have experienced weaker purifying
selection and evolved more quickly than core genes
Core genome vs. dispensable genome
58.3% of the dispensable could not be assigned any
functional annotation versus 33.9% for the core genes set.
95.5% of core genes had homologs in other species
based on blast searches to 32 plant genomes (excluding
soybean), significantly more than the dispensable gene
set (83.5%, chi-square test, p< 0.01).
lineage-specific genes evolved faster than genes that were
shared between species, either via a higher evolutionary rate
or a higher gene loss rate
Core genes were more functionally conservative among plant species than dispensable genes
Evolution of the G. max /G. soja species complex
G. soja diverged from G. max more than 0.8 mya
Nearly 3 times older than a previous estimate of 0.27 mya
based on re-sequencing of a single G. soja genome
670 conserved single-copy
gene orthologs
4.Genomic variation between G. soja and GmaxW82
SNPs: 3.63~4.72 million
Indels: 0.50~0.77 million
Structure var: CNV, PAV
Thousands of genes
affected by above
variations, some of
which may be useful for
future crop improvement.
G.soja vs G.max: Genomic basis of agronomic traits
photosensing and light signaling coordinately
controlsling
flowering
Two 3nt-indel and 9 non-synonymous
SNP; two variation hotpots
photosensing and light signaling coordinately
controlsling
flowering
Two 3nt-indel and 9 non-synonymous
SNP; two variation hotpots
G. max
G. soja
Re-sequencing*
1 G.soja+1 G.max
Re-sequencing #
25 G.soja+30 G.max
De novo sequencing
7 G.soja+1 G.max
?712???19.6M?250M
33816117972615M85M480M510M
?????70M?510M
?712???19.6M?250M
33816117972615M85M480M510M
?????70M?510M
#: From Li et al. BMC Genomics, 2013; *: FromKim et al. PNAS, 2010
G.soja-
specific
G.max-
specific
CNV-lossCNV-gainLarge
InDel
(5-100bp)
Small
InDel
(1-5bp)
SNP missed
in Re-seq
SNP G.soja-
specific
G.max-
specific
CNV-lossCNV-gainLarge
InDel
(5-100bp)
Small
InDel
(1-5bp)
SNP missed
in Re-seq
SNP
Specific variations identified in this comparison
9 SNPs in a 62bp fragment
More SNPs were found by assembly-based method
10 million SNPs, two time of SNPs identified by re-sequencing (Li et al. BMC Genomics, 2013)
New SNP mostly from divergent regions where assembled
sequences could be aligned and short sequencing reads are
difficult to be mapped
Copy number variation: 1978 genes
1179 loss
726 gain
73 gain and loss
Category: G. soja > G.max
Number: G. max > G. soja
R genes
>100 bp and <95% identity
PAV sequence: 30.3 Mb
G. soja specific: 11.3 Mb
G. max specific: 19 Mbp
PAV gene:354
G. soja specific: 338
G. max specific:16
PAV: 24.3% of involved
in defense response
Gs1-3: biotic and abiotic
stress tolerance or plant
development
56 resequencing
accession: frequency G.
soja> G. max
Gs1 Gs2 Gs3
8kb
Wild
Culitvated
1
2
3
4
5
1
2
3
4
5
Population bottleneck or artificial selection will result in the fixation
of alleles during domestication
5. Selection genes during domestication
G. Soja
Landrace
Elite cultivar
25 accessions 93.55Gb 98.2%Glyma1.01
31 accessions (Lam et al. 2010)
17 G. soja
14 G. max
25 accessions
Total: 5,102,244 SNPs
Special: 25.5%
specific to our accessionsspecific to our accessionsLi et al. BMC Genomics, 2013
0
10
20
30
40
50
60
Gm
01
Gm
02
Gm
03
Gm
04
Gm
05
Gm
06
Gm
07
Gm
08
Gm
09
Gm
10
Gm
11
Gm
12
Gm
13
Gm
14
Gm
15
Gm
16
Gm
17
Gm
18
Gm
19
Gm
20
0
20
40
60
80
100
120
140
No. of region No. of genes
No.
of
regio
ns
No.
of
genes
394 regions: 1.47% of the whole genome (950M)
928 genes: 2.0% of 46,430 predicted genes
θπ(cultivated/wild) , Tajima’s D values, FST
20 Kb sliding window (2Kb step-size).
Artificial Selection
The distribution of selection regions were not random or even
uniform throughout the genome
Appeared to be apparent clusters in certain genomic regions
Gm08
Gm12
Similar to the distribution pattern of QTLs underlying domestication
related traits (Ross-Ibarra, Genetics of Adaptation, 2005)
A homolog of the domestication gene Grain Incomplete Filling 1
(GIF1) in rice
GIF1 encodes cell-wall invertase that regulates sugar levels
to meet with the demands of cell division and growth during
the grain development.
Increased grain size and weight in transgenic rice
From: Wang et al. Nat Genet, 2008
Selection gene: Glyma03g35520.1
GmTfl1 (Glyma19g37890.1): Tian et al. 2010; Liu et al. 2010
gDNA cDNA
θ π θ π
GmTfl1 Glyma19g37890.1
Elite cultivars 1.86 0.98 0.98 0.52
Landraces 1.78 1.05 1.78 1.61
G. soja 1.65 1.28 0 0
Glyma03g35250.1
G. max (89) 0 0 0 0
Elite cultivars 0 0 0 0
Landraces 0 0 0 0
G. soja (20) 0.66 0.73 0.85 0.54
The homolog of Glyma03g35250.1 in sunflower experienced
selective sweeps during evolution (From Blackman et al. 2011).
Selection gene: Glyma03g35250.1
Confirmed some regions or genes
• 100-seed weight: QTL by Yan et al 2014 Plant Breeding, 2014
Type No. of
SNPs
No. of
haplotype
Haplotype
diversity
Total 72 32 0.762
G.soja 71 28 0.952
Landrace 29 5 0.568
Elite 3 4 0.552
Total Wild
Landrace Elite
Black
Diverse
color
Yellow
G. soja
Landrace
Elite
cultivar
CHS1, CHS3, CHS4, CHS5, and CHS9
Multiple-allele I locus
Soybean seed coat color
0
1
2
3
4
5
6
-40000 10000 60000 110000 160000
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
The hierarchical genetic structure of soybean landraces was
reflected with the geographic region.
A pan-genome was constructed by de novo sequencing and
assembling seven G. soja accessions.
Inter-genomic comparisons identified up to 3,000 lineage-
specific genes and genes with CNV, PAV or large-effect
mutations, some of which may contribute to variation of
agronomic traits such as resistance, seed composition,
flowering time, biomass etc.
A set of candidate genes significantly affected by selection for
preferred agricultural traits underlying soybean domestication
were identified and some genes were confirmed.
These results will facilitate the harnessing of untapped genetic
diversity from wild soybean for developing elite cultivars.
Summary
Funding:
National Natural Science Foundation of China State Key Basic Research
and Development Plan of China (973)
National Key Technologies R&D Program in the 11th Five-Year Plan (863)
Acknowledgments
Novogene
Prof. Ruiqiang Li
Guangyu Zhou
Wenkai Jiang
Zhouhuao Zhang
University of Georgia
Prof. Scott A. Jackson
Purdue University
Dr. Jianxin Ma