Post doctoral Associate Cornell University ...

Post on 05-Dec-2021

4 views 0 download

Transcript of Post doctoral Associate Cornell University ...

Fei LuPost‐doctoral Associate

Cornell Universityhttp://www.maizegenetics.net

Genotyping by sequencing (GBS) is simple and cost effective

..... .

.......... .......

................... ..

..... .

... ....

. .....

.

........ ..

.....

.... .................

. ......... ..

.......

.. .........

.

........... .....

1. Digest DNA 2. Ligate adapters

with barcodes3. Pool DNAs 4. PCR

5. Illuminasequencing

(Elshire et al. 2011. PLoSone)

(Altshuler et al. 2000. Nature)

Reduced representation library approach500,000 reads/sample 

(384 plex)

Universal Network Enabled Analysis Kit (UNEAK) 

A reference free SNP calling pipeline

Designed for species that…. lack a reference genome are diploid or polyploid are inbreeders or outcrossers have limited genetic or genomic resources

Overview of UNEAK

Genome is digested, sequenced using GBS

Reads are trimmed to 64 bp

Identical reads = tag

A

B

Overview of UNEAK – Network filter

Pairwise alignment  to findtag pairs with  1 bp mismatch

Topology of tag networks

Keep common reciprocal tags

C

E

D

errorreal tags

F

Build tag networks

count

Topology of tag networksTagError

Plastid &Highly repetitivetags

Moderatelyrepetitive tags,Paralogs &SNPs

Networks of 2496 tags

Details about network filter

SNPError tolerance

Program flowchart of UNEAKFastq/Qseq

TagCount

Networkfilter

TagPair TBT(Byte/Bit)

MapInfo

Optionalfilters

HapMap TagPair (Long, Long, Integer)Seq,   Seq,   Order

MapInfo includes:•SNP•Seq•Count•Count distribution•Heterozygote code

Pipeline validated with maize inbred linkage population –network filter

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

00.08 0.16

0.24

0.32 0.4

0.48

0.56

0.64

0.72 0.8

0.88

0.96Pr

opor

tion

of S

NPs

Allele frequency

Single‐site rate(Blast to maize)

Allele frequencydistribution

00.0050.010.0150.020.0250.030.0350.040.0450.05

00.07 0.14

0.21

0.28

0.35

0.42

0.49

0.56

0.63 0.7

0.77

0.84 0.91

0.98Pr

opor

tion

of S

NPs

Allele frequency

23.3% 85.0%

Step 1 Pairwise alignment of tags

Step 2Network filterEvaluation

criteria

Pipeline validated with maize inbred linkage population –SNP validation LD distribution (SNPs against 1106 markers)

Alignment (B97 tags against B97 shotgun genome from Maize HapMap2 data) 93.2% SNPs are polymorphic

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

00.04

0.08 0.12

0.16 0.2

0.24

0.28

0.32

0.36 0.4

0.44

0.48

0.52

0.56 0.6

0.64

0.68

0.72

0.76 0.8

0.84

0.88

0.92

0.96

Pro

portion of SNPs

LD (r2)

0.2

92.2%

Characterization of the Genetic Diversity of Switchgrass Using Genotyping by Sequencing 

(GBS)

GWAS and GS require high‐density markers to accelerate breeding

SNP discovery

Genome Wide Association Study (GWAS) Genomic Selection (GS) 

Accelerate switchgrass breeding

Challenges and goals Challenges No reference genome Multiple ploidy levels (4X, 6X and 8X) Highly heterozygous

Goals Discover high‐density  SNPs Construct linkage disequilibrium (LD) map  Evaluate population structure  Reconstruct phylogeny 

Switchgrass data setLinkage Populations  Association Populations 

• Full‐sib Populationn=130 individuals

• Half‐sib Populationn=168 individuals

66 diverse populations• Mostly northern‐adapted,Upland populations and cultivars

n= 540 individuals

350 GB sequence                    1.2 million SNPs generated!

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

Prop

ortio

n of

SN

Ps

Allele frequency

Allele frequency in full-sib population

Tetraploid switchgrass behaves like a diploid

1:3 1:1 3:1AA×Aa AA×aa

Aa×Aaaa×Aa

F1Most informative markers to construct linkage map

50,000 SNPs

18 Linkage groups perfectly match the chromosome number of switchgrass

Correlation of linkage groups

Can we order the SNPs?

Yes, use synteny

R

3,000 high coverage SNPs

Linkage groups perfect match to syntenicchromosomes of Foxtail millet (Setaria italica)

Small (490 Mb) genome, diploid,  n=9  13 million years divergent from switchgrass 10% switchgrass SNPs map to foxtail millet genome Constructed two framework linkage maps of 18 groups (3,224 and 4,001 

markers) 42K paternal map and 47K maternal map

Linkage grou

ps of switchgrass

Chromosomes of foxtail millet

Upland and lowland ecotypes clearly separate in phylogeny

Upland

Lowland

Detail

Jackson, MI

Hansens Island, MI

Tipton, IN

Fillmore, MN

Genesee, MN

Ipswich prairie, WI

Ipswich prairie, WI

WS4U

Ploidy level resolves into distinct groups

Upland 8X

Upland 4X

Upland 8X

Lowland 4XLowland 4X

Upland 8X

Ploidy level identified by flow cytometry (Costich et al. 2010. Plant Genome)

Geography shows isolation by distance

Upland 4X North

Upland 8XEast

Lowland 4XNortheast

Upland 8XSouth

Upland 8XWest

Lowland 4XSouth

Upland 4X arose from Upland 8X

NJ tree using 3,000 markers

Foxtail millet(outgroup)

Upland

Lowland96

58

100

100

10087 16

15

66

61

Upland 8X West

Upland 4X North

Upland 8X East

Lowland 4X Southeast

Lowland 4X South

Upland 8X South

ba

NJ tree using 29,221 markers

Reduced diversity in Upland 4X compared with Upland 8X

Upland 8X West

Upland 4X North

Upland 8X East

Upland 8X South

Coo

rdin

ate

2

Coordinate 1

0.0

0.2

0.4

-0.2

-0.4 0.0 0.2-0.2 0.4

MDS plot

Migration paths of switchgrass

Upland 4X North

Upland 8XEast

Lowland 4XNortheast

Upland 8XSouth

Upland 8XWest

Lowland 4XSouth

Summary Effective SNP calling pipeline is developed It works well for non‐reference, heterozygous, and polyploid species

1.2 million high density SNPs discovered for GWAS Tetraploid switchgrass behaves like a diploid Synteny based SNP maps constructed Robust phylogeny concurs well with ecotype, ploidylevel and geographic distribution of switchgrass

Data suggests that Upland 4X arose from Upland 8X

Future DirectionPutting it all together: GWAS and GS

• Flowering time• Plant height• Leaf length and width• Standability• Biomass quality traits

Linkage populations Association populations

Caldwell Field, Cornell U, Ithaca, NY

AcknowledgementsProject Manager:Denise Costich (USDA‐ARS, Cornell )

PIs:Edward Buckler (USDA‐ARS, Cornell)Michael Casler (USDA‐ARS, UW‐Madison)Jerome Cherney (Cornell)

Bioinformatics:Dallas Kroon

Supported by DOE (including JGI), USDA, and NSF

Institute for Genomic Diversity  (Cornell)

Sequencing:Rob ElshireJeff GlaubitzWenyan ZhuMoira Sheehan

Statistics:Alex Lipka

Field:Ken PaddockNick LepakNick Kaczmar