Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf ·...

50
Genotyping By Sequencing (GBS) Method Overview Sharon E. Mitchell Institute for Genomic Diversity Cornell University http://www.maizegenetics.net/

Transcript of Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf ·...

Page 1: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Genotyping By Sequencing (GBS) Method Overview

Sharon E. MitchellInstitute for Genomic Diversity

Cornell University

http://www.maizegenetics.net/

Page 2: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

•Background/Goals

•GBS lab protocol

•Illumina sequencing review

•GBS adapter system

•How GBS differs from RAD

•Modifying GBS for different species

•GBS Workflow

Topics Presented

Page 3: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Background

Genotyping by sequencing (GBS) in any large genome species requires reduction of genome complexity.

II.  Restriction Enzymes (REs)I.  Target enrichment

•Long range PCR of specific genes or genomic subsets

•Molecular inversion probes

•Sequence capture approaches   hybridization‐based (microarrays) 

*Technically less challenging*

• Methylation sensitive REs filter out repetitive genomic fraction

Page 4: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

QTL are often located in non‐coding regionsVgt1, Tb, B regulatory regions 60‐150kb from gene

Exon Exon Exon Exon

Exon captureMap large numbers ofgenome‐wide markers

QTL

Page 5: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Create inexpensive, robust multiplex 

sequencing protocol

Low‐coverageSequencing Illumina GA

Informatics Pipelin

Anchor markers across the genomeImpute missing data

Combine genotypic & phenotypic data for GS and GWAS

Our goal is to create a public genotyping/informatics platform based on next‐generation sequencing

Page 6: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Open Source

• Method available for anyone to use / modify.• Analysis pipeline details and code are public.• Promote dataset compatibility.• Method published in PLoS ONE to promote accessibility.

• Genotype calls publically available.

Page 7: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

< 450 bp

Restriction site(   ) sequence tag

Loss of cut site

Sample1

Overview of Genotyping by Sequencing (GBS)

• Focuses NextGen sequencing power to ends of restriction fragments• Scores both SNPs and presence/absence markers

Sample2

Page 8: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

• Reduced sample handling• Few PCR & purification steps• No DNA size fractionation• Efficient barcoding system• Simultaneous marker discovery & genotyping

• Scales very well

GBS is a simple, highly multiplexed system for constructing libraries for next‐gen sequencing

Page 9: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

GBS 96‐plex Protocol(http://www.maizegenetics.net/gbs‐overview)

1. Plate DNA & adapter pair

Now at 384‐plex

Page 10: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

GBS Adapters and Enzymes

BarcodeAdapter

“Sticky Ends”

Barcode(4‐8 bp)

CommonAdapter

P1 P2

ApeKI G CWGCPstI CTGCA GEcoT22I    ATGCA T

5’ 3’

Restriction Enzymes

Illumina Sequencing Primer 2 

Illumina Sequencing Primer 1 

Page 11: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

GBS 96‐plex Protocol(http://www.maizegenetics.net/gbs‐overview)

............. ...

..... ..................... ........... ..

..

. .. ...

.

...

..... ......... .. .........

.....

...

. .. ....... ......

...

.. .......

...

. .

..

.... . .. . ....

1. Plate DNA & adapter pair

5. PCR

Primers

2. Digest DNA with RE3. Ligate adapters

Clean‐up4. Pool 96 samples

Page 12: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

. .

..

..

.

....

.. ...

...........

.

.

.. ........

.

.

. ... ...

..

.

.

..

.

.

.

.....

.

.

...

... ..

..

.

.

.

.

...

.. ...

.. ...

......

.

..

.....

. ......

....

... ..

...

.. ..

. .

..

..

.

........

.

. .

P1 P2 P1 P2

.

O1 O2

PCR

Insert

Pooled Digestion/LigationReactions  (n=94)

GBS“Library”

PRC primers:

Insert

Page 13: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

GBS 96‐plex Protocol(http://www.maizegenetics.net/gbs‐overview)

............. ...

..... ..................... ........... ..

..

. .. ...

.

...

..... ......... .. .........

.....

...

. .. ....... ......

...

.. .......

...

. .

..

.... . .. . ....

1. Plate DNA & adapter pair

5. PCR

Primers

2. Digest DNA with RE3. Ligate adapters4. Pool 96 DNA aliquots 

6.  Evaluate fragment sizes

Clean‐upClean‐up

Page 14: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Perform Titration to Minimize Adapter Dimers Before Sequencing

Size Standards

LibraryAdapterDimer

1500 bp15 bp

NOTE:  Done once with a small number of samples.Adapter dimers constitute only 0.05% of raw sequence reads

Fluo

rescen

seintensity

Time Time

Optimal adapter amount

Page 15: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Small Fragments are Enriched in GBS Libraries Total Fragm

ents

0%

5%

10%

15%

20%

25%

30%

35%

40%

0 50 100

150

200

250

300

350

400

450

500

550

600

650

700

750

800

850

900

950

1000

1000

0000

REFGENOME

IBM

ApeKI fragment size (bp) >10000

B73 RefGen v1IBM (B73 X Mo17) RILs

Page 16: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

384‐plex GBS Results for Maize 

Reads

Mean read count per line = 528,000c.v.  = 0.22

Page 17: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Flowcell8 channels

Solid Phase Oligos

Illumina Sequencing by Synthesis Review

Page 18: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Flow cell with bound oligos

Denatured “Library”

Cluster Formation Amplifies Sequencing Signal

Bridge Amplification

Linearization

Page 19: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

ACGTGGC

ACGT

GC

T

G

TG

P1 primer

C

ACGT

GC

TG

GA

ACGT

GC

TGC

G

T TG TGC TGCA

Sequencing by Synthesis

FlowcellHiSeq 2000

Page 20: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp
Page 21: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

GACGT

GC

T

G

ACGTGGC

T

C

ACGT

GC

TG

G

T N

ACGTGGC

T ACGTGGC

T ACGTGGC

T

“In Phase” “Out of Phase”

Page 22: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Read 1

GBS captures barcode and insert DNA sequence in single read

Insert DNA

Barcode

Sequencing Primer 1

Sequencing Primer 2

Illumina

Read 1

Read 2

Sequencing Primer 1

BarcodeRE cut‐site

Insert DNA

GBS

Page 23: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Variable Length GBS Barcodes Solves Sequence Phasing Issues

•First 12 nt used to calculate phasing.•Algorithm assumes random nt distribution.•Incorrect phasing causes incorrect base calls.

Barcode

Illumina

Insert DNA

•Good design and modulating the RE cut site position with variable length barcodes produces even nt distribution. 

BarcodeRE cut‐site

GBS

Insert DNA

Read 2

Read 1

Page 24: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Invariant GBS barcodes cause loss of signal intensity.

Page 25: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Successful GBS sequencing run.

Page 26: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

GBS Adapter Design

Page 27: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Barcode Design Considerations• Barcode sets are enzyme specific

– Must not recreate the enzyme recognition site– Must have complementary overhangs

• Sets must be of variable length• Bases must be well balanced at each position• Must different enough from each other to avoid confusion if 

there is a sequencing error.– At least 3 bp differences among barcodes.

• Must not nest within other barcodes• No mononucleotide runs of 3 or more bases

http://www.deenabio.com

Page 28: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Most significant GBS technical issue?

DNA quality

Page 29: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Different Ends 50%

Same Ends 50%

100%

Ligation

LigationPCR

GBSStandard

GBS does not use standard “Y” adapters

Page 30: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Denatured “library”

Flow Cell with Bound Oligos

Same‐ended Fragments Do Not Form Clusters

Bridge Amplification

Cluster Formation  &“Linearization”

No P1 bindingsite

Cleaved from surface

Page 31: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

< 450 bp

Restriction site(   ) sequence tag

Loss of cut site

Sample1

GBS vs. RAD

• Focuses NextGen sequencing power to ends of restriction fragments• Scores both SNPs and presence/absence markers

Sample2

Page 32: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Digest

Ligate adapters

Pool

Random shear

Size select

Ligate Y adapters

PCR

RADGBS

ReferenceDavey et al. 2011

Page 33: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Modifying GBS

Considerations for using GBS with new species and / or different enzymes.

Page 34: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Why Modify the GBS Protocol?

• More markers• Fewer markers (deeper sequence coverage per locus)

• Increase multiplexing• More genome appropriate (avoid more repetitive DNA classes)

• Other novel applications (i.e., bisulfitesequencing).

Page 35: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Genome Sampling Strategies Vary by Species

Dependent on Factors that Affect Diversity:

•Mating System(Outcrosser, inbreeder, clonal?)

•Ploidy(Haploid, diploid, auto‐ or allopolyploid?)

•Geographical Distribution(Island population, cosmopolitan?)

Page 36: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Other Factors

• Genome size– The size of the genome has some bearing on the size of the fragment pool.

– Amount of repetitive DNA directly correlated with genome size.

• Genome composition– The base composition of the genome can affect the frequency and distribution of the cut sites.

Page 37: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Sampling large genomes with methylation‐sensitive restriction enzymes.

5‐base cutter

6‐base cutter

MethylatedDNA

UnmethylatedDNA

GBS LibrarySequenced Fragments

Page 38: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Scrub jay

Vole Giantsquid

Deer Mouse 

TunicateYeast

Solitary Bee

Grape Maize Cacao

Rice

Raspberry

Barley

Cassava

Goose

Sorghum

Cassava

Shrub willow

Optimizing GBS in New Species

Page 39: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Choosing Appropriate Restriction Enzymes

“Life is like a bowl of chocolates”.

Page 40: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Deer Mouse

ApeKI

PstI

Page 41: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Goose

PstI

Giant Squid

ApeKI

Page 42: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

B

Bovine GBS LibrariesPstI ApeKI

EcoT22I

X

Page 43: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

PstI ApeKI

Total  unique tags 442,813 1,138,366

Total SNPs 63,697 13,328

Tags with SNPs 14% 1%

Results for PstI and ApeKI libraries

Page 44: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

single copy92%

single copy93%

PstI Library ApeKI Library

2‐5 copies >5 copies

Proportion of sequence tags by copy number

Page 45: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

0%

10%

20%

30%

40%

50%

60%

70%

80%

1 2‐10 11‐100 101‐1000 1001‐10000 >10,0000

Proportion of sequences  by copy number

ApekIPstI

Page 46: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

0

20000000

40000000

60000000

80000000

100000000

120000000

140000000

160000000

180000000

0 1000 2000 3000

0

10000000

20000000

30000000

40000000

50000000

60000000

70000000

80000000

0 1000 2000 3000 4000 5000 6000 7000 8000

Bos taurusPstI SNPsChromosome 1

Sorghum bicolor ApeKI SNPsChromosome 1

Position

SNPs

Page 47: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

GBS SNP calls in Sorghum bicolorLots of Missing Data

091211.c10.hmp alleles chrom SC283 BR007B RIL_100 RIL_101 RIL_102 RIL_103 RIL_104 RIL_105 RIL_106 RIL_107 RIL_108 RIL_109 RIL_10 RIL_110 RIL_111 RIL_112 RIL_113 RIL_114 RIL_115

S10_29954 C/A 10 A N A C A N N C C N N N N N N N N C N

S10_29963 A/C 10 C N C A C N N A A N N N N N N N N A N

S10_29965 T/C 10 C N C T C N N T T N N N N N N N N T N

S10_84178 C/G 10 N N C N G N N N N N N N N N C N N N N

S10_84512 T/C 10 T T N N T N T N N N N C N N N N N N N

S10_84513 C/T 10 C C N N C N C N N N N T N N N N N N N

S10_85556 G/A 10 G A G A G N G G N N G G G R N A N R G

S10_85556 G/A 10 G N N A G A G N N G G G N N A N G G G

S10_115300 T/C 10 T C T C T C T N N T T N N N C N N T T

S10_133268 T/C 10 T C N N N N N T T N N N N N N N N N N

S10_158101 G/A 10 G N N A G N N N N N N N G A A N G G N

S10_158101 G/A 10 G A N A G A N N N G N G N N N N G G N

S10_163724 G/A 10 G A N A N N N G N N G N N N N N G N G

S10_163929 T/G 10 G T N T G N G N N G G N G T T T N G N

S10_163931 C/T 10 T C N C T N T N N T T N T C C C N T N

S10_180445 G/A 10 G A G N G A G G G N G N G A A N G G G

S10_209049 C/G 10 N G C S C G C C C N C C N C G G N C N

S10_209231 G/A 10 N A G A G A G G N N G G G N A A G G N

S10_214941 C/T 10 C N N T N N N N C N C N N N N N N C C

S10_234260 T/G 10 T N T N T G T N T T T T T N N G T N T

S10_234260 T/G 10 T G T G T G N T T T T T T N G G T T T

S10_238937 G/A 10 N A N A N N G G G N G N N N A N N N N

S10_252729 G/T 10 G T G T G N N G G G N N N N N T G G G

S10_252730 C/T 10 C T C T C N N C C C N N N N N T C C C

S10_261151 T/C 10 T C T C T C N N T T N N T N N N T N N

S10_268392 C/A 10 N N N N A N A N N N N N N N N N N N N

S10_268393 G/A 10 N N N N A N A N N N N N N N N N N N N

S10_274414 C/T 10 N N C N N N N C T C N N C T N N C N C

S10_281272 T/C 10 T N T N N N N N N N N N N N N N N T T

S10_287523 G/T 10 G N G N N N N G G G N N N N N N N N N

S10_292020 A/T 10 A T N N A N N A A A N A A N T T A A A

S10_292020 A/T 10 A T A T N N N A N A N A N N T T A N A

S10_294189 C/T 10 N N C N C N C C C N N N C T N N C N C

S10_302638 C/T 10 C T C T C N N C N C C N C T T N C C N

S10_312560 A/C 10 A C A N A N A A A A A A A C C N A A A

S10_347848 T/C 10 T C T C T C N T T T T N T N C N T T T

Page 48: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

Missing Data Strategies

• Impute• Technical Options

– Reduce the multiplexing level– Sequence the same library multiple times

• Molecular Options– Choose less frequently cutting enzymes

Page 49: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

DNA sample info entered to webform

HTS databaseProject approved?Samples shipped

GBS libraries made

Reference genome

Non‐reference genome

Data FileSNPs/Sample

Lab Analysis Pipelines

DNA sequence data Data File

States/Sample

GBS Workflow

Page 50: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp

BioinformaticsJeff GlaubitzQi SunJames Harriman

Method DevelopmentRob ElshireEd Buckler

Sharon Mitchell

GBS Team

Laboratory/ProductionCharlotte Acharya

Wenyan ZhuLisa Blanchard