Applied Bioinformatics for Plant Genome Characterisation...

Post on 16-Aug-2019

221 views 0 download

Transcript of Applied Bioinformatics for Plant Genome Characterisation...

Applied Bioinformatics for Plant Genome Characterisation using Next-Generation Sequence Data

1

David Edwards

University of Queensland, Australia

Dave.Edwards@uq.edu.au

Outline

• Sequencing wheat chromosome arms

• Wheat evolution

• Chickpea chromosomal genomics

• Skim GBS based genome assembly

• Skim GBS based trait association

• Assessing gene presence/absence variation

• Extreme non-model species

Chromosome sequencing

• Isolate chromosome arms using flow cytometry

• Generate NGS libraries and PE Illumina data

• De novo assemble

Wheat genome

4

http://www.jic.ac.uk/staff/graham-moore/wheat_meiosis.htm 17 billion bases

Mapping reads to reference genomes

5

1

2

3

4

5

6

11

10

9

8

7

12

Sequencing wheat chromosome arms

6

Ta 7DS Bd 1

Bd 3

www.wheatgenome.info

Berkman, et al., Plant Biotechnology Journal (2011)

Wheat genome evolution

AA

BB AW

AABB

50,000 years ago

DD

AABBDD

10,000 years ago

AABB

DD

A little history

8 http://www.nap.edu/openbook.php?record_id=12692&page=94

Wheat genome evolution

9

• When 2 genomes come together, they lose genes as two copies may not be required or may even be harmful

• Can we see differential gene loss between the three wheat genomes?

Figure 1 Wheat genome evolution

The number of conserved genes within the syntenic builds for chromosome 7A, B and D genomes

10

Wheat genome evolution

11

• Are there differences in the types of genes lost?

• Conservation of highly networked genes under neutral selection

• Strong selection pressure breaks networks and leads to loss of networked genes

7A gene network

12

7B gene network

13

7D gene network

14

Wheat genome evolution

15

AA

BB AW

AABB

50,000 years ago

DD

AABBDD

10,000 years ago

Neutral selection

Strong selection

16

SGSautoSNP

17

Australian resequencing

4 million SNPs

18

# SNPs SNPs/Mb

7A 1,486,040 4077

7B 1,860,295 4737

7D 671,976 1939

Wheat genome evolution

19

AA

BB AW

AABB

50,000 years ago

DD

AABBDD

10,000 years ago

AABB

DD Genetic exchange

No genetic exchange

20

SNP matrix

21

AC

Barrie 0

Alsen 194,725 0

Baxter 328,294 246,218 0

Chara 592,193 438,075 146,171 0

Drysdale 429,530 319,401 392,632 730,606 0

Excalibur 346,557 273,217 324,087 567,179 367,279 0

Gladius 529,898 327,659 472,457 906,611 616,253 491,885 0

H45 385,753 265,113 339,227 627,589 298,414 280,576 519,690 0

Kukri 245,356 208,666 290,506 541,524 428,134 318,029 480,575 345,358 0

Pastor 302,731 289,053 340,269 603,323 336,029 284,559 552,119 309,025 302,231 0

RAC875 412,818 257,630 390,967 722,089 429,038 368,152 158,973 386,145 418,037 375,137 0

VolcaniD

DI 508,175 413,676 412,553 808,658 696,467 600,478 813,067 633,916 498,017 586,694 643,205 0

Westoni

a 354,599 276,490 310,192 623,591 500,461 362,800 557,464 405,842 346,683 349,542 403,411 678,631 0

Wyalkatc

hem 525,289 341,043 433,228 800,300 560,759 327,888 386,213 449,614 436,777 442,941 235,924 800,137 505,345 0

Xiaoyan

54 458,214 332,986 368,604 761,864 540,264 324,881 696,677 377,053 401,191 413,462 522,021 897,807 622,449 569,223 0

Yitpi 544,440 328,216 468,743 968,088 690,017 548,694 233,539 587,310 530,687 580,060 287,648 951,537 654,967 444,084 844,785 0

AC

Barrie Alsen Baxter Chara Drysdale Excalibur Gladius H45 Kukri Pastor RAC875

VolcaniD

DI

Westoni

a

Wyalkatc

hem

Xiaoyan

54 Yitpi

Phylogenetic tree

22

GBrowse http://wheatgenome.info/

Chickpea kabuli reference

Kabuli reference

Kabuli reference

Desi Kabuli

Chickpea desi vs kabuli

Desi reference

Ruperao et al. Plant Biotechnology Journal (in press)

Desi Kabuli Desi WGS

Skim GBS based genome validation

• Skim GBS SNP calling

• Make metaSNPs

• Merge contigs

• Genetic map

• Compare all blocks against all

• Apply clustering

Skim GBS

30

• Determine SNPs by sequencing parents and running SGSautoSNP

• Low coverage skim sequence segregating population

• Map reads to the reference genome

• Call genotype where reads cover previously defined SNP

• Impute and clean to define haplotype blocks

Genotype calling

31

Call genotype of previously predicted SNPs

A

C/A T/C

A

Haplotype blocks

TN1 A G G T C C A G G A T A A T

TN2 A G G T C C A G G A T A A T

TN3 T C C A G G C G G A T A A T

TN4 A G G T C C A G G A T A A T

TN5 T C C A G G C T C G C G G C

TN6 A G G T C C A G G A T A A T

TN7 T C C A G G C T C G C G G C

T A G G T C C A G G A T A A T

N T C C A G G C T C G C G G C

Pre-imputation

After imputation and cleaning

Clustering

Clustering

LG 1 after ordering

Trait association

38

Disease resistance in canola

Drought tolerance in chickpea

Gene loss

Cabbage

40

Brussel sprout

41

Cauliflower

42

Kale

43

Kohlrabi

44

Wild B. oleracea

45

Brassica pan-genome

46

List all Brassica genes Essential (conserved) Optional (presence/absence variation) Associate PAVs with traits Abundance of optional genes with fitness

Seagrass

47 Manatee grazing on seagrass (picture by David Peart).

Manacheese?

48

Seagrass

GO.ID Term

GO:0018871 1-aminocyclopropane-1-carboxylate metabolic prcesses

GO:0042218 1-aminocyclopropane-1-carboxylate biosynthetic processes

GO:0009692 ethylene metabolic process

GO:0009693 ethylene biosynthetic process

GO:0043449 cellular alkene metabolic process

GO:0043450 alkene biosynthetic process

GO:1900673 olefin metabolic process

GO:1900674 olefin biosynthetic process

GO:0048447 sepal morphogenesis

GO:0048451 petal formation

GO:0048453 sepal formation

GO:0048442 sepal development

GO:0048464 flower calyx development

GO:0048446 petal morphogenesis

GO:0010044 response to aluminum ion

GO:0071281 cellular response to iron ion

GO:0010039 response to iron ion

GO:0010105 negative regulation of ethylene mediated signalling pathway

GO:0070298 negative regulation of phosphorelay signal transduction system

GO:0048441 petal development

GO:0048465 corolla development

GO:0071248 cellular response to metal ion

GO:0009963 positive regulation of flavonoid biosynthetic process

GO:0010104 regulation of ethylene mediated signalining pathway

GO:0070297 regulation of phosphorelay signal transduction system

GO:1900378 positive regulation of secondary metabolite biosynthetic process

GO:0071241 cellular response to inorganic substance

GO:0009956 radial pattern formation

GO:0010375 stomatal complex patterning

GO:0048729 tissue morphogenesis

GO:2000038 regulation of stomatal complex development

GO terms for genes lost in seagrass

Conclusions

• Build high quality genome assemblies

• Identify variation between genomes

• Associate genome variation with agronomic traits

• Apply diverse genomic knowledge to improve crops

Acknowledgements

52

Philipp Bayer

Kenneth Chan

Pradeep Ruperao

Michal Lorenc

Agnieszka Golicz

Kaitao Lai

Paul Visendi

Paula Martinez

Jenny Lee

Juan Montenegro

Paul Berkman

Jiri Stiller

Sahana Manoli

Jacqueline Batley

Alice Hayward

Emma Campbell

Jessica Dalton-Morgan

Satomi Hayashi

Reece Tollenaere

Hana Šimková

Marie Kubaláková

Jaroslav Doležel

Tim Sutton

Deepa Jaganathan

Rajeev Varshney

(and colleagues)

Martin Schliep

Rudy Dolferus

Peter Ralph

Contact:

Dave.Edwards@uq.edu.au

Acknowledgements

53

Kaitao Lai

Philipp Bayer

Kenneth Chan

Michal Lorenc

Agnieszka Golic

Paul Visendi

Pradeep Ruperao

Paul Berkman

Jiri Stiller

Sahana Manoli

Jacqueline Batley

Alice Hayward

Emma Campbell

Jessica Dalton-Morgan

Satomi Hayashi

Hana Šimková

Marie Kubaláková

Jaroslav Doležel

Contact:

Dave.Edwards@uq.edu.au

Advisory Board Jeff Bennetzen Jose Crossa Robert Henry Rodomiro Ortiz Andrew Paterson Kadambot Siddique Mark Sorrells Mark Tester Michael Udvardi