Presentation at ZSJ 2013 by Shigehiro Kuraku

29

description

Slides from an oral presentation given in Japanese at ZSJ 2013 Meeting in Okayama, Japan, in September 2013.

Transcript of Presentation at ZSJ 2013 by Shigehiro Kuraku

Page 1: Presentation at ZSJ 2013 by Shigehiro Kuraku
Page 2: Presentation at ZSJ 2013 by Shigehiro Kuraku

Page 3: Presentation at ZSJ 2013 by Shigehiro Kuraku
Page 4: Presentation at ZSJ 2013 by Shigehiro Kuraku

‘the complete set of phylogenetic trees derived from the proteome of an organism’

Sicheritz-Pontén and Andersson, 2001. Nuc. Acids Res. 29: 545

genome-wide events +

gene family-specific events

August 2012. At Daitoku-ji Temple, Kyoto

Page 5: Presentation at ZSJ 2013 by Shigehiro Kuraku

Hypothesis C Hypothesis A Hypothesis B

human

chicken

shark

lamprey

hagfish

amphioxus

tunicate

Cyc

lost

om

es

human

chicken

shark

lamprey

hagfish

amphioxus

tunicate

human

chicken

shark

lamprey

hagfish

amphioxus

tunicate

- Composition of Hox/Dlx clusters Neidert et al., 2001. PNAS Irvine et al., 2002. J Exp Zool B Force et al., 2002. J Exp Zool B etc

- ParaHox clusters Furlong et al., 2007. MBE

- Mol. phylogeny of 33 gene families Escriva et al., 2002. MBE

- Amphioxus genome Putnam et al., 2008. Nature

Cyc

lost

om

es

Cyc

lost

om

es

- Mol. phylogeny of 55 gene families

Kuraku et al., 2009. MBE

- Sea lamprey genome analysis

Smith, Kuraku et al., 2013. Nature Genetics

- Globin gene phylogeny

Hoffmann et al., 2010. PNAS

Page 6: Presentation at ZSJ 2013 by Shigehiro Kuraku

Heuristic ML JTT+G4

ML-BP/NJ-BP

Kuraku and Kuratani, 2011

Page 7: Presentation at ZSJ 2013 by Shigehiro Kuraku

(Kuraku & Kuratani, 2011. Genome Biol. Evol.)

(cf. hidden paralogy)

Page 8: Presentation at ZSJ 2013 by Shigehiro Kuraku

Informatics Modern sequencing

Molecular Developmental Biology

Genome Resource & Analysis Unit Center for Developmental Biology

RIKEN, Kobe, Japan

Page 9: Presentation at ZSJ 2013 by Shigehiro Kuraku

illumina HiSeq1500

Installed in November 2011

~150 bp reads in Rapid Run mode

Sanger sequencing, Cell sorting with FACS, clone distribution, etc.

Page 10: Presentation at ZSJ 2013 by Shigehiro Kuraku

Kuraku et al., 2013. Nucleic Acids Res. Amemiya et al., 2013

Not only sequencing

Page 11: Presentation at ZSJ 2013 by Shigehiro Kuraku

Page 12: Presentation at ZSJ 2013 by Shigehiro Kuraku

・Main applications: RNA-seq & ChIP-seq

・Trouble shooting with tight wet-dry communication

・Diverse non-model organisms for RNA-seq

Our experiences at GRAS

・Many requests with limited sample amounts

Page 13: Presentation at ZSJ 2013 by Shigehiro Kuraku

・Look carefully for acceptable pricing and service contents

・Longer illumina reads are not necessarily beneficial

・Sequencers ‘can’ produce ‘data’ from problematic samples

Low quality DNA/RNA, contamination, over-amplification, …

For retrieving complete genome and original transcriptome

~150bp on HiSeq & ~300bp MiSeq (as of September 2013)

Prep of libraries with longer inserts

e.g. How many reads do you need?

Page 14: Presentation at ZSJ 2013 by Shigehiro Kuraku

Page 15: Presentation at ZSJ 2013 by Shigehiro Kuraku

Species Sequenced at

Gene model by Sequencing technology

Published in # of authors

Started in

sea lamprey

Wash. Univ. Yandell lab / Ensembl

Sanger Nat. Genet. (2013)

59 2005?

soft-shelled turtle

BGI BGI / Ensembl illumina Nat. Genet. (2013)

34 2010

coelacanth Broad Institute

Broad / Ensembl illumina Nature (2013)

91 2011

Page 16: Presentation at ZSJ 2013 by Shigehiro Kuraku

International consortium

Vertebrate ‘new genes’

GC & codon usage bias

Myelin-associated genes

Smith, Kuraku, et al. 2013. Nature Genetics

Sequenced at Wash. Univ. Genome Institute

In-house annotation effort

Horizontal gene transfer

Kuraku et al., 2012. Genome Biol. Evol.

GC-content & codon usage bias

Qiu et al., 2011. BMC Genomics

Trained gene prediction setting available at Augustus web server

Contributed analysis

http://www.ensembl.org/Petromyzon_marinus/Info/Index

Coding genes: 10,415

Incomplete genome assembly: Pax6 missing

Incomplete gene annotation: Fgf8/17-A missing

(as of September 2013; release 73)

Page 17: Presentation at ZSJ 2013 by Shigehiro Kuraku

Amino acid composition

Deviation of ‘gene model’ in lamprey genome

Smith, Kuraku, et al. 2013. Nature Genetics

Methods: Correspondence analysis for frequencies of 20 amino acids

CA

CA

Page 18: Presentation at ZSJ 2013 by Shigehiro Kuraku

Codon usage bias

Heavy use of GC-rich codons

sea lamprey stickleback Tetraodon

Takifugu platypus medaka

dog human mouse

ghost shark zebrafish

chicken anole lizard

opossum X. tropicalis

Methods: RSCU (Sharp et al., 1986) and ENc (Wright, 1990)

Qiu et al., 2011. BMC Genomics

N

Page 19: Presentation at ZSJ 2013 by Shigehiro Kuraku

Genomic DNA

Raw reads

Genome assembly (contigs/scaffolds)

‘Gene model’ (protein-coding sequences)

Sanger, 454, illumina, or/and PacBio

Gene prediction (after ‘training’)

Heterochromatin etc.

Repeats, regions with low depth

‘Unusual’ genes

Assembly

Reference: transcriptome, annotated genes in GenBank

Page 20: Presentation at ZSJ 2013 by Shigehiro Kuraku

Genomic DNA

Raw reads

Genome assembly (contigs/scaffolds)

‘Gene model’ (protein-coding sequences)

Sanger, 454, illumina, or/and PacBio

Gene prediction (after ‘training’)

Assembly

Reference: transcriptome, annotated genes in GenBank

Page 21: Presentation at ZSJ 2013 by Shigehiro Kuraku

(cf. Assemblathon2 - Bradnam et al., 2013)

‘NG50’ instead of N50

CEGMA (Parra et al., 2007) – coverage of CEGs

CGAL, REAPR, ALE – evaluation by identifying misassemblies

QUAST – computation of assembly summary

Page 22: Presentation at ZSJ 2013 by Shigehiro Kuraku

248 core eukaryotic genes (CEGs)

Species Assembly release # of CEGs found (including ‘partial’)

Published?

human GRCh37 (hg19) 248 First draft in 2001

mouse GRCm38 (mm10) 239 First draft in 2002

X. tropicalis JGI_4.2 239 Hellsten et al., 2010

coelacanth LatCal1 236 Amemiya et al., 2013

spotted gar LepOcu1 235 unpublished

soft-shell turtle PelSin_1.0 232 Wang et al., 2013

anole lizard AnoCar2.0 231 Alföldi et al., 2011

zebrafish Zv9 230 Howe et al., 2013

chicken galGal4 220

chicken WASHUC2.63 (galGal3) 210 First draft in 2004

Japanese lamprey LetCam1 199 Mehta et al., 2013

sea lamprey PerMar1 172 Smith et al., 2013

little skate version2 77 unpublished

elephant shark (1.4x) 58 Venkatesh et al., 2007

Page 23: Presentation at ZSJ 2013 by Shigehiro Kuraku

Genomic DNA

Raw reads

Genome assembly (contigs/scaffolds)

‘Gene model’ (protein-coding sequences)

Sanger, 454, illumina, or/and PacBio

Gene prediction (after ‘training’)

Assembly

Reference: transcriptome, annotated genes in GenBank

Page 24: Presentation at ZSJ 2013 by Shigehiro Kuraku

(cf. Assemblathon2 - Bradnam et al., 2013)

‘NG50’ instead of N50

‘Annotation Turnover’ and ‘AED’ (Eilbeck et al., 2009)

Also, run CEGMA to check transcript diversity?

CEGMA (Parra et al., 2007) – coverage of CEGs

CGAL, REAPR, ALE – evaluation by identifying misassemblies

QUAST – computation of assembly summary

Page 25: Presentation at ZSJ 2013 by Shigehiro Kuraku

– Nakamura et al., 2013

Page 26: Presentation at ZSJ 2013 by Shigehiro Kuraku

Page 27: Presentation at ZSJ 2013 by Shigehiro Kuraku

- Phylogenetic property of the species of your interest

e.g. Ploidy level, distance to close relatives, …

www.genomesize.com, www.timetree.org

- Any clue about its molecular attributes ?

e.g. GC-content, repeats, intron/UTR length, …

Using existing resources at SRA & Sanger traces at NCBI dbEST

Page 28: Presentation at ZSJ 2013 by Shigehiro Kuraku

- Genome or transcriptome to sequence ?

- Sample prep mostly determines the fate of the project

Any existing or emerging resources?

Quantification with Qubit; rRNA removal controlled with BioAnalyzer

Replication > Depth (Rapaport et al., 2013. Genome Biol.)

- Rigorous QC of prepared libraries before sequencing ChIP-qPCR before ChIP-seq

- RNA-seq: sequence identification or quantification?

Page 29: Presentation at ZSJ 2013 by Shigehiro Kuraku

- Fostering more productive sequencing facilities in Japan

GRAS accepts visits of facility managers/staffs

[email protected]

- Education of researchers with dual (wet/dry) capabilities

Learning material: ‘Unix & Perl for Biologists’ by Korf Lab

‘A sequencer or a bioinformatician ?‘

http://korflab.ucdavis.edu/unix_and_Perl/

- Importing latest information from overseas