15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel...
-
Upload
allyson-joseph -
Category
Documents
-
view
218 -
download
3
Transcript of 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel...
![Page 1: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/1.jpg)
15 January 2006, PAG XIV SanDiego Rémy Bruggmann, MIPS/IBI, GSF
A Bioinformatic Framework to Unravel the Secrets of the Tomato
Genome
![Page 2: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/2.jpg)
Outline
Introduction
Data management
Annotation
Training/Test gene set
Summary
![Page 3: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/3.jpg)
MIPS´ look at the Green Side of Life
– genome projects and database activities –
Arabidopsis thalianaArabidopsis lyrata *Capsella rubella *
MaizeRice
MedicagoLotus
Solanum lycopersicum
![Page 4: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/4.jpg)
MIPS´ look at the Green Side of Life
– genome projects and database activities –
Need to streamline and unify databases as well as analytical schemas and operation routines
Strong synergism and very robust
Risk to loose flexibility and „custom tailor“ attractiveness
Awareness that not every genome and every community„is just the same“
![Page 5: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/5.jpg)
From Center Centric Strategies to distributed Approaches
Typically, genome projects undergo particular phases:
Sequenced BACs are annotated
Gene models are published to the community
Potentially generates competition rather than collaboration among groups
![Page 6: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/6.jpg)
From Center Centric Strategies to distributed Approaches
Consequences can be:
underlying analytical procedures are not always tested, trained and evaluated
Between groups more or less pronounced differences exist--> differing, contradicting and confliciting data
![Page 7: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/7.jpg)
„information enriched high quality genome backbone to address genome scale biological
questions“
Aim of all groups:
![Page 8: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/8.jpg)
From Center Centric Strategies to distributed Approaches
An example ...
International Medicago Genome Annotation Group
Consists of groups participating either in the International or the European Medicago Genome Initiative annotation/ bioinformatics programs
Agreement on common annotation standards, data exchange formats and naming conventions
Aims to produce and provide unified high-quality Medicago data set
![Page 9: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/9.jpg)
From Center Centric Strategies to distributed Approaches
Advantages of sharing efforts in genome annotation within a common annotation pipeline
![Page 10: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/10.jpg)
From Center Centric Strategies to distributed Approaches
prevents from:
(i) duplicating efforts
(ii) conflicts resulted from different
annotation “standards”
ensures high-quality annotation standards
ensures common (gene) naming common dataset
Integrates and profits from knowledge and expertise
of the individual groups
![Page 11: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/11.jpg)
Data management
All data should be organized in agenome database
![Page 12: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/12.jpg)
Wishlist for a modern genome db
Complete Comprehensive Up-to-date Integrated User interface Application interface State-of-the-art automatic analysis Adaptable Cross-genome comparison
…low cost, low manpower...
![Page 13: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/13.jpg)
PlantsDB Philosophy
Plants Genome Resource: provides and integrates sequence data from European plant sequencing consortia along with publically available data from the international initiative
Plants DB communicates bioinformatic analysis data (visualization, genetic elements, structural data, ontologies, domains...; BLAST, browse and search,…comparative analysis)
Integration: provides a distributed network to integrate and retrieve data from heterogenous resources using BioMOBY (connection to other plant DBs, PlaNET)
![Page 14: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/14.jpg)
Preliminary Annotation Pipeline
Towards a preliminary annotation
![Page 15: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/15.jpg)
Repeat OntologyRepeatDatabase
RepeatMasker
Repeat Detection
Masked sequences Repeat annotation
Gene prediction GAMEXML
![Page 16: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/16.jpg)
Gene Prediction
EST DB
Protein DB
ESTAssemblies
e.g. SwissProt
External Databases
► GenomeThreader► FGenesH++/ProtMap► GeneMarkHMM
GAMEXML
Gene prediction programs
Document of computational
results
Manual annotation inApollo Genome Viewer
PlantsDB
Web Access
Gbrowse
![Page 17: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/17.jpg)
First Results
![Page 18: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/18.jpg)
Repeat Masker
5.8 MB analysed (48 BACs)
~ 6.7 % repetitive elements(<0.2% - 23% per bac)
~ 1 min/100 kb
whole genome (euchromatic part):
~ 2 daysBACs
0
5
10
15
20
25
Repeat content[%]
State: December 2005
![Page 19: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/19.jpg)
Preliminary Results
Comparison of different gene finders
![Page 20: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/20.jpg)
ab initio predictions
EST/TC
FGeneSH
GeneMark
EST/TC
![Page 21: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/21.jpg)
ab initio predictions
![Page 22: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/22.jpg)
ab initio predictions
FGeneSH++ and GeneMarkHMM often generate incomplete or wrong gene models at the moment
There are no matrices available that are trained for tomato
Tomato matrices will increase prediction quality dramatically
Collection of annotated high quality genes for a training/test set for EuGene, FGeneSH,
GeneMarkHMM, ...
![Page 23: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/23.jpg)
Training/Test Gene Set
How can we get a training/test set?
Map available tomato cDNA/ESTs to the BACs(use only high confident matches)
Link experimental data to the genemodels
Use this gene set for ab initio gene finder training
![Page 24: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/24.jpg)
GenomeThreader
GenomeThreader used for EST/cDNA-Mapping:
similarity-based approach:EST/Proteins used to predict gene structure via optimal spliced alignments
Offers many options (full user control)
incremental updates (avoids a lot of duplicated computations)
Improved GeneSeqer
![Page 25: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/25.jpg)
GenomeThreader - calculations
DB Entries Size [MB] Calc time/100kb [s] Whole Genome
Tomato 32401 27 27 s
~ 2.8 daysMicroTom 26363 21 22 s
Potato 38239 34 23 s
Tobacco 28661 20 39 s
Arabidopsis cDNAs 31939 45 10 s 0.3 days
Dicots 404822 311 170 s 4.3 days
rice cds 15639 21 8 s 0.2 days
Uni_trembl Plants 185564 74 38 s 1.0 day
Uniprot_swissprot 181571 82 8 s 0.2 days
Nonred 1675230 662 437 s 11.1 days
Total 2834224 1433 14 min 22 days
(single CPU, euchromatic part)
![Page 26: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/26.jpg)
Example
Tomato
Microtom
Potato
Tobacco
![Page 27: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/27.jpg)
Examples - UK
![Page 28: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/28.jpg)
Example
![Page 29: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/29.jpg)
Number of high quality genes
0
2
4
6
8
10 Number of genes: 164(covered completely by cDNA/ESTs)
~3.4 genes/BAC(range: 0 - 9 genes/BAC)
These genes can be used to train gene finders
BAC
# genes
(Only very good alignments considered)
![Page 30: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/30.jpg)
Gene Finder
Which program can be trained for tomato?
One possibility is EuGene (VIB Gent)
- performed well e.g. for Arabidopsis and Medicago- available as soon as test/training gene set is large
enough
![Page 31: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/31.jpg)
EuGene - overview
DNA MarkovAA MarkovSplice
sites
Start sites
Protein similarities
EST similaritiesFL cDNA
Exon conservation
Repeats
Statistical contents
NetGene2
SplicePredictor
SpliceMachine
GeneSplicer
NetStartSpliceMachineATRPred
Similarities
Plugins
Plugin
training
Needs
one
dataset
Optimize
plugin
combination
Needs
one
dataset
Test
Needs
one
dataset
new
TRAINING OPTIM TEST
![Page 32: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/32.jpg)
EuGene
First round training:- 500 high quality tomato genes- statistical models on codon usage and splice sites of Arabidopsis will be used
Second round training:- 2000 high quality tomato genes- Build a tomato-only version of EuGene
Approx. 150 BACs needed for first round training
![Page 33: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/33.jpg)
Current state of sequenced BACs
Total number of BACs:- unfinished: 71- finished: 87- available: 52
![Page 34: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/34.jpg)
Summary
ab initio gene finders are not yet calibrated to tomato
Need of a test/training gene set to calibrate the gene finders
We need another 100 BACs to get enough genes for a first round training of EuGene
GenomeThreader produces only good alignments with ESTs from SOL-species (Tomato, Potato, Tobacco)
More repeats will be detected (will be included in RepeatMasker Library)
![Page 35: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/35.jpg)
Acknowledgments
Automated annotation
MIPS
Heidrun GundlachGeorg HabererManuel SpannaglKlaus F.X. Mayer
Manual Annotation/Curation/Web-site(Chromosome 4)Imperial CollegeDaniel BuchanJames Abbot
Sarah ButcherGerard Bishop
Sequencing & Assembly(Chromosome 4)Sanger InstituteChristine NicholsonSean Humphray
MPIZ Köln Heiko Schoof
EuGeneVIB Gent Stephane Rombauts
GenomeThreaderUniversity of HamburgGordon GremmeStefan KurtzVolker Brendel
![Page 36: 15 January 2006, PAG XIV SanDiegoRémy Bruggmann, MIPS/IBI, GSF A Bioinformatic Framework to Unravel the Secrets of the Tomato Genome.](https://reader030.fdocuments.in/reader030/viewer/2022013011/56649efc5503460f94c0efab/html5/thumbnails/36.jpg)
15 January 2006, PAG XIV SanDiego Rémy Bruggmann, MIPS/IBI, GSF
A Bioinformatic Framework to Unravel the Secrets of the Tomato
Genome