2010-09-03.Whitty_B.Solanaceae_Genomics_
-
Upload
brett-whitty -
Category
Documents
-
view
66 -
download
0
Transcript of 2010-09-03.Whitty_B.Solanaceae_Genomics_
Solanaceae Genomics Resource Bre1 R. Whi1y, C. Robin Buell, Michigan State University,
Department of Plant Biology, East Lansing MI 48824
Collec'vely, the Solanaceae (a family which includes Potato, Tomato, Tobacco, Pepper, Eggplant and Petunia) are a valuable component of U.S. agriculture. The major Solanaceae crop species share both sequence iden'ty and gene order thereby providing the basis for leveraging genomic resources across taxa. Transcriptome and genome sequencing projects have been ini'ated for the major crop species; albeit none of the three genome ini'a'ves (potato, tomato, tobacco) have yet released to the public a high quality, finished complete genome sequence. Thus, it is essen'al that all of the par'al Solanaceae transcript and genome sequence be integrated at the family level and linked to other model dicot species to provide contextual informa'on on the puta've func'on of Solanaceae homologs. In this project, we are working to iden'fy puta've orthologs, paralogs, and lineage-‐specific genes within the Solanaceae to facilitate intra-‐ and inter-‐species comparisons. We also iden'fy homologs of Solanaceae species within three dicot species (Arabidopsis, Poplar, Grapevine) to permit leveraging resources from these model species to the Solanaceae. We are working to generate compara've analyses, alignments, views, and displays of the Solanaceae. Overall, we provide a robust and integrated compara've genomics resource that permits broad and deep data-‐mining of Solanaceae sequences by the community.
This project was ini'ated January 1, 2008 and we con'nue to update project data quarterly, and develop addi'onal resources and tools for the Solanaceae community. It is supported by the Na'onal Research Ini'a've (NRI) Plant Genome Program of the USDA Na'onal Ins'tute of Food and Agriculture (NIFA) (2008-‐35300-‐18671).
All project data is made available through our web site:
h\p://solanaceae.plantbiology.msu.edu Project email address: [email protected]
Model Dicot ComparaIve Genome Databases Alignments of Solanaceae transcript assemblies against model dicot (Arabidopsis, Grapevine, Poplar) genomic and polypep'de sequence are available for display and search in a Gbrowse database.
Potato, Tomato and Tobacco DraK Genomes Our analyses and databases include all public data releases from the three genome sequencing efforts in the Solanaceae. We obtain data as it is released to GenBank from the Interna'onal Tomato Genome Sequencing Project which includes gene models annotated by the project members, and the Interna'onal Potato Genome Sequencing Consor'um (of which we are members) which to date has released assemblies with no gene annota'ons. Our annota'on and analysis pipeline provides gene models for genes present on these assemblies, supplementary to any previously annotated gene models present in the public data.
A Community Resource As a component of our project we aim to provide a web portal that, in addi'on to presen'ng results from our compara've analyses, acts as a unified repository for genomic and transcriptomic data, and related bioinforma'c resources for the Solanaceae, and thereby improves the accessibility of this data to the Solanaceae community.
AnnotaIon/Analysis Pipelines We retrieve all publicly available Solanaceae genomic sequences from GenBank, and the sequences are run through the GMOD MAKER gene annota'on pipeline to provide a common set of evidence-‐supported gene model predic'ons; these supplement the models previously annotated (if any) on the public assemblies. Our transcriptomic analyses are performed on transcript assemblies generated by PlantGDB (PUTs).
Some of the analyses we perform on genomic and transcriptomic sequence include: • Ortholog/paralog predic'on by best hit and OrthoMCL clustering • SSR iden'fica'on in transcript and genome sequence, and genera'on of primers (using Primer3) • Iden'fica'on of puta've SNPs in transcript assemblies • Alignment of PlantGDB-‐assembled Solanaceae transcripts (PUTs) to the genomic sequence using exonerate • Alignment of UniProt's SwissProt & UniRef protein databases to the genomic sequence using exonerate • BLASTP of Solanaceae gene models against model dicot proteomes (Arabidopsis, Grapevine, Poplar) • InterProScan search on the models to iden'fy func'onal domains • Repeat feature predic'on (using RepeatMasker)
• ncRNA feature predic'on (using tRNAscan-‐SE and RNAmmer)
Integrated and Accessible Data Available sequence data, analysis results, and tools for species in the Solanaceae are presented in centralized views on the project site to aid users in applying these resources in their research. At the genome level, our species overview page consolidates available sequence data, genome informa'on and resources, and lists available analysis results and tools. At the transcript level, our gene overview page presents a summary of gene informa'on and analyses, such as BLAST results, computa'onally predicted SNPs, SSRs, orthology/paralogy, and links transcripts to other site resources including our genome browsers.
Solanaceae ComparaIve Genome Database Our database contains annota'on and compara've data for all public Solanaceae genomic sequence assemblies. We currently use the GMOD Generic Genome Browser (Gbrowse) to facilitate the web-‐based display and searching of our annota'on and compara've analyses.
Potato Genome Sequencing ConsorIum Potato DraK Genome Browser As members of the Potato Genome Sequencing Consor'um we are hos'ng the public Potato genome browser. Presently, the doubled monoploid Solanum phureja DM1-‐3 516R44 (CIP801092) v3.2 genome assembly and annota'on is online. Visit h\p://potatogenome.net for details on this draj genome release.
In the genome browser all aligned Solanaceae transcript assemblies are linked to the the full set of resources associated with those assemblies provided by the Solanaceae Genomics Resource site.
Upcoming Features for 2010/2011 We expect that the finished Potato and Tomato genomes will be released to the public sequence databases in the coming months. At that 'me, we will integrate the complete genomes into our exis'ng resources, and will make available addi'onal tools and analysis results; one of these new tools will be a genome synteny viewer.
We have produced a significant amount of RNA-‐Seq data from our par'cipa'on in the Potato Genome Sequencing Consor'um (PGSC) h\p://potatogenome.net and Solanaceae Coordinated Agricultural Project (SolCAP) h\p://solcap.msu.edu, and when publicly released it will be incorporated into the Solanaceae Genomics Resource databases and tools. This data will greatly expand our exis'ng SNP database tool, and we will provide new tools for the query and display of expression data.
Coming Soon