2010-09-03.Whitty_B.Solanaceae_Genomics_

1
Solanaceae Genomics Resource Bre1 R. Whi1y, C. Robin Buell, Michigan State University, Department of Plant Biology, East Lansing MI 48824 Collec’vely, the Solanaceae (a family which includes Potato, Tomato, Tobacco, Pepper, Eggplant and Petunia) are a valuable component of U.S. agriculture. The major Solanaceae crop species share both sequence iden’ty and gene order thereby providing the basis for leveraging genomic resources across taxa. Transcriptome and genome sequencing projects have been ini’ated for the major crop species; albeit none of the three genome ini’a’ves (potato, tomato, tobacco) have yet released to the public a high quality, finished complete genome sequence. Thus, it is essen’al that all of the par’al Solanaceae transcript and genome sequence be integrated at the family level and linked to other model dicot species to provide contextual informa’on on the puta’ve func’on of Solanaceae homologs. In this project, we are working to iden’fy puta’ve orthologs, paralogs, and lineagespecific genes within the Solanaceae to facilitate intra and interspecies comparisons. We also iden’fy homologs of Solanaceae species within three dicot species (Arabidopsis, Poplar, Grapevine) to permit leveraging resources from these model species to the Solanaceae. We are working to generate compara’ve analyses, alignments, views, and displays of the Solanaceae. Overall, we provide a robust and integrated compara’ve genomics resource that permits broad and deep datamining of Solanaceae sequences by the community. This project was ini’ated January 1, 2008 and we con’nue to update project data quarterly, and develop addi’onal resources and tools for the Solanaceae community. It is supported by the Na’onal Research Ini’a’ve (NRI) Plant Genome Program of the USDA Na’onal Ins’tute of Food and Agriculture (NIFA) (20083530018671). All project data is made available through our web site: h\p://solanaceae.plantbiology.msu.edu Project email address: [email protected] Model Dicot ComparaIve Genome Databases Alignments of Solanaceae transcript assemblies against model dicot (Arabidopsis, Grapevine, Poplar) genomic and polypep’de sequence are available for display and search in a Gbrowse database. Potato, Tomato and Tobacco DraK Genomes Our analyses and databases include all public data releases from the three genome sequencing efforts in the Solanaceae. We obtain data as it is released to GenBank from the Interna’onal Tomato Genome Sequencing Project which includes gene models annotated by the project members, and the Interna’onal Potato Genome Sequencing Consor’um (of which we are members) which to date has released assemblies with no gene annota’ons. Our annota’on and analysis pipeline provides gene models for genes present on these assemblies, supplementary to any previously annotated gene models present in the public data. A Community Resource As a component of our project we aim to provide a web portal that, in addi’on to presen’ng results from our compara’ve analyses, acts as a unified repository for genomic and transcriptomic data, and related bioinforma’c resources for the Solanaceae, and thereby improves the accessibility of this data to the Solanaceae community. AnnotaIon/Analysis Pipelines We retrieve all publicly available Solanaceae genomic sequences from GenBank, and the sequences are run through the GMOD MAKER gene annota’on pipeline to provide a common set of evidencesupported gene model predic’ons; these supplement the models previously annotated (if any) on the public assemblies. Our transcriptomic analyses are performed on transcript assemblies generated by PlantGDB (PUTs). Some of the analyses we perform on genomic and transcriptomic sequence include: Ortholog/paralog predic’on by best hit and OrthoMCL clustering SSR iden’fica’on in transcript and genome sequence, and genera’on of primers (using Primer3) Iden’fica’on of puta’ve SNPs in transcript assemblies Alignment of PlantGDBassembled Solanaceae transcripts (PUTs) to the genomic sequence using exonerate Alignment of UniProt's SwissProt & UniRef protein databases to the genomic sequence using exonerate BLASTP of Solanaceae gene models against model dicot proteomes (Arabidopsis, Grapevine, Poplar) InterProScan search on the models to iden’fy func’onal domains Repeat feature predic’on (using RepeatMasker) ncRNA feature predic’on (using tRNAscanSE and RNAmmer) Integrated and Accessible Data Available sequence data, analysis results, and tools for species in the Solanaceae are presented in centralized views on the project site to aid users in applying these resources in their research. At the genome level, our species overview page consolidates available sequence data, genome informa’on and resources, and lists available analysis results and tools. At the transcript level, our gene overview page presents a summary of gene informa’on and analyses, such as BLAST results, computa’onally predicted SNPs, SSRs, orthology/paralogy, and links transcripts to other site resources including our genome browsers. Solanaceae ComparaIve Genome Database Our database contains annota’on and compara’ve data for all public Solanaceae genomic sequence assemblies. We currently use the GMOD Generic Genome Browser (Gbrowse) to facilitate the webbased display and searching of our annota’on and compara’ve analyses. Potato Genome Sequencing ConsorIum Potato DraK Genome Browser As members of the Potato Genome Sequencing Consor’um we are hos’ng the public Potato genome browser. Presently, the doubled monoploid Solanum phureja DM13 516R44 (CIP801092) v3.2 genome assembly and annota’on is online. Visit h\p://potatogenome.net for details on this draj genome release. In the genome browser all aligned Solanaceae transcript assemblies are linked to the the full set of resources associated with those assemblies provided by the Solanaceae Genomics Resource site. Upcoming Features for 2010/2011 We expect that the finished Potato and Tomato genomes will be released to the public sequence databases in the coming months. At that ’me, we will integrate the complete genomes into our exis’ng resources, and will make available addi’onal tools and analysis results; one of these new tools will be a genome synteny viewer. We have produced a significant amount of RNASeq data from our par’cipa’on in the Potato Genome Sequencing Consor’um (PGSC) h\p://potatogenome.net and Solanaceae Coordinated Agricultural Project (SolCAP) h\p://solcap.msu.edu , and when publicly released it will be incorporated into the Solanaceae Genomics Resource databases and tools. This data will greatly expand our exis’ng SNP database tool, and we will provide new tools for the query and display of expression data. Coming Soon

Transcript of 2010-09-03.Whitty_B.Solanaceae_Genomics_

Page 1: 2010-09-03.Whitty_B.Solanaceae_Genomics_

Solanaceae  Genomics  Resource  Bre1  R.  Whi1y,  C.  Robin  Buell,  Michigan  State  University,    

Department  of  Plant  Biology,  East  Lansing  MI  48824  

Collec'vely,  the  Solanaceae  (a  family  which  includes  Potato,  Tomato,  Tobacco,  Pepper,  Eggplant  and  Petunia)  are  a  valuable  component  of  U.S.  agriculture.  The  major  Solanaceae  crop  species  share  both  sequence  iden'ty  and  gene  order  thereby  providing  the  basis  for  leveraging  genomic  resources  across  taxa.  Transcriptome  and  genome  sequencing  projects  have  been  ini'ated  for  the  major  crop  species;  albeit  none  of  the  three  genome  ini'a'ves  (potato,  tomato,  tobacco)  have  yet  released  to  the  public  a  high  quality,  finished  complete  genome  sequence.    Thus,   it   is  essen'al  that  all  of  the  par'al  Solanaceae  transcript  and  genome  sequence  be   integrated  at  the  family  level  and  linked  to  other  model  dicot  species  to  provide  contextual  informa'on  on  the  puta've  func'on  of  Solanaceae  homologs.  In  this  project,  we  are  working  to  iden'fy  puta've  orthologs,   paralogs,   and   lineage-­‐specific   genes  within   the   Solanaceae   to   facilitate   intra-­‐   and   inter-­‐species   comparisons.  We  also   iden'fy  homologs  of  Solanaceae  species  within  three  dicot  species  (Arabidopsis,  Poplar,  Grapevine)  to  permit  leveraging  resources  from  these  model  species  to  the  Solanaceae.  We  are  working  to  generate  compara've  analyses,  alignments,  views,  and  displays  of   the  Solanaceae.  Overall,  we  provide  a   robust  and   integrated  compara've  genomics   resource   that  permits  broad  and  deep  data-­‐mining  of  Solanaceae  sequences  by  the  community.    

This  project  was  ini'ated  January  1,  2008  and  we  con'nue  to  update  project  data  quarterly,  and  develop  addi'onal  resources  and  tools  for  the  Solanaceae  community.  It  is  supported  by  the  Na'onal  Research  Ini'a've  (NRI)  Plant  Genome  Program  of  the  USDA  Na'onal  Ins'tute  of  Food  and  Agriculture  (NIFA)  (2008-­‐35300-­‐18671).  

All  project  data  is  made  available  through  our  web  site:  

h\p://solanaceae.plantbiology.msu.edu  Project  email  address:  [email protected]  

Model  Dicot  ComparaIve  Genome  Databases  Alignments  of  Solanaceae  transcript  assemblies  against  model  dicot  (Arabidopsis,  Grapevine,  Poplar)  genomic  and  polypep'de  sequence  are  available  for  display  and  search  in  a  Gbrowse  database.  

Potato,  Tomato  and  Tobacco  DraK  Genomes  Our  analyses  and  databases  include  all  public  data  releases  from  the  three  genome  sequencing  efforts  in  the  Solanaceae.  We  obtain  data  as  it  is  released  to  GenBank  from  the  Interna'onal  Tomato  Genome  Sequencing  Project   which   includes   gene   models   annotated   by   the   project   members,   and   the   Interna'onal   Potato  Genome  Sequencing  Consor'um  (of  which  we  are  members)  which  to  date  has  released  assemblies  with  no  gene  annota'ons.  Our  annota'on  and  analysis  pipeline  provides  gene  models   for  genes  present  on  these  assemblies,  supplementary  to  any  previously  annotated  gene  models  present  in  the  public  data.    

A  Community  Resource    As  a  component  of  our  project  we  aim  to  provide  a  web  portal  that,  in  addi'on  to  presen'ng  results   from   our   compara've   analyses,   acts   as   a   unified   repository   for   genomic   and  transcriptomic   data,   and   related   bioinforma'c   resources   for   the   Solanaceae,   and   thereby  improves  the  accessibility  of  this  data  to  the  Solanaceae  community.  

AnnotaIon/Analysis  Pipelines  We  retrieve  all  publicly  available  Solanaceae  genomic  sequences  from  GenBank,  and  the  sequences  are  run  through  the  GMOD  MAKER  gene  annota'on  pipeline  to  provide  a  common  set  of  evidence-­‐supported  gene  model   predic'ons;   these   supplement   the  models   previously   annotated   (if   any)   on   the   public   assemblies.  Our  transcriptomic  analyses  are  performed  on  transcript  assemblies  generated  by  PlantGDB  (PUTs).    

Some  of  the  analyses  we  perform  on  genomic  and  transcriptomic  sequence  include:  •   Ortholog/paralog  predic'on  by  best  hit  and  OrthoMCL  clustering  •   SSR  iden'fica'on  in  transcript  and  genome  sequence,  and  genera'on  of  primers  (using  Primer3)  •   Iden'fica'on  of  puta've  SNPs  in  transcript  assemblies  •   Alignment  of  PlantGDB-­‐assembled  Solanaceae  transcripts  (PUTs)  to  the  genomic  sequence  using  exonerate  •   Alignment  of  UniProt's  SwissProt  &  UniRef  protein  databases  to  the  genomic  sequence  using  exonerate  •   BLASTP  of  Solanaceae  gene  models  against  model  dicot  proteomes  (Arabidopsis,  Grapevine,  Poplar)  •   InterProScan  search  on  the  models  to  iden'fy  func'onal  domains  •   Repeat  feature  predic'on  (using  RepeatMasker)  

•   ncRNA  feature  predic'on  (using  tRNAscan-­‐SE  and  RNAmmer)  

Integrated  and  Accessible  Data  Available   sequence   data,   analysis   results,   and   tools   for   species   in   the   Solanaceae   are   presented   in  centralized   views   on   the   project   site   to   aid   users   in   applying   these   resources   in   their   research.   At   the  genome   level,   our   species   overview  page   consolidates   available   sequence   data,   genome   informa'on   and  resources,   and   lists   available   analysis   results   and   tools.   At   the   transcript   level,   our   gene   overview   page  presents   a   summary   of   gene   informa'on   and   analyses,   such   as   BLAST   results,   computa'onally   predicted  SNPs,  SSRs,  orthology/paralogy,  and  links  transcripts  to  other  site  resources  including  our  genome  browsers.    

Solanaceae  ComparaIve  Genome  Database  Our   database   contains   annota'on   and   compara've   data   for   all   public   Solanaceae   genomic   sequence  assemblies.  We  currently  use   the  GMOD  Generic  Genome  Browser   (Gbrowse)   to   facilitate   the  web-­‐based  display  and  searching  of  our  annota'on  and  compara've  analyses.  

Potato  Genome  Sequencing  ConsorIum  Potato  DraK  Genome  Browser  As   members   of   the   Potato   Genome   Sequencing   Consor'um   we   are   hos'ng   the   public   Potato   genome  browser.   Presently,   the   doubled   monoploid   Solanum   phureja   DM1-­‐3   516R44   (CIP801092)   v3.2   genome  assembly  and  annota'on  is  online.  Visit  h\p://potatogenome.net  for  details  on  this  draj  genome  release.  

In   the   genome   browser   all   aligned   Solanaceae   transcript   assemblies   are   linked   to   the   the   full   set   of  resources  associated  with  those  assemblies  provided  by  the  Solanaceae  Genomics  Resource  site.    

Upcoming  Features  for  2010/2011  We  expect  that  the  finished  Potato  and  Tomato  genomes  will  be  released  to  the  public  sequence  databases  in  the  coming  months.  At  that  'me,  we  will  integrate  the  complete  genomes  into  our  exis'ng  resources,  and  will  make  available  addi'onal  tools  and  analysis  results;  one  of  these  new  tools  will  be  a  genome  synteny  viewer.  

We   have   produced   a   significant   amount   of   RNA-­‐Seq   data   from   our   par'cipa'on   in   the   Potato   Genome  Sequencing  Consor'um   (PGSC)  h\p://potatogenome.net  and  Solanaceae  Coordinated  Agricultural  Project  (SolCAP)   h\p://solcap.msu.edu,   and   when   publicly   released   it   will   be   incorporated   into   the   Solanaceae  Genomics  Resource  databases  and  tools.  This  data  will  greatly  expand  our  exis'ng  SNP  database  tool,  and  we  will  provide  new  tools  for  the  query  and  display  of  expression  data.    

Coming Soon