New data and tools at TAIR
(The Arabidopsis Information Resource)
Overview of TAIR
Genome release
Published papers
Gene function
Journal collaborations
Direct submission
RNA-seq Proteomic Corrections
Other data:MarkersEcotypes
Gene symbolsNew genomes
New tools
ResearchersDirectly (TAIR pages)AND via other databases
TAIR10 Genome Release
Genome release
RNA-seq Proteomic Corrections • No assembly updates
• Will incorporate: – 200M Ecker and Mockler
RNA-seq reads– Additional proteomics data– Individual gene structure
corrections sent to us
Mapping and Assembly1. Mapping• RNA-seq sequences (Tophat (C. Trapnell),
Supersplat (T.C. Mockler))• Peptides (6-frame translation, spliced exon graph)
2. Assembly approaches• Augustus (M. Stanke)o Uses spliced RNA seq reads, peptideso Aim: Identify additional splice-variants, update existing
genes• TAU (T.C. Mockler)o Uses spliced RNA seq readso Aim: Identify additional splice-variants• Cufflinks (C. Trapnell)o Uses spliced and unspliced RNA seq datao Aim: Identify novel genes
Preliminary Results
Augustus/TAU/Cufflinks predicted models are classified into categories:
Novel genes 21
Updated genes 812
Splice-variants 2134
B-list 1586
Rejects 2318
TAIR10 Genome Release
Genome release
RNA-seq Proteomic Corrections • No assembly updates
• Will incorporate: – 200M Ecker and Mockler
RNA-seq reads– Additional proteomics data– Individual gene structure
corrections sent to us
• Release expected in August 2010
Experimentally Verified Gene Function
• From research articles read by TAIR curators
• From TAIR’s collaboration with journals
• From direct submissions by researchers to TAIR
Published papers
Gene function
Journal collaborations
Direct submission
Where does it come from???
• How?– Papers are prioritized
according to novelty of gene function results
– Highest priority papers are read and gene function is extracted
• Why?– A lot of high quality
experimental gene function information is only available in the form of articles
• How many?– About 1/3 of all new articles
containing gene function data are curated at TAIR each year
Published papers
Gene function
Literature Curation
• How?– Author instructions, Excel
sheet or online form
• Why?– To capture a larger fraction of
gene function data– Because publication is the
right time to get the data into TAIR
• What journals?
Gene function
Journal collaborations
Journal Collaboration
Journal Collaboration
• How?– Author instructions, Excel
sheet or online form
• Why?– To capture a larger fraction of
gene function data– Because publication is the
right time to get the data into TAIR
• What journals?
Gene function
Journal collaborations
2010:Journal of Integrative Plant BiologyJournal of Experimental BotanyPlant ScienceEnvironmental BotanyPlant Physiology and BiochemistryPlant, Cell and Environment
Plant Physiology (2008)
The Plant Journal (2009)
Journal Collaboration
Direct Submission of Gene Function
• How?– Excel sheet or online form
• Why?– To capture more data with a
small curation team– Because researchers are the
experts on the genes they study
Gene function
Direct submission
New online submission form
17986450
Why Gene Ontology?
• Standardization allows comparison across experiments and species
• Hierarchical structure allows high level categorization
• Well structured ontology framework facilitates computational analysis
• Attached to data source (peer reviewed published research)
• Experimental evidence can be distinguished from predictions
Example Gene Ontology annotations
Gene GO term Evidence Reference
Phot1 Phototropism Mutant phenotype Huala et al 1997
Phot1 Cytoplasm Direct assay Sakamoto et al 2002
Phot1 Serine / threonine kinase activity
Direct assay Christie et al 1998
Biological process
Cellular component
Molecular function
3 GO flavors
New online submission form
Autocomplete (just start typing to get a list of matching terms)
New online submission form
New online submission form
What is the result of TAIR’s effort to capture gene function?
• How many genes have experimental gene function in TAIR?
Published papers
Gene function
Journal collaborations
Direct submission
Num
ber
of g
enes
9342 genes (May 31 2010)
Genes in TAIR with experimental evidence for biological process, molecular function or cellular component
Arabidopsis Gene Function in TAIR
Year
Ge
nes
Protein coding genes
Predicted function
Experimental function
Ara-bidopsis
yeast worm fly ze-brafish
mouse rat0
1000
2000
3000
4000
5000
6000
7000
8000
Experimental GO Annotations
Biological Process
Cellular Component
Molecular Function
Organism
Nu
mb
er
of
gen
e p
rod
ucts
Overview of TAIR
Genome release
Published papers
Gene function
Journal collaborations
Direct submission
RNA-seq Proteomic Corrections
Other data:MarkersEcotypes
Gene symbolsNew genomes
New tools
ResearchersDirectly (TAIR pages)AND via other databases
GBrowse_syn
Tool by Sheldon McKay, CSHLAlignment data from Pedro Pattyn, Van de Peer lab, U. of Ghent
GBrowse_syn
A. lyrata
A. thaliana
poplar
NBrowse
Tool by H.-L. Kao, F. Piano, M. Schuman, M. Gibson, Kris Gunsalus, NYUInteraction datasets curated by TAIR, BioGRID and IntAct
NBrowse
Tool by H.-L. Kao, F. Piano, M. Schuman, M. Gibson, Kris Gunsalus, NYUInteraction datasets curated by TAIR, BioGRID and IntAct
NBrowse
Tool by H.-L. Kao, F. Piano, M. Schuman, M. Gibson, Kris Gunsalus, NYUInteraction datasets curated by TAIR, BioGRID and IntAct
Genes have been loaded
Working on adding some gene function information and improving searching
Arabidopsis lyrata
Overview of TAIR
Genome release
Published papers
Gene function
Journal collaborations
Direct submission
RNA-seq Proteomic Corrections
Other data:MarkersEcotypes
Gene symbolsNew genomes
New tools
ResearchersDirectly (TAIR pages)AND via other databases
Central registry for Gene Symbols
Central registry for Gene Symbols
Central registry for Gene Symbols
Central registry for Gene Symbols
Helpdesk
Helpdesk
Helpdesk
RSS news feed
RSS news feed
TAIR Facebook Page
TAIR Twitter Feed
Tanya Berardini Donghui Li
Gene Function/GO:
Bob Muller Larry Ploetz Chris Wilks (50%)
?
David Swarbreck Philippe Lamesch Rajkumar Sasidharan
Genome Annotation:
TAIR Staff
Tech Team:
Cynthia LeeShanker Singh
TAIR Sponsors:
Funding Agencies:
Host Institution: Partner:
Top Related