Production Priorities. Genome protein sets User Support Production systems change Database changes...

11
Production Priorities

Transcript of Production Priorities. Genome protein sets User Support Production systems change Database changes...

Page 1: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.

Production Priorities

Page 2: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.

• Genome protein sets

• User Support

• Production systems change

• Database changes

• On-the-fly species gene associations

Page 3: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.

Genome protein sets (gp2protein)

• FASTA files of all proteins believed to occur from a genome, not just what is curated

• Provide standard defline format• These datasets would be the input for

Inparanoid and likely many other analysis projects

• [*Future*] Include ID mappings: UniProt, IPI, CCD, MOD IDs, GI, Protein_ID, RefSeq

• [*Future*] Mapping from proteins to gene

Page 4: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.

0%

10%

20%

30%

40%

50%

60%

70%

S. cerevisiae D. discoideum D. melanogasterC. elegans A. thaliana M. musculus

D. rerio H. sapiens

Process

Function

Component

Only annotations with IMP, IDA, IPI, IGI and IEP

Page 5: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.

0

5,000

10,000

15,000

20,000

25,000

30,000

Estimated Gene Number

S. cerevisiaeD. discoideumD. melanogasterC. elegansA. thalianaM. musculusD. rerioH. sapiens

Page 6: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.

User SupportEmail Lists to report problems• GO

• GO-DATABASE

• GO-WEBMASTER

• GOFRIENDS

• GO-IN

• GO-TOP

• …

To which list do we want users to send questions, bug reports, …

Page 7: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.

Proposal

• Define specific email addresses for support• Supported Annotation Staff would be in a

rotation to monitor email queries.– Answer questions that can be done so

immediately– Forward questions to appropriate person or group

as necessary– Track resolution of the query

Page 8: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.

Production systems change

• Currently GODB & AmiGO run on main SGD database server

• Moving GO to cluster environment where there will be multiple GO DB servers and GO HTML servers

• Last year started building GO Lite DB three times a week. This means AmiGO is always using 2-4 day old data.

• More cluster nodes are on order that may allow us to do daily updates. (no need for GOTerm DB)

• CVS via HTTP

Page 9: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.

Potential Production DB Changes

• Build updating script, currently can only rebuild from scratch

• Switch to Chado• This would also allow the incorporation of new

data types• GOID & Term history tracking• Need to consider what files need to be archived• For some file types that are just derivative we

can just provide a script

Page 10: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.

Survey of GO File Downloads(43 replying)

0% 10% 20% 30% 40% 50%

GO-DEV

GA

mySQL

XML

OWL

Flat Ontology

Page 11: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.

On-the-fly species gene associations

• Mainly useful for multi-species gene associations data sets, eg. GOA UniProt.