Production Priorities. Genome protein sets User Support Production systems change Database changes...
-
Upload
gilbert-snow -
Category
Documents
-
view
214 -
download
0
Transcript of Production Priorities. Genome protein sets User Support Production systems change Database changes...
![Page 1: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649f155503460f94c2a36b/html5/thumbnails/1.jpg)
Production Priorities
![Page 2: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649f155503460f94c2a36b/html5/thumbnails/2.jpg)
• Genome protein sets
• User Support
• Production systems change
• Database changes
• On-the-fly species gene associations
![Page 3: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649f155503460f94c2a36b/html5/thumbnails/3.jpg)
Genome protein sets (gp2protein)
• FASTA files of all proteins believed to occur from a genome, not just what is curated
• Provide standard defline format• These datasets would be the input for
Inparanoid and likely many other analysis projects
• [*Future*] Include ID mappings: UniProt, IPI, CCD, MOD IDs, GI, Protein_ID, RefSeq
• [*Future*] Mapping from proteins to gene
![Page 4: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649f155503460f94c2a36b/html5/thumbnails/4.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
S. cerevisiae D. discoideum D. melanogasterC. elegans A. thaliana M. musculus
D. rerio H. sapiens
Process
Function
Component
Only annotations with IMP, IDA, IPI, IGI and IEP
![Page 5: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649f155503460f94c2a36b/html5/thumbnails/5.jpg)
0
5,000
10,000
15,000
20,000
25,000
30,000
Estimated Gene Number
S. cerevisiaeD. discoideumD. melanogasterC. elegansA. thalianaM. musculusD. rerioH. sapiens
![Page 6: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649f155503460f94c2a36b/html5/thumbnails/6.jpg)
User SupportEmail Lists to report problems• GO
• GO-DATABASE
• GO-WEBMASTER
• GOFRIENDS
• GO-IN
• GO-TOP
• …
To which list do we want users to send questions, bug reports, …
![Page 7: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649f155503460f94c2a36b/html5/thumbnails/7.jpg)
Proposal
• Define specific email addresses for support• Supported Annotation Staff would be in a
rotation to monitor email queries.– Answer questions that can be done so
immediately– Forward questions to appropriate person or group
as necessary– Track resolution of the query
![Page 8: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649f155503460f94c2a36b/html5/thumbnails/8.jpg)
Production systems change
• Currently GODB & AmiGO run on main SGD database server
• Moving GO to cluster environment where there will be multiple GO DB servers and GO HTML servers
• Last year started building GO Lite DB three times a week. This means AmiGO is always using 2-4 day old data.
• More cluster nodes are on order that may allow us to do daily updates. (no need for GOTerm DB)
• CVS via HTTP
![Page 9: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649f155503460f94c2a36b/html5/thumbnails/9.jpg)
Potential Production DB Changes
• Build updating script, currently can only rebuild from scratch
• Switch to Chado• This would also allow the incorporation of new
data types• GOID & Term history tracking• Need to consider what files need to be archived• For some file types that are just derivative we
can just provide a script
![Page 10: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649f155503460f94c2a36b/html5/thumbnails/10.jpg)
Survey of GO File Downloads(43 replying)
0% 10% 20% 30% 40% 50%
GO-DEV
GA
mySQL
XML
OWL
Flat Ontology
![Page 11: Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649f155503460f94c2a36b/html5/thumbnails/11.jpg)
On-the-fly species gene associations
• Mainly useful for multi-species gene associations data sets, eg. GOA UniProt.