Biocurator 2012 poster P10

1
Community annotation with EcoliWiki and GONUTS Daniel Renfro, Brenley McIntosh, Deborah Siegele and Jim Hu Texas A&M University (College Station, TX) Abstract Community participation in content generation and maintenance for biological databases has long been viewed as a possible solution to the problems of cost and scalability that limit the classical model for biocuration. The success of Wikipedia has inspired a proliferation of biological wikis. EcoliWiki and GONUTS are two wikis that are designed for distinct but overlapping purposes. EcoliWiki is modeled on typical model organism databases, with a central component being gene-centric pages about genes, their products, expression and regulation, and evolution. GONUTS is a Gene Ontology browser and repository for term-specific usage notes for GO. It also supports community annotation for proteins with UniProt accessions. EcoliWiki and GONUTS share common wiki infrastructure for automated creation of pages from templates, handling references, and capturing tabular data to enable structured data mining. Both use the directed acyclic graph struture of mediawiki categories to capture relationships between pages. So far, the initial fear that wikis would introduce chaos into annotation has not been a problem. Instead, a common problem faced by wikis and other community annotation systems is that biologists have only weak incentives to participate in content curation. To increase participation and couple annotation to common career goals for academic biologists, we created the Community Assessment of Community Annotation with Ontologies (CACAO). In CACAO, biologist get teaching credit for having teams of students participate in GO annotation. Annotation is done as an intercollegiate competition on the GONUTS website, and annotations, along with student-generated notes are submitted to GO and UniProt after review by curators. CACAO leverages the expertise of students, faculty supervisors, and biocurators and could be a viable model for other kinds of community efforts. Adapting wikis for annotation Traditional models of community curation create barriers to user participation Contributions are invisible while gatekeepers evaluate them Partial information is discouraged Wikis provide immediate feedback and allow submission of smaller units of information But wikis are traditionally too unstructured for efficient extraction of structured data TableEdit is a mediawiki extension developed for EcoliWiki to address this problem Gene Ontology (GO) is the de facto ontology for functional annotation. GO annotations for Escherichia coli gene products can be added to EcoliWiki (http://ecoliwiki.net) while annotations for any protein in UniProt can be added to GONUTS (http://gonuts.tamu.edu) by any registered user. GO Annotations GONUTS & CACAO Scoreboard - Tracks team annotations, challenges and points in real time. Team and Individual Contributions - A table on each team’s page tracks annotations from team members. A similar table shows each annotation contributed by the individual biocurator. Assessment by Experienced Students - Graduate students or undergraduates who have completed at least 1 semester of CACAO initially assess every annotation as acceptable, unacceptable, requiring changes or requiring additional review by a professional biocurator. In addition, these students judge challenges and refinements. GONUTS & Electronic Jamborees "Because the breadth of expertise necessary to annotate a complete genome does not exist in any single individual or organization, we hosted an "Annotation Jamboree" involving more than 40 scientists from around the world, primarily from the Drosophila research community. Each was responsible for organizing and interpreting the gene set for a given protein family or biological process. Over a 2-week period, jamboree participants worked to define genes, to classify them according to predicted function, and to begin synthesizing information from a genome-wide perspective." - Adams et al. (2000) The genome sequence of Drosophila melanogaster. Science 287:2185-2195 Annotation jamborees were first described for the annotation of the Drosophila genome Having multiple investigators travel to a single site is hard. GONUTS allows the Reference Genome project of the GO consortium to organize annotation jamborees via conference calls and over the internet. Other groups can use GONUTS in similar ways. Genes of interest for an annotation jamboree are tagged in GONUTS. These tags allow a set of software tools to generate graphs and tables that compare the GO annotations for each gene in the group. Recruited via Institution GO consortium TAMU UCL Miss State Phage Meeting Mich State Penn State Wisconsin-Parkside N. Dakota State Central Florida PortEco Steering Committee Swarthmore Wisconsin ASM General Meeting Hofstra ASM CUE N. Texas TAMU Seminar speakers Miami Ohio Other Houston Baptist Growth in CACAO Activity Participation Recruitment Challenges and Rebuttals – Submitted challenges are displayed in a table that allows for multiple challenges and rebuttals.

Transcript of Biocurator 2012 poster P10

Page 1: Biocurator 2012 poster P10

Community annotation with EcoliWiki and GONUTSDaniel Renfro, Brenley McIntosh, Deborah Siegele and Jim Hu

Texas A&M University (College Station, TX)

AbstractCommunity participation in content generation and maintenance for biological databases has long been viewed as a possible solution to the problems of cost and scalability that limit the classical model for biocuration. The success of Wikipedia has inspired a proliferation of biological wikis. EcoliWiki and GONUTS are two wikis that are designed for distinct but overlapping purposes. EcoliWiki is modeled on typical model organism databases, with a central component being gene-centric pages about genes, their products, expression and regulation, and evolution. GONUTS is a Gene Ontology browser and repository for term-specific usage notes for GO. It also supports community annotation for proteins with UniProt accessions. EcoliWiki and GONUTS share common wiki infrastructure for automated creation of pages from templates, handling references, and capturing tabular data to enable structured data mining. Both use the directed acyclic graph struture of mediawiki categories to capture relationships between pages.

So far, the initial fear that wikis would introduce chaos into annotation has not been a problem. Instead, a common problem faced by wikis and other community annotation systems is that biologists have only weak incentives to participate in content curation. To increase participation and couple annotation to common career goals for academic biologists, we created the Community Assessment of Community Annotation with Ontologies (CACAO). In CACAO, biologist get teaching credit for having teams of students participate in GO annotation. Annotation is done as an intercollegiate competition on the GONUTS website, and annotations, along with student-generated notes are submitted to GO and UniProt after review by curators. CACAO leverages the expertise of students, faculty supervisors, and biocurators and could be a viable model for other kinds of community efforts.

Adapting wikis for annotation• Traditional models of community curation create barriers to user participation

• Contributions are invisible while gatekeepers evaluate them• Partial information is discouraged

• Wikis provide immediate feedback and allow submission of smaller units of information• But wikis are traditionally too unstructured for efficient extraction of

structured data• TableEdit is a mediawiki extension developed for EcoliWiki to address this

problem

Gene Ontology (GO) is the de facto ontology for functional annotation. GO annotations for Escherichia coli gene products can be added to EcoliWiki (http://ecoliwiki.net) while annotations for any protein in UniProt can be added to GONUTS (http://gonuts.tamu.edu) by any registered user.

GO Annotations

GONUTS & CACAO

Scoreboard - Tracks team annotations, challenges and points in real time.

Team and Individual Contributions - A table on each team’s page tracks annotations from team members. A similar table shows each annotation contributed by the individual biocurator.

Assessment by Experienced Students- Graduate students or undergraduates who have completed at least 1 semester of CACAO initially assess every annotation as acceptable, unacceptable, requiring changes or requiring additional review by a professional biocurator. In addition, these students judge challenges and refinements.

GONUTS & Electronic Jamborees

"Because the breadth of expertise necessary to annotate a complete genome does not exist in any single individual or organization, we hosted an "Annotation Jamboree" involving more than 40 scientists from around the world, primarily from the Drosophila research community. Each was responsible for organizing and interpreting the gene set for a given protein family or biological process. Over a 2-week period, jamboree participants worked to define genes, to classify them according to predicted function, and to begin synthesizing information from a genome-wide perspective."- Adams et al. (2000) The genome sequence of Drosophila melanogaster. Science 287:2185-2195

Annotation jamborees were first described for the annotation of the Drosophila genome

Having multiple investigators travel to a single site is hard. GONUTS allows the Reference Genome project of the GO consortium to organize annotation jamborees via conference calls and over the internet.

Other groups can use GONUTS in similar ways.

Genes of interest for an annotation jamboree are tagged in GONUTS. These tags allow a set of software tools to generate graphs and tables that compare the GO annotations for each gene in the group.

Recruited via Institution

GO consortium TAMUUCLMiss State

Phage Meeting Mich StatePenn StateWisconsin-ParksideN. Dakota StateCentral Florida

PortEco Steering Committee SwarthmoreWisconsin

ASM General Meeting Hofstra

ASM CUE N. Texas

TAMU Seminar speakers Miami Ohio

Other Houston Baptist

Growth in CACAO Activity

Participation Recruitment

Challenges and Rebuttals – Submitted challenges are displayed in a table that allows for multiple challenges and rebuttals.