The Genomics Education Partnership TA AnnotationWorkshop 2006
description
Transcript of The Genomics Education Partnership TA AnnotationWorkshop 2006
The Genomics Education
PartnershipTA
AnnotationWorkshop 2006
August 21-23
Funded by the Howard Hughes Medical Institute
WU Program Participants
Sarah Elgin, Prof Biology & GeneticsJeremy Buhler, Asst Prof Computer ScienceChris Shaffer, Biology, Senior Teaching FellowWilson Leung, Biology, Res. Asst, TA & Web MasterTaylor Cordonnier (Teaching Assistant & Lab Participant)
John Russell (Professor, Director of DBBS)Tricia Wallace (Tour Guide, WU Genome Sequencing Center)Undergraduate alumni of Bio 4342:
Kasia Falkowska, David DesruisseauWashington University Graduate Students
Michael Brooks (genetics/computational biology)
Deanna Mendez (biophysics/chromosomal proteins)
Sanjida Rangwala (genetics/plant genomes)
Participating Schools
Catherine Coyle-Thompson California State University - Northridge
Chunguang Du Montclair State University
Todd Eckdahl Missouri Western
Anya Goodman Cal Poly State University – San Luis Obispo
Charles Hauser St. Edward’s University
Karmella Haynes WU, Davidson College
Chris Jones Moravian College
Olga Ruiz Kopp Utah Valley State College
Gary Kuleck Loyola Marymount University
Jennifer Myka Thomas More College
Paul Overvoorde Macalester College
Debbie Parrilla-Hernandez Universidad de Puerto Rico en Humacao
Dennis Revie California Lutheran University
Stephanie Schroeder Webster University
Mary Shaw New Mexico Highlands University
Gary Skuse Rochester Institute of Technology
Colette Witkowski Southwest Missouri State
• Better integration of genomics into the undergraduate biology curriculum
• Better integration of research thinking into the academic year curriculum
• Creation of a dynamic student-scientist partnership to engage students in genomics research
Goals
• GOAL: To provide students the opportunity to work as a research team through a large-scale sequencing project.
• PROCESS: Students begin with sample preparation, data generation, finishing and quality control at the WU Genome Sequencing Center, and complete annotation and analysis with WU Computer Science faculty.
Virtual Tour of the Genome Sequencing Center - available on line, as CD, or DVD
• Web site: lecture notes, PowerPoint presentations, references, homework with answer keys, example student presentations
• Key analytical work is computer based
• Major resources for annotation, databases, are open access (NIH, UCSC, Ensembl)
Challenge: making it work at a distance, with your
curriculum
Choice of research problems?Comparative analysis of Drosophila dot chromosomes
D. erecta annotation; D. mojavensis sequencingAnnotation of corn genome?Gut bacteria genomes?
Requires lead scientist(s) committed to publication
To compare finished sequence from the dot chromosomes of D. melanogaster with D. virilis
Our ‘04-’06 research goal:
The sequencing “pipeline”
• Genomes enter the GSC as BAC or fosmid library
• Clones to be sequenced are selected • The GSC prepares ~2 kb libraries from each clone
• The 2 kb fragments are sequenced from each end (~700 bases each)
• Phred/Phrap assembles the sequenced fragments• Finishers use Consed, request additional data to generate a single, high-quality contig
• Annotation identifies sequence features of interest
• Future: start from posted unfinished sequence: annotate D. erecta, finish & annotate D. mojavensis
Current status, spring 2006
• 13 fosmids (~40 kb each) were selected to be made into libraries for sequencing
• Each student sequences and annotates one fosmid• 8 smaller gaps will be sequenced using a PCR-based method (summer work, Michelle & Taylor)
Finished sequence
D. virilis dot chromosome, reference strain
Remaining gaps
~12kb 15kb 8kb 9kb12kb 13kb 3kb
Chosen fosmids
Shotgun sequencing & assembly
genome
Assemble sequence reads
Shotgun (paired ends)
Additional sequence reads needed
scaffold
Initial assembly, 2-fold coverage
From 2X reads to 6X coverage….
• Three significant contigs
• All gaps spanned
• Fair coverage, but weak spots
GSC libraries for sequencing
plasmid
insert (2-4 kb)
primer
read
Sequence reads in a problem area-
a run of C’s…
Final Assembly
•40,809 base pairs
•438 reads
•Good coverage, no low quality regions
Final check:EcoRI digest, actual vs. in
silico
Annotation: analyzing
sequence data• Practice problem: genes and pseudogenes in man and chimpanzee
• Annotating Drosophila fosmid:– Finding genes– Finding repeats– Searching for conserved elements– Clustal analysis – Evaluating synteny
• Final challenge: putting it all together
Working as a group, with TA assistance, is most effective
Partnership can be effective. Work on adjacent fosmids?
Annotation: what do students gain by
analyzing sequence data?
• What tools are available for finding genes & other features of interest? How do they work? Managing data…
• How do you define a gene? a psuedogene?
• How are genomes organized? Repeats?• Power of comparative genomics • Questions of evolution
Initial analysis of D. virilis dot chromosome
fosmids
27/28 genes remain on the dot, but rearrangements within the chromosome are common!
Examples of genome organization in Drosophila
D.v. Dot
D.m. Dot
D.m. Arm
D.v. ArmCG10440Egfr
5KB
Egfr CG10440
Ephrin CG1970Pur-Alpha Thd1 Zfh2
Ephrin CG1970 Pur-Alpha Thd1 Zfh2
Coding UTR DNA Tranposons Other Repetitive
Dot chromosomes genes have larger introns due to repetitious DNA
Legend: Perc. D. virilis DotPerc. D. virilis Other
Perc. D. melanogaster DotPerc. D. melanogaster Other
Other Chromosomes
Dot Chromosomes
The dot chromosomes of D. melanogaster and D. virilis both have a high density of repeat sequences, but differ in type of repeats
1360 Elements
DINES
Other DNA Transposons
Unknown
Simple Repeats
Retroelements
Resulting publication:Slawson, E.E., Shaffer, C.D., Leung, W, Malone, C.D.,
Kellmann, E., Shevchek, R.B., Craig, C.A., Bloom, S.,
Bogenpohl, J. II, Dee, J., Morimoto, E.T.A., Myoung, J.,
Nett, A.S., Ozsolak, F., Tittiger, M.E., Zeug, A., Pardue,
M.L., Buhler, J., Mardis, E., and Elgin, S.C.R. (2006)
“Comparison of dot chromosome sequences from
D. melanogaster and D. virilis reveals an enrichment of
DNA transposon sequences in heterochromatic domains,”
Genome Biology 7: R15.
• But required ca. 10 months additional full-time work!
Assessment: Likert Scale(5 = Agree, 1 = Disagree)
• Before the course, I understood how the human genome had been sequenced: 2.5
• After the course, I understood… how the human genome had been sequenced: 4.9; … how eukaryotic genomes are organized 4.5; … nature of genes 4.4.
• The course helped me improve my wet lab skills: 2.5
• The course helped me improve my computer skills: 4.5
• Genomics is awesome! I love the power of databases! 4.8
Learning Gains from WU Lab Courses Compared to Summer Program Research Experiences
Mean ValuesScale: 1-5
1. Understanding of the research process
2. Understanding how knowledge is constructed
3. Ability to analyze data
4. Skill in interpretation of results
5. Understanding how scientists work on real problems
6. Assertions require supporting evidence
7. Skill in scientific writing
Data from Course Work (25) SURE 2003 (1135)
4.24
4.16
4.08
3.92
3.88
3.88
3.80
Learning Gains from WU Lab Courses Compared to Summer Program Research Experiences
Mean ValuesScale: 1-5
9. Tolerance for obstacles
10. Ability to integrate theory and practice
11. Learning lab skills
12. Clarification of a career path
13. Learning to work independently
14. Understanding primary literature
15. Learning ethical conduct
8. Readiness for more research
Data from Course Work (25) SURE 2003 (1135)
3.64
3.63
3.60
3.56
3.13
2.83
2.79
2.22
Comparison of Learning Gains from WU Lab Courses with Summer Research Experiences
Course Work SURE 2003 SURE 2004
Learning Gains
Mea
n va
lues
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Understanding knowledge construction
Skill in scientific writing
Learning to workindependently
What Students Say They Learned:
Oral presentation skills, defending ideas
Scientific writing
Why you do things, and how to choose a strategy
That research doesn’t always work, and goes slowly That research is collaborative
That science is more ambiguous than it appears in lectures
Things Students Said Helped Them Understand the Material Better:
Writing formal lab reports
Defending their work against challenges from others (in oral presentations)
Having lots of opportunities to ask questions
Doing trouble-shooting
Lessons Learned• Students need ownership; can come from the computer-based effort, does not require wet lab.
• Generating letter grades - use staged problem sets to teach techniques, record progress; periodic reports with written and oral defense of conclusions.
• Challenging - work always changing, requires time commitment; computer support important
• Quality of the experimental work is very good! Finished sequence, publishable data, conclusions. Good student-scientist partnership.
Goals for workshop….• Provide background experience in gene annotation; introduce computer-based training materials, problem sets; annotate a Drosophila gene
• Provide a review of genome sequencing, visit the WU Genome Sequencing Center
• Discuss your role as a TA• Discuss plan to facilitate data in / data out from WU
• Discuss communications plan - Wiki? Help contacts?
• Discuss present and future projects of the GEP