The Genomics Education Partnership TA AnnotationWorkshop 2006

34
The Genomics Education Partnership TA AnnotationWorkshop 2006 August 21-23 Funded by the Howard Hughes Medical Institute

description

The Genomics Education Partnership TA AnnotationWorkshop 2006. August 21-23 Funded by the Howard Hughes Medical Institute. WU Program Participants. Sarah Elgin, Prof Biology & Genetics Jeremy Buhler, Asst Prof Computer Science Chris Shaffer, Biology, Senior Teaching Fellow - PowerPoint PPT Presentation

Transcript of The Genomics Education Partnership TA AnnotationWorkshop 2006

Page 1: The Genomics Education Partnership TA AnnotationWorkshop 2006

The Genomics Education

PartnershipTA

AnnotationWorkshop 2006

August 21-23

Funded by the Howard Hughes Medical Institute

Page 2: The Genomics Education Partnership TA AnnotationWorkshop 2006

WU Program Participants

Sarah Elgin, Prof Biology & GeneticsJeremy Buhler, Asst Prof Computer ScienceChris Shaffer, Biology, Senior Teaching FellowWilson Leung, Biology, Res. Asst, TA & Web MasterTaylor Cordonnier (Teaching Assistant & Lab Participant)

John Russell (Professor, Director of DBBS)Tricia Wallace (Tour Guide, WU Genome Sequencing Center)Undergraduate alumni of Bio 4342:

Kasia Falkowska, David DesruisseauWashington University Graduate Students

Michael Brooks (genetics/computational biology)

Deanna Mendez (biophysics/chromosomal proteins)

Sanjida Rangwala (genetics/plant genomes)

Page 3: The Genomics Education Partnership TA AnnotationWorkshop 2006

Participating Schools

Catherine Coyle-Thompson California State University - Northridge

Chunguang Du Montclair State University

Todd Eckdahl Missouri Western

Anya Goodman Cal Poly State University – San Luis Obispo

Charles Hauser St. Edward’s University

Karmella Haynes WU, Davidson College

Chris Jones Moravian College

Olga Ruiz Kopp Utah Valley State College

Gary Kuleck Loyola Marymount University

Jennifer Myka Thomas More College

Paul Overvoorde Macalester College

Debbie Parrilla-Hernandez Universidad de Puerto Rico en Humacao

Dennis Revie California Lutheran University

Stephanie Schroeder Webster University

Mary Shaw New Mexico Highlands University

Gary Skuse Rochester Institute of Technology

Colette Witkowski Southwest Missouri State

Page 4: The Genomics Education Partnership TA AnnotationWorkshop 2006

• Better integration of genomics into the undergraduate biology curriculum

• Better integration of research thinking into the academic year curriculum

• Creation of a dynamic student-scientist partnership to engage students in genomics research

Goals

Page 5: The Genomics Education Partnership TA AnnotationWorkshop 2006

• GOAL: To provide students the opportunity to work as a research team through a large-scale sequencing project.

• PROCESS: Students begin with sample preparation, data generation, finishing and quality control at the WU Genome Sequencing Center, and complete annotation and analysis with WU Computer Science faculty.

Page 6: The Genomics Education Partnership TA AnnotationWorkshop 2006

Virtual Tour of the Genome Sequencing Center - available on line, as CD, or DVD

• Web site: lecture notes, PowerPoint presentations, references, homework with answer keys, example student presentations

• Key analytical work is computer based

• Major resources for annotation, databases, are open access (NIH, UCSC, Ensembl)

Challenge: making it work at a distance, with your

curriculum

Page 7: The Genomics Education Partnership TA AnnotationWorkshop 2006

Choice of research problems?Comparative analysis of Drosophila dot chromosomes

D. erecta annotation; D. mojavensis sequencingAnnotation of corn genome?Gut bacteria genomes?

Requires lead scientist(s) committed to publication

Page 8: The Genomics Education Partnership TA AnnotationWorkshop 2006

To compare finished sequence from the dot chromosomes of D. melanogaster with D. virilis

Our ‘04-’06 research goal:

Page 9: The Genomics Education Partnership TA AnnotationWorkshop 2006

The sequencing “pipeline”

• Genomes enter the GSC as BAC or fosmid library

• Clones to be sequenced are selected • The GSC prepares ~2 kb libraries from each clone

• The 2 kb fragments are sequenced from each end (~700 bases each)

• Phred/Phrap assembles the sequenced fragments• Finishers use Consed, request additional data to generate a single, high-quality contig

• Annotation identifies sequence features of interest

• Future: start from posted unfinished sequence: annotate D. erecta, finish & annotate D. mojavensis

Page 10: The Genomics Education Partnership TA AnnotationWorkshop 2006

Current status, spring 2006

• 13 fosmids (~40 kb each) were selected to be made into libraries for sequencing

• Each student sequences and annotates one fosmid• 8 smaller gaps will be sequenced using a PCR-based method (summer work, Michelle & Taylor)

Finished sequence

D. virilis dot chromosome, reference strain

Remaining gaps

~12kb 15kb 8kb 9kb12kb 13kb 3kb

Chosen fosmids

Page 11: The Genomics Education Partnership TA AnnotationWorkshop 2006

Shotgun sequencing & assembly

genome

Assemble sequence reads

Shotgun (paired ends)

Additional sequence reads needed

scaffold

Page 12: The Genomics Education Partnership TA AnnotationWorkshop 2006

Initial assembly, 2-fold coverage

Page 13: The Genomics Education Partnership TA AnnotationWorkshop 2006

From 2X reads to 6X coverage….

• Three significant contigs

• All gaps spanned

• Fair coverage, but weak spots

Page 14: The Genomics Education Partnership TA AnnotationWorkshop 2006

GSC libraries for sequencing

plasmid

insert (2-4 kb)

primer

read

Page 15: The Genomics Education Partnership TA AnnotationWorkshop 2006

Sequence reads in a problem area-

a run of C’s…

Page 16: The Genomics Education Partnership TA AnnotationWorkshop 2006

Final Assembly

•40,809 base pairs

•438 reads

•Good coverage, no low quality regions

Page 17: The Genomics Education Partnership TA AnnotationWorkshop 2006

Final check:EcoRI digest, actual vs. in

silico

Page 18: The Genomics Education Partnership TA AnnotationWorkshop 2006

Annotation: analyzing

sequence data• Practice problem: genes and pseudogenes in man and chimpanzee

• Annotating Drosophila fosmid:– Finding genes– Finding repeats– Searching for conserved elements– Clustal analysis – Evaluating synteny

• Final challenge: putting it all together

Page 19: The Genomics Education Partnership TA AnnotationWorkshop 2006

Working as a group, with TA assistance, is most effective

Page 20: The Genomics Education Partnership TA AnnotationWorkshop 2006

Partnership can be effective. Work on adjacent fosmids?

Page 21: The Genomics Education Partnership TA AnnotationWorkshop 2006

Annotation: what do students gain by

analyzing sequence data?

• What tools are available for finding genes & other features of interest? How do they work? Managing data…

• How do you define a gene? a psuedogene?

• How are genomes organized? Repeats?• Power of comparative genomics • Questions of evolution

Page 22: The Genomics Education Partnership TA AnnotationWorkshop 2006

Initial analysis of D. virilis dot chromosome

fosmids

27/28 genes remain on the dot, but rearrangements within the chromosome are common!

Page 23: The Genomics Education Partnership TA AnnotationWorkshop 2006

Examples of genome organization in Drosophila

D.v. Dot

D.m. Dot

D.m. Arm

D.v. ArmCG10440Egfr

5KB

Egfr CG10440

Ephrin CG1970Pur-Alpha Thd1 Zfh2

Ephrin CG1970 Pur-Alpha Thd1 Zfh2

Coding UTR DNA Tranposons Other Repetitive

Page 24: The Genomics Education Partnership TA AnnotationWorkshop 2006

Dot chromosomes genes have larger introns due to repetitious DNA

Legend: Perc. D. virilis DotPerc. D. virilis Other

Perc. D. melanogaster DotPerc. D. melanogaster Other

Other Chromosomes

Dot Chromosomes

Page 25: The Genomics Education Partnership TA AnnotationWorkshop 2006

The dot chromosomes of D. melanogaster and D. virilis both have a high density of repeat sequences, but differ in type of repeats

1360 Elements

DINES

Other DNA Transposons

Unknown

Simple Repeats

Retroelements

Page 26: The Genomics Education Partnership TA AnnotationWorkshop 2006

Resulting publication:Slawson, E.E., Shaffer, C.D., Leung, W, Malone, C.D.,

Kellmann, E., Shevchek, R.B., Craig, C.A., Bloom, S.,

Bogenpohl, J. II, Dee, J., Morimoto, E.T.A., Myoung, J.,

Nett, A.S., Ozsolak, F., Tittiger, M.E., Zeug, A., Pardue,

M.L., Buhler, J., Mardis, E., and Elgin, S.C.R. (2006)

“Comparison of dot chromosome sequences from

D. melanogaster and D. virilis reveals an enrichment of

DNA transposon sequences in heterochromatic domains,”

Genome Biology 7: R15.

• But required ca. 10 months additional full-time work!

Page 27: The Genomics Education Partnership TA AnnotationWorkshop 2006

Assessment: Likert Scale(5 = Agree, 1 = Disagree)

• Before the course, I understood how the human genome had been sequenced: 2.5

• After the course, I understood… how the human genome had been sequenced: 4.9; … how eukaryotic genomes are organized 4.5; … nature of genes 4.4.

• The course helped me improve my wet lab skills: 2.5

• The course helped me improve my computer skills: 4.5

• Genomics is awesome! I love the power of databases! 4.8

Page 28: The Genomics Education Partnership TA AnnotationWorkshop 2006

Learning Gains from WU Lab Courses Compared to Summer Program Research Experiences

Mean ValuesScale: 1-5

1. Understanding of the research process

2. Understanding how knowledge is constructed

3. Ability to analyze data

4. Skill in interpretation of results

5. Understanding how scientists work on real problems

6. Assertions require supporting evidence

7. Skill in scientific writing

Data from Course Work (25) SURE 2003 (1135)

4.24

4.16

4.08

3.92

3.88

3.88

3.80

Page 29: The Genomics Education Partnership TA AnnotationWorkshop 2006

Learning Gains from WU Lab Courses Compared to Summer Program Research Experiences

Mean ValuesScale: 1-5

9. Tolerance for obstacles

10. Ability to integrate theory and practice

11. Learning lab skills

12. Clarification of a career path

13. Learning to work independently

14. Understanding primary literature

15. Learning ethical conduct

8. Readiness for more research

Data from Course Work (25) SURE 2003 (1135)

3.64

3.63

3.60

3.56

3.13

2.83

2.79

2.22

Page 30: The Genomics Education Partnership TA AnnotationWorkshop 2006

Comparison of Learning Gains from WU Lab Courses with Summer Research Experiences

Course Work SURE 2003 SURE 2004

Learning Gains

Mea

n va

lues

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Understanding knowledge construction

Skill in scientific writing

Learning to workindependently

Page 31: The Genomics Education Partnership TA AnnotationWorkshop 2006

What Students Say They Learned:

Oral presentation skills, defending ideas

Scientific writing

Why you do things, and how to choose a strategy

That research doesn’t always work, and goes slowly That research is collaborative

That science is more ambiguous than it appears in lectures

Page 32: The Genomics Education Partnership TA AnnotationWorkshop 2006

Things Students Said Helped Them Understand the Material Better:

Writing formal lab reports

Defending their work against challenges from others (in oral presentations)

Having lots of opportunities to ask questions

Doing trouble-shooting

Page 33: The Genomics Education Partnership TA AnnotationWorkshop 2006

Lessons Learned• Students need ownership; can come from the computer-based effort, does not require wet lab.

• Generating letter grades - use staged problem sets to teach techniques, record progress; periodic reports with written and oral defense of conclusions.

• Challenging - work always changing, requires time commitment; computer support important

• Quality of the experimental work is very good! Finished sequence, publishable data, conclusions. Good student-scientist partnership.

Page 34: The Genomics Education Partnership TA AnnotationWorkshop 2006

Goals for workshop….• Provide background experience in gene annotation; introduce computer-based training materials, problem sets; annotate a Drosophila gene

• Provide a review of genome sequencing, visit the WU Genome Sequencing Center

• Discuss your role as a TA• Discuss plan to facilitate data in / data out from WU

• Discuss communications plan - Wiki? Help contacts?

• Discuss present and future projects of the GEP