Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007.

Post on 03-Jan-2016

218 views 0 download

Transcript of Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007.

Aliya SadequeBIOC 599Supervisory Committee Meeting Wednesday December 19, 2007.

Outline

About me Thesis project blueprint Course selection

Curriculum Vitae

Queen’s University.Bachelor of Science (Honours) in Biochemistry. Minor in Computing.Graduated May, 2007

Previous Coursework Undergraduate Level

Biochemistry: Proteins and Enzymes Physical Biochemistry Metabolism Molecular Biology Introductory Biochemistry Laboratory Protein Structure and Function Current Topics in Biochemistry Biochemistry of the Cell Advanced Molecular Biology

Previous Coursework Undergraduate Level Computing:

Database Management Systems Neural and Genetic Computing Introduction to Data Mining System Level Programming Operating Systems

Undergraduate Level Mathematics: Introduction to Statistics Discrete Math for Computer Scientists Modeling Techniques in Biology

Thesis Project Blueprint Context

Why is this work necessary What kind of tools have been used to

address it

Longest Common Subsequence

Part I: Explore LCSs in poxvirus Visualization Threshold frequency equation

Part II: Develop an interface for use by biologists

Background

“Promoter sequences might be identified as conserved islands in a divergent sea”

Observed: 42-bp sequence showing “unusually high degree of sequence conservation” (Brunetti et al.) Are these claims reasonable? How can they be tested?

Tools Alignment 0 mismatch suffix tree Longest Common Subsequence

Algorithm

Visualization

Threshold FrequencyFigure 1. Table showing number of hits resulting from LCS trials with varying

values of n and k, or subsequence length and error number, respectively.

k = 1 k=2 k=3

length # solutions length # solutions length # solutions

10 118643 15 58492 51 27

12 63845 17 6554 52 24

13 23723 18 2105 53 20

14 5966 19 1004 54 16

15 1350 20 667 55 12

17 344 25 216 56 10

20 191 30 114 57 7

25 101 40 46 59 7

30 48 45 24 60 5

35 28 50 14 61 2

36 25 53 6 62 0

40 13 54 5 63 0

45 6 55 4 64 0

50 1 57 2 65 0

User Interface

Design with usability in mind Selection of inputs – What kind of

genomes can/will this tool be used for?

Format of results – How should these be presented in order to allow interpretation?

Visualization Further processing of output

Timeline

Part I: Poxvirus LCS data collection and analysis 2 months

Part II: Interface 4-6 months

Course Selection

BIOC 570 - completed MICR 502 - Virology Courses to sit in for:

Biochemistry courses? Computing courses?

Data mining Bioinformatics Statistics