Aliya Sadeque BIOC 599 Supervisory Committee Meeting
description
Transcript of Aliya Sadeque BIOC 599 Supervisory Committee Meeting
Aliya SadequeBIOC 599Supervisory Committee Meeting Wednesday December 19, 2007.
Outline
About me Thesis project blueprint Course selection
Curriculum Vitae
Queen’s University.Bachelor of Science (Honours) in Biochemistry. Minor in Computing.Graduated May, 2007
Previous Coursework Undergraduate Level
Biochemistry: Proteins and Enzymes Physical Biochemistry Metabolism Molecular Biology Introductory Biochemistry Laboratory Protein Structure and Function Current Topics in Biochemistry Biochemistry of the Cell Advanced Molecular Biology
Previous Coursework Undergraduate Level Computing:
Database Management Systems Neural and Genetic Computing Introduction to Data Mining System Level Programming Operating Systems
Undergraduate Level Mathematics: Introduction to Statistics Discrete Math for Computer Scientists Modeling Techniques in Biology
Thesis Project Blueprint Context
What do we know so far Why is this work important
LCS Hits curves
where does the number of hits explode Visualization
Where are these regions Further investigation of regions of interest
Promoter Prediction Tools
Existing tools: what’s out there? Developing a new tool
Visualization visualize all predicted promoters against LCS identified
regions
Context
Promoter sequences might be identified as conserved islands in a divergent sea”
Longest Common Subsequence
Longest Common Subsequence
subsequence length # solutions
51 0
52 0
60 0
50 1
45 6
40 13
36 25
35 28
30 48
25 101
20 191
17 344
15 1350
14 5966
13 23723
12 63845
10 118643
subsequence length # solutions
58 0
60 0
57 2
55 4
54 5
53 6
50 14
45 24
40 46
30 114
25 216
20 667
19 1004
18 2105
17 6554
15 58492
subsequence length
# solutions
62 0
63 0
64 0
65 0
61 2
60 5
57 7
59 7
56 10
55 12
54 16
53 20
52 24
51 27
Longest Common Subsequence Hits Curves
Figure 1 Hits curve (full range view)
-20000
0
20000
40000
60000
80000
100000
120000
140000
0 10 20 30 40 50 60 70
subsequence length (nts)
nu
mb
er
of
matc
hes
error 1 errors 2 errors 3
Longest Common Subsequence Hits Curves
Figure 2. Hits curve (under 50 matches)
-5
5
15
25
35
45
0 10 20 30 40 50 60 70
subsequence length (nts)
nu
mb
er
of
matc
hes
error 1 errors 2 errors 3
Promoter Prediction
Existing Tools Interpolated Context Modeling (ICM) Feedforward Neural Network
New ideas for promoter prediction Neural Networks
Drosophila tool GPCR tool
Data mining techniques WEKA
Other forms of computer learning
Course Selection
BIOC 570 - completed MICR 502 - Virology Courses to sit in for:
Biochemistry courses? Computing courses?
Data mining Bioinformatics
Questions for myself
Hits curves – what does it mean if they identify same region? Exact to approximate matches
Why is this study important What weka tools would be good?