CS 6293 Advanced Topics: Current Bioinformatics Motif finding.
Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.
-
Upload
alexandra-howard -
Category
Documents
-
view
221 -
download
0
Transcript of Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.
![Page 1: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/1.jpg)
Topics in Bioinformatics
CS832b
Bin Ma
![Page 2: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/2.jpg)
Lecture 1: Basic
![Page 3: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/3.jpg)
Three molecules we will study
• DNA• A string over alphabet {A,C,G,T}
• RNA• Primary structure – a string over alphabet {A,C,G,U}
• Secondary and tertiary structures
• Protein• Primary structure – a string over alphabet
{A,R,N,D,C,Q,E,G,H,I,L,K,M,F,P,S,T,W,Y,V}
• Secondary and tertiary structures
![Page 4: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/4.jpg)
![Page 5: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/5.jpg)
![Page 6: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/6.jpg)
5’
5’ 3’
3’
![Page 7: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/7.jpg)
![Page 8: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/8.jpg)
DNA
5’…AGTAGCCTATGCGA…3’ …::::::::::::::…3’…TCATCGGATACGCT…5’
5’…AGTAGCCTATGCGA…3’
![Page 9: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/9.jpg)
>CHRXGATCACCTGACATCAGGAGTTCAAGACCAGCCTGCCAACGTGGTGAAACCCCATCTCTACTAAAAATAGGAAATTCACCTGGTGGCAGGTGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGAAGAATCGCTTGAACCCAGGAGGTGGAGATTGCACTGAGCTGAGATCACGCCACTGCGCTCCAGCCTGGGTGACAGAGCAAGACTCCATAAAAAAAAAAATTATAACCTAATGATTAAATACTGTAGGGAAGAGCTTACCACAATTGCTGGCCCATGGCCAATGCTGGGTATAAGACAGCTACTGCAAACAACCATGATGATGATACATCTCTTGTGTAGGGTTAGGTTGTTTGAGACACATTCTATGCTCCTTGATTTGATTGGAAGGTACCTTGGTTCCTTGGGGACTTGGAGGTGACGAAAGCCTCCCTGGGGACAAAACTCACCTTCACTTCTCTAATATCAAGCTTCAGCAACCTGCTCCAGCTACAGCACAGGGTTGGACAGGCCCAACAACAGAGGAAATCCACAAAGTGTGTCTTGACACATACATCCACGGGGTCTAACGAGGTGAGGCCAATGACTGCTTCCACACACCCCAGCCAGACTCTGACTTCACTCCCGGCAGGTTTCAGTAGACTTGGCAGCAGTTGGAGCGAGCTGGCTTCTTGCGGTAGGCAGCCATGTTGGAAGAGCTCCCAATAGTCCTCGTTTCCTGGTAATCTCATGCTTGGATCATCTTCTTCTCTTGAGTGAAGAGAAGAACTGCAGAGAGAGACAGAGACAGAGAGACAGATCACAGGGGCAGTTTCCCCCATACTGTTCTCAAGATAAATGAGTCAACTCTTACACCTCTTTTCTCTGGTGTAAAACAAGGCTGGTGAACAGGCAGAGAGAACTGGGGTGTTGGAGTAGCATTGACCTTCCTTCTTCATCCCTCTATAATCTCTCCTAGTGCAGGAGTAGGAAAACTAAAAATCACACGTCTGATCATCTGTGATCTCAGAGTCTTGGACAAGCCTTGCTTGCCAATCAGCAGGGATGGGAGTTGGAGCCATCTCCAAGTGTCCCCCCACAAATCTATGTCCACCTGGAAGTTTCAAATGCAACTTTATTTGGGAAAGGCAATTTTGCAAATGTTATTAAGTGAAGGATCTAGGGATGAGATCATCCTGGAGTAGGGTGGGTCCTAGGTCAAATGACAGGAAATCTGCCCACCTCGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCACCAAACCTGGCCTATCATTGATTTAATGATTAATACGGTTAGGCTCTGTGTCCCCACCCAAATCTCATCTCAAATTGTAATTCCCATGTGTCCAGGGAGGGAGCTTGTGGAAGGTGATTGGATCACAGGGGCAGTTTTTGTCATGCTGTTCTCATGATAAATGAGTCAATTCTCAGAAGAGATGATGGTTTTAAAGTGTGGCACTTCTTTGCTCTCTTGCTCTCTCTCTCTCCTGAGTAGACTGGCTCATTCTTTCTACTGGTTACAAGCAATAGAAGTGATAACAAAATTGATGGTTTCTCATTTCCTAAATGGTACCAGTGGATTCCTGGTTTCCTCTCTCTCTCTTCTCTCTCTCTATCAACTTTTCCCTCAATCTCTCTATCAACCTCCCTCTCTCTCAATCTCAATCTCTCTCAGTCTCATTCTCAATCTCTTTTGCTCAATCTCTTTCTCAGCTTCTCTCCCTCAATTTCTCTTTTGCAACTTCTCTCTCTCAGTCTGTGTCTCTCAATCTCCCTCTCTCAATCTCTCTTGTAGTCTCCCTGTCTCTCATACTCTCTCTGTTTCTGTCTGTCTCTGCCCTTGCTCTAGGGAAAGCAAGTTCTTATGCTGTAAGTTCTCCTGTAAAAAGGTCCACATGATACGGAACTGGCCATCTTTGGCCAACATGAGTGAGTTTAGAAGTGTGCCTTTCACCAGTTGAGCCTTCAAATGAGATCCCAGCCCTGGATGACACAGTGACAGTAACCTGCTAGGAACTGTGAACCAGAGGCACCCAGCCAAGCTGCTCCCAGACTCCCAACCCAGTGAAACCATAAGATAATAAATGCATGTTGTTTTAAGCTGCTAAGTTTGGGGGTCACTTGTTACACAGCAACAGCTGACTCATACATTTTCTTTGAAATTGATTTCCACTTCTGTCACCAGCATCATTCCATAAATTTGCTCTATGTGCATTGCTGACCTGCAGTAGAAGTTTTGGAGAAGTGAACCACATCCCCTTATCTGCCATTTGACAGCAAGCAGCCTCAAACATTCATAATTTCTTTCCTGACTCTCCACTCCACACTGTTGCCTGCCTTCCTGGTTCCAGATCTTTGGATCTGGACTGACACCTGGGCACTGTCATAGGCATCCGTGTGAAGAGACCACCAACAGGCTCTGTGTGAGCAATAAAGCTTTTTAATCACCTGGGTGCAGGTGGGCTGATTCTGAAAAGAGAGTCAGCAAAGAGTGGTGGGATTATCATTAGTTCTTATAGGTTCGGGATAGGTGGTGGAGTTAGGAGCAATTTTTTGTGGGCAGGGAGTGGATCTTACAAAGGACATTCTCAAGGGTGGGGATGATTTTACAAAGTACCTTCTTAAGGGCGGGGGAGGATATTACAAAGTACCTTCTCAAGGGTGGGGATGATTTTACAAAGTACCTTCTTAAGGGCGGGGGAGGATATTACAAAGTACCTTCTCAAGGGTGGGGGTGGATATTACAAAGTACCTTCTTAAGGGCAGGGGAGGATATTACAAAGTACCTTCTCAAGGGGGGGGATGATTTTACAAAGTACCTTCTTAAGGGCGGGGGAGGATATTACAAAGTACCTTCTCAAGGGTGGGGGTGGATATTAGAAAGTACCTTCT
• Chromosome X is one of the 23 chromosomes in human genome.• Chromosome X has 162 million base pairs.
![Page 10: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/10.jpg)
Genome Sizes
Species Size in bps
Amoeba dubia 670,000,000,000
Homo sapiens 3,400,000,000
Drosophila melanogaster 180,000,000
Mycoplasma genitalium 580,000
Human immunodeficiency virus type 1
9,750
![Page 11: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/11.jpg)
Protein and Amino Acids
![Page 12: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/12.jpg)
Protein
![Page 13: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/13.jpg)
Protein
GOT Ecoli
![Page 14: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/14.jpg)
A protein sequence
>gi|7228451|dbj|BAA92411.1| EST AU055734(S20025) corresponds to a region …
MCSYIRYDTPKLFTHVTKTPPKNQVSNSINDVGSRRATDRSVASCSSEKSVGTMSVKNASSISFEDIEKSISNWKIPKVN
IKEIYHVDTDIHKVLTLNLQTSGYELELGSENISVTYRVYYKAMTTLAPCAKHYTPKGLTTLLQTNPNNRCTTPKTLKWD
EITLPEKWVLSQAVEPKSMDQSEVESLIETPDGDVEITFASKQKAFLQSRPSVSLDSRPRTKPQNVVYATYEDNSDEPSI
SDFDINVIELDVGFVIAIEEDEFEIDKDLLKKELRLQKNRPKMKRYFERVDEPFRLKIRELWHKEMREQRKNIFFFDWYE
SSQVRHFEEFFKGKNMMKKEQKSEAEDLTVIKKVSTEWETTSGNKSSSSQSVSPMFVPTIDPNIKLGKQKAFGPAISEEL
VSELALKLNNLKVNKNINEISDNEKYDMVNKIFKPSTLTSTTRNYYPRPTYADLQFEEMPQIQNMTYYNGKEIVEWNLDG
FTEYQIFTLCHQMIMYANACIANGNKEREAANMIVIGFSGQLKGWWNNYLNETQRQEILCAVKRDDQGRPLPDRDGNGNP
TELKEGFHMEEKDEPIQEDDQVVGTIQKYTKQKWYAEVMYRFIDGSYFQHITLIDSGADVNCIREDEILDQLVQTKREQV
VNSIYLHDNSFPKSMDLPDQKITEKRAKLQDIPHHEERLLDYREKKSRDGQDKLPMEVEQSMATNKNTKILLRAWLLST
A protein sequence may have a few hundreds to several thousands amino acids.
![Page 15: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/15.jpg)
RNA
![Page 16: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/16.jpg)
Animal cell
Nucleus
Chromatin
Mitochondrion
Nucleolus (rRNA synthesized)
Plasma membraneCell coat
Cytoplasm
![Page 17: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/17.jpg)
Protein synthesis
![Page 18: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/18.jpg)
Protein synthesis
![Page 19: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/19.jpg)
Genetic code ..ATTCACAGTGGA..
I
H
S
G
![Page 20: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/20.jpg)
Notes on translation
• Reading frame• Start and end codon
• Third base not important
• 5’ -> 3’
![Page 21: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/21.jpg)
DNA replication
![Page 22: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/22.jpg)
The Central Dogma of Molecular Biology
DNA RNA Proteintranscript translation
replication
genotype phenotype
![Page 23: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/23.jpg)
Exception – retroviruses
DNA RNA Proteintranscript translation
replication
genotype phenotype
![Page 24: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/24.jpg)
ProteinPhenotype
DNA(Genotype)
Biology
![Page 25: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/25.jpg)
Genes• One gene encodes one protein (or sometimes
RNA).• Like a program, it starts with start codon (e.g.
ATG), then each three code one amino acid. Then a stop codon (e.g. TGA) signifies end of the gene.
• Genes are dense in prokaryotes and sparse in eukaryotes.
• In the middle of a eukaryotic gene, there are introns that are spliced out (as junk) after transcription. Good parts are called exons. This is the task of gene finding.
![Page 26: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/26.jpg)
Introns and Exons
![Page 27: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/27.jpg)
Jumping genes
• Genes can jump over other genes.
![Page 28: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/28.jpg)
Gene related diseases
• Hemophilia: on X chromosome.• Sickle-Cell Anemia: single nucleotide mutation in the first
exon of beta-globin gene (removes a cutting site). 1 in 12 African Americans are carriers. (sick for homozygotes)
• BRCA1 gene (chr. 17q) – responsible for ½ inherited breast cancer (10% of breast cancer)
• Fragile X syndrome (mentally retard) – 1 in 1250 males, 2500 females (dominate, but females have partially expressed good gene). FMR-1 gene: tri-nucleotide repeats >200 causes disease.
• P53 gene: chr. 17p, responsible for ½ of all cancers
![Page 29: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bfb61a28abf838c9dee7/html5/thumbnails/29.jpg)