01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND Proteome Analyst: Accelerating...
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of 01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND Proteome Analyst: Accelerating...
01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND
Proteome Analyst:Accelerating Protein Research
www.cs.ualberta.ca/~bioinfo
01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND
White House, 2000. Courtesy, Reuters.
01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFINDDNA Sequence
1 cctcgcccgc ctgccgcctt tttgtgcgcg tgtgagtgtg ggccccagcg tgccctcccg 61 ggggtgggtt ccgggcggaa ggcggaggcc cggcgcgcag cccgccgccc gcctgcccgc 121 ggaccgggga gccggggtgc ttggagcggg ggacgccagg cgtgggctgg cggcgggacc 181 aggaggagga ggaggaggag gaggagagcg cgggctggcg cttgcccggg cgcagtcggc 241 ggggaccgag tcgtacttcc tgtgcgaaag gcggcccgac cctaaccgcc accccctccc 301 cctgtctccc tctctgaacc cgcccattgg gggtaggaca ctcagccgtc accgctcgct 361 ctgctggccg ctacctgcag caagataggg ccgccatcgc cgggcgacga cgaggaggag 421 gcggccgccg cagccggggc ccccgccgcc gccggagcga caggtgattt ggcttctgca 481 cagttaggag gagcaccaaa ccgatgggag gttttgtcag ccacacctac aactataaaa 541 gatgaagctg gtaatctagt ccagattcca agtgctgcta cttcaagtgg gcagtatgtt 601 cttccccttc agaatttgca gaatcaacaa atattttccg ttgcaccagg atcagattca 661 tcaaatggta cagtgtccag tgttcaatat caagtgatac cacagatcca gtcagcagat 721 ggtcagcagg ttcaaattgg tttcacaggc tcttcagata atgggggtat aaatcaagaa 781 agcagtcaaa ttcagatcat tcctggctct aatcaaacct tacttgcctc tggaacacct 841 tctgctaaca tccagaatct cataccacag actggtcaag tccaggttca gggagttgca 901 attggtggtt catcttttcc tggtcaaacc caagtagttg ctaatgtgcc tcttggtctg 961 ccaggaaata ttacgtttgt accaatcaat agtgtcgatc tagattcttt gggactctcg 1021 ggcagttctc agacaatgac tgcaggcatt aatgccgacg gacatttgat aaacacagga 1081 caagctatgg atagttcaga caattcagaa aggactggtg agcgggtttc tcctgatatt 1141 aatgaaacta atactgatac agatttattt gtgccaacat cctcttcatc acagttgcct 1201 gttacgatag atagtacagg tatattacaa caaaacacaa atagcttgac tacatctagt
01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFINDProtein Sequence
>UniProt/Swiss-Prot|P30613|KPYR_HUMAN
MSIQENISSLQLRSWVSKSQRDLAKSILIGAPGGPAGYLRRASVAQLTQELGTAFFQQQQ
LPAAMADTFLEHLCLLDIDSEPVAARSTSIIATIGPASRSVERLKEMIKAGMNIARLNFS
HGSHEYHAESIANVREAVESFAGSPLSYRPVAIALDTKGPEIRTGILQGGPESEVELVKG
SQVLVTVDPAFRTRGNANTVWVDYPNIVRVVPVGGRIYIDDGLISLVVQKIGPEGLVTQV
ENGGVLGSRKGVNLPGAQVDLPGLSEQDVRDLRFGVEHGVDIVFASFVRKASDVAAVRAA
LGPEGHGIKIISKIENHEGVKRFDEILEVSDGIMVARGDLGIEIPAEKVFLAQKMMIGRC
NLAGKPVVCATQMLESMITKPRPTRAETSDVANAVLDGADCIMLSGETAKGNFPVEAVKM
QHAIAREAEAAVYHRQLFEELRRAAPLSRDPTEVTAIGAVEAAFKCCAAAIIVLTTTGRS
AQLLSRYRPRAAVIAVTRSAQAARQVHLCRGVFPLLYREPPEAIWADDVDRRVQFGIESG
KLRGFLRVGDLVIVVTGWRPGSGYTNIMRVLSIS
01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFINDAnnotation
• Knowledge of the DNA and protein sequences greatly accelerates lab research to discover protein function
• Time- and resource-intensive
• Human bottle-neck
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFINDSequence Database Growth
2 000 000
1 500 000
1 000 000
500 000
086 88 92 94 96 98 00 02 04
Protein Sequences
Year
UnnannotatedProtein
Sequences(GenPept)
Human AnnotatedProtein Sequences
(SwissProt)
01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFINDProtein Annotation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.+
01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFINDProteome Analyst
Proteome Analyst (PA):1. is a free, Web-based tool2. uses machine learning to make predictions;
can explain its predictions3. is very accurate (e.g., precision and recall)
Goal: Filter vast amounts of biological data and make meaningful predictions on the function and location of proteins; accelerate protein research.