01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND Proteome Analyst: Accelerating...

15
01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND Proteome Analyst: Accelerating Protein Research www.cs.ualberta.ca/ ~bioinfo
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of 01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND Proteome Analyst: Accelerating...

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND

Proteome Analyst:Accelerating Protein Research

www.cs.ualberta.ca/~bioinfo

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND1953

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND

1990

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND

$3 billion and 13 years later…

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND

White House, 2000. Courtesy, Reuters.

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFINDDNA Sequence

1 cctcgcccgc ctgccgcctt tttgtgcgcg tgtgagtgtg ggccccagcg tgccctcccg 61 ggggtgggtt ccgggcggaa ggcggaggcc cggcgcgcag cccgccgccc gcctgcccgc 121 ggaccgggga gccggggtgc ttggagcggg ggacgccagg cgtgggctgg cggcgggacc 181 aggaggagga ggaggaggag gaggagagcg cgggctggcg cttgcccggg cgcagtcggc 241 ggggaccgag tcgtacttcc tgtgcgaaag gcggcccgac cctaaccgcc accccctccc 301 cctgtctccc tctctgaacc cgcccattgg gggtaggaca ctcagccgtc accgctcgct 361 ctgctggccg ctacctgcag caagataggg ccgccatcgc cgggcgacga cgaggaggag 421 gcggccgccg cagccggggc ccccgccgcc gccggagcga caggtgattt ggcttctgca 481 cagttaggag gagcaccaaa ccgatgggag gttttgtcag ccacacctac aactataaaa 541 gatgaagctg gtaatctagt ccagattcca agtgctgcta cttcaagtgg gcagtatgtt 601 cttccccttc agaatttgca gaatcaacaa atattttccg ttgcaccagg atcagattca 661 tcaaatggta cagtgtccag tgttcaatat caagtgatac cacagatcca gtcagcagat 721 ggtcagcagg ttcaaattgg tttcacaggc tcttcagata atgggggtat aaatcaagaa 781 agcagtcaaa ttcagatcat tcctggctct aatcaaacct tacttgcctc tggaacacct 841 tctgctaaca tccagaatct cataccacag actggtcaag tccaggttca gggagttgca 901 attggtggtt catcttttcc tggtcaaacc caagtagttg ctaatgtgcc tcttggtctg 961 ccaggaaata ttacgtttgt accaatcaat agtgtcgatc tagattcttt gggactctcg 1021 ggcagttctc agacaatgac tgcaggcatt aatgccgacg gacatttgat aaacacagga 1081 caagctatgg atagttcaga caattcagaa aggactggtg agcgggtttc tcctgatatt 1141 aatgaaacta atactgatac agatttattt gtgccaacat cctcttcatc acagttgcct 1201 gttacgatag atagtacagg tatattacaa caaaacacaa atagcttgac tacatctagt

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFINDProtein Sequence

>UniProt/Swiss-Prot|P30613|KPYR_HUMAN

MSIQENISSLQLRSWVSKSQRDLAKSILIGAPGGPAGYLRRASVAQLTQELGTAFFQQQQ

LPAAMADTFLEHLCLLDIDSEPVAARSTSIIATIGPASRSVERLKEMIKAGMNIARLNFS

HGSHEYHAESIANVREAVESFAGSPLSYRPVAIALDTKGPEIRTGILQGGPESEVELVKG

SQVLVTVDPAFRTRGNANTVWVDYPNIVRVVPVGGRIYIDDGLISLVVQKIGPEGLVTQV

ENGGVLGSRKGVNLPGAQVDLPGLSEQDVRDLRFGVEHGVDIVFASFVRKASDVAAVRAA

LGPEGHGIKIISKIENHEGVKRFDEILEVSDGIMVARGDLGIEIPAEKVFLAQKMMIGRC

NLAGKPVVCATQMLESMITKPRPTRAETSDVANAVLDGADCIMLSGETAKGNFPVEAVKM

QHAIAREAEAAVYHRQLFEELRRAAPLSRDPTEVTAIGAVEAAFKCCAAAIIVLTTTGRS

AQLLSRYRPRAAVIAVTRSAQAARQVHLCRGVFPLLYREPPEAIWADDVDRRVQFGIESG

KLRGFLRVGDLVIVVTGWRPGSGYTNIMRVLSIS

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFINDAnnotation

• Knowledge of the DNA and protein sequences greatly accelerates lab research to discover protein function

• Time- and resource-intensive

• Human bottle-neck

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFINDSequence Database Growth

2 000 000

1 500 000

1 000 000

500 000

086 88 92 94 96 98 00 02 04

Protein Sequences

Year

UnnannotatedProtein

Sequences(GenPept)

Human AnnotatedProtein Sequences

(SwissProt)

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFINDProtein Annotation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.+

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFINDProteome Analyst

Proteome Analyst (PA):1. is a free, Web-based tool2. uses machine learning to make predictions;

can explain its predictions3. is very accurate (e.g., precision and recall)

Goal: Filter vast amounts of biological data and make meaningful predictions on the function and location of proteins; accelerate protein research.

01101010101011010110101010010T10DL001DR10ET01TLVLNVATLPAEKMKPFFIND