Biological sequence analysis and information processing by artificial neural networks Søren Brunak...
-
date post
18-Dec-2015 -
Category
Documents
-
view
223 -
download
0
Transcript of Biological sequence analysis and information processing by artificial neural networks Søren Brunak...
Biological sequence analysis and information processing by artificial neural networks
Søren Brunak
Center for Biological Sequence Analysis
Technical University of Denmark
Parvis alignment>carp Cyprinus carpio growth hormone 210 aa vs.
>chicken Gallus gallus growth hormone 216 aa
scoring matrix: BLOSUM50, gap penalties: -12/-2
40.6% identity; Global alignment score: 487
10 20 30 40 50 60 70
carp MA--RVLVLLSVVLVSLLVNQGRASDN-----QRLFNNAVIRVQHLHQLAAKMINDFEDSLLPEERRQLSKIFPLSFCNSD
:: . : ...:.: . : :. . :: :::.:.:::: :::. ..:: . .::..: .: .:: :.
chicken MAPGSWFSPLLIAVVTLGLPQEAAATFPAMPLSNLFANAVLRAQHLHLLAAETYKEFERTYIPEDQRYTNKNSQAAFCYSE
10 20 30 40 50 60 70 80
80 90 100 110 120 130 140 150
carp YIEAPAGKDETQKSSMLKLLRISFHLIESWEFPSQSLSGTVSNSLTVGNPNQLTEKLADLKMGISVLIQACLDGQPNMDDN
: ::.:::..:..: ..:::.:. ::.:: : : ::. .:.:. :. ... ::: ::. ::..:.. : .: .
chicken TIPAPTGKDDAQQKSDMELLRFSLVLIQSWLTPVQYLSKVFTNNLVFGTSDRVFEKLKDLEEGIQALMRELEDRSPR---G
90 100 110 120 130 140 150 160
170 180 190 200 210
carp DSLPLP-FEDFYLTM-GENNLRESFRLLACFKKDMHKVETYLRVANCRRSLDSNCTL
.: : .. : . . .:. : ... ::.:::::.:::::::.: .::: .::::.
chicken PQLLRPTYDKFDIHLRNEDALLKNYGLLSCFKKDLHKVETYLKVMKCRRFGESNCTI
170 180 190 200 210
Diversity of interactions in a network enables complex calculations
• Similar in biological and artificial systems
• Excitatory (+) and inhibitory (-) relations between compute units
Transfer of biological principles to neural network algorithms
• Non-linear relation between input and output
• Massively parallel information processing
• Data-driven construction of algorithms
• Ability to generalize to new data items
Simplest non-trivial classification problem
CNHSYYP, HIETRRA, NWQSADY, NQYSEPR, WHITRCA, DYHSANY, ...
• Two categories: positives and negatives• Data described by two features, e.g. charge, sidechain volume, molecular weight, number of atoms, ...
Features of phosphorylations sites
PKGcGMP-dep.kinase
PKC
CaM-IICa++/cal-modulin-dep. kinase
cdc2Cyclin-dep.kinase 2
CK-IICasein kinase 2
Transfer of biological principles to neural network algorithms
• Non-linear relation between input and output
• Massively parallel information processing
• Data-driven construction of algorithms
Sparse encoding of nucleotide sequence windows
Nucleotides
4 letter alphabet
Normally no need for a fifth letter
ACGTAGGCAATCTCAGACGTTTATC
1000010000100001100000100010010010001000000101000001010010000010100001000010000100010001100000010100