Emotional Intelligence: Understanding what motivates people.
UNDERSTANDING INTELLIGENCE
description
Transcript of UNDERSTANDING INTELLIGENCE
![Page 1: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/1.jpg)
Predicting Protein Structures and Structural Features on a Genomic
Scale
Pierre BaldiSchool of Information and Computer Sciences
Institute for Genomics and BioinformaticsUniversity of California, Irvine
![Page 2: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/2.jpg)
UNDERSTANDING INTELLIGENCE
• Human intelligence (inverse problem)• AI (direct problem)• Choice of specific problems is key• Protein structure prediction is a good
problem
![Page 3: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/3.jpg)
PROTEINS
R1 R3
| |
Cα N Cβ Cα
/ \ / \ / \ / \N Cβ Cα N Cβ
| R2
![Page 4: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/4.jpg)
![Page 5: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/5.jpg)
Utility of Structural
Information
(Baker and Sali, 2001)
![Page 6: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/6.jpg)
CAVEAT
A L L -A L P H A A L L -B E TA
M E M B R A N E (2 5 % ) G L O B U L A R (7 5 % )
P R O TE IN S
![Page 7: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/7.jpg)
REMARKS• Structure/Folding• Backbone/Full Atom• Homology Modeling• Fold Recognition (Threading) • Ab Initio (Physical Potentials/Molecular
Dynamics, Statistical Mechanics/Lattice Models)
• Statistical/Machine Learning (Training Sets, SS prediction)
• Mixtures: ab-initio with statistical potentials, machine learning with profiles, etc.
![Page 8: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/8.jpg)
PROTEIN STRUCTURE PREDICTION (ab initio)
DECOMPOSITION INTO 3 PROBLEMS
1. FROM PRIMARY SEQUENCE TO SECONDARY
STRUCTURE AND OTHER STRUCTURAL FEATURES 2. FROM PRIMARY SEQUENCE AND STRUCTURAL
FEATURES TO TOPOLOGICAL REPRESENTATION 3. FROM TOPOLOGICAL REPRESENTATION TO 3D
COORDINATES
![Page 9: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/9.jpg)
![Page 10: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/10.jpg)
Helices
1GRJ (Grea Transcript Cleavage Factor From Escherichia Coli)
![Page 11: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/11.jpg)
Antiparallel β-sheets
1MSC (Bacteriophage Ms2 Unassembled Coat Protein Dimer)
![Page 12: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/12.jpg)
Parallel β-sheets
1FUE (Flavodoxin)
![Page 13: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/13.jpg)
Contact map
![Page 14: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/14.jpg)
Secondary structure prediction
![Page 15: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/15.jpg)
GRAPHICAL MODELS: BAYESIAN NETWORKS
• X1, … ,Xn random variables associated with the vertices of a DAG = Directed Acyclic Graph
• The local conditional distributions P(Xi|Xj: j parent of i) are the parameters of the model. They can be represented by look-up tables (costly) or other more compact parameterizations (Sigmoidal Belief Networks, XOR, etc).
• The global distribution is the product of the local characteristics:
P(X1,…,Xn) = Πi P(Xi|Xj : j parent of i)
![Page 16: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/16.jpg)
![Page 17: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/17.jpg)
![Page 18: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/18.jpg)
![Page 19: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/19.jpg)
![Page 20: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/20.jpg)
![Page 21: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/21.jpg)
DATA PREPARATION
Starting point: PDB data base. Remove sequences not determined by X ray diffraction. Remove sequences where DSSP crashes. Remove proteins with physical chain breaks (neighboring AA
having distances exceeding 4 Angstroms) Remove sequences with resolution worst than 2.5 Angstroms. Remove chains with less than 30 AA. Remove redundancy (Hobohm’s algorithm, Smith-Waterman,
PAM 120, etc.) Build multiple alignments (BLAST, PSI-BLAST, etc.)
![Page 22: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/22.jpg)
SECONDARY STRUCTURE PROGRAMS
DSSP (Kabsch and Sander, 1983): works by assigning potential backbone hydrogen bonds (based on the 3D coordinates of the backbone atoms) and subsequently by identifying repetitive bonding patterns.
STRIDE (Frishman and Argos, 1995): in addition to hydrogen bonds, it uses also dihedral angles.
DEFINE (Richards and Kundrot, 1988): uses difference distance matrices for evaluating the match of interatomic distances in the protein to those from idealized SS.
![Page 23: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/23.jpg)
SECONDARY STRUCTURE ASSIGNMENTS
DSSP classes: • H = alpha helix• E = sheet• G = 3-10 helix• S = kind of turn• T = beta turn• B = beta bridge• I = pi-helix (very rare)• C = the restCASP (harder) assignment: • α = H and G• β = E and B• γ = the restAlternative assignment: • α = H• β = B• γ = the rest
![Page 24: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/24.jpg)
ENSEMBLES
Profiles n s f o W Q3 residue
No 7 2 8 11 1611 68.7% No 9 2 8 11 1899 68.8% No 7 3 8 11 1919 68.6% No 8 3 9 11 2181 68.8% No 20 0 17 11 2821 67.7% Output 9 2 8 11 1899 72.6% Output 8 3 9 11 2181 72.7% Input 9 2 8 11 1899 73.37% Input 8 3 9 11 73.4% Input 12 3 9 10 2757 73.6% Input 7 3 8 11 1919 73.4% Input 8 3 9 10 2045 73.4% Input 12 3 9 11 2949 73.2%
![Page 25: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/25.jpg)
SSpro 1.0 and SSpro 2.0 on the 3 test sets, Q3
SSpro 1.0 SSpro 2.0R126 H 0.8079 0.8238
E 0.6323 0.6619C 0.8056 0.8126
Q3 0.7662 0.7813EVA H 0.8076 0.8248
E 0.625 0.6556C 0.7805 0.7903
Q3 0.76 0.7767CASP4 H 0.8386 0.8608
E 0.6187 0.6851C 0.8099 0.822
Q3 0.778 0.8065
![Page 26: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/26.jpg)
![Page 27: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/27.jpg)
![Page 28: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/28.jpg)
FUNDAMENTAL LIMITATIONS
100% CORRECT RECOGNITION IS PROBABLY IMPOSSIBLE FOR SEVERAL REASONS
• SOME PROTEINS DO NOT FOLD SPONTANEOUSLY OR MAY NEED CHAPERONES
• QUATERNARY STRUCTURE [BETA-STRAND PARTNERS MAY BE ON A DIFFERENT CHAIN]
• STRUCTURE MAY DEPEND ON OTHER VARIABLES [ENVIRONMENT, PH]
• DYNAMICAL ASPECTS • FUZZINESS OF DEFINITIONS AND ERRORS IN
DATABASES
![Page 29: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/29.jpg)
![Page 30: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/30.jpg)
![Page 31: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/31.jpg)
BB-RNNs
![Page 32: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/32.jpg)
2D RNNs
![Page 33: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/33.jpg)
2D INPUTS
• AA at positions i and j• Profiles at positions i and j• Correlated profiles at positions i and j• + Secondary Structure, Accessibility, etc.
![Page 34: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/34.jpg)
![Page 35: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/35.jpg)
PERFORMANCE (%)
6Å 8Å 10Å 12Å
non-contacts
99.9 99.8 99.2 98.9
contacts 71.2 65.3 52.2 46.6
all 98.5 97.1 93.2 88.5
![Page 36: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/36.jpg)
Protein ReconstructionUsing predicted secondary structure and predicted contact map
PDB ID : 1HCR, chain ASequence: GRPRAINKHEQEQISRLLEKGHPRQQLAIIFGIGVSTLYRYFPASSIKKRMNTrue SS : CCCCCCCCHHHHHHHHHHHCCCCHHHHHHHCECCHHHHHHHCCCCCCCCCCCPred SS : CCCCCCCHHHHHHHHHHHHCCCCHHHHEEHECHHHHHHHHCCCHHHHHHHCC
PDB ID: 1HCR
Chain A (52 residues)
Model # 147
RMSD 3.47Å
![Page 37: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/37.jpg)
Protein ReconstructionUsing predicted secondary structure and predicted contact map
PDB ID : 1BC8, chain CSequence: MDSAITLWQFLLQLLQKPQNKHMICWTSNDGQFKLLQAEEVARLWGIRKNKPNMNYDKLSRALRYYYVKNIIKKVNGQKFVYKFVSYPEILNMTrue SS : CCCCCCHHHHHHHHCCCHHHCCCCEECCCCCEEECCCHHHHHHHHHHHHCCCCCCHHHHHHHHHHHHHHCCEEECCCCCCEEEECCCCHHHCCPred SS : CCCHHHHHHHHHHHHHCCCCCCEEEEECCCEEEEECCHHHHHHHHHHHCCCCCCCHHHHHHHHHHHHHCCCEEECCCCEEEEEEECCHHHHCC
PDB ID: 1BC8
Chain C (93 residues)
Model # 1714
RMSD 4.21Å
![Page 38: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/38.jpg)
CASP6 Self AssessmentEvaluation based on GDT_TS of first submitted model
GDT_TS: Global Distance Test Total Score
GDT_TS = (GDT_P1 + GDT_P2 + GDT_P4 + GDT_P8 ) / 4
Pn : percentage of residues under distance cutoff n
![Page 39: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/39.jpg)
Hard Target Summary
N: number of targets predictedAv.R.: average ranksumZ: sum of Z scores on all targets in setsumZpos: sum of Z scores for predictions with positive Z score
group N Av.R. sumZ sumZposBAKER-ROBETTA 25 9.12 27.81 27.94baldi-group-server 24 10.04 20.47 22.33Rokky 25 11.60 17.56 18.46Pmodeller5 20 12.55 14.35 16.11ZHOUSPARKS2 25 15.00 12.67 15.91ACE 25 13.08 11.63 14.89Pcomb2 24 16.04 10.82 13.58RAPTOR 24 16.33 9.81 13.21zhousp3 25 15.72 9.30 12.90PROTINFO-AB 19 16.74 8.62 12.59
•Top 10 groups displayed, of 65 registered servers
•Assessment on 25 new fold and fold recognition analogous target domains
![Page 40: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/40.jpg)
Hard Target Summary•Top 10 groups displayed, of 65 registered servers
•Assessment on 19 new fold and fold recognition analogous target domains less than 120 residues
N: number of targets predictedAv.R.: average ranksumZ: sum of Z scores on all targets in setsumZpos: sum of Z scores for predictions with positive Z score
group N Av.R. sumZ sumZposbaldi-group-server 19 6.74 20.61 20.61BAKER-ROBETTA 19 9.11 20.44 20.57Rokky 19 12.11 12.30 13.20PROTINFO-AB 16 12.63 11.53 12.59ZHOUSPARKS2 19 15.32 8.91 11.87Pcomb2 18 15.39 9.54 11.48Pmodeller5 15 14.47 9.25 11.00PROTINFO 18 16.22 8.66 10.56ACE 19 14.21 6.98 10.24RAPTOR 18 17.00 7.22 9.64
![Page 41: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/41.jpg)
• Target Information– Length: 70 amino acids– Resolution: 1.52 Å– PDB code: 1WHZ– Description: Hypothetical Protein From Thermus Thermophilus Hb8– Domains: single domain
Target T0281Detailed Target Analysis
• Assessment– GDT_TS server rank of our 1st model: 2– GDT_TS: 51.07– RMSD to native: 6.15
![Page 42: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/42.jpg)
Target T0281Contact Map Comparison
True Map vs. Predicted Map True Map vs. Recovered Map
*note: true map is lower left
![Page 43: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/43.jpg)
Target T0281Structure Comparison
true structure predicted structure
![Page 44: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/44.jpg)
Target T0281Structure Comparison Superposition
True structure: thick trace
Predicted structure: thin trace
![Page 45: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/45.jpg)
• Target Information– Length: 51 amino acids– Resolution: 2.00 Å– PDB code: 1WD5– Description: Putative phosphoribosyl transferase, T. thermophilus– Domains: 2nd domain, residues 53-103 of 208 AA sequence
Target T0280_2Detailed Target Analysis
• Assessment– GDT_TS server rank of our 1st model: 1 (also 1st among human groups)– GDT_TS: 54.41– RMSD to native: 5.81
![Page 46: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/46.jpg)
Target T0280_2Contact Map Comparison
*note: true map is lower left
True Map vs. Predicted Map True Map vs. Recovered Map
![Page 47: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/47.jpg)
Target T0281Structure Comparison
true structure predicted structure
![Page 48: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/48.jpg)
Target T0281Structure Comparison Superposition
True structure: thick trace
Predicted structure: thin trace
![Page 49: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/49.jpg)
THE SCRATCH SUITEwww.igb.uci.edu
– DOMpro: domains– DISpro: disordered regions– SSpro: secondary structure – SSpro8: secondary structure– ACCpro: accessibility– CONpro: contact number– DI-pro: disulphide bridges– BETA-pro: beta partners– CMAP-pro: contact map– CCMAP-pro: coarse contact map– CON23D-pro: contact map to 3D– 3D-pro: 3D structure (homology + fold recognition + ab-initio)
![Page 50: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/50.jpg)
![Page 51: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/51.jpg)
SISQQTVWNQMATVRTPLNFDSSKQSFCQFSVDLLGGGISVDKTGDWITLVQNSPISNLLCCCECCCCCCEEEECCCCCCCCCCCCEEEEEEECCCCEEEECCCCCCEEEEECCHHHHHHCCCEEEEECEEEEECCCCCCCTCCCCEEEEEEEETCSEEEECTTTTEEEEEECCHHHHHH-----------+--------------+++++++++-+---------++++----++++++---------+-+++-----------++++++++++-+-++++---+++++++++++++++-----------+--++---------+++++++++----+++-----++++++++++++++---------+-++++++--------+++++++++---++++-----++++++++++++++eeeeee---e--e-e-eee-ee-eee---------e-e--eeeeee--------------
RVAAWKKGCLMVKVVMSGNAAVKRSDWASLVQVFLTNSNSTEHFDACRWTKSEPHSWELIHHHHHHCCCEEEEEEEEEECCEEECCCCCEEEEEEEECCCCCCCCCEEEEEECCCCCCCCHHHHHHTTCEEEEEEEEEEEEEEECCCCCEEEEEEEECCCTTCCCEEEEEEECCTCCEEE+++++--+++++++++++-+----------++++++---------+-+++----------+++----++++++++++++----------+++++++++------++++++++++-+-+--++++---+++++++++++++--+------+++++++++------++++++++++---+-+++++-+-++++++++++++----------+++++++++------++++++++++---+-+-----ee---e-------e-e-ee-e-e-e-----e--eeee--e-------e-e-ee-e
..
Solvent accessibility threshold: 25%PSI-BLAST hits : 24
..
Query served in 151 seconds
![Page 52: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/52.jpg)
Advantage of Machine Learning• Pitfalls of traditional ab-initio
approaches• Machine learning systems take time to
train (weeks).• Once trained however they can predict
structures almost faster than proteins can fold.
• Predict or search protein structures on a genomic or bioengineering scale .
![Page 53: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/53.jpg)
DAG-RNNs APPROACH
• Two steps:– 1. Build relevant DAG to connect inputs, outputs, and hidden
variables– 2. Use a deterministic (neural network) parameterization together
with appropriate stationarity assumptions/weight sharing—overall models remains probabilistic
• Process structured data of variable size, topology, and dimensions efficiently
• Sequences, trees, d-lattices, graphs, etc• Convergence theorems• Other applications
![Page 54: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/54.jpg)
![Page 55: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/55.jpg)
Convergence Theorems
• Posterior Marginals:σBNdBN in distributionσBNdBN in probability (uniformly)
• Belief Propagation:σBNdBN in distributionσBNdBN in probability (uniformly)
![Page 56: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/56.jpg)
Structural Databases
• PPDB = Poxvirus Proteomic Database
• ICBS = Inter Chain Beta Sheet Database
![Page 57: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/57.jpg)
![Page 58: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/58.jpg)
![Page 59: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/59.jpg)
![Page 60: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/60.jpg)
Strategies for drug design
O
HN
O
NHR O
RR
NH
O
O
HN
O
NH RO
R R
NH
OHN
HN
O
NHR
HN
R O
O
HN
O
NHR O
RR
NH
O
O
HN
O
NH RO
R R
NH
protein 2
O
protein 1
HN
protein 1
HN
O
NHR
protein 2
HN
R O
O
HN
O
NHR O
RR
NH
O HN
O
NHR
O
HN
O
NHR O
RR
NH
O HN
O
NHR
HN
O OHN
HN
HN
O OHN
HN
-sheet mimic -sheet mimic
protein 1 protein 1
• Block, modulate, mediate β-sheet interactions
• [Covalent modification of a chain to prevent β-sheet formation]
![Page 61: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/61.jpg)
![Page 62: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/62.jpg)
![Page 63: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/63.jpg)
Three-Stage Prediction of Protein Beta-Sheets Using Neural Networks, Alignments, and Graph
Algorithms
Jianlin Cheng and Pierre Baldi
School of Info. and Computer Sci.University of California Irvine
![Page 64: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/64.jpg)
Beta-Sheet Architecture
![Page 65: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/65.jpg)
Importance of Predicting Beta-Sheet Structure
• AB-Initio Structure Prediction• Fold Recognition• Model Refinement• Protein Design• Protein Stability
![Page 66: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/66.jpg)
Previous Work• Methods
– Statistical potential approach for strand alignment. (Hubbard, 1994; Zhu and Braun, 1999)
– Statistical potentials to improve beta-sheet secondary structure prediction.(Asogawa,1997)
– Information theory approach for strand alignment. (Steward and Thornton, 2000)
– Neural networks for beta-residue contacts. (Baldi, et.al, 2000)
• ShortcomingsFocus on one single aspect; not utilize structural contexts and evolutionary information; not exploit constraints enough; not publicly available.
![Page 67: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/67.jpg)
Three-Stage Prediction of Beta-Sheets
• Stage 1 Predict beta-residue pairings using 2D-
Recursive Neural Networks (2D-RNN).• Stage 2 Align beta-strands using alignment algorithms.• Stage 3 Predict beta-strand pairs and beta-sheet
architecture using graph algorithms.
![Page 68: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/68.jpg)
Dataset and StatisticsNum
Chains 916
Betaresidues
48,996
ResiduePairs
31,638
BetaStrand
10,745
StrandPairs
8,172
BetaSheet
2,533
![Page 69: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/69.jpg)
Stage 1: Prediction of Beta-Residue Pairings Using 2D-RNN
Input Matrix I (m×m)
2D-RNNO = f(I)
Target / Output Matrix (m×m)
(i,j)
i-2 i-1 i i+1 i+2 j-2 j-1 j j+1 j+2 |i-j|
20 profiles 3 SS 2 SA
Tij: 0/1Oij: Pairing Prob.
(i,j)
Iij
Total: 251 inputs
![Page 70: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/70.jpg)
An Example Target
Protein 1VJGBeta-Residue Pairing Map (Target Matrix)
![Page 71: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/71.jpg)
An Example Output
![Page 72: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/72.jpg)
Stage 2: Beta-Strand Alignment
• Use output probability matrix as scoring matrix
• Dynamic programming
• Disallow gaps and use simplified searching algorithms
1 m
n 1
1 m
1 n
Anti-parallel
Parallel
Total number of alignments = 2(m+n-1)
![Page 73: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/73.jpg)
Strand Alignment and Pairing Matrix
• The alignment score (Pseudo Binding Energy) is the sum of the probabilities of paired residues.
• The best alignment is the alignment with maximum score.
• Strand Pairing Matrix.
Strand Pairing Matrix of 1VJG
![Page 74: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/74.jpg)
Stage 3: Prediction of Beta-Strand Pairings and Beta-Sheet Architecture
Strand Pairing Constraints
![Page 75: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/75.jpg)
Minimum Spanning Tree Like Algorithm
Strand Pairing Graph (SPG)
Goal: Find a set of connected subgraphs that maximize the sum of pseudo-energy and satisfy the constraints.
Algorithm: Minimum Spanning Tree Like Algorithm.
![Page 76: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/76.jpg)
Example of MST Like Algorithm
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1234567
1 2 3 4 5 6 7
4 5
Strand Pairing Matrix of 1VJG
Assembly of beta-strandsStep 1: Pair strand 4 and 5
![Page 77: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/77.jpg)
Example of MST Like Algorithm
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1234567
1 2 3 4 5 6 7
4 5
2 1
Strand Pairing Matrix of 1VJGA
N
Assembly of beta-strandsStep 2: Pair strand 1 and 2
![Page 78: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/78.jpg)
Example of MST Like Algorithm
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1234567
1 2 3 4 5 6 7
4 5
2 1 3Strand Pairing Matrix of 1VJGA
N
Assembly of beta-strandsStep 3: Pair strand 1 and 3
![Page 79: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/79.jpg)
Example of MST Like Algorithm
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1234567
1 2 3 4 5 6 7
4 5
2 1 3 6Strand Pairing Matrix of 1VJGA
N
Assembly of beta-strandsStep 4: Pair strand 3 and 6
![Page 80: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/80.jpg)
Example of MST Like Algorithm
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1234567
1 2 3 4 5 6 7
4 5
2 1 3 67Strand Pairing Matrix of 1VJGA
N
C
Assembly of beta-strandsStep 5: Pair strand 6 and 7
![Page 81: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/81.jpg)
Beta-Residue Pairing Results
• Sensitivity = Specificity = 41%
• Base-line: 2.3%. Ratio of improvement = 17.8.
• ROC area: 0.86• At 5% FPR, TPR is 58% • CMAPpro: Spec. and Sens. is 27%. ROC
area:0.8. TPR=42% at 5% FPR.
![Page 82: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/82.jpg)
Strand Pairing Results
• Naïve algorithm of pairing all adjacent strands– Specificity = 42%– Sensitivity = 50%
• MST like algorithm– Specificity = 53%– Sensitivity = 59%– >20% correctly predicted strand pairs are non-adjacent
strand pairs
![Page 83: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/83.jpg)
Strand Alignment Results
Paring Direction
Align.All
Align.Anti-P
Align.Para.
Align.Bridge
Acc. 93% 72% 69% 71% 88%
On the correctly predicted pairs:
Pairing Direction
Align.All
Align.Anti-P
Align.Para.
Align.Bridge
Acc. 84% 66% 63% 66% 73%
On all native pairs:
•Pairing direction is 15% higher than of random algorithm.•Alignment accuracy is improved by >15%.
![Page 84: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/84.jpg)
Application and Future Work• New methods for beta-residue pairings (e.g. Linear
Programming, SVM), and strand alignment and pairings. More inputs (Punta and Rost, 2005).
• Applications– AB-Initio Structure Sampling (beta-sheet)– Fold Recognition (conservation of beta-sheets)– Contact Map– Model Refinement (pairing direction/alignment)
• Web server and datasethttp://www.ics.uci.edu/~baldig/betasheet.html
![Page 85: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/85.jpg)
A New Fold Example (CASP6)1S12 (T0201, 94 residues)
1 2 3 4 5
1 0 1.71 .05 .29 .33
2 0 .06 .41 .12
3 0 .22 .04
4 0 .53
5 0
CEEEEEECCEEEECCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHEHHCCCCEEEEHHHHHHHHHHHHHHHHHHHHHHHHHCCCCEEEEEEECCC
Predicted: 1-2, 2-4, 3-4, 4-5
CEEEEECCCEEEEECCCCCHHHHHHHHHHHHHHHHHHHHCCCEEEEEECCEEEEEECCCCHHHHHHHHHHHHHHHHHHHHCCCCEEEEECCCCCC
True: 1–2, 2-4, 3-4, 1-5
True SS:
Predicted SS:
124
3
5
Rendered in Rasmol
Strand Pairing Matrix
![Page 86: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/86.jpg)
ACKNOWLEDGMENTS• UCI:
– Gianluca Pollastri, Pierre-Francois Baisnee, Michal Rosen-Zvi– Arlo Randall, S. Joshua Swamidass, Jianlin Cheng, Yimeng Dou, Yann
Pecout, Mike Sweredoski, Alessandro Vullo, Lin Wu
– James Nowick, Luis Villareal
• DTU: Soren Brunak• Columbia: Burkhard Rost• U of Florence: Paolo Frasconi• U of Bologna: Rita Casadio, Piero Fariselli
www.igb.uci.edu/www.ics.uci.edu/~pfbaldi
![Page 87: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/87.jpg)
1DFN| Defensin
![Page 88: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/88.jpg)
Sequence with cysteine's position identified: MSNHTHHLKFKTLKRAWKASKYFIVGLSC[29]LYKFNLKSLVQTALSTLAMITLTSLVITAIIYISVGNAKAKPTSKPTIQQTQQPQNHTSPFFTEHNYKSTHTSIQSTTLSQLLNIDTTRGITYGHSTNETQNRKIKGQSTLPATRKPPINPSGSIPPENHQDHNNFQTLPYVPC[173]STC[176]EGNLAC[182]LSLC[18
6]HIETERAPSRAPTITLKKTPKPKTTKKPTKTTIHHRTSPETKLQPKNNTATPQQGILSSTEHHTNQSTTQILength: 257, Total number of cysteines: 5Four bonded cysteines form two disulfide bonds :173 -------186 ( red cysteine pair)176 -------182 (blue cysteine pair)
Prediction Results from DIpro (http://contact.ics.uci.edu/bridge.html)Predicted Bonded Cysteines:173,176,182,186Predicted disulfide bondsBond_Index Cys1_Position Cys2_Position1 173 1862 176 182
Prediction Accuracy for both bond state and bond pair are 100%.
A Perfectly Predicted Example
![Page 89: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/89.jpg)
Sequence with cysteine's position identified: MTLGRRLAC[9]LFLAC[14]VLPALLLGGTALASEIVGGRRARPHAWPFMVSLQLRGGHFC[55]GATLIAPNFVMSAAHC[71]VANVNVRAVRVVLGAHNLSRREPTRQVFAVQRIFENGYDPVNLLNDIVILQLNGSATINANVQVAQLPAQGRRLGNGVQC[151]LAMGWGLLGRNRGIASVLQELNVTVVTSLC[181]RRSNVC[187]TLVRGRQAGVC[198]FGDSGSPLVC[208]NGLIHGIASFVRGGC[223]ASGLYPDAFAPVAQFVNWIDSIIQRSEDNPC[254]PHPRDPDPASRTHLength: 267, Total Cysteine Number: 11Eight bonded cysteines form four disulfide bonds: 55 ----- 71 (Red), 151 ----- 208 (Blue), 181 ----- 187 (Green), 198 ----- 223 (Purple)
A Hard Example with Many Non-Bonded Cysteines
Prediction Results from DIpro (http://contact.ics.uci.edu/bridge.html)Predicted Bonded Cysteines:9,14,55,71,181,187,223,254Predicted Disulfide Bonds:Bond_Index Cys1_Position Cys2_Position1 55 71 (correct)2 9 14 (wrong)3 223 254 (wrong)4 181 187 (correct)Bond State Recall: 5 / 8 = 0.625, Bond State Precision = 5 / 8 = 0.625Pair Recall = 2 / 4 = 0.5 ; Pair Precision = 2 / 4 = 0.5Bond number is predicted correctly.
![Page 90: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/90.jpg)
Bond Num
Bond State Recall(%)
Bond State Precision(%)
Pair Recall(%)
Pair Precision(%)
1 91 46 74 392 93 77 61 513 90 74 54 454 77 87 52 595 71 86 33 426 65 84 27 347 63 85 36 558 66 89 27 419 60 83 23 35
10 55 86 30 4511 62 86 34 4712 67 97 17 2315 50 94 27 5016 82 99 11 1317 61 96 22 3318 50 82 6 919 47 90 11 20Overall bond state recall: 78%; overall bond state precision: 74%;
bond number prediction accuracy: 53%; average difference between true bond number and predicted bond number: 1.1 .
Prediction Accuracy on SP51 Dataset on All Cysteines
![Page 91: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/91.jpg)
CURRENT WORK
• Feedback:Ex: SS Contacts SS Contacts
• Homology, homology, homology:SSpro 4.0 performs at 88%
![Page 92: UNDERSTANDING INTELLIGENCE](https://reader033.fdocuments.in/reader033/viewer/2022051419/568159fd550346895dc748ab/html5/thumbnails/92.jpg)