Protein Structure Nimrod Rubinstein Bioinformatics Seminar.

21
Protein Protein Structure Structure Nimrod Rubinstein Nimrod Rubinstein Bioinformatics Seminar Bioinformatics Seminar

Transcript of Protein Structure Nimrod Rubinstein Bioinformatics Seminar.

Protein StructureProtein Structure

Nimrod RubinsteinNimrod RubinsteinBioinformatics SeminarBioinformatics Seminar

Protein SynthesisProtein Synthesis

1.1. AttachmentAttachment of correct of correct amino acids (AAs) to amino acids (AAs) to their corresponding their corresponding tRNAs.tRNAs.

2.2. Initiation:Initiation: forming the forming the initiation complex.initiation complex.

3.3. Elongation:Elongation: sequentially forming sequentially forming peptide bonds.peptide bonds.

4.4. Termination:Termination: synthesis synthesis is terminated and the is terminated and the polypeptide is released.polypeptide is released.

From Sequence to StructureFrom Sequence to Structure

Structure Hierarchies:Structure Hierarchies:

Primary structure:Primary structure: the sequence of AAs the sequence of AAs covalently bound along the backbone of the covalently bound along the backbone of the polypeptide chain.polypeptide chain.

Cα Cα

Cα C

CN N

O

O

O ф ф ψ ψ

C N

ф ψ

Ala

Gly

Cys

-1800 ≤ ф ≤ 1800 -1800 ≤ ψ ≤ 1800

From Sequence to StructureFrom Sequence to Structure

Structure Hierarchies:Structure Hierarchies:Secondary structure:Secondary structure: local conformation of local conformation of some part of the polypeptide.some part of the polypeptide.

ββ Sheet Sheetαα HelixHelix

ParallelAnti Parallel

From Sequence to StructureFrom Sequence to Structure

Structure Hierarchies:Structure Hierarchies:Tertiary structure: Tertiary structure: the overall the overall 3-dimensional arrangement of 3-dimensional arrangement of all the atoms in the protein.all the atoms in the protein.

From Sequence to StructureFrom Sequence to Structure

Structure Hierarchies:Structure Hierarchies:

Quaternary structure: Quaternary structure: some proteins contain some proteins contain two or more separate polypeptide chains, which two or more separate polypeptide chains, which may be identical or different.may be identical or different.

Globular Fibrous

From Sequence to StructureFrom Sequence to Structure

Additional Parameters:Additional Parameters:Surface accessibility:Surface accessibility:

The surface area of the molecule that is exposed to the solvent, derived from the complete structure.

•VDW surface: the surface area of an atom.

•Connolly surface: the interface between the molecule and the solvent sphere (conventionally with r = 1.4Å) .

•Solvent accessible surface: the path of the center of the solvent sphere rolled over the VDW surface.

•Relative accessibility = (SAS)/(maxSAS)

•maxSAS = SAS(Gly-X-Gly)

From Sequence to StructureFrom Sequence to Structure

Additional Parameters:Additional Parameters:Coordination number:Coordination number:

•The number of structure stabilizing The number of structure stabilizing contacts each residue in the contacts each residue in the structure makes. structure makes.

•Computation: encapsulating an AA Computation: encapsulating an AA with a sphere, centered at the with a sphere, centered at the residue’s center of mass, and residue’s center of mass, and counting the number of residues counting the number of residues falling inside this sphere. falling inside this sphere.

•Usually done with different cutoff Usually done with different cutoff radii. radii.

From Sequence to StructureFrom Sequence to StructureProtein Folding:Protein Folding:

Luckily, nature works out with these sorts of numbers and the correct conformation of a protein is reached within seconds.

If each conformation were sampled in the shortest possible If each conformation were sampled in the shortest possible time (time of a molecular vibration ~ time (time of a molecular vibration ~ 1010-13-13 ss) it would take ) it would take an astronomical amount of time an astronomical amount of time ((~10~1077 77 yearsyears) to sample all possible conformations, in ) to sample all possible conformations, in order to find theorder to find the Native State Native State. .

The Levinthal paradox: The Levinthal paradox: [Levinthal C.; J. Chym. Phys. (1968)]

Assume a protein is comprised of 100 AAs. Assume a protein is comprised of 100 AAs. Assume each AA’s backbone can take up 10 Assume each AA’s backbone can take up 10 different conformations, defined by ф and ψ different conformations, defined by ф and ψ values. Altogether we get: values. Altogether we get: 1010100 100 conformationsconformations..

NPC even in the 2D caseNPC even in the 2D case

From Sequence to StructureFrom Sequence to StructureFolding Models:Folding Models:

The Backbone-Centric view:The Backbone-Centric view:

•Sequence order dependent interactions (фψфψ - propensitiespropensities and

H-bonds), produce local secondary structure elements (SSEs).

•Local SSEs later overgo longer-range interactions to form supersecondary structures.

•Supersecondary structures of ever-increasing complexity thus grow, ultimately into the native conformation.

From Sequence to StructureFrom Sequence to StructureFolding Models:Folding Models:

The Sidechain-Centric view:The Sidechain-Centric view:

•Hydrophobic sidechain Hydrophobic sidechain interactions are the strongest for interactions are the strongest for AAs in a water solution.AAs in a water solution.

•A few key hydrophobic residues A few key hydrophobic residues are responsible for a “hydrophobic are responsible for a “hydrophobic collapse” to the “molten globule” collapse” to the “molten globule” state.state.

•The “molten globule” might not The “molten globule” might not include SSEs, yet about this include SSEs, yet about this structure the remainder of the structure the remainder of the polypeptide chain condenses. polypeptide chain condenses.

•The conformation space is viewed The conformation space is viewed as “funnel shaped”. as “funnel shaped”.

Molten globule

states

From Sequence to StructureFrom Sequence to StructureFolding Models:Folding Models:

The Sidechain-Centric view - Larger The Sidechain-Centric view - Larger proteins:proteins:

•Intermediate states exist, which are highly populated.

•These states may assist in finding the Native Structure or may serve as traps that inhibit the folding process.

•Structurally aligning intermediate states against the SCOP found the corresponding Native Structures to have the highest scores.

•But, many features were missing:

• Well defined SSEs.

• A well formed hydrophobic core.

• High RMSDs (7-10Å). [Dobson C. M.; TRENDS in Biochemical Sciences; Jan 2005]

From Sequence to StructureFrom Sequence to StructureFolding Models:Folding Models:

Post-translationalPost-translational Vs. Vs. Co-Co-translationaltranslational

•Denaturation-Renaturation experiments are biased.

•An AA is added to the polypeptide chain in: 10-2 s.

•The rate at which an SSE is formed is: 10-7 – 10-4 s.

Anfinsen’s experiments:

•Exposure of a purified RNase-A enzyme to a concentrated urea solution in the presence of a reducing agent denaturizes the folded conformation resulting in a complete loss of catalytic activity.

•Removal of the urea and reducing agent causes the enzyme to accurately refold to its native structure and restore its catalytic activity.

[Anfinsen C. et al.; PNAS (1961)]

Determining the StructureDetermining the StructureCrystallization:Crystallization:• Assembling a solution of protein Assembling a solution of protein

molecules into a periodic lattice.molecules into a periodic lattice.

F F

X-Ray Diffraction:X-Ray Diffraction:• The crystal is bombarded with X-ray beams.The crystal is bombarded with X-ray beams.• The collision of the beams with the electrons creates a diffraction The collision of the beams with the electrons creates a diffraction

pattern.pattern.• The diffraction pattern is transformed into an electron density map of The diffraction pattern is transformed into an electron density map of

the protein from which the 3D locations of the atoms can be deduced.the protein from which the 3D locations of the atoms can be deduced.

Determining the StructureDetermining the StructureNucleotide Magnetic Nucleotide Magnetic

Resonance:Resonance:• A solution of the protein is placed A solution of the protein is placed

in a magnetic field.in a magnetic field.• spinsspins align parallel or anti-parallel align parallel or anti-parallel

to the field.to the field.• RF pulses of electromagnetic RF pulses of electromagnetic

energy shifts energy shifts spinsspins from their from their alignment.alignment.

• Upon radiation termination Upon radiation termination spinsspins re-align while emitting the energy re-align while emitting the energy they absorbed.they absorbed.

• The emission spectrum contains The emission spectrum contains information about the identity of information about the identity of the nuclei and their immediate the nuclei and their immediate environment. environment.

• The result is an ensemble of The result is an ensemble of models rather than a single models rather than a single structure. structure.

Structure SimilarityStructure SimilarityProtein Families:Protein Families:• Structures seem to be preserved much more than Structures seem to be preserved much more than

sequences, which is easily explainable due to neutral sequences, which is easily explainable due to neutral mutations.mutations.

1BRU: Pancreatic Elastase (Sus scorfa).

1CHG: Chymotrysinogen (Bos taurus).

Global Alignment: 39% identity

1BRU1CHG

Rigid Cα Alignment: RMSD 1.26Å

1CHG 1BRU

Structure SimilarityStructure Similarity

http://scop.mrc-lmb.cam.ac.uk/scop/

Protein Families:Protein Families:• Structures seems to be preserved much more than Structures seems to be preserved much more than

sequences, which is easily explainable due to neutral sequences, which is easily explainable due to neutral mutations.mutations.

• Structural Biologists claim that there are a limited Structural Biologists claim that there are a limited number of ways in which protein domains fold. There number of ways in which protein domains fold. There may be as few as ~2000 different folds (differing by their may be as few as ~2000 different folds (differing by their backbone topology). backbone topology).

• Nearly a 1000 different folds have already been resolved.Nearly a 1000 different folds have already been resolved.

Structure PredictionStructure PredictionHomology (Comparative) Modeling: Homology (Comparative) Modeling:

Guideline:Guideline: At least 30% sequence identity is needed between At least 30% sequence identity is needed between probe and template.probe and template.

1.1. Template Assignment:Template Assignment: creating a robust probe- creating a robust probe-template alignment (PWA/MSA).template alignment (PWA/MSA).

2.2. Model Construction:Model Construction:a.a. Generation of coordinates for conserved Generation of coordinates for conserved

segments: segments: superimposing/averaging/restrain superimposing/averaging/restrain based.based.

b.b. Generation of coordinates for variable segments: Generation of coordinates for variable segments: DB DB scanning/scanning/Ab InitioAb Initio/restrain based./restrain based.

c.c. Generation of coordinates for sidechain atoms: Generation of coordinates for sidechain atoms: superimposing/rotamer superimposing/rotamer libraries/restrain based.libraries/restrain based.

3.3. Model Evaluation:Model Evaluation: a.a. Assessment of to the ability to functionally identify Assessment of to the ability to functionally identify

the the active site of the model.active site of the model.

b.b. Assessment of physico-chemical or structural Assessment of physico-chemical or structural environment based on statistical analyses of DBs environment based on statistical analyses of DBs for characteristics such as:for characteristics such as: Intramolecular packing. Intramolecular packing. Bond geometry. Bond geometry. Solvent accessibility.Solvent accessibility.

bFGF

[Peitsch et al. (1999)]

Structure PredictionStructure PredictionThreading (Sequence-Structure Alignment):Threading (Sequence-Structure Alignment):

Identifying evolutionary unrelated proteins that have converged Identifying evolutionary unrelated proteins that have converged to similar folds.to similar folds.

• Scoring Scheme: Scoring Scheme: describes the propensity of each AA for its describes the propensity of each AA for its structural/physico-chemical environment:structural/physico-chemical environment: SS type, solvent accessibility, SS type, solvent accessibility, coordination number, etc…coordination number, etc…

• Profile construction:Profile construction: encoding the template’s AAs structural features to a encoding the template’s AAs structural features to a 1D profile and predicting such a profile for the probe.1D profile and predicting such a profile for the probe.

• Threading Algorithm:Threading Algorithm: Aligning the 1D profiles of the template and the Aligning the 1D profiles of the template and the probe using DP and the defined scoring scheme.probe using DP and the defined scoring scheme.

But: No adjustments to the template profile can be made thus substantial No adjustments to the template profile can be made thus substantial rearrangements are ignoredrearrangements are ignored

template

probe

[Bryant, Lawrence; Proteins (1993)]

Structure PredictionStructure Prediction

Ab InitioAb Initio Techniques: Techniques: Simulating the folding processSimulating the folding process

Simplifying the energy landscape:Simplifying the energy landscape:• Reducing the number of degrees of freedom:Reducing the number of degrees of freedom:

• Representing a group of atoms by a single atom.Representing a group of atoms by a single atom.• Reducing the number of atom interactions.Reducing the number of atom interactions.

• Sampling the conformation space:Sampling the conformation space:• Monte Carlo sampling.Monte Carlo sampling.• Genetic Algorithm.Genetic Algorithm.• Simulated Annealing.Simulated Annealing.

• Hierarchical folding simulation.Hierarchical folding simulation.

Blind PredictionBlind Prediction

CCritical ritical AAssessment of Protein ssessment of Protein SStructure tructure PPrediction – rediction – CASPCASP

Goal:Goal: “ to obtain an in-depth and objective assessment of our current abilities “ to obtain an in-depth and objective assessment of our current abilities and inabilities in the area of protein structure prediction”.and inabilities in the area of protein structure prediction”.

Groups use their tools to model proteins with pre-published structures.Groups use their tools to model proteins with pre-published structures. The predictions are thus evaluated against the subsequently determined The predictions are thus evaluated against the subsequently determined

structures.structures. CASP6 (2004) shows limited improvements compared to CASP5 (2003).CASP6 (2004) shows limited improvements compared to CASP5 (2003).