Structure Modeling and Bioimage informatics Unit 26 BIOL221T: Advanced Bioinformatics for...

38
Structure Modeling and Structure Modeling and Bioimage informatics Bioimage informatics Unit 26 Unit 26 BIOL221T BIOL221T : Advanced : Advanced Bioinformatics for Bioinformatics for Biotechnology Biotechnology Irene Gabashvili, PhD

Transcript of Structure Modeling and Bioimage informatics Unit 26 BIOL221T: Advanced Bioinformatics for...

Structure Modeling and Structure Modeling and Bioimage informaticsBioimage informatics

Unit 26Unit 26

BIOL221TBIOL221T: Advanced : Advanced Bioinformatics for Bioinformatics for

BiotechnologyBiotechnologyIrene Gabashvili, PhD

Abstracts – Abstracts – approximate approximate guidelinesguidelines

Motivation:Motivation:Why do we careWhy do we care?(importance, difficulty, impact). ?(importance, difficulty, impact).

Problem statement:Problem statement:What What problemproblem are you trying to solve? What is are you trying to solve? What is the the scopescope of your work? of your work?

Approach:Approach:How did you go about solvingHow did you go about solving or making or making progress on the problem? What was the progress on the problem? What was the extent extent of your work? of your work?

Results:Results:What's the answer?What's the answer?

AbstractsAbstracts

Limits: paragraph, ~150-200 words, one Limits: paragraph, ~150-200 words, one double-spaced page… More to include:double-spaced page… More to include:

Numbers Numbers – if possible: How many genes, – if possible: How many genes, SNPs, sequence identity.. xx percent SNPs, sequence identity.. xx percent faster, cheaper, smaller, betterfaster, cheaper, smaller, better

Conclusions: Conclusions: What are the implicationsWhat are the implications? ? Have you found a path to change the Have you found a path to change the world, was it a nice hack, or a road sign world, was it a nice hack, or a road sign indicating that this path is a waste of time indicating that this path is a waste of time (all is useful!). Can you (all is useful!). Can you generalizegeneralize? ?

How will projects be How will projects be graded?graded?

Originality, structure, and scopeOriginality, structure, and scope No copy/paste from the web – but No copy/paste from the web – but

it’s Ok to reference the source - it’s Ok to reference the source - publications & websitespublications & websites

Proteins play key roles in Proteins play key roles in a living systema living system

Three examples of protein Three examples of protein functionsfunctions

Catalysis:Catalysis:Almost all chemical reactions in Almost all chemical reactions in a living cell are catalyzed by a living cell are catalyzed by protein enzymes.protein enzymes.

Transport:Transport:Some proteins transports Some proteins transports various substances, such as various substances, such as oxygen, ions, and so on.oxygen, ions, and so on.

Information transfer:Information transfer:For example, hormones.For example, hormones.

Alcohol dehydrogenase oxidizes alcohols to aldehydes or ketones

Haemoglobin carries oxygen

Insulin controls the amount of sugar in the blood

Amino Acid versus Amino Acid versus ResidueResidue

CCOOHH2N

H

R

CCO N

H

R

H

Amino Acid Residue

Amino acid: Basic unit Amino acid: Basic unit of proteinof protein

COO-NH3+ C

R

HAn amino

acid

Different side chains, R, determin the properties of 20 amino acids.

Amino group Carboxylic acid group

The DSSP codeThe DSSP code"Dictionary of Protein Secondary Structure""Dictionary of Protein Secondary Structure" G = 3-turn helix (G = 3-turn helix (310 helix). Min length 3 residues. ). Min length 3 residues. H = 4-turn helix (H = 4-turn helix (alpha helix). Min length 4 residues. ). Min length 4 residues. I = 5-turn helix (I = 5-turn helix (pi helix). Min length 5 residues. ). Min length 5 residues. T = hydrogen bonded turn (3, 4 or 5 turn) T = hydrogen bonded turn (3, 4 or 5 turn) E = E = beta sheet in parallel and/or anti-parallel sheet in parallel and/or anti-parallel sheet

conformation (extended strand). Min length 2 residues. conformation (extended strand). Min length 2 residues. B = residue in isolated beta-bridge (single pair beta-B = residue in isolated beta-bridge (single pair beta-

sheet hydrogen bond formation) sheet hydrogen bond formation) S = bend (the only non-hydrogen-bond based S = bend (the only non-hydrogen-bond based

assignment) assignment)

Protein structureProtein structure

Primary structure Primary structure (Amino acid sequence)(Amino acid sequence)↓↓

Secondary structureSecondary structure (( αα-helix, -helix, ββ-sheet-sheet ))↓↓

Tertiary structure Tertiary structure (( Three-dimensional Three-dimensional structure formed by assembly of secondary structure formed by assembly of secondary

structuresstructures ))↓↓

Quaternary structure Quaternary structure (( Structure formed by Structure formed by more than one polypeptide chainmore than one polypeptide chain ))

20 20 Amino acidsAmino acids

Glycine (G)

Glutamic acid (E)Asparatic acid (D)

Methionine (M)

Threonine (T)

Serine (S)

Glutamine (Q)

Asparagine (N)

Tryptophan (W)Phenylalanine (F)

Cysteine (C)

Proline (P)

Leucine (L)Isoleucine (I)Valine (V)

Alanine (A)

Histidine (H)Lysine (K)

Tyrosine (Y)

Arginine (R)

Yellow: Hydrophobic, Green: Hydrophilic, Red: Acidic, Blue: Basic

Proteins are linear Proteins are linear polymers of amino acidspolymers of amino acids

R1

NH3+ C CO

H

R2

NH C CO

H

R3

NH C CO

H

R2

NH3+ C COO ー

H

R1

NH3+ C COO ー

H

H2OH2O

Peptide bond

Peptide bond

The amino acid sequence is called

as primary structure A AF

NGG

S TS

DK

A carboxylic acid condenses with an amino group with the release of a water

Amino acid sequence is Amino acid sequence is encoded by DNA base encoded by DNA base sequence in a genesequence in a gene・

CGCGAATTCGCG・

・GCGCTTAAGCGC・

DNA molecule

DNA base sequence

Amino acid sequence is Amino acid sequence is encoded by DNA base encoded by DNA base sequence in a genesequence in a geneSecond letter

T C A G

First le

tter

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

T

Th

ird le

tter

TTC TCC TAC TGC CTTA

LeuTCA TAA

StopTGA Stop A

TTG TCG TAG TGG Trp G

C

CTT

Leu

CCT

Pro

CATHis

CGT

Arg

TCTC CCC CAC CGC CCTA CCA CAA

GlnCGA A

CTG CCG CAG CGG G

A

ATTIle

ACT

Thr

AATAsn

AGTSer

TATC ACC AAC AGC CATA ACA AAA

LysAGA

ArgA

ATG Met ACG AAG AGG G

G

GTT

Val

GCT

Ala

GATAsp

GGT

Gly

TGTC GCC GAC GGC CGTA GCA GAA

GluGGA A

GTG GCG GAG GGG G

Gene is protein’s Gene is protein’s blueprint, genome is blueprint, genome is

life’s blueprint life’s blueprint

Gene

GenomeDNA

Protein

Gene GeneGene

Gene

GeneGeneGeneGene

GeneGeneGeneGene

GeneGene

Protein Protein

ProteinProtein

Protein

ProteinProtein

Protein

Protein

Protein

Protein

ProteinProtein

Protein

Gene is protein’s Gene is protein’s blueprint, genome is blueprint, genome is

life’s blueprint life’s blueprint Genome

Gene GeneGene

Gene

GeneGeneGeneGene

GeneGeneGeneGene

GeneGene

Protein Protein

ProteinProtein

Protein

ProteinProtein

Protein

Protein

Protein

Protein

ProteinProtein

Protein

Glycolysis network

Each Protein has a Each Protein has a unique structureunique structure

Amino acid sequence

NLKTEWPELVGKSVEEAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAE

VPRVGFolding!

Basic structural units of Basic structural units of proteins: Secondary proteins: Secondary

structurestructureα-helix β-sheet

Secondary structures, α-helix and β-sheet, have regular hydrogen-bonding patterns.

Three-dimensional Three-dimensional structure of proteinsstructure of proteins

Tertiary structure

Quaternary structure

Close relationship Close relationship between protein structure between protein structure

and its functionand its function

enzyme

A

B

A

Binding to A

Digestion of A!

enzyme

Matching the shape to A

Hormone receptor AntibodyExample of enzyme reaction

enzyme

substrates

More LinksMore Links

BLOCKS: http://blocks.fhcrc.org/ www.sbc.su.se/~miklos/DAS www.pdg.cnb.uam.es/EUCLID/Full_Paper

/homepage.html Eva: Cubic.bioc.columbia.edu/evaEva: Cubic.bioc.columbia.edu/eva Jpred: Jpred: www.compbio.dundee.ac.uk/~www-www.compbio.dundee.ac.uk/~www-

jpredjpred// LOC3D: LOC3D:

cubic.bioc.columbia.edu/db/LOC3Dcubic.bioc.columbia.edu/db/LOC3D Pfam: Pfam: http://www.sanger.ac.uk/Software/Pfam/http://www.sanger.ac.uk/Software/Pfam/

More LinksMore Links PredictProtein PredictProtein www.predictprotein.org ProfTMB: ProfTMB: http://www.predictprotein.org/cgi-bin/var/bigelow/proftmb/queryhttp://www.predictprotein.org/cgi-bin/var/bigelow/proftmb/query

PROSITE: http://expasy.org/prosite/PROSITE: http://expasy.org/prosite/ ProtFun: ProtFun: http://www.cbs.dtu.dk/services/ProtFun/http://www.cbs.dtu.dk/services/ProtFun/ PSIPRED: PSIPRED: http://bioinf.cs.ucl.ac.uk/psipred/http://bioinf.cs.ucl.ac.uk/psipred/

PSORT: http://psort.nibb.ac.jp/PSORT: http://psort.nibb.ac.jp/ SAM-T99 - discontinuedSAM-T99 - discontinued SOSUI: SOSUI: http://bp.nuap.nagoya-u.ac.jp/sosui/sosui_submit.htmlhttp://bp.nuap.nagoya-u.ac.jp/sosui/sosui_submit.html

TargetP: TargetP: http://www.cbs.dtu.dk/services/TargetP/http://www.cbs.dtu.dk/services/TargetP/

DatabasesDatabases

PDB: www.rcsb.org/ PDB: www.rcsb.org/ MSD: http://www.ebi.ac.uk/msd/MSD: http://www.ebi.ac.uk/msd/ MMDB: MMDB:

http://www.ncbi.nlm.nih.gov/Structure/MMDBhttp://www.ncbi.nlm.nih.gov/Structure/MMDB

PDBSum: www.ebi.ac.uk/PDBSum: www.ebi.ac.uk/pdbsumpdbsum// TargetDB: TargetDB: targetdbtargetdb.pdb.org/ .pdb.org/

PDBsumPDBsum

provides an at-a-glance overview of provides an at-a-glance overview of every macromolecular structure every macromolecular structure deposited in the Protein Data Bank deposited in the Protein Data Bank (PDB), giving schematic diagrams of (PDB), giving schematic diagrams of the molecules in each structure and the molecules in each structure and of the interactions between them. of the interactions between them.

http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/

GetPage.plGetPage.pl

More linksMore links

AbCheck - Antibody Sequence Test AbCheck - Antibody Sequence Test http://www.bioinf.org.uk/abs/seqtest.ht

ml Atlas of protein Side chain interactionsAtlas of protein Side chain interactions http://www.biochem.ucl.ac.uk/bsm/sid

echains/index.html# The beta-turn prediction server:The beta-turn prediction server: http://www.biochem.ucl.ac.uk/bsm/http://www.biochem.ucl.ac.uk/bsm/

btpred/index.htmlbtpred/index.html

More linksMore links

CATH – protein structure CATH – protein structure classification:classification:

http://www.cathdb.info/latest/http://www.cathdb.info/latest/index.htmlindex.html

Protein Ligand Interactions:Protein Ligand Interactions: http://www.biochem.ucl.ac.uk/bsm/http://www.biochem.ucl.ac.uk/bsm/

proLig/proLig/

More linksMore links

DB Browser, including protein DB Browser, including protein sequence/structure DBssequence/structure DBs

http://www.bioinf.man.ac.uk/dbbrowser/http://www.bioinf.man.ac.uk/dbbrowser/ Dictionary of Homologous Dictionary of Homologous

superfamilies:superfamilies: http://www.biochem.ucl.ac.uk/bsm/dhs/http://www.biochem.ucl.ac.uk/bsm/dhs/ PROCAT – a DB of 3D enzyme active site PROCAT – a DB of 3D enzyme active site

templates:templates: http://www.biochem.ucl.ac.uk/bsm/http://www.biochem.ucl.ac.uk/bsm/

PROCAT/PROCAT.htmlPROCAT/PROCAT.html

More linksMore links

DOMPLOT – annotation by ligands:DOMPLOT – annotation by ligands: http://www.biochem.ucl.ac.uk/bsm/http://www.biochem.ucl.ac.uk/bsm/

domplot/domplot/ Enzymes Structure database:Enzymes Structure database: http://www.biochem.ucl.ac.uk/bsm/http://www.biochem.ucl.ac.uk/bsm/

enzymes/index.htmlenzymes/index.html Gene3DGene3D http://gene3d.biochem.ucl.ac.uk/http://gene3d.biochem.ucl.ac.uk/

Gene3D/Gene3D/

More linksMore links

The Scorecons Server The Scorecons Server (scores (scores residue conservation in a multiple residue conservation in a multiple sequence alignment)sequence alignment)

http://www.ebi.ac.uk/thornton-srv/http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/valdar/databases/cgi-bin/valdar/scorecons_server.plscorecons_server.pl

3D enzyme active site 3D enzyme active site templatestemplates

PROCAT: PROCAT: http://www.biochem.ucl.ac.uk/bsm/Phttp://www.biochem.ucl.ac.uk/bsm/PROCAT/PROCAT.htmlROCAT/PROCAT.html

PROCAT has now been PROCAT has now been superseded by the Catalytic Site superseded by the Catalytic Site Atlas: Atlas: http://www.ebi.ac.uk/thornton-http://www.ebi.ac.uk/thornton-srv/databases/CSA/srv/databases/CSA/

More LinksMore Links

Protein Nucleic Acid interaction ServerProtein Nucleic Acid interaction Server http://www.biochem.ucl.ac.uk/bsm/DNA/http://www.biochem.ucl.ac.uk/bsm/DNA/

server/server/ Protein DNA interaction, taxProtein DNA interaction, tax http://www.biochem.ucl.ac.uk/bsm/http://www.biochem.ucl.ac.uk/bsm/

prot_dna/prot_dna.htmlprot_dna/prot_dna.html SAS (SAS (Sequences Annotated by Sequences Annotated by

StructureStructure)) http://www.ebi.ac.uk/thornton-srv/http://www.ebi.ac.uk/thornton-srv/

databases/sas/databases/sas/

More LinksMore Links

NACCESS – calculates residue NACCESS – calculates residue accessibilitiesaccessibilities

http://www.bioinf.manchester.ac.uk/http://www.bioinf.manchester.ac.uk/naccess/naccess/

The The SURFNETSURFNET program generates program generates surfacessurfaces and and void regionsvoid regions between between surfaces from coordinate data supplied in surfaces from coordinate data supplied in a a PDBPDB file file

http://www.biochem.ucl.ac.uk/~roman/http://www.biochem.ucl.ac.uk/~roman/surfnet/surfnet.htmlsurfnet/surfnet.html

PredictionPrediction

Homology Modeling: >30%Homology Modeling: >30% Threading – picks up where Threading – picks up where

homology leaves offhomology leaves off Ab initio structure predictionAb initio structure prediction

ValidationValidation

DSSPDSSP PROCHEK: PROCHEK:

http://www.biochem.ucl.ac.uk/~romhttp://www.biochem.ucl.ac.uk/~roman/procheck/procheck.htmlan/procheck/procheck.html

VADARVADAR Verify3D: Verify3D:

http://nihserver.mbi.ucla.edu/Verify_3D/http://nihserver.mbi.ucla.edu/Verify_3D/

VisualizationVisualization

Cn3DCn3D UCSF Chimera (MidasPlus)UCSF Chimera (MidasPlus) Rasmol Rasmol ProteinExplorer ProteinExplorer

BioimagingBioimaging

NIH sites for image processing software:NIH sites for image processing software:http://www.cc.nih.gov/cip/visualization/vis_packages.htmlhttp://www.cc.nih.gov/cip/visualization/vis_packages.html

NIH IMAGENIH IMAGE

http://rsb.info.nih.gov/nih-image/http://rsb.info.nih.gov/nih-image/ Spider & Web: Spider & Web:

http://www.wadsworth.org/spider_doc/spider/docs/spider.htmlhttp://www.wadsworth.org/spider_doc/spider/docs/spider.html

EMAN : EMAN : http://blake.bcm.tmc.edu/eman/eman1/http://blake.bcm.tmc.edu/eman/eman1/

DICOMDICOM

The Digital Imaging and The Digital Imaging and Communications in Medicine standardCommunications in Medicine standard

For all medical imaging modalities, For all medical imaging modalities, such as CT scans, MRIs, and such as CT scans, MRIs, and ultrasound. ultrasound.

All image files which are compliant All image files which are compliant with Part 10 of the DICOM standard with Part 10 of the DICOM standard (available in DocSharing) are DICOM (available in DocSharing) are DICOM format filesformat files

Humans Animal models

Mutant GeneMutant Gene

Mutant or Mutant or missing Proteinmissing Protein

Mutant Phenotype Mutant Phenotype

(disease)(disease)

Mutant Gene

Mutant or missing Protein

Mutant Phenotype

(disease model)

Disease models

SHH-/+ SHH-/-

shh-/+ shh-/-