Post on 12-Jan-2016
description
Modeling Functional Genomics Modeling Functional Genomics DatasetsDatasets
CVM8890-101CVM8890-101Lesson 6Lesson 6
11 July 200711 July 2007 Bindu NanduriBindu Nanduri
Lesson 6: Functional Lesson 6: Functional genomics modeling II: a genomics modeling II: a
pathway analysis example. pathway analysis example.
Introduction to protein interaction networks
Cancer
Proliferation
Differentiation
Quiescence
Programmed Cell Death
Cell
Differentiation
Proliferation
Differentiation
Quiescence
Programmed Cell DeathAnergyActivation
CD4 +T ‘helper” Lymphocyte
Lymphoma
Agbase protein annotation processAgbase protein annotation process
Protein identifiers or Fasta format
GORetriever
Annotated Proteins
GOanna
Proteins with no annotations
GOSlimViewer
44%
67%
67%
Proliferation
AngiogenesisApoptosis
MigrationQuiescenceDifferentiation
AnergyActivation
SenescenceCell Cycle
100% 20% 80% 69%31%
56%
79% 21%
92% 8% 92% 8% 32%68%
33%
33%
Potential CD4+ T lymphocyte Biological Processes
AP-1 dependent gene
expressionMetastasisTumor
invasion
AP-1
Integrin Signaling Pathway
Hypothesis driven data analysis
Exploration of data to identify pathways of interacting proteins
Protein protein interaction networks (PPI)
Why study PPIsWhy study PPIs
Proteins do not function alone!!!!!Proteins do not function alone!!!!!
PPI are inherent to the function of PPI are inherent to the function of multiprotein complexesmultiprotein complexes
PPIs can help infer function : where PPIs can help infer function : where functional information is available for one functional information is available for one partnerpartner
Changes in normal PPI can result in Changes in normal PPI can result in
diseasedisease
Types of PPI
PPI categories based on composition, affinity and timescale of interaction
Homo and hetero oligomeric complexes: interactions between identical ornon-identical chains
Obligate PPI: protomers do not exist in as stable structures in vivothese are functionally obligate
Non-obligate PPI: protomers can exist as stable structures, may co-localizefor function /are co-localized
c
Arc repressor dimer necessary for DNA binding
Non-obligate homo dimer Sperm lysin
PPI based on the life time of the complex: transient or permanent
Permanaent interactions are stable and exist only as complex
Transient interactions are marked by association/dissociation cycles in vivo
Weak interactions (sperm lysin) associate and dissociate
Strong transient interactions require a molecular triggerheterotrimeric G protein dissociates to G-alpha and g-beta and g-gamma when it binds to GTP , GDP-bound form is a trimer
Control of protein oligomerization
PPI interactions are a continuum of obligate and non-obligate states
Interactions of complexes driven by concentration and free energy of complex relative to alternate states
Take home message of PPI types
PPI interactions are a continuum of obligate and non-obligate states
Interactions of complexes driven by concentration and free energy of complex relative to alternate states
How to identify PPI
Experimental Computational
Gene Coexpression
TAP assays
Sequence coevolution
Yeast two hybrid Phylogenetic profile
Gene Cluster
Rosetta stone method
Text mining
TAP assays
Yeast two hybrid (Y2H)
Protein arrays
PLoS Computational Biology March 2007, Volume 3 e42
Y2H Assay
Eukaryotic transcription factors have DNA binding and activation domain
Physical association of these domainsactivates transcription
Cretae chimeric proteins with either BD or AD tranfect yeast
Gal4/LexA based reporters
In vivo method that can detecttransient PPI
TAP Assay
TAP tag consists of two IgG binding domains of Staphylococcus protein Aand calmodulin binding peptideseperated by tobacco etch virus protease cleavage site
TAP provides direct information on protein complexes
O. Puig et al,Methods, 2001
PLoS Computational Biology March 2007, Volume 3 e42
Gene Coexpression
Expression profile similarity
correlation coefficient between relative expression levels of two genes/proteinsthe normalized difference between their absolute expression levels
The distribution for target proteins is compared with the distributions for random noninteracting protein pairs
Expression levels of physically interacting proteins coevolvecoevolution of gene expression is a better predictor of proteininteractions than coevolution of amino acid sequencesGood for studying permanent complexes : ribosome, proteasome
PLoS Computational Biology March 2007, Volume 3 e42
Protein microarrays/chips
Protein chips are disposable arrays of microwells in silicone elastomer sheets placed on top of microscope slides
Target proteins are over expressed immobilized and probed with fluorescentlylabeled proteins
H Zhu et al (2000) “Analysis of yeast protein kinases using protein chips” Nature Genetics 26: 283-289
can detect PPI between actual proteins
PLoS Computational Biology March 2007, Volume 3 e42
Database/URL/FTP Type
DIP http://dip.doe-mbi.ucla.edu E,SBIND http://bind.ca E,C,SMPact/MIPS http://mips.gsf.de/services/ppi E,C,FSTRING http://string.embl.de E,P,FMINT http://mint.bio.uniroma2.it/mint E,CIntAct http://www.ebi.ac.uk/intact E,CBioGRID http://www.thebiogrid.org E,CHPRD http://www.hprd.org E,CProtCom http://www.ces.clemson.edu/compbio/ProtCom S,H3did, Interprets http://gatealoy.pcb.ub.es/3did/ S,HPibase, Modbase http://alto.compbio.ucsf.edu/pibase S,HCBM ftp://ftp.ncbi.nlm.nih.gov/pub/cbm SSCOPPI http://www.scoppi.org/ SiPfam http://www.sanger.ac.uk/Software/Pfam/iPfam SInterDom http://interdom.lit.org.sg PDIMA http://mips.gsf.de/genre/proj/dima/index.html F,SProlinks http://prolinks.doe-mbi.ucla.edu/cgibin/functionator/pronav/ FPredictome http://predictome.bu.edu/ F
PLoS Computational Biology March 2007, Volume 3 e42
Database/URL/FTP Type
DIP http://dip.doe-mbi.ucla.edu E,SBIND http://bind.ca E,C,SMPact/MIPS http://mips.gsf.de/services/ppi E,C,FSTRING http://string.embl.de E,P,F
Type of data (high-throughput experimental data (E), structural data (S), manual curation(C), functional predictions (F), and interface homology modeling (H)Unit of interaction :P is proteinIntAct http://www.ebi.ac.uk/intact E,CBioGRID http://www.thebiogrid.org E,CHPRD http://www.hprd.org E,CProtCom http://www.ces.clemson.edu/compbio/ProtCom S,H3did, Interprets http://gatealoy.pcb.ub.es/3did/ S,HPibase, Modbase http://alto.compbio.ucsf.edu/pibase S,HCBM ftp://ftp.ncbi.nlm.nih.gov/pub/cbm S
PPI database comparisons PPI database comparisons
Proteins: Structure, Function and Bioinformatics 63:490-500 2006
Experimental PPI dataset overlap is small Experimental PPI dataset overlap is small
High FP rate in high- throughput expHigh FP rate in high- throughput exp
………….difficult to confirm by multiple sources.difficult to confirm by multiple sources
How to identify PPI
Experimental Computational
Gene Coexpression
TAP assays
Sequence coevolution
Yeast two hybrid Phylogenetic profile
Gene Cluster/neighborhood
Rosetta stone method
Text mining
TAP assays
Yeast two hybrid (Y2H)
Protein arrays
PLoS Computational Biology March 2007, Volume 3 e43
Phylogenetic profile (PP)
Hypothesis: functionally linked and potentially interacting nonhomologous proteins co-evolve and have orthologs in the same subset of fully sequenced organisms
PLoS Computational Biology March 2007, Volume 3 e43
Gene Cluster, Gene Neighborhood
Genes in the gene cluster/operon are co-regulated and participate in the same biological function
PLoS Computational Biology March 2007, Volume 3 e43
Sequence Co-evolution
interacting proteins very often co-evolve
changes in one protein ( loss of function or Interaction) compensated by the correlated changes in another protein.
The orthologs of co-evolving proteinstend to interact, thereby making it possible to infer unknowninteractions in other genomes
co-evolution can be reflected in terms of the similarity between phylogenetic trees of two non-homologousinteracting protein families
PLoS Computational Biology March 2007, Volume 3 e43
Rosetta Stone method
interacting proteins/domains have homologs in other genomesfused into one protein chain, a Rosetta Stone protein
Gene fusion occurs to optimize co-expression of genes encoding forinteracting proteins.
Text MiningUtilizing the wealth of publicly available data Utilizing the wealth of publicly available data
..search Medline or PubMed for words or word ..search Medline or PubMed for words or word combinationscombinations
co-occurrence of words together is a simple metric, however co-occurrence of words together is a simple metric, however prone to high false positive ratesprone to high false positive rates
Natural Language Processing (NLP) methods are specific Natural Language Processing (NLP) methods are specific
““A binds to B”; “A interacts with B”; “A associates with B” A binds to B”; “A interacts with B”; “A associates with B” difficult to detect so it has a higher false negative difficult to detect so it has a higher false negative raterate
Normally requires a list of known gene names or protein Normally requires a list of known gene names or protein names for a given organismnames for a given organism
GO ToolBoxGenome Biol. 2004;5(12):R101.
ProtQuant tool