Protein Analysis Tools 2 nd April, 2012 Ansuman Chattopadhyay, PhD, Head Molecular Biology...

download Protein Analysis Tools 2 nd April, 2012 Ansuman Chattopadhyay, PhD, Head Molecular Biology Information Service Health Sciences Library System University.

If you can't read please download the document

Transcript of Protein Analysis Tools 2 nd April, 2012 Ansuman Chattopadhyay, PhD, Head Molecular Biology...

  • Slide 1

Protein Analysis Tools 2 nd April, 2012 Ansuman Chattopadhyay, PhD, Head Molecular Biology Information Service Health Sciences Library System University of Pittsburgh [email protected] http://www.hsls.pitt.edu/guides/genetics Slide 2 What well do: Brief overview of CLC Main Workbench find genomic context of a protein sequence search for the presence of conserved domains create a multiple sequence alignment plot Slide 3 What well do: analyze primary structure such as, hydrophobicity, hydrophylicity, antigenicity, repeat sequence detection etc. predict secondary structure predict post translational modification such as, Phosphorylation, glycosylation, . search for interacting partners predict domain driven protein-protein interactions Slide 4 Workshop Resources http://www.hsls.pitt.edu/molbio/tutorials Slide 5 HSLS MolBio Videos Slide 6 Sequence Analysis Software Suits Wisconsin GCG VectorNTI DNA STAR-LaserGene Geneious CLC Main Slide 7 Why CLC Main ? Windows Mac Linux DNA, RNA, Protein, Microarray Data Analysis Regular Update HSLS Licensed Slide 8 CLC Main Access HSLS CLC Main Registration Link: http://www.hsls.pitt.edu/molbio/clcmainhttp://www.hsls.pitt.edu/molbio/clcmain Access via Pitt - Network Connect Instruction video: http://goo.gl/JNjMthttp://goo.gl/JNjMt Slide 9 CLC Main Workbench Overview Graphical Users Interface Protein sequences Import Sequence Navigation Slide 10 CLC Main Graphical User Interface (GUI) Slide 11 CLC Main Slide 12 Navigate a protein sequence Slide 13 CLC Main getting started (basic navigation steps): http://media.hsls.pitt.edu/media/molbi ovideos/clc-navigation-ac0312.swfhttp://media.hsls.pitt.edu/media/molbi ovideos/clc-navigation-ac0312.swf CLC Main Workbench Walkthrough (Part1): http://media.hsls.pitt.edu/media/molbiovideos/ clcmain-walkthrough-part1-ac0112.swf http://media.hsls.pitt.edu/media/molbiovideos/ clcmain-walkthrough-part1-ac0112.swf CLC Main Workbench Walkthrough (Part2): http://media.hsls.pitt.edu/media/molbiovideos/ clcmain-walkthrough-part2-ac0112.swf http://media.hsls.pitt.edu/media/molbiovideos/ clcmain-walkthrough-part2-ac0112.swf Videos Slide 14 Import a Protein Sequence Slide 15 Protein Sequence Human PLCg1 Refseq no: NP_002651 Uniprot Accession Number: P19174 FASTA file Raw sequence CLC features: Search, Import, Create new sequence Slide 16 Import a DNA /Protein sequence into CLC Main (Part1):http://media.hsls.pitt.edu/media/molbi ovideos/clc-import-part1-ac0112.swfhttp://media.hsls.pitt.edu/media/molbi ovideos/clc-import-part1-ac0112.swf Import a DNA /Protein sequence into CLC Main (Part 2):http://media.hsls.pitt.edu/media/molbiovide os/clc-import-part2-ac0112.swfhttp://media.hsls.pitt.edu/media/molbiovide os/clc-import-part2-ac0112.swf Videos Slide 17 CLC protein sequence Slide 18 Protein sequence manipulation Create a new protein with PLCg1 SH2-SH2- SH3 domains Slide 19 Sequence Alignment Pair-wise Alignment Global Local Multiple Sequence Alignment Slide 20 Sequence Alignment Slide 21 Pair-wise Sequence Alignment Slide 22 Multiple Sequence Alignment Slide 23 Tools: ClustalW and T-coffee Slide 24 PLCg1 Orthologous sequences PLCg1: Mouse: NP_067255 Rat: NP_037319 Cow: NP_776850 Dog: XP_542998 Zebra fish: NP_919388 Human: NP_002651 NP_067255,NP_037319,NP_776850,XP_542998,NP_919388,NP_002651 Slide 25 Create a multiple sequence alignment plot using CLC(part1): http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212 part1.swf Create a multiple sequence alignment plot using CLC (part2): http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212- part2.swf Create a multiple sequence alignment plot: http://media.hsls.pitt.edu/media/clres2705/msa.swf http://media.hsls.pitt.edu/media/clres2705/msa.swf Compare two peptide sequences.: http://media.hsls.pitt.edu/media/clres2705/blast2.swf http://media.hsls.pitt.edu/media/clres2705/blast2.swf Videos Slide 26 Starting with a short peptide sequence find: the whole protein sequence orthologs in other species (nematode) Tool: UCSC BLAT NCBI BLAST against SwissProt Slide 27 Peptide to whole protein Peptide seq: SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR Slide 28 Place a mRNA or peptide sequence into the human genome (BLAT): http://www.hsls.pitt.edu/molbio/videos/play?v=12e Find homologous sequences: http://media.hsls.pitt.edu/media/clres2705/blast.swf Videos Slide 29 Find homologous sequence SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR Slide 30 Sequence Manipulation & Format Conversion Sequence Manipulation Suite http://bioinformatics.org/sms2/ http://bioinformatics.org/sms2/ Readseq http://thr.cit.nih.gov/molbio/readseq/ http://thr.cit.nih.gov/molbio/readseq/ GenePept FASTA Slide 31 Hands-On Retrieve amino acid sequence present between position 25 to 45 in Sequence A (MS Word Doc) Identify the rat gene which encodes this peptide fragment and retrieve its whole protein sequence Find the fruit fly homolog of this protein. What % identity the fruit fly protein shares with its rat homolog? Predict potential MAPK phosphorylation sites present in the fruit fly protein Slide 32 Protein Domain Search: InterPro Scan InterPro is a database of protein families, domains, regions, repeats and sites in which identifiable features found in known proteins can be applied to new protein sequences. >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK Slide 33 Videos: Find protein domains, PTM, secondary str etc: http://media.hsls.pitt.edu/media/clres2705/unipro t.swf http://media.hsls.pitt.edu/media/clres2705/unipro t.swf Start with a protein pattern and find what proteins posses that domain: http://media.hsls.pitt.edu/media/clres2705/scanp rosite.swf http://media.hsls.pitt.edu/media/clres2705/scanp rosite.swf Search for protein domains,repeats and sites: http://media.hsls.pitt.edu/media/clres2705/interpr o.swf http://media.hsls.pitt.edu/media/clres2705/interpr o.swf Slide 34 Protein Domain Search: ScanProsite >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK Slide 35 Pattern Search [AC]-x-V-x(4)-{ED}: This pattern is translated as: [Ala or Cys]-any-Val- any-any-any-any-{any but Glu or Asp} F-[GSTV]-P-R-L-[G>] Slide 36 Pattern Search Slide 37 Protein Primary Structure Analysis Tool: ExPASy from SIB Calculated Mol Wt Theoritical PI Extinction coefficients Estimated half-life Hydropathicity plot : Kyte & DoolittleKyte & Doolittle Hydrophilicity plot: Hopp T.P., Woods K.R Slide 38 Antigenic Site Prediction Tool: Emboss Antigenic >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK Slide 39 EmBoss Antigenic Antigenic predicts potentially antigenic regions of a protein sequence, using the method of Kolaskar and Tongaonkar.Analysis of data from experimentally determined antigenic sites on proteins has revealed that the hydrophobic residues Cys, Leu and Val, if they occur on the surface of a protein, are more likely to be a part of antigenic sites. A semi-empirical method which makes use of physicochemical properties of amino acid residues and their frequencies of occurrence in experimentally known segmental epitopes was developed by Kolaskar and Tongaonkar to predict antigenic determinants on proteins. Application of this method to a large number of proteins has shown that their method can predict antigenic determinants with about 75% accuracy which is better than most of the known methods. This method is based on a single parameter and thus very simple to use. Slide 40 Transmembrane Region prediction Slide 41 Transmembrane Site Prediction Tool: TMHMM Server >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK Slide 42 Protein Secondary Structure >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK Slide 43 Protein-Protein Interactions Prediction Tool: STRING >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK Slide 44 Hands-on Take the human BCL2 protein sequence and Find its domain architecture Predict the topology of its transmembrane region Design suitable antigenic site for antibody generation What is its calculated Mol Wt and Ext Coefficient? Predict its secondary structure What % of this protein possesses alpha helical structure? Predict its potential interacting partners Slide 45 Hands-on Prediction of potential phosphorylation sites present in a protein sequence. Sequence: human BCL2 >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIF SSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLR QAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWI QDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK Slide 46 Phosphorylation Site Prediction: >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK Tool: NetPhos Slide 47 Phosphorylation Site Prediction: >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRG YEWDAGDVGAAPPGAAPAPGIFSSQPG HTPHPAASRDPVARTSPLQTPAAPGAAA GPALSPVPPVVHLTLRQAGDDFSRRYRR DFAEMSSQLHLTPFTARGRFATVVEELF RD GVNWGRIVAFFEFGGVMCVESVNREMS PLVDNIALWMTEYLNRHLHTWIQDNGG WDAFVELYGPSMRPLFDFSWLSLKTLLS LALVGACITLGAYLGHK Tool: GPS Slide 48 Thank you! Any questions? Carrie IwemaAnsuman Chattopadhyay [email protected]@pitt.edu 412-383-6887412-648-1297 http://www.hsls.pitt.edu/guides/genetics