Protein Structure Exercise

48
Protein Structure Exercise Bioinformatics Tools and Databases Foothill College

description

Protein Structure Exercise. Bioinformatics Tools and Databases Foothill College. Protein Sequences. From NCBI, search either proteins or genomes – using keywords etc. From Genome, type in HIV-1 What links do you have from there? Choose NC_001802, then coding region - PowerPoint PPT Presentation

Transcript of Protein Structure Exercise

Protein Structure Exercise

Bioinformatics Tools and Databases

Foothill College

Protein Sequences

• From NCBI, search either proteins or genomes – using keywords etc.

• From Genome, type in HIV-1

• What links do you have from there?

• Choose NC_001802, then coding region

• From that entry, save FASTA protein

• Identify the gag-pol and env sequence

HIV-1 Gag-Pol AA Sequence>gi|28872819|ref|NP_057849.4| Gag-Pol; Gag-Pol polyprotein [Human immunodeficiency virus 1]

MGARASVLSGGELDRWEKIRLRPGGKKKYKLKHIVWASRELERFAVNPGLLETSEGCRQILGQLQPSLQT

GSEELRSLYNTVATLYCVHQRIEIKDTKEALDKIEEEQNKSKKKAQQAAADTGHSNQVSQNYPIVQNIQG

QMVHQAISPRTLNAWVKVVEEKAFSPEVIPMFSALSEGATPQDLNTMLNTVGGHQAAMQMLKETINEEAA

EWDRVHPVHAGPIAPGQMREPRGSDIAGTTSTLQEQIGWMTNNPPIPVGEIYKRWIILGLNKIVRMYSPT

SILDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNWMTETLLVQNANPDCKTILKALGPAATLEEMMTAC

QGVGGPGHKARVLAEAMSQVTNSATIMMQRGNFRNQRKIVKCFNCGKEGHTARNCRAPRKKGCWKCGKEG

HQMKDCTERQANFLREDLAFLQGKAREFSSEQTRANSPTRRELQVWGRDNNSPSEAGADRQGTVSFNFPQ

VTLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAI

GTVLVGPTPVNIIGRNLLTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEK

EGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGD

AYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQY

MDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWT

VNDIQKLVGKLNWASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSK

DLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPI

QKETWETWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTNRGR

QKVVTLTDTTNQKTELQAIYLALQDSGLEVNIVTDSQYALGIIQAQPDQSESELVNQIIEQLIKKEKVYL

AWVPAHKGIGGNEQVDKLVSAGIRKVLFLDGIDKAQDEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKC

QLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTI

HTDNGSNFTGATVRAACWWAGIKQEFGIPYNPQSQGVVESMNKELKKIIGQVRDQAEHLKTAVQMAVFIH

NFKRKGGIGGYSAGERIVDIIATDIQTKELQKQITKIQNFRVYYRDSRNPLWKGPAKLLWKGEGAVVIQD

NSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED

BLASTing PDB

• Open two browsers

• Open the URL to NCBI BLAST P

• BLAST the PDB database with the amino acid sequence from gag-pol and then env

• Go to http://us.expasy.org/tools/blast/

• BLAST the PDB database as above

• What are the top structures at each site?

Digging Deeper into Sequence

• From the expasy PDB BLAST return:

• Choose another sequence (close or far) and do a Multiple Sequence Alignment

• Choose BLOSSUM or PAM matrices

• View the alignments in HTML format

NCBI BLASTp of PDB

• After doing the BLAST P of PDB:

• Click on related structures to see more

• Follow the PDB links to the MMDB– Hint: you can use some of these structures

at VAST for structure comparisons

• How can you display the structure?

• RasMol, SPDBV, and Cn3D viewers

BLAST P – Other Data

• From NCBI BLAST P – what are the conserved domains that are detected?

• Click on each to find the Pfam entries

• Show domain relatives (CDART)

• (The next two images show results for gag-pol and env proteins – try both)

• Path is from CDD to CDART – explore!

Conserved Domain Databases

• NCBI contains a database of conserved domains. These are linked, by sequence to BLAST and other tools.

• Conserved domains represent “functional folds” in nature’s playbook.

• You can compare your sequence by alignment (Pfam) to other protein folds.

• Use CDART for graphical domain display.

exPASy Proteomics Tools

• http://us.expasy.org/tools/• Protein identification and characterization• DNA -> Protein• Similarity searches, pattern and profile searches• Post translational modifications• Topology prediction• Primary structure analysis• Secondary structure prediction• Tertiary structure• Sequence alignment• Biological text analysis

exPASy ScanProsite

• Go to exPASy ScanProsite

• http://www.expasy.ch/tools/scanprosite/

• Enter either HIV sequence (gag-pol or env) into the search box

• You can choose email data return here

• What are the post translational modifications? Click on the references.

PIR – Georgetown University

• Go to http://pir.georgetown.edu/

• Choose the iProClass database

http://pir.georgetown.edu/iproclass/

• Paste in the gag-pol sequence

• Look at the BLAST hits

• Try the links to domain display and pattern match. What do you see?

Pfam

• Go to The Pfam Home page at: http://www.sanger.ac.uk/Software/Pfam

• Choose Protein search

• Enter the HIV-1 gag-pol sequence

• The search may take 3 to 5 minutes

• The page return will show protein families and conserved domains

SMART

• Simple Modular Architecture Research Tool• Sequence analysis• Architecture analysis• Search with sequence or accession• Don’t forget to check a database:

– Pfam– Signal peptides– Internal peptides

NCBI Structure Tools

• http://www.ncbi.nlm.nih.gov/Structure/

• Modeling to for the MMDB and PDB

• MMDB – Molecular Modeling Data Base

• PDB – Protein Data Bank

• Search by keyword (HIV-1 or gag-pol)

• Follow links in and out of MMDB / PDB

• RasMol, Chime, Cn3D structure viewers

MMDB

• Molecular Modeling Database• http://www.ncbi.nlm.nih.gov/Structure/

MMDB/mmdb.shtml• Contains weekly updates from PDB• “The structure database is considerably

smaller than Entrez's protein or nucleotide databases, but a large fraction of all known protein sequences have homologs in this set”

Cn3D Structure Viewer

• Structure viewer– PC, Unix / Linux, Mac OSX etc.

• Helper application– Structure view / sequence view– Can align and show multiple sequences

• Has a great online tutorial– (read carefully) and try it out!

• Exports files as PNG for great presos too!

1RTH Using Cn3D Saved as PNG

VAST and VAST Search

• Vector Alignment Search Tool• VAST Search is a service that allows

searching for structural neighbors starting with a set of 3D-coordinates specified by the user.

• Type in a structure code (PDB) view similar alignments and click to import.

• (Start by BLASTing the PDB database).

CDD and CDD Search

• http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml

• Enter a sequence, accession number, or search by keyword

• CDD is linked from BLAST so you may enter it while doing sequence analysis

• Where there are CDDs there often is homology – or close cousins (UniGene)

The Protein Machine

• http://www2.ebi.ac.uk/translate/ • For translating nucleotide sequences

into protein in three different modes– You can choose the sense strand or

complement or any reading frame– You can start and end at any position– You can select any translation table

• Or enter an accession number