Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson...

32
Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Transcript of Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson...

Page 1: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Protein Sequence, Structure, and Function Lab

Gustavo Caetano - Anolles

1

PowerPoint by Casey Hanson

Page 2: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

2

Exercise

In this exercise we will be doing the following:

1. Visualize the structure of various proteins in the Protein Data Bank.

2. Use the Superfamily HMM tool to uncover common protein domains in aligned sequences.

3. Reconstruct Phylogenies of Structurally Related Proteins Using Mr. Bayes.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

Page 3: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

3

Step 0: Local Files

For viewing and manipulating the files needed for this laboratory exercise, insert your flash drive.

Denote the path to the flash drive as the following:

[course_directory]

We will use the files found in:

[course_directory]/08_Protein_Structure/data/

Page 4: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Visualization in PDBIn this exercise, we will become familiar with the Protein Data Bank, a database that provides various information on the structure and function of proteins. We will concentrate on Acyl Phosphotase (2ACY) in our exercises.

We will primarily be using this tool to visualize the 3D structure of proteins in the browser, and then making predictions on their secondary structure from this view. We will validate our predictions using a PDBsum Hera Diagram.

Additionally, we will use CATH (a tool that imposes a hierarchical structure to PDB) to look at the folds (hierarchy) for 2ACY.

4

Page 5: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

5

Step 1A: Accessing PDB

Open a browser and go to the following web address:

http://www.rcsb.org/pdb/

In the search box, type the PDB ID of Acyl Phosphotase and press Enter:

2ACY

Page 6: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

6

Step 1B: Accessing PDBOn the right side of the screen, under biological assembly, by 3D View: click JSmol.

On the next page you may get warnings regarding Java.

If so follow the directions on the next slide.

1. If Java™ needs your permission to run, click Run This Time

2. If a Security Warning pops up, select the checkbox and click Run.

3. If a Block Window pops up, select Don’t Block

Page 7: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

7

Step 2: Visualization of 2ACY

The visualization window should look something like the this:

Holding Left Click down and moving the mouse should enable you to rotate the protein in 3D space!

Look at the protein. Can you detect what its secondary structure is from this 3D diagram?

Write down your prediction in Notepad and we will test it next.

Page 8: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

8

Step 3A: Verification of 2ACY Secondary Structure

Though we could do this in PDB, we will consult a secondary resource to verify our prediction.

Go to the following web address:

http://www.ebi.ac.uk/pdbsum/

In the search box PDB code box, type 2ACY and click Find.

Page 9: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

9

Step 3B: Verification of 2ACY Secondary Structure

Under the Protein Chain header click the

The Protein Chain page should look like the following:

Page 10: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

10

Step 3C:Verification of 2ACY Secondary Structure

How does your prediction compare with this domain?

Click on the domain icon on the right side of the screen for a nice diagram of the domain.

N terminus

C terminus

sheet

Helix

Page 11: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

11

Step 4A: Analysis of Folds for 2ACY

Now, we will look at 2ACY’s fold in the CATH hierarchy.

CATH (Class, Architecture, Topology, and Homologous Superfamily) is a novel hierarchical clustering of proteins according to these 4 attributes.

To view the CATH hierarchy from our 2ACY Domain page, click on the CATH button .

Page 12: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

12

Step 4B: Analysis of Folds for 2ACY

The resulting page for Domain 2ACYA00 (our one domain in 2ACY) should include the following CATH Classification.

The decomposition of 2ACY according to CATH follows the following path:

Alpha Beta 2-Layer Sandwich Alpha-Beta Plaits

Page 13: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Using blastP For Finding Sequence Matches to GATA-1In this exercise, we will utilize a different BLAST tool called blastP to find all protein sequence matches to GATA-1 (the erythroid transcription factor from an earlier lab).

Using SUPERFAM HMM we will analyze which protein domains these homologous sequences have.

13

Page 14: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

14

Step 5A: BLASTing GATA-1

Go to the following web address to access BLAST:

http://blast.ncbi.nlm.nih.gov/Blast.cgi

The program we want to run is protein blast..

Page 15: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

15

Step 5B: BLASTing GATA-1

The protein FASTA sequence is available in our data directory:

[course_directory]/09_Protein_Structure/data/gata1.fasta

Click the Choose File button and upload our gata1.fasta file.

Under Database choose Protein Data Bank(pdb).

Ensure that for Algorithm, blastp is selected.

Click BLAST.

Page 16: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

16

Step 5C: BLASTing GATA-1

The screenshot below details the correct configuration.

Page 17: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

17

Step 5D: BLASTing GATA-1

The distribution of hits should look similar to below:

Page 18: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

18

Step 5E: BLASTing GATA-1In this step, we will download all of the significant alignments in this plot.

Scroll down the window to the Sequences producing significant alignments box:

Click Select All.

Click Download

Select FASTA (complete sequence)

Click Continue

Page 19: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

19

Step 6A: Running Superfamily HMMThe file from the previous step is available in our data directory as:

[course_directory]/08_Protein_Structure/data/gata1_homologs.fasta

To run SUPERFAMILY HMM go to the following web address:

http://supfam.cs.bris.ac.uk/SUPERFAMILY/hmm.html

Page 20: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

20

Step 6B: Running Superfamily HMM

On the next screen, next to Multiple Sequence FASTA File, click Choose File.

Select our homolog file we just downloaded or the file in the data directory: gata1_homologs.fasta

Ensure that Amino Acid sequence is selected from the dropdown menu at the top.

Ensure Notification is Browser.

Click Submit.

Page 21: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

21

Step 6C: Running Superfamily HMM

Ignore the short sequence warnings and click the View the domain assignment results link at the bottom of the page.

The results are shown in pictorial and tabular form (scroll down on the page) and are sorted according to e-value of whether or not the sequence belongs to a given superfamily.

The picture to the right shows a diverse set of domains showing up.

Page 22: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

22

Step 6D: Running Superfamily HMM

Many homologs though show the same domain family as 1GAT.

In the tabular view, you can see the e-values for superfamily assignments and family assignments for each on of these homologs.

In general, the superfamily assignment must not exceed 0.00001 to be considered significant, while the family assignment can not exceed 0.001.

Those sequences that violate these constraints have their e-values grayed in the tabular view.

Page 23: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Finding Structural Neighbors and Rebuilding Phylogenetic TreesIn this section, we will search a database for sequences with a similar structure to a protein of interest, 3GE4 – a DNA STARVATION PROTEIN. In particular, we will look at Chain A.

Then, utilizing Mr. Bayes we will reconstruct a Phylogenetic Tree utilizing the alignment data we get from DALI, our structural alignment program.

23

Page 24: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

24

Step 7A: DALI

There is a nice web interface for using DALI at the following link:

http://ekhidna.biocenter.helsinki.fi/dali_server/start

To run our query against the database we need to just specify two things.

In PDB identifier type 3GE4

In Chain type A.

NOTE: DO NOT CLICK SUBMIT. WE HAVE PRECOMPUTED THE RESULTS.

Page 25: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

25

Step 7B: DALIThe DALI results for this protein-chain have already been computed and are available in the HTML file in our data directory.

[course_directory]/08_Protein_Structure/data/Dali_mol1A.html

In the browser, it should look similar to the following: a ranked list of sequences to the query (3GE4) decreasing in similarity.

Page 26: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

26

Step 7C: DALI

Select the following hits: (Ctrl-F to search for something in the web page)

3ge4-A1tjo-C1ji4-L1bcf-H3uoi-J1eum-A

Click on Structural Alignment

Page 27: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

27

Step 7D: DALI

The structural alignment is shown below where the TOP figure shows the alignment of the residues while the BOTTOM figure shows the secondary structure identifier for the residue (L = coil, H= Helix, E = Strand).

Page 28: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

28

Step 8A: Reconstructing Phylogenies Using Mr. Bayes

A nexus file for the tracks we selected in the previous stage is provided in the data directory:

[course_directory]/08_Protein_Structure/data/alignment.nex

We will run a program called Mr. Bayes that will reconstruct the phylogenies from these structural alignments.

Its icon is located on the desktop.

Page 29: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

29

Step 8B: Reconstructing Phylogenies Using Mr. Bayes

Unfortunately, Mr. Bayes does not handle paths well.

In order to use our alignment.nex file, we have to copy it into the directory where Mr. Bayes is installed.

To navigate to this directory, Right Click on the Mr. Bayes icon on the Desktop.

Click Find Target…

Page 30: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

30

Step 8C: Reconstructing Phylogenies Using Mr. Bayes

Open up our data directory in a window side by side with our Mr. Bayes directory.

Drag our alignment.nex file to the Mr. Bayes directory.

Page 31: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

31

Step 8D: Reconstructing Phylogenies Using Mr. Bayes

$ execute alignment.nex

$ showmodel

$ set autoclose=yes; # close chains and go to next statement

$ mcmcp ngen=10000 printfreq=100 samplefreq=100 nchain=4 savebrlens=yes filename=alignment;

# define parameters of the run

$ mcmc; # Run Markov Chain Monte Carlo

$ sump # Summarize your mcmc results

$ sumt # Output Trees

Run the following commands in Mr. Bayes to reconstruct the phylogeny.

Page 32: Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.

Protein Sequence, Structure, and Function | Gustavo Caetano - Anolles | 2015

32

Step 9: Analyzing the Phylogenies

The phylogeny is shown in the output of Mr. Bayes.

A screenshot is shown below.