Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating...

28
Intro du cti o n Prev ious Work Method Backup Acknowledg ments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan Advisor: Dr. Predrag Radivojac

Transcript of Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating...

Page 1: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Investigating mRNA’s of intrinsically disordered proteins

Harini Gopalakrishnan

Advisor: Dr. Predrag Radivojac

Page 2: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Basic Facts –mRNA

1. mRNA-Messenger Ribonucleic Acid

2. Nucleic Acid polymer consisting of nucleotide

monomers adenine, guanine, cytosine and uracil

3. Three important types

• rRNA (ribosomal RNA)

• tRNA (transfer RNA)

• mRNA (messenger RNA)

Page 3: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Basic Facts –mRNA (contd)

Encodes and carries information from DNA to protein synthesis

http://en.wikipedia.org/wiki/Image:Mature_mRNA.png

Page 4: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Basic Facts-mRNA (contd)

What is significance of mRNA folding?

Secondary Structures have been used to explain • Translational controls• Regulatory function in the cell especially the non-coding mRNA

What are the different folding algorithms?

• Energy Minimization• Base Pair Maximization • Covariation

Eg: Mfold, Vienna Package

Page 5: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Basic Facts-Disordered ProteinWhat is a disordered Protein?

• lack a well defined three-dimensional structure

• conserved between species in composition and sequence

• presence of low sequence complexity

• amino acid compositional bias away from bulky hydrophobic residues

What are the significance of disorder Proteins?

regulation of transcription and translation, cellular signal transduction, protein phosphorylation, the storage of small molecules and the regulation of the self assembly of large multiprotein complexes such as the bacterial flagellum and the ribosome

Page 6: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Basic Facts-Disordered Protein

What is its role in diseases?

Famous (or infamous?) disorder proteins in diseases

-alpha-synuclein -p53 -proteins in HPV’s linked to Ovarian Cancer

What are the different predictors that are used?(all based on amino acid sequence inputs)

VL2,VSL2,PONDR,VLXTImage Courtesy: http://www.disprot.org

Page 7: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Snapshot from Previous Studies …..

• Third Codon and stability

• Speed of translation and protein secondary structures

-alpha helices and beta sheets

• The three bases in the codon

1st base -Biosynthetic pathway

2nd base -Residue hydrophobicity

3rd base -helix or beta strand-forming potential of amino

acid

Page 8: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

In a Nutshell

• Check if nucleotide composition has a bias towards the proteins being ordered and disordered

• Check if the stability of RNA fold have any say in differentiating the proteins between the two categories.

• Work is different because no study has linked Protein disorder and mRNA composition and stability.

• Also establishing the correlation would open new avenues in studying how protein structure can be inferred directly from its precursor- the mRNA.

Page 9: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Hypothesis

• There should exist some kind of codon bias between the mRNA sequence of ordered and disordered protein

• There should be a difference in folding energy stability between the mRNA of ordered and disordered proteins

• There is a correlation between the age of codons and disordered proteins

Central dogma

Page 10: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Method

•Data Collection

•Implementation

•Analysis

•Future Work

Page 11: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Data CollectionOne of the important phases , as the whole significance of the analysis lies on the quality of data set selected for both the categories of proteins.

After experimentation with various other databases, proteins were finally taken from the unigene90, DisProt and PDB

Disorder was predicted using VSL2B

True Dataset(Experimentally Verified)

Predicted Dataset (From disorder predictors)

Dataset

Page 12: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Data Collection

Once we have the proteins of interest, we use Uniprot to webmine the protein and corresponding mRNA dataset based on their unigene id

Problem!

•Introns

•Poly A tails, which need to be removed

We need a clean data set, in order to study Codon Usage, and nucleotide composition

Page 13: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Solution - Alignment BLAST

•Proved to be efficient while aligning the ordered proteins

•Extremely inefficient while aligning protein vs. mRNA for the disordered set of proteins

•Disorder proteins have more low complexity region

WISE

•Software by the EMBL institute to align protein vs. nucleotide data

•Uses Markov Chain methods to make gene predictions and hence identifies introns

•Extremely efficient and provided qualitative datasets

Page 14: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Data Collection-Final input Statistics

343

151

9681

Predicted Order

Predicted Disorder

True Order

True Disorder

Page 15: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Reference

Background

Method -Overview

Analyzed mainly two characteristics of mRNA

Nucleotide Composition of mRNA

• Codon Usage

• Nucleotide Composition

RNA Folding Energy and Base Pair analysis using Mfold

• number of base pair formation

• total minimum free energy per RNA fold between

Page 16: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Reference Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Background

Methods

Mfold Snapshot

Page 17: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Background

Mfold -Overview

What is Mfold?

A mRNA secondary structure prediction algorithm by M. Zuker and N.Markham

How does it work?

It is based on the nearest neighbor thermodynamic rules in which free energies are assigned to loops rather than base pairs. It tries to predict the optimal structure by minimizing the overall free energy of the structure formed by coaxial stacking of helices.

What does it output?

Several output files for every optimal and sub optimal folds within the allowable energy range are obtained. Energy dot plot (on the right) is one important component of this predictor output

Page 18: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Background

Tools Employed

•Parsing and mining information on Web done by PERL

• Analysis and graphs done using MATLAB

• Reporting and graphs done in Excel

• Disorder Prediction using mRNA inputs was done in MATLAB using SVM

Method

Page 19: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Reference Introduction

Previous

Work

Method

Backup

Acknow

ledgments

Background

Results

Page 20: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Nucleotide Composition

Nucleotide DP OP P-Value (DP, OP)

A 0.267 0.239 0.0067

C 0.275 0.256 0

G 0.291 0.250 0

T 0.166 0.255 0

Nucleotide DT OT P-Value (DT, OT)

A 0.275 0.267 1.06E-02

C 0.270 0.247 5.04E-17

G 0.271 0.259 4.55E-05

T 0.183 0.226 5.21E-57

True Dataset

Predicted Dataset

Introduction

Previous

Work

Method

Result

s

Acknow

ledgments

Reference

Background

Nucleotide Composition

Page 21: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Analysis of Codon Age

Analysis based on the Composition of mRNA

Introduction

Previous

Work

Method

Result

s

Acknow

ledgments

Reference

Background

OldNew

Amino acid

New

codon

14 out of 18 Amino Acids have Disorder promoting Codon as the older one

2 amino acids (M and W) are neutral as they have only one codon each

Page 22: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Base Composition

Third Base Second Base First Base

Base OP DPTotal % OP DP

Total % OP DPTotal %

G 4 9 13 26.07 9 5 14 -4.88 10 6 16 -3.57

C 4 12 16 38.57 10 6 16 -3.57 8 8 16 10.48

T 14 2 16 -31.67 9 6 15 -0.71 7 5 12 0.83

A 13 1 14 -32.98 7 7 14 9.17 10 5 15 -7.74

Predicted Dataset

Introduction

Previous

Work

Method

Result

s

Acknow

ledgments

Reference

Background

Base Composition Preferential selection of

codons with “g” or “c” for the

third base

Base Third Base

Order Disorder T-test R Test

g 33949 11475 4.49E-48 0.3263

c 26576 8594 1.17E-15 0.4424

t 23488 6404 3.03E-11 0.0308

a 31324 7721 2.40E-64 0.3548

Statistical Verification

Page 23: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Energy of Folding and Base Pair

Energy of Folding

Introduction

Previous

Work

Method

Result

s

Acknow

ledgments

Reference

Background

Predicted Dataset

Dataset PP-value

OP DP

Average Minimum Energy (Kcal) -2230 -2487.27 7.08E-03

Average Energy(Kcal) -2170 -2428.29 6.93E-03

Average Length 677.57 679.35 0.87

Page 24: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Energy of Folding and Base Pair

Base Pair Analysis

Introduction

Previous

Work

Method

Result

s

Acknow

ledgments

Reference

Background

Base Pair Analysis

Summary-Nucleotide Analysis OP DP P-Value

OP vs. DP

Average Length 1005.77 732.9 --

Average Bases 0.062 0.050 0.0063

Bonding ability of A 0.118 0.118 0.2367

Bonding ability of C 0.133 0.08 3.33e-06

Bonding ability of G 0.151 0.10 7.72e-08

Bonding ability of T 0.146 0.14 0.81

Page 25: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Energy of Folding and Base Pair

Introduction

Previous

Work

Method

Result

s

Acknow

ledgments

Reference

Background

Sequence Entropy Plot

Page 26: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Future Work Introduction

Previous

Work

Method

Result

s

Acknow

ledgments

Reference

Background

Predictions

Using Support Vector Machines(SVM’s)

• Based on Codon Composition

• Age of Codons

• Base Composition

Accuracies have been good and promising

Aim: To predict disorder from mRNA based on all above information

Page 27: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Future Work Introduction

Previous

Work

Method

Result

s

Acknow

ledgments

Reference

Background

Acknowledgments

Dr. Predrag Radivojac

Dr. Haixu Tang

Dr. Vladimir Uversky

Amrita Mohan

Linda Hostetter

Informatics faculty and staff

My various Course Professors

Friends and Fellow Students

Page 28: Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.

Future Work Introduction

Previous

Work

Method

Result

s

Acknow

ledgments

Reference

Background

References1. http://helix.nih.gov/docs/online/mfold/node3.html2 Jan C Biro Nucleic acid chaperons: a theory of an RNA-assisted protein folding Theoretical Biology and Medical Modeling 2005, 2:35  3 T. A. Thanaraj and p. Argos Protein secondary structural types are differentially coded on messenger RNA Protein Sci. 1996 5: 1973-19834 Taylor FJR, Coates D. 1989. The code within codons. Biosystems 22:177-187.5.Brunak S, Engelbrecht J, Kesmir C. 1994. Correlation between protein secondary structure and the mRNA nucleotide sequence Protein Structure by Distance Analysis. Amsterdam: 10s Press. pp 327-334.6. H Jane Dyson and Peter E Wright Intrinsically Unstructured proteins and their functions Nat Rev Mol Cell Biol. 2005 Mar; 6(3):197-208 7. Dunker, A.K., Brown, C.J., Lawson, J.D., Lakoucheva, L.M, and Obradovic, Z Intrinsic disorder And Protein Function.8 Tompa P Intrinsically Disorder proteins evolve by repeat expansion Bioessays 2003 Sep; 25(9):847-55 9 Svetlana A. Shabalina, Aleksey Y. Ogurtsov, and Nikolay A. Spiridonov A periodic pattern of mRNA secondary structure created by the genetic code Nucleic Acids Res. 2006; 34(8): 2428–243710 Edward N Trifonov Theory of Early Molecular Evolution Landes Biosciences 200611 E.N.Trifonov Consensus temporal order of Amino Acids and evolition of the triplet code Gene 2000 ;( 261):139-15112 Predrag Radivojac, Zoran Obradovic, David K. Smith, Guang Zhu, Slobodan Vucetic, Celeste J. Brown J. David Lawson and A. Keith Dunker Protein flexibility and intrinsic disorder Protein Science (2004), 13:71-8013 N. R. Markham & M. Zuker. UNAFold: software for nucleic acid folding and hybridizing. Methods in Molecular Biology: Bioinformatics. Totowa, NJ: Humana Press, in press.14 Peng K., Radivojac P., Vucetic S., Dunker A.K., and Obradovic Z., Length-Dependent Prediction of Protein Intrinsic Disorder, BMC Bioinformatics 7:208, 2006.15 Gene Ontology: tool for the unification of biology. Nture Genet. (2000) 25: 25-29.16 Brooks D, Singh, M, Fresco J R Selection influences the proteomic usage of a majority of amino acid17 Vucetic S, Obradovic Z, Vacic V, Radivojac P, Peng K, Iakoucheva LM, Cortese MS, Lawson JD, Brown CJ, Sikes JG, Newton CD, and Dunker AK. 2005Disprot: A database of protein disorder Bioinformatics 21:137-140