Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou,...

42
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department of Health Information Management

Transcript of Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou,...

Page 1: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Genomics and Personalized Care in

Health Systems

Lecture 9 RNA and Protein Structure

Leming Zhou, PhD

School of Health and Rehabilitation Sciences

Department of Health Information Management

Page 2: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Outline• RNA structure

• Protein structure

• Pharmacogenomics

Page 3: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Two Types of Genes• Protein coding genes

– Common patterns: promoter region, start codon, codons, stop codon

– Translated to protein sequence

• RNA genes– No consistent patterns common to all RNA genes

– Not translated to proteins

– Functional as RNA molecules

Page 4: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Types of RNA• mRNA: messager RNA

• tRNA: transfer RNA for providing codons and amino acids

• rRNA: ribosomal RNA for protein translation

• miRNA: MicroRNAs are small (22 nucleotides) non-coding RNA gene products that seem to regulate translation

• snRNAs: small nuclear RNAs– Spliceosomal RNAs found in spliceosome which is

involved in splicing

– Small nucleolar RNA located in the nucleolus

Page 5: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

RNA Genes• RNA has various functions

• There are software developed to search for RNA genes in the genome.– tRNAscan searched for tRNA

Page 6: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

RNA Databases• Ribosomal RNA database

– Ribosomal Database Project: http://rdp.cme.msu.edu/

• tRNA Databases– Genomic tRNA Database: http://gtrnadb.ucsc.edu/

• snoRNA Databases– Yeast snoRNA Database:

http://people.biochem.umass.edu/fournierlab/snornadb/main.php

Page 7: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Secondary and Tertiary Structure• RNA sequence RNA structure

– folding and pairing of bases within the sequence

• Canonical pairing: G-C and A-U– G-C pairing give more energetic stability (3 bonds)

• Non-canonical pairing: G-U (very common), A-C, A-G, etc.

• Double stranded regions and loop regions are the secondary structure elements

• Tertiary structure is the interaction between secondary structure elements

Page 8: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

RNA Secondary Structure• For RNAs, secondary structures are conserved,

but primary sequences are not necessarily conserved

http://rnajournal.cshlp.org/content/10/10/1541/F1.expansion

Page 9: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

RNA Structure Prediction Methods

• Sequence and base pairing patterns

• Energy minimization– Find the energetically most stable structure

– Energy calculations based on base pairings

– All possible structures are sampled using the Monte Carlo method

– Zuker and Stiegler (1981) used dynamic programming and energy rules to get the energetically most favorable structure.

– Mfold is software developed by Zuker and co-workers. It is very computationally expensive and can be used on a maximum of about 1000 nucleotides.

Page 10: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Exercises

Use mfold to predict the secondary structure of a RNA sequence

GTTTCCGTAGTGTAGTGGTTATCACGTTCGCCTCACACGCGAAAGG

TCCCCGGTTCGAAACCGGGCGGAAACA

http://mfold.rna.albany.edu/?q=mfold

Page 11: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Page 12: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Page 13: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Page 14: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Protein Structure

Page 15: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Four Levels of Protein Structure• Primary Structure – Sequence of amino acids

• Secondary Structure – Local Structure such as

alpha-helices and beta-sheets

• Tertiary Structure – Arrangement of the secondary structural elements to give 3D structure of a protein

• Quaternary Structure – Arrangement of the subunits to give a protein complex its 3D structure

Page 16: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Protein Basic Structure• A protein is made of a chain of amino acids

• A amino acid sequence is generally reported from the N-terminal end to the C-terminal end

J. Biol. Chem. 1973, 248, p. 7670

Page 17: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Page 18: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Secondary Structure (Helices)

Page 19: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Helix Examples

Page 20: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Secondary Structure (Beta-sheets)

Page 21: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Beta Sheet Examples

Parallel beta sheet Anti-parallel beta sheet

Page 22: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Beta Sheet Examples (Cont’d)

Page 23: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Protein Structure Example

Beta Sheet

Helix Loop

ID: 12as2 chains

Page 24: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Protein Classification

Page 25: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Domain and Motif• Domain: a discrete portion of a protein assumed

to fold independently of the rest of  the protein and possessing its own function.– Most proteins have multiple domains

• Motif:– Frequently occurring structure patterns among multiple

proteins

Page 26: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Protein Classification• Family: the proteins in the same family are

homologous, evolved from the same ancestor. Usually, the identity of two sequences are very high.

• Super Family: distant homologous sequences, evolved from the same ancestor. Sequence identity is around 25%-30%.

• Fold: only shapes are similar, no homologous relationship. Usually, sequence identity is very low.

• Protein classification databases: SCOP, CATH

Page 27: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

SCOP• The SCOP database aims to provide a detailed

and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known.

• Proteins are classified to reflect both structural and evolutionary relatedness. – Many levels exist in the hierarchy

– The principal levels are family, super family and fold

Page 28: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

CATH• CATH is novel hierarchical classification of

protein domain structures, which clusters proteins at four major levels:– Class

– Architecture

– Topology

– Homologous super family

Page 29: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

CATH-Protein Structure Classification

Class

Architecture

Topology

Page 30: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Protein Structure Determination

Page 31: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Experimental Methods for Protein Structure Determination• X-ray crystallography

– Crystallize proteins

– Measure X-ray diffraction pattern

• NMR spectroscopy– NMR – Use nuclear magnetic resonance to predict distances

between different Functional groups in a protein in solution.

– Calculate possible structure using these distances.

• Neutron diffraction

• Electron microscopy

• Atomic force microscopy

Page 32: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Limitations of Experimental Methods• X-ray Diffraction

– Only a small number of proteins can be made to form crystals

– A crystal is not the protein’s native environment

– Very time consuming

• NMR Distance Measurement– Not all proteins are found in solution

– This method generally looks at isolated proteins rather than protein complexes

– Very time consuming

Page 33: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Computational Structure Prediction• The functions of a protein is determined by its

structure.

• Experimental methods to determine protein structure are time-consuming and expensive.

• Big gap between the available protein sequences and structures.

Page 34: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Observations• Sequences determine structures

• Proteins fold into minimum energy state.

• Structures are more conserved than sequences. If two protein sequences share 30% identical residues, then they have a very good chance to have the same fold.

Page 35: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Prediction Methods• Ab initio folding: build a structure without

referring to an existing structure

• Homology Modeling: sequence-based method

• Protein Threading: sequence-structure alignment

• Consensus Method: vote a prediction from some candidates generated by several prediction programs

Page 36: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Ab Initio Folding• Based on the “first-principle”

• Build structures purely from protein sequences, no templates used

• Unaffordable computing demands

• Paradigm is changing, knowledge-based methods are proposed

Page 37: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Secondary Structure Prediction• Three-state model: helix (H), strand (E), coil (L)

• Given a protein sequence:– NWVLSTAADMQGVVTDGMASGLDKD…

• Predict are secondary structure sequence:– LLEEEELLLLHHHHHHHHHHLHHHL…

– Accuracy: 50-85%

Page 38: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Predict Protein Secondary Structure Using PredictProtein• Protein Sequence>gi|22330039|ref|NP_683383.1| unknown protein; protein id:

At1g45196.1 [Arabidopsis thaliana]

MPSESSYKVHRPAKSGGSRRDSSPDSIIFTPESNLSLFSSASVSVDRCSSTSDAHDRDDSLISAWKEEFEVKKDDESQNLDSARSSFSVALRECQERRSRSEALAKKLDYQRTVSLDLSNVTSTSPRVVNVKRASVSTNKSSVFPSPGTPTYLHSMQKGWSSERVPLRSNGGRSPPNAGFLPLYSGRTVPSKWEDAERWIVSPLAKEGAARTSFGASHERRPKAKSGPLGPPGFAYYSLYSPAVPMVHGGNMGGLTASSPFSAGVLPETVSSRGSTTAAFPQRIDPSMARSVSIHGCSETLASSSQDDIHESMKDAATDAQAVSRRDMATQMSPEGSIRFSPERQCSFSPSSPSPLPISELLNAHSNRAEVKDLQVDEKVTVTRWSKKHRGLYHGNGSKM

• PredictProtein web server:– http://www.predictprotein.org

Page 39: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Read the Results

Page 40: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Page 41: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Evolutionary Methods• Taking into account related sequences helps in

identification of “structurally important”residues.

• Algorithm:– Find similar sequences

– Construct multiple alignment

– Use alignment profile for secondary structure prediction

• Additional information used for prediction– Mutation statistics

– Residue position in sequence

– Sequence length

Page 42: Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Department of Health Information Management

Sequence Similarity Methods for Structure Prediction• These methods can be very accurate if there is

>50% sequence similarity

• They are rarely accurate if the sequence similarity <30%

• They use similar methods as used for sequence alignment such as the dynamic programming algorithm, hidden markov models, and clustering algorithms.