Capstone Project Presentation

35
Friday 17 rd December 2004 Stuart Young Capstone Project Presentation Capstone Project Presentation Predicting Deleterious Predicting Deleterious Mutations Mutations Young SP, Radivojac P, Mooney SD Young SP, Radivojac P, Mooney SD

description

Capstone Project Presentation. Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD. Predicting Deleterious Mutations. Deleterious “Hurtful or injurious to life or health; noxious” (Oxford English Dictionary) - PowerPoint PPT Presentation

Transcript of Capstone Project Presentation

Page 1: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Capstone Project PresentationCapstone Project Presentation

Predicting Deleterious Predicting Deleterious MutationsMutations

Young SP, Radivojac P, Mooney SDYoung SP, Radivojac P, Mooney SD

Page 2: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

DeleteriousDeleterious““Hurtful or injurious to life Hurtful or injurious to life

or health; noxious”or health; noxious”(Oxford English Dictionary)(Oxford English Dictionary)

““Tis pity wine should be so Tis pity wine should be so deleterious, For tea and coffee deleterious, For tea and coffee leave us much more serious.leave us much more serious.””

((BYRONBYRON JuanJuan IV, 1821) IV, 1821)

Page 3: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

SNPsSNPs

What is an SNP (single What is an SNP (single nucleotide polymorphism)?nucleotide polymorphism)? Why are SNPs important?Why are SNPs important? Some SNPs are Some SNPs are nonsynonymousnonsynonymous The molecular effects of SNPs The molecular effects of SNPs vary widelyvary widely

Page 4: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

MOTIVATIONMOTIVATION

Improve on the existing Improve on the existing deleterious prediction methods deleterious prediction methods Use protein sequence, Use protein sequence, evolution and structure data evolution and structure data combined with machine learning combined with machine learning to identify potentially disease-to identify potentially disease-causing SNPscausing SNPs

Page 5: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

SNP data is increasingly SNP data is increasingly availableavailable

Over 40 major online databasesOver 40 major online databases dbSNP is the primary SNP dbSNP is the primary SNP database (contains 5,000,000+ database (contains 5,000,000+ validated human SNPs) validated human SNPs) Many databases contain Many databases contain potentially disease-causing SNPs potentially disease-causing SNPs related to a particular diseaserelated to a particular disease

Page 6: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Deleterious effects of Deleterious effects of mutations on proteinsmutations on proteins

FunctionFunction StabilityStability ExpressionExpression Protein-Protein Protein-Protein InteractionsInteractions

Page 7: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Current Classification ToolsCurrent Classification Tools

Sequence Approaches BLOSUM62

An amino acid substitution score matrix

SIFT Collects sequence homologues in multiple alignments and identifies non-conservative changes in amino acidsNg P & Henikoff S, 'Predicting Deleterious Amino Acid Substitutions‘. Genome Research, 2001, 11:863-874.

Page 8: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Current Classification ToolsCurrent Classification Tools

Structural Approaches

Expert rulesUses evolutionary and structural dataSunyaev et al, 'Prediction of deleterious human alleles‘. Human Molecular Genetics, 2001, Vol. 10, No. 6, 593.

Decision Trees Improved performance based on

sequence and structural data Produces intuitive rules

Page 9: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Our foundation for the projectOur foundation for the project

Saunders CT & Baker D

‘‘Evaluation of Structural and Evolutionary

Contributions to Deleterious Mutation Prediction’

J. Mol. Biol. (2002) 322, 891–901

Structural and evolutionary Structural and evolutionary featuresfeatures Trained classifiers based on Trained classifiers based on two data sets - experimental two data sets - experimental mutations and human alleles mutations and human alleles

Page 10: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

S & B - Training Sets Experimental mutations (~5,000)

HIV-1 proteaseE. Coli Lac repressor

T4 Lysozyme Human alleles (~350 mutations) 103 ‘hot’ human genes

Page 11: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Why two training sets? Unbiased human data is hard to get:

Many disease-associated mutations are discovered through genetics association studies and may not be causative (i.e., only linked with the causative allele) Effect of mutations is hard to measure

Experimental ‘whole gene mutagenesis’ data is used considered ‘unbiased’

Page 12: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Features used in S&B

Study

SIFT SIFT + Solvent Accessibility(SA) SIFT + normalized B-factor SIFT + Sunyaev expert rules SIFT + SA + B-factor

Page 13: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Hypothesis

Can we improve on the results of Saunders and

Baker by using more structural and sequence

properties?

Page 14: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Experimental Design Classification algorithm

Decision Trees Support Vector Neural Nets

Additional Features Amino acid relative frequencies Additional structural properties

Page 15: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Structural Property ValuesRuss Altman (Stanford) developed a vector representation of protein

structural sites Spheres (1.875Å → 7.5Å) centered on C-alpha atom of the mutation position 66 features Atom/residue counts within sphere and other features, e.g.:

Solubility Solvent accessibility

Page 16: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Amino Acid Windows

AA frequencies within a window on either side of the mutation position 20 AAs = 20 features LEFT and RIGHT → 40 features

Page 17: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Amino Acid Windows

Page 18: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Tools

Databases PDB - Protein structure data S-BLEST - Structural features

Software Perl 5.8.0 Matlab (NN, PRTools(DT), SVC)

Page 19: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

List of Features Used BLOSUM62, disorder, secondary structure, molecular weight Grouped amino acid frequency windows of varying widths SIFT S-BLEST (vector contains four sub-shells spreading outward from site) Solvent accessibility (C-beta density, i.e., the number of C-beta atoms around the site)

Page 20: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Comparison with S&B

Results

Page 21: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

1. Human Data Set Human allele dataset as train and test set Ensembles of decision trees for classification 20-fold cross validation Progressively added features to see their affect on performance Because structural data was not available for all mutation sites, we used a subset of the original Saunders and Baker training set

Page 22: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Best Features

Page 23: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

1. Experimental Data Set Same as human data set but using experimental mutations for training and testing

Page 24: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Evaluation of S-BLEST Using a Random Subset of the Experimental Training

Set

Page 25: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

3. Cross-classification Used the same features described above Trained on one dataset and tested on the other:

Human to experimental Experimental to human Experimental gene to exp. gene

Page 26: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Page 27: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Page 28: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Page 29: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Page 30: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Summary of Results

Human data set80% accuracy (up from 70%)

Experimental data set87% accuracy (up from 79.5%)

Page 31: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Conclusion

Prediction tools CAN identify deleterious mutations We believe that further study is warranted to identify over-fitted classifiers to further improve classification accuracy on real world data

Page 32: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

AcknowledgementsAcknowledgements

PeopleAndrew Campen (CCBB IT, IUPUI)

Brandon Peters (CCBB, IUPUI)Haixu Tang (Capstone Coordinator, IUB)

FundingThis work was funded by a grant from the

Showalter Trust (Sean Mooney, PI), INGEN, and a IUPUI McNair Scholarship. The Indiana

Genomics Initiative (INGEN) Indiana University is supported in part by Lilly Endowment Inc.

Page 33: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Thank You

Page 34: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations

Page 35: Capstone Project Presentation

Friday 17rd December 2004 Stuart Young

Predicting Deleterious Predicting Deleterious MutationsMutations