Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA...

Computational Prediction of Protein-DNA Interactions

Xide Xia Advisor: Dr. Mohammed AlQuraishi

Position Weight Matrix (PWM)

PWMs are often represented graphically as sequence logos.

Phase 1. Select a Model

Protein DNAAtom pairs: {Protein atom, DNA atom} 27*37 = 999

3DNA: Base mutation ➔ Input Feature Vector:

Output vector P: PWM

N* 999

4* N* 999

Kullback–Leibler divergence (KLD)

KLD = 1.3

KLD = 4.4

PWM (P)

Prediction (Q)

PWM (P)

Prediction (Q)

Define: X (Length = 999*N)

4* (999 * N)

Custom Model (Similar to Multinomial Logistic Regression)

Prediction Q

Goal: Minimizing

cvx_begin variable X Q = exp(X * phi) penalty = KL_div(P, Q)

minimize (penalty + |X|) cvx_end

CVX (a Matlab-based modeling system for convex optimization)

Data PWM Protein Sequence

Protein Structure

DNA Structur

ePhase 1 ✓ ✓ ✓ ✓

Data (size = 700)

Result

Nbins=2 Nbins=3 Nbins=4 Nbins=5

KLD 2.5421 2.4352 2.5435 2.5641

Binwidth = 1.3Å

Phase 2. Train Model on More Structure Data

3d-footprint

PDB name & Protein ID

Data PWM Protein Sequence

Protein Structure

DNA Structur

ePhase 1 ✓ ✓ ✓ ✓

Phase 2 ✓ ✓ ✓ ✓

(size: 1200 * 10 ≈ 12,000)

Fit the PWM along the DNA strand

Best structural location of PWM

-12.876

-0.8429

New Result

Lambda 0.1 0.05 0.01 0.005 0.0025

Nbin=3 1.9854 1.9145 1.5917 1.4569 1.3290

Nbin=4 1.9602 1.8704 1.4684 1.3089 1.1848

Predicting Protein-DNA Interactions based on Structures

Binwidth = 1.3Å

Old data

2.4352

2.5435

Phase 3. Train Model on Sequence PWM Protein

SequenceProtein

StructureDNA

Structure

Phase 1 ✓ ✓ ✓ ✓

Phase 2 ✓ ✓ ✓ ✓

Phase 3 ✓ ✓ ✗ ✗

CIS-BP

Protein Sequence

There are many databases available such as CIS-BP, UniProbe, and JASPAR.

(size: 5000 * 10 ≈ 50,000)

HHalign

Modeller

Green: Template Red: New structure

>1bdmA structureX: 1bdm.pdb MKAPVRVAVTGAAGQIGYSLLFRIAAGEMLGKDQPVILQLLEIPQAMKALEGVVMELEDCAFPLLAGLEATDDPDVAFKDADYALLVGAAPRLQVNGKIFTEQGRALAEVAKKDVKVLVVGNPANTNALIAYKNAPGLNPRNFTAMTRLDHNRAKAQLAKKTGTGVDRIRRMTVWGNHSSIMFPDLFHAEVDGRPALELVDMEWYEKVFIPTVAQRGAAIIQARGASSAASAANAAIEHIRDWALGTPEGDWVSMAVPSQGEYGIPEGIVYSFPVTAKDGAYRVVEGLEINEFARKRMEITELLDEMEQVKAL--GLI*

>TvLDH sequence:TvLDH MSEAAHVLITGAAGQIGYILSHWIASGELYGDRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGFVATTDPKAAFKDIDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTNCEIAMLHAKNLKPENFSSLSMLDQNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEGKTQKVVDVLDHDYVFDTFFKKIGHRAWDILEHRGFTSAASPTKAAIQHMKAWLFGTAPGEVLSMGIPVPEGNPYGIKPGVVFSFPCNVDKEGKIHVVEGFKVNDWLREKLDFAQGG*

Method 1 ( Naive)

Method 2 ( similar to EM Algorithm)

1. Among all the output structures of HHalign, select all templates that have the probability to be a true positive higher than 0.9. Take them as inputs of Modeller.

2. Among all the output structures of Modeller, select the one with highest model score to be the “simulated” structure of input protein sequence.

1. Among all the output structures of HHalign,

select all templates that have the probability to be a true positive higher than 0.9. Take them as inputs of Modeller.

2. Among all the output structures of Modeller, select the one has minimal KL divergence result with true PWM on the proposed model.

3. Run the model on selected data and adjust the optimal parameters.

4. Repeat 2~3 until the model is always selecting the same structures. (Converge!)

New constructed structures

New model

Application

• Make prediction on interaction when mutation caused by diseases happen.

• Mutation on proteins (trans) • Mutation on DNA (cis)

Acknowledgements

• Dr. Mohammed AlQuraishi • Prof. Peter Sorger • Saroja Somasundaram • Samuel Cho • All LSP/HiTS Lab members

Thank you!

Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA...

Documents

Transcript of Computational Prediction of Protein-DNA InteractionsComputational Prediction of Protein-DNA...

DNA/Protein structure-function analysis and prediction · StructureFunction Analysis 17 Jan 2006 DNA/Protein structurefunction analysis and prediction • Protein Folding and energetics:

Bioinformatics Master Course: DNA/Protein Structure-Function Analysis and Prediction

Workshop: computational gene prediction in DNA sequences (intro)

Global skin colour prediction from DNA - Springer10.1007/s00439-017-1808... · Global skin colour prediction from DNA ... The scale was established in 1975 by a dermatologist –

Transcription Factor DNA Binding Prediction

Model based prediction of human hair color using DNA variants

R EGULATORY M OTIF F INDING Mohammed AlQuraishi. T ALK O UTLINE Biology Background Algorithmic Problem Papers New Motif Finding Algorithm (MotifCut) Analysis.

Lecture 3 1.Protein Function prediction using network concepts 2.Application of network concepts in DNA sequencing.

Design & Analysis of DNA Microarray Experiments in ... · the analysis of DNA microarray data: Class prediction methods. Journal of the National Cancer Institute 95:14-18, 2003 •

Direct inference of protein DNA interactions using ...scholar.harvard.edu/files/alquraishi/files/Full Text.pdf · Direct inference of protein–DNA interactions using compressed sensing

DNA/Protein structure-function analysis and prediction

DNA-binding Residues and Binding Mode Prediction with Binding-Mechanism Concerned Models

Blind Testing and Evaluation of a Comprehensive DNA ...Prediction Results Methods Introduction Blind Testing and Evaluation of a Comprehensive DNA Phenotyping System Rachel Wiley1,

Design & Analysis of DNA Microarray Experiments …...class prediction using gene expression profiles. Journal of Computational Biology 9:505-511, 2002 • Simon R. Using DNA microarrays

Prediction of Complete Gene Structures Human Genomic DNA

Huda Alquraishi Rashid hospital Dubai - UAEs-Presentation.pdf · Huda Alquraishi Rashid hospital Dubai Huda Al Quraishi. Agenda • Introduction • Case presentation • Guidelines

Bioinformatics Master Course II: DNA/Protein structure-function analysis and prediction

Deeplex M TB...2020/07/06 · better, faster and culture-free, A new Mycobacterium tuberculosis drug resistance prediction assay, DNA extraction and purification Highlights • Prediction

Targeted RNA and DNA sequencing in disease prediction ...• Down to 2 ng fresh DNA QIAseq Targeted RNAscan Panels • UMIs • Known fusion genes (validation) • Unknown partners

Accurate and sensitive quantification of protein-DNA binding affinity · 2018-04-02 · prove prediction, partly because it better captures the effect of variation in DNA shape on