TRANSCRIPTION FACTOR BINDING POSITIONS ... - CAE...

TRANSCRIPTION FACTOR BINDING POSITIONS PREDICTION WITH CNN

Bowen Hu

TRANSCRIPTION FACTORS (TF)

➤ Protein that binds to a specific DNA sequence.

➤ Key in regulating (turning on or off) gene expression.

➤ Understanding TF binding locations will help us understand which part of gene is expressed in specific cell lines(skin tissue, brain tissue…).

➤ Together with genotype and expression analysis, we can take a step forward on understanding biological processes and disease states.

CHIP-SEQ DATA

➤ ChIP-sequencing is a method used to analyze protein interactions with DNA.

➤ Finding all possible combination of TFs and cell lines is expensive and time consuming.

➤ Method for precisely predicting whether a TF will bind to some sequence is necessary.

DATA PREPARATION

➤ Chip-seq data are downloaded from ENCODE, https://www.encodeproject.org, experiment ENCSR101FJT.

➤ TF in the experiment is ZNF143-human with sample size 21679.

➤ Data cleaning:

Remove missing values

Generate negative samples.

Truncate sequence with fixed length 60 so that it can be expressed as image.

DATA TRANSFORMATION

➤ One-hot encoding.

➤ DNA sequence “ACTA” will be expressed as:

➤ The first DNA sequence input AAAGAATCCAGCTTAAATCGAis shown next page to illustrate CNN model.

Convolution Neural network

WHY CNN?

➤ Traditional methods for predicting TF binding position are based on position weight matrices (PWMs) or motifs.

➤ People use likelihood ratio test or score test to make decision.

➤ These methods are not using Chip-seq information directly, but summary statistics.

➤ There are some other biological features influencing binding behavior as well.

➤ I would expect higher accuracy if apply a model built on Chip-seq data directly (CNN).

CNN TRAINING

➤ Hyper-parameter selection.

Sequence length: 60;

Feature matrix (motif) length: 10;

Number of features: 600;

Window size of max pooling layer: 60;

Fully connection layer size: 50;

➤ Data seperation:

70% for training, 20% for validation, 10% for testing.

CNN VISUALIZATIONConvolution layer

Max pooling

RESULTS

➤ The accuracy rate of predicted value is 60% with CNN.

➤ Comparing to 90% accuracy rate of a current prevailing method gkm-SVM, it is not a desirable result.

DISCUSSION

➤ Possible reasons for worse accuracy rate of CNN:

Misleading negative samples.

CNN failed to capture the non-linear feature due to limit of layers.

Hyper-parameter can be improved.

Failed to involve tissue information.

➤ Improvement & future work

Generate new negative samples.

Add more layers of CNN.

TRANSCRIPTION FACTOR BINDING POSITIONS ... - CAE...

Documents

Transcript of TRANSCRIPTION FACTOR BINDING POSITIONS ... - CAE...

Stochastic Programming – Recourse Models - CAE Usershomepages.cae.wisc.edu/~linderot/classes/ie495/lecture4.pdf · Stochastic Programming – Recourse Models Prof. Jeﬀ Linderoth

Diamond Price Model - CAE Usershomepages.cae.wisc.edu/~ece539/project/f17/Yang_rpt.pdfThe last C of 4Cs is the carat, the diamond’s physical weight measured in metric carats. One

Testing Different Classification Approaches Based on Face ...homepages.cae.wisc.edu/~ece539/fall13/project/Abulila_rpt.pdf · The project has been concerned on the pattern classification

homepages.cae.wisc.eduhomepages.cae.wisc.edu/~ece539/project/s16/Xu_rpt.docx · Web viewGeorge Bush ”’ Feature vectors The features being analyzed in this project are Race Age

LEAST MEAN SQUARE ALGORITHM - CAE Usershomepages.cae.wisc.edu/~ece539/resources/tutorials/LMS.pdf · Figure 6.4, shown below, gives a comparison of the normalized array factor plots

Interconnect Delay Models - CAE Usershomepages.cae.wisc.edu/~ece902/LectureNotes/Simulation_1up/lec3a.pdfInterconnect Delay Models. Basic Circuit Analysis Techniques • Output response

project - CAE Usershomepages.cae.wisc.edu/~ece539/project/f01/raytan.pdf · of this project is to investigate the use of an artificial neural ... The only time when an error occurred

Chip-to-Chip and On-Chip Communications - InTech - Opencdn.intechopen.com/.../InTech-Chip_to_chip_and_on_ch… · · 2013-03-11Chip-to-Chip and On-Chip Communications 3 of silicon

Radioisotope Batteries for MEMS - CAE Usershomepages.cae.wisc.edu/~blanchar/res/BlanchardKorea.pdf · Radioisotope Batteries for MEMS Jake Blanchard University of Wisconsin January

CHIP / Recien Nacido de CHIP Perinate...Provider Directory CHIP / CHIP Perinate Newborn Directorio de Proveedores CHIP / Recien Nacido de CHIP Perinate Bexar Service Area Member Services

Predicting Results of Brazilian Soccer League Matcheshomepages.cae.wisc.edu/~ece539/fall13/project/TrindadeTavares_rpt.pdf · Predicting Results of Brazilian Soccer League ... many

Suggested Applications Projectshomepages.cae.wisc.edu/~ece539/project/f03/yang.doc · Web view% File: LE_bestconf - find the best configuration % Description: % using 3-way cross

homepages.cae.wisc.eduhomepages.cae.wisc.edu/~ece539/project/f03/carey.doc · Web viewUsing neural networks, associative memories are able to recall the desired information given

Soccer Matches Results Prediction Using Multi Layer Perceptronhomepages.cae.wisc.edu/~ece539/fall13/project/ParreirasCouto_rpt.… · Soccer Matches Results Prediction Using Multi-Layer

Tactile Auditory Sensory Substitution - CAE Usershomepages.cae.wisc.edu/~bme200/sensory_substitution_f06/... · 2006. 12. 13. · Sensory Substitution Sensory substitution is presenting

Microgrid Energy Management System - CAE Usershomepages.cae.wisc.edu/~manur/MS_Research_AshrayManur.pdf · Microgrid Energy Management System by ... 1.1 Introduction ... 4 Microgrid

Engineering Design Representation - CAE Usershomepages.cae.wisc.edu/~me232/tolerance_info/fastener_handout.pdf · 2 Introduction: Engineering Design Representation Despite advances,

Patel Traders - IndiaMARTLEXMARK CHIP & RESETTER DEVICE Compatible Chip For Lexmark X463 Chip Compitiable chip for Lexmark-T650-T652-T654-T656-25K Compatible Chip for Samsung 406 Chip

Predicting NFL Game Outcomes - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/McBride... · 2013-12-26 · Penalties make it difficult for the offense to string together a

bithikablogs.files.wordpress.com · Web viewChoc Chip Banana Loaf. Choc Chip Banana Loaf. Choc Chip Banana Loaf. Choc Chip Banana Loaf. Choc Chip Banana Loaf. Choc Chip Banana Loaf.