Post on 05-Dec-2014
description
ConTra v2: a tool to identify transcription factor binding
sites across species, update 2011
Stefan Broos
Prediction of functional regulatory units in noncoding regions
● Look for consensus sequence in certain genomic regions
● Example TATABox consensus sequence TATA(T/A)A(A/T)(A/G)
● >chr1:2375696723757090_hg18_1000_+TTAGTACTTAATGGAGACGGGTGTCATCATATACACAAGTGTTTAAAAATCGTTTATTATGCAAAATGTTAACTTTTATAAAAAGTTTAATATACATCGCATTGTTACAGAAAGTCAC
● Problem: does not take into account the nucleotide frequencies
Prediction of functional regulatory units in noncoding regions
● More advanced way to represent binding sites (and most popular way) is the positional weight matrix (PWM)
● 4xL matrix with L being the length of the binding site
● Each element of the matrix represents the frequency of a certain nucleotide (the 4 rows) at a given position of the binding site
Prediction of functional regulatory units in noncoding regions
● Example of the positional weight matrix of the TATABox:
A [ 61 16 352 3 354 268 360 222 155 56 83 82 82 68 77 ]C [145 46 0 10 0 0 3 2 44 135 147 127 118 107 101 ]G [152 18 2 2 5 0 20 44 157 150 128 128 128 139 140 ]T [ 31 309 35 374 30 121 6 121 33 48 31 52 61 75 71 ]
Prediction of functional regulatory units in noncoding regions
Prediction of functional regulatory units in noncoding regions
● PWMs provide a more natural way to represent and search for binding sites
● Problem: motifs tend to be short and degenerative. No positional dependencies are taken into account...
● Although this is the most popular method, most of the predicted sites are false positive predictions with no known real in vivo functionality (~ Futility theorem)
Prediction of functional regulatory units in noncoding regions
● Solutions:– Use information of flanking sequences
– Use more complex models (biophysical models)
– Use sequence conservation across species (if a site is conserved across species, there is a higher probability the site is functional)
– ...
Prediction of functional regulatory units in noncoding regions
● Solutions:– Use information of flanking sequences
– Use more complex models (HMMs and biophysical models)
– Use sequence conservation across species (if a site is conserved across species, there is a higher probability the site is functional)
– ...
What is ConTra?● A tool to visualize predicted and conserved
transcription factor binding sites in a region of interest
● A tool to explore the regulatory potential of a set of binding sites in a region of interest
● Focus on ease of use● Free access to the latests and most uptodate
versions of the TRANSFAC and JASPAR PWM libraries
What is ConTra?
First version of ConTra
● Published in 2008 by Hooghe, Hulpiau et al.
● Popular tool, cited 23 times
● Had some limitations
ConTra update
What is new?● Update of PWM libraries● More reference species were added
What is new?● Users are no longer restricted to the promoter
region. One can search for binding sites in 5'UTR, 3'UTR, promoter and intron regions
● Users can upload their own matrices (it is as simple as uploading a multifasta file!)
● Users can upload a custom alignment● Noncoding genes are no longer excluded from
the analysis
PWM libraries● TRANSFAC version 2010.04● Jaspar update 2010● Phylophacts 2010● All protein binding microarrays from Berger et
al. Cell, 2008● These PWM libraries are used in combination
with the match scan tool
Alignments in ConTra● Alignments generated using MULTIZ● Downloaded from UCSC genome browser
How does it work?● The analysis consists of a four step process
Step 1– Select type of analysis: visualization or
exploration
– Select species
– Select gene of interest using the gene name or symbol, Ensembl gene ID (ENSG), entrez gene ID, RefSeq (NM_|NR_) or Ensembl transcript ID (ENST)
How does it work?● The analysis consists of a four step process
Step 2– All possible matches with your search term are
listed. Search term is highlighted
– Select 1 transcript of your gene of interest
How does it work?● The analysis consists of a four step process
Step 3– Select a genomic region of interest (promoter, 5'
UTR, 3'UTR, intronic regions)
How does it work?● The analysis consists of a four step process
Step 4– Select up to 20 PWMs from the TRANSFAC
library, JASPAR library, phylophacts or PBM
– Select a cutoff (to minimize false positive predictions or to minimize false negative predictions)
– Run ConTra ...
Who should use it and where to find it?
● You!● To get an indication how your gene is regulated● To create publication ready graphics● To get a quick and easy visualization of some
transcription factor binding sites● http://bioit.dmbr.ugent.be/contrav2/index.php
Questions & Examples● Analyse gene of interest● Explore gene of interest● Download and upload own alignment● Make your own PWM● Make beautiful publication graphics using
ConTra and Jalview