MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a...

MEME homework:

probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2

(0.2)(0.3)(0.2)(0.3)(0.2)(0.3) = 2.16 x 10-4

This is the probability that ANY 6 mer will be this sequence by chance

How many instances within 1,000 bp upstream of 6,000 genes?Number of 6mers per 1,000 bp: 1000 bp – 5 bp (account for 6mer start position)

6000 genes * 995 = 5.97 x 106 possible 6mers totalP that any one is your sequence: 2.16 x 10-4 x 5.97 x 106 = 1290 sites

= 995 6mers per gene upstream region

BUT … can also have the reverse complement (i.e. site on other strand)= 2X possible sites (because of our bg model) = 2580 possible matches1

An alternative approach: Phylogenetic footprinting

Rather than look at multiple, different regulatory regions from one species, look at one region but across multiple, orthologous regions from many species.

Hypothesis: functional regions of the genome will be conserved more than ‘nonfunctional’ regions, due to selection.

Therefore, simply look for regions of sequence that are conserved above background.

Simplest case: stretches of very highly conserved sequence

Kellis et al. 2003 “Sequencing and comparison of yeast species to identify genes and regulatory elements”Sequenced 4 closely related Saccharomyces genomes & identified conserved sequences in multiplealignments of orthologous sequences from the four species.

Position

Information Profile:

Incorporating evolutionary models can improve motif finding

Remember that evolution acts on functionally important base pairs …

Also remember from our motif finding exercise that not all contiguous base pairsare equally important (information content).

Incorporating evolutionary models can improve motif finding

Remember that evolution acts on functionally important base pairs …

Also remember from our motif finding exercise that not all contiguous base pairsare equally important (information content).

Moses et al. 2003 “Position specific variation in the rate of evolution in transcription factor binding sites” Rate of evolution (ie. degree of conservation) within a motif is inversely proportional to the information content … important base pairs evolve slower 5

Sinha et al. 2004 “PhyME: A probabalistic algorithm for finding motifs in sets of orthologous sequences”

Moses et al. 2004 “Monkey: identification of transcription factor binding sitesin multiple alignments using a binding site-specific evolutionary model

Siddharthan et al. 2005 “PhyloGibbs: A Gibbs sampling motif finder that incorporates phylogeny.”

Wang & Stormo. 2003 (PhyloCon) “Combining phylogenetic data with co-regulated genes to identify regulatory motifs”

Prakash et al. 2004. (OrthoMEME) “Motif discovery in heterogeneous sequence data

Multiple motif finding methods now work on multiple alignments of regulatory regions of coregulated genes.

Given: 1) group of regulatory regions of coregulated genes2) orthologs of each region, in the form of multiple alignments

Keep in mind that the relevant evolutionary models are specific for what one is looking for (TF binding sites, ncRNA, etc)

Moses et al. 2003 “Position specific variation in the rate of evolution in transcription factor binding sites” Rate of evolution (ie. degree of conservation) within a motif is inversely proportional to the information content … important base pairs evolve slower 7

VISTA suite for visualizing conservation in global alignments

Pre-computed multiple global alignments of mammalian genomes, visualizedby conservation level.

-- Uses BLAT local alignment tool to find seeds of high sequence similarity,then these seeds are used for global single- or multiple-genome alignment

Frazer et al. 2004 “VISTA: computational tools for comparative genomics”8

Which species to compare?

Balance between:-- species closely related enough that:

1) There’s enough similar sequence to get confident pairwise alignments

2) The sequences of interest and their corresponding functions have been conserved

-- species distantly enough related that:1) nonfunctional sequence has had time to diverge

The above approaches have focused on using similarity/conservationto identify important regions of the genome …

A large focus in genomics is understanding the differences in genome sequences and what accounts for the vast diversity in phenotypes

within a population.

Analysis of single nucleotide polymorphisms (SNP) within populations,Analysis of variations in gene expression within and between populations,Analysis of quantitative trait loci (QTLs) accounting for differences in gene expression.

Connecting phenotype to genotype

-- Large variations in size, shape, health, etc in human populations

-- Much of that variation has to do with disease susceptibility

-- A major goal of genetics (and now genomics) is understanding the consequencesof genetic variation.

A major force in genomics is to identify and annotate SNPs in human populations, and identify those related to disease

~2800 disease-associated genes known, mostly from positional cloning & mapping studies

Done by linkage analysis: pattern of marker inheritance in families with heritable diseases

Each base-pair position on human chromosome 21 is interrogated 8 times(4 in forward & 4 in reverse orientations)

GGAGATGAGTTCGATTACTCTTAGG

GGAGATGAGTTCAATTACTCTTAGG

GGAGATGAGTTCTATTACTCTTAGG

GGAGATGAGTTCCATTACTCTTAGG

1.7 x 108 oligos total on eight Affy wafers were used to identify SNPs onhuman Chromosome 21 from 21 different individuals.

Array-based methods of SNP detection & Haplotype mapping

Each row = single SNPEach column = Ch 21

Blue = major alleleYellow = minor allele

Much of the chromosomalvariation is explained with

relatively limited haplotype diversity.

80% of haplotype structurecan be captured with only

10% of the SNPs in that block(need only 2SNPs to type)

Haplotype length can varyfrom a few kb to mega bases.

Phenotypic variation (including disease susceptibility) are often linked to copy # changes

This is especially true of numerous types of cancers, where local amplificationsand translocations increase the copy number of cell proliferation regulators, etc.

Amplifications in breastcancer lines increase the copy # of specific regulators ..

MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a...

Documents

Transcript of MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a...

SINGLE FAMILY RESIDENTIAL BREAKDOWN · 2 District Map TORONTO REAL ESTATE BOARD - - $90,000 25 0.2 12 0.5 6 0.8 $90,001- $100,000 11 0.1 7 0.3 2 0.3 $100,001- $110,000 17 0.2 11 0.4

Machine Learning Branch of Machine Learning …...0.4 0.1 0.7 0.3 0.1 2 ˆ ˙ ˝ 0.5 0.9 0.3 0.5 0.5 0.3 0.4 0.2 0.8 0.1 0.3 1.7 (Gaussian Noise) (Blinding) Dr. Patrick Chan @ SCUT

Municipal Public Health & Wellbeing Plan 2013-2017 · Malayan 0.3%, German 0.2%, Filipino 0.2%, Mandarin 0.2% and Arabic 0.2%. 3.8 Economic Development The Shire is well situated

TOP | 株式会社 オータケ - WJ1 WJ2 WJ18...±0.2 ±0.3 ±0.3 2.5 ±0.2 ±0.2 3 3 φ18 φ22 φ26 JIS K 6742-2007 HIVP（耐衝撃性硬質ポリ塩化ビニル管）サイズ表

CalPERS Trust Level Review - Risk Management …...Source: BarraOne I Ca/PERS Tracking Error(%) 0.2% 0.3% 0.0% 4.3% 0.2% 0.3% 0.3% 0.1% 2.1% 1.8% 0.5% CalPERS Trust Level Appendix

RSEN SERIES - TDK...0.1～2 0.1～2-----0.1～10 0.1～10 0.2～10 0.3～5 減衰量保証帯域(MHz) ディファレンシャルモード 25dB保証 0.2～30 0.2～30 0.2～30 0.2～30

2017 - 2019 HUMANITARIAN RESPONSE PLAN€¦ · 06/04/2018 · UGANDA RWANDA BURUNDI 0.4 0.1 0.1 0.3 0.6 0.8 1.4 0.7 0.9 0.5 0.2 0.4 0.5 0.2 0.1 0.1 0.2 0.1 0.3 1.8 1.1 0.9 0.5 0.2

(Geometer's Sketchpad Fractals) FRACTAL DESIGN 0.2 0.3

Lip-SealLip-Seal 20.0 19.0 10.0 21.0 22.0 SH1867 3.0 12.0 6.7 21.0 16.3 3.8 2.7 23.2 SH1857 13.0+0.3-0.3 19.0+0.3-0.3 15.0+0.2-0.2 R 0.5 6.5 5.0 1.0 1.5 SH1849 14.0 4.0 5.9 1 ...

Economic Bulletin - March 2020 - Banco de Portugal€¦ · Trade balance (% GDP) 0.4 1.0 0.6 0.2 1.0 1.0 0.3 Harmonized index of consumer prices (HICP) 0.3 0.2 0.7 1.1 -0.1 0.5 0.7

Face Sta in and Finish Line Summit 53 ' 36 0.2 0.3 Miles ...

MEME MEME MEME!

Rethinking the Essence, Flexibility and Reusability of ...dmip.webs.upv.es/LMCE2014/Slides/paper11.pdf · 1 0.3 0.2 0.1 0.0 2 0.4 0.3 0.2 0.0 3 0.5 0.5 0.4 0.1 . N 0.9 0.7 0.6 . 0.2

ERL series Small & High Power Current Sensing Low …19 型番 ERL0510 1.0±0.2 0.5±0.2 0.5±0.2 0.3±0.2 ERL0816 1.6±0.2 0.8±0.2 0.5±0.2 0.3±0.2 ERL1220 2.0±0.2 1.25±0.2 0.5±0.2

Bayes’ Netscs188/fa20/assets/slides/...cold sun 0.2 cold rain 0.3 T W P hot sun 0.3 hot rain 0.2 cold sun 0.3 cold rain 0.2 T P hot 0.5 cold 0.5 W P sun 0.6 rain 0.4 Example: Independence

Decimals – Subtraction 2 tenths - 1 tenth =1 tenth 0.2 - 0.1 = 0.1 6 tenths - 3 tenths =3 tenths 0.6 - 0.3 = 0.3.

CHAPTER B Forecasts of Aviation Activity - tulsaairports.com · Pawnee 15,403 15,552 16,684 16,612 16,401 1.5% 0.2% 0.2% ‐0.1% ‐0.3% Rogers 46,828 ...

how to live a · 2/8/2007 · 42 Some 10:10 Ideas 0.3 Better food buying and no waste 0.2 Cavity wall insulation 0.3 No processed foods or ready meals 0.2 0.2 Vegan three days per

REUNION ACTIONNAIRES INDIVIDUELS · (11.1) + 0.3 - 0.2 + 0.3 1.6 Opérations en cours de cession nette Dette nette 31 mars 2014 +13.5 - 0.3 - 0.0 - 0.2 Incluant Mediaserv: €(111)m

Principle Components Analysis (PCA)rcs46/lectures_2017/10-unsupervise/... · 2017-10-19 · Bi-plot-0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2-0.4-0.3-0.2-0.1 0.0 0.1 0.2 PC1 PC2 1 2 3 4 5 6

TOP | 株式会社オータケ - WJ1 WJ2 WJ18...±0.2 ±0.3 ±0.3 2.5 ±0.2 ±0.2 3 3 φ18 φ22 φ26 JIS K 6742-2007 HIVP（耐衝撃性硬質ポリ塩化ビニル管）サイズ表