Bayesian detection of non-sinusoidal periodic patterns in circadian expression data

27
Bayesian detection of non-sinusoidal periodic patterns in circadian expression data Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no. 23 2009, pages 3114-3120

description

Bayesian detection of non-sinusoidal periodic patterns in circadian expression data. Darya Chudova , Alexander Ihler , Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no. 23 2009, pages 3114-3120. Outline. Introduction Methodology - PowerPoint PPT Presentation

Transcript of Bayesian detection of non-sinusoidal periodic patterns in circadian expression data

Bayesian detection of non-sinusoidal periodic patterns in circadian expression data

Bayesian detection of non-sinusoidal periodic patterns in circadian expression dataDarya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic SmythBIOINFORMATICS Gene expressionVol. 25 no. 23 2009, pages 3114-3120OutlineIntroductionMethodologyExperimental ResultsConclusionIntroductionCyclical biological processes :Cell cycle, hair growth cycle, mammary cycle and circadian rhythmsProduce coordinated periodic expression of thousands of genes.Existing computational methods are biased toward discovering genes that follow sine-wave patterns.The objective is to identify or rank which of these genes are most likely to be periodically regulated.

IntroductionTwo major categories :Frequency domainCompute the spectrum of the average expression profile for each probe.Test the significance of the dominant frequency against a suitable null hypothesis such as uncorrelated noise.Not well suited for short time courses.Time domainIdentification of sinusoidal expression patternsSimple and computational efficiencyNot effective at finding periodic signals which violate the sinusoidal assumption.IntroductionIn this article, a general statistical framework for detecting periodic profiles from time courseAnalyzing the similarity of observed profiles across the cycles.discover periodic transcripts of arbitrary shapes from replicated gen expression profiles.Provide an empirical Bayes procedure for estimating parameters of the prior distribution.Derive closed-formed expressions for the posterior probability of periodicity.IntroductionExpression profiles from the murine liver time course data set.Two of these probe sets (NrIdI and Arntl) correspond to well-established clock-control genes.

MethodologyProbabilistic mixture model:Differentially expressed geneschange their expression level in response to changes in experimental conditionsBackground genesremains constant throughout the experimentCoordinated expression across multiple cyclesModel periodic phenomenaMethodologyMode the data using a mixture of three components for background, differentially and periodically expressed profiles.

Compute the posterior probability that a given probe set was generated by the periodic component.

MethodologyA probabilistic model for periodicityN probe sets over C cycles of known length.Each cycle is represented by the same grid of T time points, indexed from 1 to T.Denote the number of replicate observations for probe set at time point of cycle by . : the expression intensity value for a particular probe set i , time point j and replicate k for cycle c. : the entire set of observations for probe set i.

MethodologyOur probabilistic model for expression , then consists of three components : background(b), differentailly expressed but aperiodic (d) and periodically expressed profiles (p).Let denote the component associated with probe set i.Each of the three component models consists of Normal/Inverse Gamma (NIG) prior distribution on the latent profile and additional Normal noise on the observations.

MethodologyNormal/Inverse Gamma (NIG) prior is a flexible and computationally convenient distribution commonly used as a prior model for latent expression levels and replicate variability.Scalar variables are distributed as NIG with parameters .

: inverse Gamma distribution with a degrees of freedom and scale parameters b, evaluated at x.

MethodologyThree type of unknown quantities:The prior parameters, denoted Determine via an empirical Bayesian procedureSubsequently treated as known and fixedProbe set-specific hidden variables: the latent profiles (consisting of a mean and variance) for each component. The component identify , indicating from which component the data ware generated.

Methodology

The observed profiles Y and latent variables Z (component identity) and {, }N probes sets, repeat N timesMethodologyThe background component model:NIG prior shared by all background probe sets and parameterized by four scalarsYi are modeled as independent samples from a Gaussian distribution with mean and variance

MethodologyThe differentially expressed component model: and be (C x T)-dimensional vectorThe prior distribution for this component is defined by four (C x T) dimensional parameters,

Mode observations as being independent given :

MethodologyThe periodic component model:Assume repeated expression of the same pattern across multiple cycles and are T-dimensional variables encoding expression levels and replicate variability in the ideal cycle.

MethodologyThe complete set of prior parameters includes the prior component probabilities z (corresponding to the relative frequencies of background, differentially expressed, and periodic probe sets)

MethodologyInferenceDetect periodic expression by computing the posterior probability of the periodic component

MethodologyAn analysis of variance periodicity detectorThe resulting inferential test for periodicity is quite close to a simplified, non-Bayesian test based on analysis of variance (ANOVA).Construct ANOVA testDividing the data into groups by their associated time points regardless of cycle numberAll replicates for c=1,..,C and k=1,, fall into the same group

Methodologytest whether the data support separation into these groupswhether the amount of variation between groups is significantly larger than the variation found within the groups.High values of the ratio of these quantities indicated that most of the variability in observations can be explained using a time-dependent, cycle-independent profile,

MethodologyEstimating parameters of the prior distribution:Develop an empirical Bayes procedure to determine the prior parameters Determine a tentative assignment of probe set to each componentUse this assignment to find approximate maximum likelihood estimates of the location scale and parameter of the inverse Gamma distribution (a,b); we set the location mean to o in all three components.MethodologyTo find a tentative initial assignment of probe sets for estimating prior parameters:Run ANOVA detector of differential expression and periodicity.To define parameters of the component for differential expressionProbe sets that vary significantly over time (P0.1)probe sets for estimating the prior parameters of the periodic componentchoosing those probe sets with P0.9 from our model.ConclusionWe argue that in typical experiments with only a small number of samples per cycle, we should test for arbitrary patterns which are repeated between cycles, rather than parametric shapes.To this end, we propose a Bayesian mixture model for identifying patterns of unconstrained shape, which stand out as both differentially and periodically expressed.