Chaos Game Representationppt

download Chaos Game Representationppt

of 35

Transcript of Chaos Game Representationppt

  • 7/25/2019 Chaos Game Representationppt

    1/35

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    Digital Search Trees andChaos Game Representation

    Peggy Cenac1

    in collaboration with Brigitte Chauvin2, Nicolas Pouyanne2 and

    Stephane Ginouillac2

    Journees ALEA 2006 - CIRM Luminy

    1INRIA Rocquencourt

    2University of Versailles Saint Quentin

    Peggy Cenac Digital Search Trees and Chaos Game Representation

  • 7/25/2019 Chaos Game Representationppt

    2/35

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    Plan

    1 Chaos Game Representation (CGR)DefinitionStochastic properties of the CGR

    2 Digital Search Tree (DST) and CGRThe CGR-treeConstruction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art

    3 Main ResultsAssumptions and notationsAsymptotic resultsNumerical experimentsGuidelines for the proofs

    4 PerspectivesPeggy Cenac Digital Search Trees and Chaos Game Representation

  • 7/25/2019 Chaos Game Representationppt

    3/35

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    DefinitionStochastic properties of the CGR

    Chaos Game Representation (CGR)

    Peggy Cenac Digital Search Trees and Chaos Game Representation

  • 7/25/2019 Chaos Game Representationppt

    4/35

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    DefinitionStochastic properties of the CGR

    Definition

    Graphical representation of DNA in a bounded set.

    Storage toolPattern visualization

    Sequences comparison (local/global)Iterative mapping technique

    DNA sequence U= (ui)i=1,...,n, where ui {A, C, G, T}.The Chaos Game Representation ofU, on the unit square S isa sequence {X0, . . . , Xn} defined by

    X0 = (12 ,

    12 )

    Xi+1 = 12

    Xi+ui+1

    ,

    A= (0, 0), C = (0, 1), G = (1, 1), T = (1, 0).

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    Pl

  • 7/25/2019 Chaos Game Representationppt

    5/35

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    DefinitionStochastic properties of the CGR

    Examples (1)

    CGR of the word ATGCGAGTGT.Peggy Cenac Digital Search Trees and Chaos Game Representation

    Pl

  • 7/25/2019 Chaos Game Representationppt

    6/35

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    DefinitionStochastic properties of the CGR

    Examples (2)

    CGR of 200000 nucleotides of Chromosome 2 of Homo Sapiens (on the

    left) and of Bacteroides Thetaiotaomicron (on the right).

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    Plan

  • 7/25/2019 Chaos Game Representationppt

    7/35

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    DefinitionStochastic properties of the CGR

    A(0,0)

    C(0,1)

    T(1,0)

    G(1,1)

    Sa

    Sc

    St

    Sg

    A(0,0)

    C(0,1)

    T(1,0)

    G(1,1)

    Saa

    Sca

    Sac

    Scc

    Sta

    Sga

    Stc

    Sgc

    Sat

    Sct

    Sag

    Scg

    Stt

    Sgt

    Stg

    Sgg

    Sw def=i

    k=11

    2ik+1vk+

    12i

    S, where w is the word v1. . . vi.

    Counting pointsin Sw counting occurrencesofw.

    Each point contains thewholesequencehistory.Peggy Cenac Digital Search Trees and Chaos Game Representation

    Plan

  • 7/25/2019 Chaos Game Representationppt

    8/35

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    DefinitionStochastic properties of the CGR

    Stochastic properties of the CGR

    U is supposed to be a stationary ergodic sequence.

    (Xn)n0 is a Markov chain of order 1, and converges almostsurely to a random vector Xwith distribution .

    When U is i.i.d. and uniformly distributed, is the Lebesguemeasure on S. Whenever U is not uniformly distributed, iscontinuous, singular with respect to the Lebesgue measure.

    The law of large number holds, and the empirical measuresconverge.

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    Plan The CGR-tree

  • 7/25/2019 Chaos Game Representationppt

    9/35

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    The CGR-treeConstruction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art

    Digital Search Tree (DST) and CGR

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    Plan The CGR-tree

  • 7/25/2019 Chaos Game Representationppt

    10/35

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    The CGR treeConstruction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art

    The CGR-tree

    In the CGR of a sequence U=U1. . . Ui. . . , one successivelyrepresents

    U1

    U1U2...

    U1U2. . . Undef=U(n)

    U(i) Swis equivalent to Ui|w|+1. . . Ui1Ui =w.

    We define a representation of a DNA sequence U as aquaternary tree, the CGR-tree, in which one can visualizerepetitions of subwords.

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    Plan The CGR-tree

  • 7/25/2019 Chaos Game Representationppt

    11/35

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    The CG t eeConstruction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art

    Construction

    We adopt the classical order (A, C, G, T) on letters.

    Let Tbe the complete infinite 4-ary tree. Each node ofT has4 branches corresponding to the letters (A, C, G, T) orderedin the same way.

    The CGR-tree ofU is anincreasing sequenceT1 T2. . . Tn . . .of finite subtrees ofT, each Tn

    having n nodes.Successively insertthe reverted words Ui. . . U1.

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    Plan The CGR-tree

  • 7/25/2019 Chaos Game Representationppt

    12/35

    Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR

    Main ResultsPerspectives

    Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art

    Example

    Construction of the tree for U=GAGCACAGTGGAAGGG :GAGCACAGTGGAAGGG

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    Plan The CGR-tree

  • 7/25/2019 Chaos Game Representationppt

    13/35

    Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR

    Main ResultsPerspectives

    Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art

    GAGCACAGTGGAAGGG

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    Plan The CGR-tree

  • 7/25/2019 Chaos Game Representationppt

    14/35

    Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR

    Main ResultsPerspectives

    Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art

    GAGCACAGTGGAAGGG

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanCh G R i (CGR)

    The CGR-treeC i f h CGR

  • 7/25/2019 Chaos Game Representationppt

    15/35

    Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR

    Main ResultsPerspectives

    Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art

    GAGCACAGTGGAAGGG

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanCh G R t ti (CGR)

    The CGR-treeC t ti f th CGR t

  • 7/25/2019 Chaos Game Representationppt

    16/35

    Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR

    Main ResultsPerspectives

    Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art

    GAGCACAGTGGAAGGG

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    The CGR-treeConstruction of the CGR tree

  • 7/25/2019 Chaos Game Representationppt

    17/35

    Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR

    Main ResultsPerspectives

    Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art

    Representation of 16 nucleotides ofMus Musculus

    GAGCACAGTGGAAGGG in the CGR-tree (on the left) and in the

    normalized CGR (on the right).

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    The CGR-treeConstruction of the CGR tree

  • 7/25/2019 Chaos Game Representationppt

    18/35

    Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR

    Main ResultsPerspectives

    Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art

    Remarks

    A CGR-tree without its labels is equivalent toa list of wordsin the sequence without their order.

    Shapeof CGR-tree Representation in the unit square.Each nodeof the tree w=w1. . . wd is associated with thepoint

    Xwdef=

    d

    k=1

    wk

    2dk+1

    +X0

    2d

    ,

    thecenter of the corresponding square Sw.

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    The CGR-treeConstruction of the CGR-tree

  • 7/25/2019 Chaos Game Representationppt

    19/35

    Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR

    Main ResultsPerspectives

    Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art

    Chaos Game Representation (on the left) and normalized CGR (on the

    right) of the first 400000 nucleotides of Chromosome 2 ofHomo Sapiens.

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    The CGR-treeConstruction of the CGR-tree

  • 7/25/2019 Chaos Game Representationppt

    20/35

    Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR

    Main ResultsPerspectives

    Construction of the CGR treeExampleRelation between the CGR-treeand DSTState of the art

    Relation between the CGR-treeand DST

    Proposition

    The CGR-tree of a random sequence U=U1U2. . .is a DigitalSearch Tree (DST), obtained by inserting in a quartenary tree thesuccessive reverted prefixes.

    W(1) = U1,

    W(2) = U2U1,...

    W(n) = UnUn1. . . U1,...

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    The CGR-treeConstruction of the CGR-tree

  • 7/25/2019 Chaos Game Representationppt

    21/35

    p ( )Digital Search Tree (DST) and CGR

    Main ResultsPerspectives

    ExampleRelation between the CGR-treeand DSTState of the art

    State of the art

    In the Bernoulli model : the trees are binary, built withindependent successive sequences having the samedistribution ; the two letters have the same probability 12 . Several results are known (see chap. 6 in Mahmoud

    (1992)), concerning the height, the insertion depth andthe profile.

    Aldous and Shields (1998) prove by embedding incontinuous time that the height satisfies

    Hn log2n Pn

    0.

    The height is concentrated (Drmota (2002)).For DSTs built from independent sequences on an alphabetwith m letters, withnonsymmetric i.i.d or Markovian sources,Pittel (1985) gets asymptotic results on the insertion depth

    and on the height. Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    The CGR-treeConstruction of the CGR-tree

  • 7/25/2019 Chaos Game Representationppt

    22/35

    p ( )Digital Search Tree (DST) and CGR

    Main ResultsPerspectives

    ExampleRelation between the CGR-treeand DSTState of the art

    Theorem (Pittel, 1985)

    Let us denoten (resp. Ln) the length of the shortest (resp.longest) branches, then we have :

    n

    ln n

    a.s.n

    1

    h+

    , and Ln

    ln n

    a.s.n

    1

    h

    .

    Moreover, in probability :

    Dnln n

    Pn

    1

    h,

    h+, h and h are some constants depending on the distribution ofthe source.

    In the CGR-tree, the successive inserted words are stronglydependentfrom each other.

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    The CGR-treeConstruction of the CGR-tree

  • 7/25/2019 Chaos Game Representationppt

    23/35

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    ExampleRelation between the CGR-treeand DSTState of the art

    The overlapping structure

    The main difficulty is thestrong dependencybetween thewords inserted in the CGR-tree, due to theiroverlapping

    structure.

    We need classical results on the distribution of wordoccurences in a random sequences. Generating functions

    Markov chains embedding methods Martingale approach (Penney game)

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    ( )

    Assumptions and notationsAsymptotic results

  • 7/25/2019 Chaos Game Representationppt

    24/35

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    Asymptotic resultsNumerical experimentsGuidelines for the proofs

    Main Results

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    Di i l S h T (DST) d CGR

    Assumptions and notationsAsymptotic results

  • 7/25/2019 Chaos Game Representationppt

    25/35

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    Asymptotic resultsNumerical experimentsGuidelines for the proofs

    Assumptions and notations (1)

    U=U1. . . Un is supposed to be aMarkov chainof order 1,with transition matrix Qt and invariant measure as initial

    distribution.

    Let us denote s(j) def=s1. . . sj, where sidenotes the i

    th letter ofthe infinite sequence s.

    p(s(j)) can be defined as p(s(j)) def= P(U1 =sj, . . . , Uj=s1).

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    Di it l S h T (DST) d CGR

    Assumptions and notationsAsymptotic results

  • 7/25/2019 Chaos Game Representationppt

    26/35

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    y pNumerical experimentsGuidelines for the proofs

    Assumptions and notations (2)

    We define the constants

    h+def= lim

    n+

    1

    nmax

    ln

    1

    p

    s(n)

    : p

    s(n)

    >0

    ,

    hdef

    = limn+

    1

    nmin

    ln 1

    p

    s(n)

    :p

    s(n)

    >0

    ,

    h def

    = limn+

    1

    nE

    ln 1

    p

    s(n).

    Due to an argument ofsub-additivity, these limits are well

    defined. Moreover, Pittel shows that there exists two infinitesequences denoted here by s+ and s such that

    h+= limn

    1

    nln

    1

    ps(n)+

    , and h= lim

    n

    1

    nln

    1

    ps(n)

    .

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGR

    Assumptions and notationsAsymptotic results

  • 7/25/2019 Chaos Game Representationppt

    27/35

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    y pNumerical experimentsGuidelines for the proofs

    Assumptions and notations (3)

    Tjdef=Tj(w) : the finite tree with jnodes (without counting

    the root), built from the jfirst sequences W(1), . . . , W(j),which are thesuccessive suffixes of the reversed sequence U

    n.

    n (resp. Ln) denotes thelengthof theshortest path(resp.thelongest path) from the root to a feasible external node ofthe tree Tn1(w).

    Dn denotes theinsertion depthofW(n) in Tn1 to build Tn.

    Mn is the length of a path ofTn, randomly and uniformlychosen in the n possible paths.

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGR

    Assumptions and notationsAsymptotic results

  • 7/25/2019 Chaos Game Representationppt

    28/35

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    Numerical experimentsGuidelines for the proofs

    Asymptotic results

    Theorem

    For a CGR-tree built on a markovian sequence U of order1,

    n

    ln n

    a.s.n

    1

    h+

    and Ln

    ln n

    a.s.n

    1

    hDnln n

    Pn

    1

    h and lim

    n

    Mnln n

    Pn

    1

    h.

    Remark

    For an i.i.d. sequence U, in the case when the random variables Ui are

    not equiprobable, Dnln n does not converge a.s. since

    lim supn

    Dnln n

    1

    h >

    1

    h+= lim inf

    n

    Dnln n

    .

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGR

    Assumptions and notationsAsymptotic resultsN

  • 7/25/2019 Chaos Game Representationppt

    29/35

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    Numerical experimentsGuidelines for the proofs

    Numerical experiments

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGR

    Assumptions and notationsAsymptotic resultsN i l i

  • 7/25/2019 Chaos Game Representationppt

    30/35

    Digital Search Tree (DST) and CGRMain ResultsPerspectives

    Numerical experimentsGuidelines for the proofs

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGR

    Assumptions and notationsAsymptotic resultsN i l i t

  • 7/25/2019 Chaos Game Representationppt

    31/35

    g ( )Main ResultsPerspectives

    Numerical experimentsGuidelines for the proofs

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGR

    Assumptions and notationsAsymptotic resultsNumerical experiments

  • 7/25/2019 Chaos Game Representationppt

    32/35

    g ( )Main ResultsPerspectives

    Numerical experimentsGuidelines for the proofs

    Guidelines for the proofs

    We define

    for adeterministicinfinite sequence s, the random variable

    Xj(s) def=

    0 ifs1 is not in Tjmax{k: the word s(k) is already inserted in Tj}

    Tk(s) def= min{j :Xj(s) =k}.

    Xj(s) and Tk(s) are induality: {Xj(s)k}={Tk(s) j}

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGR

    Assumptions and notationsAsymptotic resultsNumerical experiments

  • 7/25/2019 Chaos Game Representationppt

    33/35

    Main ResultsPerspectives

    Numerical experimentsGuidelines for the proofs

    Lemma

    Let s be such that

    limn+

    1

    nln

    1

    p

    s(n)

    = h(s)>0.

    Then we have Xn(s)

    ln na.s.

    n

    1

    h(s).

    Corollary

    Xn(v)

    ln na.s.

    n

    1

    ln 1p

    ,

    where p def= P(Ui=v).

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)

    Digital Search Tree (DST) and CGR

    Assumptions and notationsAsymptotic resultsNumerical experiments

  • 7/25/2019 Chaos Game Representationppt

    34/35

    Main ResultsPerspectives

    Numerical experimentsGuidelines for the proofs

    We decompose

    Tk(s) =

    kr=1

    (Tr(s) Tr1(s)) def

    =

    kr=1

    Zr(s)

    The random variables (Zr(s))r areindependent.The proofs are based on the generating functions ofZr(s).

    Peggy Cenac Digital Search Trees and Chaos Game Representation

    PlanChaos Game Representation (CGR)Digital Search Tree (DST) and CGR

    M i R lt

  • 7/25/2019 Chaos Game Representationppt

    35/35

    Main ResultsPerspectives

    Perspectives

    Second order in the asymptotic behaviour

    Convergence in L1

    Central Limit Theorem

    Peggy Cenac Digital Search Trees and Chaos Game Representation