Chaos Game Exploration of Triple Vertex Polygons John Paul, Thomas, Bjorn GUTS/Challenge STI 2009.
Chaos Game Representationppt
-
Upload
dorys-morgado -
Category
Documents
-
view
221 -
download
0
Transcript of Chaos Game Representationppt
-
7/25/2019 Chaos Game Representationppt
1/35
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGRMain ResultsPerspectives
Digital Search Trees andChaos Game Representation
Peggy Cenac1
in collaboration with Brigitte Chauvin2, Nicolas Pouyanne2 and
Stephane Ginouillac2
Journees ALEA 2006 - CIRM Luminy
1INRIA Rocquencourt
2University of Versailles Saint Quentin
Peggy Cenac Digital Search Trees and Chaos Game Representation
-
7/25/2019 Chaos Game Representationppt
2/35
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGRMain ResultsPerspectives
Plan
1 Chaos Game Representation (CGR)DefinitionStochastic properties of the CGR
2 Digital Search Tree (DST) and CGRThe CGR-treeConstruction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art
3 Main ResultsAssumptions and notationsAsymptotic resultsNumerical experimentsGuidelines for the proofs
4 PerspectivesPeggy Cenac Digital Search Trees and Chaos Game Representation
-
7/25/2019 Chaos Game Representationppt
3/35
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGRMain ResultsPerspectives
DefinitionStochastic properties of the CGR
Chaos Game Representation (CGR)
Peggy Cenac Digital Search Trees and Chaos Game Representation
-
7/25/2019 Chaos Game Representationppt
4/35
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGRMain ResultsPerspectives
DefinitionStochastic properties of the CGR
Definition
Graphical representation of DNA in a bounded set.
Storage toolPattern visualization
Sequences comparison (local/global)Iterative mapping technique
DNA sequence U= (ui)i=1,...,n, where ui {A, C, G, T}.The Chaos Game Representation ofU, on the unit square S isa sequence {X0, . . . , Xn} defined by
X0 = (12 ,
12 )
Xi+1 = 12
Xi+ui+1
,
A= (0, 0), C = (0, 1), G = (1, 1), T = (1, 0).
Peggy Cenac Digital Search Trees and Chaos Game Representation
Pl
-
7/25/2019 Chaos Game Representationppt
5/35
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGRMain ResultsPerspectives
DefinitionStochastic properties of the CGR
Examples (1)
CGR of the word ATGCGAGTGT.Peggy Cenac Digital Search Trees and Chaos Game Representation
Pl
-
7/25/2019 Chaos Game Representationppt
6/35
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGRMain ResultsPerspectives
DefinitionStochastic properties of the CGR
Examples (2)
CGR of 200000 nucleotides of Chromosome 2 of Homo Sapiens (on the
left) and of Bacteroides Thetaiotaomicron (on the right).
Peggy Cenac Digital Search Trees and Chaos Game Representation
Plan
-
7/25/2019 Chaos Game Representationppt
7/35
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGRMain ResultsPerspectives
DefinitionStochastic properties of the CGR
A(0,0)
C(0,1)
T(1,0)
G(1,1)
Sa
Sc
St
Sg
A(0,0)
C(0,1)
T(1,0)
G(1,1)
Saa
Sca
Sac
Scc
Sta
Sga
Stc
Sgc
Sat
Sct
Sag
Scg
Stt
Sgt
Stg
Sgg
Sw def=i
k=11
2ik+1vk+
12i
S, where w is the word v1. . . vi.
Counting pointsin Sw counting occurrencesofw.
Each point contains thewholesequencehistory.Peggy Cenac Digital Search Trees and Chaos Game Representation
Plan
-
7/25/2019 Chaos Game Representationppt
8/35
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGRMain ResultsPerspectives
DefinitionStochastic properties of the CGR
Stochastic properties of the CGR
U is supposed to be a stationary ergodic sequence.
(Xn)n0 is a Markov chain of order 1, and converges almostsurely to a random vector Xwith distribution .
When U is i.i.d. and uniformly distributed, is the Lebesguemeasure on S. Whenever U is not uniformly distributed, iscontinuous, singular with respect to the Lebesgue measure.
The law of large number holds, and the empirical measuresconverge.
Peggy Cenac Digital Search Trees and Chaos Game Representation
Plan The CGR-tree
-
7/25/2019 Chaos Game Representationppt
9/35
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGRMain ResultsPerspectives
The CGR-treeConstruction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art
Digital Search Tree (DST) and CGR
Peggy Cenac Digital Search Trees and Chaos Game Representation
Plan The CGR-tree
-
7/25/2019 Chaos Game Representationppt
10/35
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGRMain ResultsPerspectives
The CGR treeConstruction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art
The CGR-tree
In the CGR of a sequence U=U1. . . Ui. . . , one successivelyrepresents
U1
U1U2...
U1U2. . . Undef=U(n)
U(i) Swis equivalent to Ui|w|+1. . . Ui1Ui =w.
We define a representation of a DNA sequence U as aquaternary tree, the CGR-tree, in which one can visualizerepetitions of subwords.
Peggy Cenac Digital Search Trees and Chaos Game Representation
Plan The CGR-tree
-
7/25/2019 Chaos Game Representationppt
11/35
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGRMain ResultsPerspectives
The CG t eeConstruction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art
Construction
We adopt the classical order (A, C, G, T) on letters.
Let Tbe the complete infinite 4-ary tree. Each node ofT has4 branches corresponding to the letters (A, C, G, T) orderedin the same way.
The CGR-tree ofU is anincreasing sequenceT1 T2. . . Tn . . .of finite subtrees ofT, each Tn
having n nodes.Successively insertthe reverted words Ui. . . U1.
Peggy Cenac Digital Search Trees and Chaos Game Representation
Plan The CGR-tree
-
7/25/2019 Chaos Game Representationppt
12/35
Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR
Main ResultsPerspectives
Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art
Example
Construction of the tree for U=GAGCACAGTGGAAGGG :GAGCACAGTGGAAGGG
Peggy Cenac Digital Search Trees and Chaos Game Representation
Plan The CGR-tree
-
7/25/2019 Chaos Game Representationppt
13/35
Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR
Main ResultsPerspectives
Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art
GAGCACAGTGGAAGGG
Peggy Cenac Digital Search Trees and Chaos Game Representation
Plan The CGR-tree
-
7/25/2019 Chaos Game Representationppt
14/35
Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR
Main ResultsPerspectives
Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art
GAGCACAGTGGAAGGG
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanCh G R i (CGR)
The CGR-treeC i f h CGR
-
7/25/2019 Chaos Game Representationppt
15/35
Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR
Main ResultsPerspectives
Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art
GAGCACAGTGGAAGGG
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanCh G R t ti (CGR)
The CGR-treeC t ti f th CGR t
-
7/25/2019 Chaos Game Representationppt
16/35
Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR
Main ResultsPerspectives
Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art
GAGCACAGTGGAAGGG
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
The CGR-treeConstruction of the CGR tree
-
7/25/2019 Chaos Game Representationppt
17/35
Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR
Main ResultsPerspectives
Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art
Representation of 16 nucleotides ofMus Musculus
GAGCACAGTGGAAGGG in the CGR-tree (on the left) and in the
normalized CGR (on the right).
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
The CGR-treeConstruction of the CGR tree
-
7/25/2019 Chaos Game Representationppt
18/35
Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR
Main ResultsPerspectives
Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art
Remarks
A CGR-tree without its labels is equivalent toa list of wordsin the sequence without their order.
Shapeof CGR-tree Representation in the unit square.Each nodeof the tree w=w1. . . wd is associated with thepoint
Xwdef=
d
k=1
wk
2dk+1
+X0
2d
,
thecenter of the corresponding square Sw.
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
The CGR-treeConstruction of the CGR-tree
-
7/25/2019 Chaos Game Representationppt
19/35
Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR
Main ResultsPerspectives
Construction of the CGR-treeExampleRelation between the CGR-treeand DSTState of the art
Chaos Game Representation (on the left) and normalized CGR (on the
right) of the first 400000 nucleotides of Chromosome 2 ofHomo Sapiens.
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
The CGR-treeConstruction of the CGR-tree
-
7/25/2019 Chaos Game Representationppt
20/35
Chaos Game Representation (CGR)Digital Search Tree (DST) and CGR
Main ResultsPerspectives
Construction of the CGR treeExampleRelation between the CGR-treeand DSTState of the art
Relation between the CGR-treeand DST
Proposition
The CGR-tree of a random sequence U=U1U2. . .is a DigitalSearch Tree (DST), obtained by inserting in a quartenary tree thesuccessive reverted prefixes.
W(1) = U1,
W(2) = U2U1,...
W(n) = UnUn1. . . U1,...
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
The CGR-treeConstruction of the CGR-tree
-
7/25/2019 Chaos Game Representationppt
21/35
p ( )Digital Search Tree (DST) and CGR
Main ResultsPerspectives
ExampleRelation between the CGR-treeand DSTState of the art
State of the art
In the Bernoulli model : the trees are binary, built withindependent successive sequences having the samedistribution ; the two letters have the same probability 12 . Several results are known (see chap. 6 in Mahmoud
(1992)), concerning the height, the insertion depth andthe profile.
Aldous and Shields (1998) prove by embedding incontinuous time that the height satisfies
Hn log2n Pn
0.
The height is concentrated (Drmota (2002)).For DSTs built from independent sequences on an alphabetwith m letters, withnonsymmetric i.i.d or Markovian sources,Pittel (1985) gets asymptotic results on the insertion depth
and on the height. Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
The CGR-treeConstruction of the CGR-tree
-
7/25/2019 Chaos Game Representationppt
22/35
p ( )Digital Search Tree (DST) and CGR
Main ResultsPerspectives
ExampleRelation between the CGR-treeand DSTState of the art
Theorem (Pittel, 1985)
Let us denoten (resp. Ln) the length of the shortest (resp.longest) branches, then we have :
n
ln n
a.s.n
1
h+
, and Ln
ln n
a.s.n
1
h
.
Moreover, in probability :
Dnln n
Pn
1
h,
h+, h and h are some constants depending on the distribution ofthe source.
In the CGR-tree, the successive inserted words are stronglydependentfrom each other.
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
The CGR-treeConstruction of the CGR-tree
-
7/25/2019 Chaos Game Representationppt
23/35
Digital Search Tree (DST) and CGRMain ResultsPerspectives
ExampleRelation between the CGR-treeand DSTState of the art
The overlapping structure
The main difficulty is thestrong dependencybetween thewords inserted in the CGR-tree, due to theiroverlapping
structure.
We need classical results on the distribution of wordoccurences in a random sequences. Generating functions
Markov chains embedding methods Martingale approach (Penney game)
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
( )
Assumptions and notationsAsymptotic results
-
7/25/2019 Chaos Game Representationppt
24/35
Digital Search Tree (DST) and CGRMain ResultsPerspectives
Asymptotic resultsNumerical experimentsGuidelines for the proofs
Main Results
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
Di i l S h T (DST) d CGR
Assumptions and notationsAsymptotic results
-
7/25/2019 Chaos Game Representationppt
25/35
Digital Search Tree (DST) and CGRMain ResultsPerspectives
Asymptotic resultsNumerical experimentsGuidelines for the proofs
Assumptions and notations (1)
U=U1. . . Un is supposed to be aMarkov chainof order 1,with transition matrix Qt and invariant measure as initial
distribution.
Let us denote s(j) def=s1. . . sj, where sidenotes the i
th letter ofthe infinite sequence s.
p(s(j)) can be defined as p(s(j)) def= P(U1 =sj, . . . , Uj=s1).
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
Di it l S h T (DST) d CGR
Assumptions and notationsAsymptotic results
-
7/25/2019 Chaos Game Representationppt
26/35
Digital Search Tree (DST) and CGRMain ResultsPerspectives
y pNumerical experimentsGuidelines for the proofs
Assumptions and notations (2)
We define the constants
h+def= lim
n+
1
nmax
ln
1
p
s(n)
: p
s(n)
>0
,
hdef
= limn+
1
nmin
ln 1
p
s(n)
:p
s(n)
>0
,
h def
= limn+
1
nE
ln 1
p
s(n).
Due to an argument ofsub-additivity, these limits are well
defined. Moreover, Pittel shows that there exists two infinitesequences denoted here by s+ and s such that
h+= limn
1
nln
1
ps(n)+
, and h= lim
n
1
nln
1
ps(n)
.
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGR
Assumptions and notationsAsymptotic results
-
7/25/2019 Chaos Game Representationppt
27/35
Digital Search Tree (DST) and CGRMain ResultsPerspectives
y pNumerical experimentsGuidelines for the proofs
Assumptions and notations (3)
Tjdef=Tj(w) : the finite tree with jnodes (without counting
the root), built from the jfirst sequences W(1), . . . , W(j),which are thesuccessive suffixes of the reversed sequence U
n.
n (resp. Ln) denotes thelengthof theshortest path(resp.thelongest path) from the root to a feasible external node ofthe tree Tn1(w).
Dn denotes theinsertion depthofW(n) in Tn1 to build Tn.
Mn is the length of a path ofTn, randomly and uniformlychosen in the n possible paths.
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGR
Assumptions and notationsAsymptotic results
-
7/25/2019 Chaos Game Representationppt
28/35
Digital Search Tree (DST) and CGRMain ResultsPerspectives
Numerical experimentsGuidelines for the proofs
Asymptotic results
Theorem
For a CGR-tree built on a markovian sequence U of order1,
n
ln n
a.s.n
1
h+
and Ln
ln n
a.s.n
1
hDnln n
Pn
1
h and lim
n
Mnln n
Pn
1
h.
Remark
For an i.i.d. sequence U, in the case when the random variables Ui are
not equiprobable, Dnln n does not converge a.s. since
lim supn
Dnln n
1
h >
1
h+= lim inf
n
Dnln n
.
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGR
Assumptions and notationsAsymptotic resultsN
-
7/25/2019 Chaos Game Representationppt
29/35
Digital Search Tree (DST) and CGRMain ResultsPerspectives
Numerical experimentsGuidelines for the proofs
Numerical experiments
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGR
Assumptions and notationsAsymptotic resultsN i l i
-
7/25/2019 Chaos Game Representationppt
30/35
Digital Search Tree (DST) and CGRMain ResultsPerspectives
Numerical experimentsGuidelines for the proofs
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGR
Assumptions and notationsAsymptotic resultsN i l i t
-
7/25/2019 Chaos Game Representationppt
31/35
g ( )Main ResultsPerspectives
Numerical experimentsGuidelines for the proofs
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGR
Assumptions and notationsAsymptotic resultsNumerical experiments
-
7/25/2019 Chaos Game Representationppt
32/35
g ( )Main ResultsPerspectives
Numerical experimentsGuidelines for the proofs
Guidelines for the proofs
We define
for adeterministicinfinite sequence s, the random variable
Xj(s) def=
0 ifs1 is not in Tjmax{k: the word s(k) is already inserted in Tj}
Tk(s) def= min{j :Xj(s) =k}.
Xj(s) and Tk(s) are induality: {Xj(s)k}={Tk(s) j}
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGR
Assumptions and notationsAsymptotic resultsNumerical experiments
-
7/25/2019 Chaos Game Representationppt
33/35
Main ResultsPerspectives
Numerical experimentsGuidelines for the proofs
Lemma
Let s be such that
limn+
1
nln
1
p
s(n)
= h(s)>0.
Then we have Xn(s)
ln na.s.
n
1
h(s).
Corollary
Xn(v)
ln na.s.
n
1
ln 1p
,
where p def= P(Ui=v).
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)
Digital Search Tree (DST) and CGR
Assumptions and notationsAsymptotic resultsNumerical experiments
-
7/25/2019 Chaos Game Representationppt
34/35
Main ResultsPerspectives
Numerical experimentsGuidelines for the proofs
We decompose
Tk(s) =
kr=1
(Tr(s) Tr1(s)) def
=
kr=1
Zr(s)
The random variables (Zr(s))r areindependent.The proofs are based on the generating functions ofZr(s).
Peggy Cenac Digital Search Trees and Chaos Game Representation
PlanChaos Game Representation (CGR)Digital Search Tree (DST) and CGR
M i R lt
-
7/25/2019 Chaos Game Representationppt
35/35
Main ResultsPerspectives
Perspectives
Second order in the asymptotic behaviour
Convergence in L1
Central Limit Theorem
Peggy Cenac Digital Search Trees and Chaos Game Representation