MTMS.02.056 Algorithmic Information Theory · MTMS.02.056 Algorithmic Information Theory, What are...
Transcript of MTMS.02.056 Algorithmic Information Theory · MTMS.02.056 Algorithmic Information Theory, What are...
MTMS.02.056 Algorithmic Information
Theory
What are random sequences?
Sven LaurUniversity of Tartu
Historical perspective
Three dominant ways to view probability
Knowledgeof the
Long Run
FairPrice
Probability
BayesianismFrequentism
MathematicalStatistics
. Three notions in the graph form a vicious cycle.
. Depending on the starting point we get different interpretations.
. Each of them has its own application area and weaknesses.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 1
Approximate time-line
1700 1800 1900 2000
1713 J. BernoulliArs Conjectandi
1764 T. BayesBayes Theorem
1774 P. LaplaceBayes Theorem
1810 P. LaplaceCentral Limit Theorem
1834 A. CournotFinite frequentism
1900 K. Pearson!2-test
1921 M. KeynesLogical probability
1919 R. MisesKollektivs
1931 F. RamseySubjective probability
1933 A. KolmogorovGrundbegri!e
1937 de FinettiCoherence principle
1954 SavageSubjective utility
1969 P. Martin-LofRandom sequences
Kolmogorov’s neat axiomatisation of probability as a measure set off thebalance and mathematical statistics quickly became a dominant school.
. It took decades for other interpretations to return.
. The resurrection of Mises theory of kollektivs is particularly interesting.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 2
Kollektivs as a way to define randomness
1919 von Mises postulated what is a random sequence. Probability isa property of an infinite sequence. A sequence x ∈ 0, 1∞ is a collectiveif it satisfies the following conditions.
. Relative frequency has a limiting value p(x).
. For any admissible sub-sequence x′, the corresponding relative frequencymust converge to p(x).
. An sub-sequence is admissible if it is chosen by a method that uses onlyvalues x1, . . . , xi to decide whether to take xi+1 or not.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 3
Formal definition
A sequence x = (xi)∞i=1 is uniformly random collective if for any set
of selection functions φn : 0, 1n → 0, 1 the following subsequencex′ = (xi)i∈I for I = n+ 1 : φn(x1, . . . , xn) = 1 has a limiting frequency
p(x′) = limn
1
n·
n∑n=1
xik =1
2
Theorem (Kamke 1932). There no uniformly random collectives.
Proof. Fix a sequence x and define φn(·) = xn+1. Then (φn)∞n=1 is anadmissible selection function but p(x′) = 1.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 4
Mises-Wald-Church random sequences
Theorem (Wald 1936). Let us consider a countable set of place selectionfunctions H = (A` : 0, 1∗ → 0, 1)∞`=1 which for any sequence (xi)
∞i=1
defines the set of selected elements as follows:
I`(x) = n+ 1 : A`(x1 . . . xn) = 1
Then the set of sequences (xi)∞i=1 such that
∀` ∈ N : p((xi)i∈I`
)=
1
2(1)
is uncountable set. If the infinite sequence is generated by flipping the faircoin the outcome (xi)
∞i=1 will satisfy the condition (1) with probability 1.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 5
Frechet’s objections
Theorem I (Ville 1936). For any countable set of selection functions Hthere exists a specific sequence (xi)
∞i=1 that passes the Mises-Wald criterion
for randomness but for which the following condition holds
∃k0 ∈ N : ∀k > k0 :1
k·
k∑i=1
xk ≥1
2.
In other words, a gabler always wins if he or she plays long enough.
Theorem II (Ville 1936). For any countable set of selection functions Hthere exists a specific sequence (xi)
∞i=1 that passes Mises-Wald criterion for
randomness but for which the law of iterated logarithms does not hold.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 6
Mises-Wald-Church sequences:
Ville’s construction
Construction target
Let A1, A2, . . . , A`, . . . be a countable set of place selection functions andlet φ(x) continuously increasing positive function such that
limx→∞
φ(x)
x= 0 and lim
x→∞φ(x) =∞ ,
i.e., φ(x) is sub-linear function. Then there exist a sequence (xi)∞i=1 such
that for any ` ∈ N there exists constants α`, β` > 0 such that:
∀n ∈ N : −α`
n≤ 1
n·
n∑k=1
xik −1
2<α`
n+ β` ·
φ(n)
n
for I` = (i1, i2, . . . , ik, . . .) and thus satisfies the Mises-Wald criterion.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 7
Corresponding illustration
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 ↵`
n
1
2+
↵`
n+ ` · (n)
n
For each selection function, the frequency converges asymmetrically to 12.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 8
High-level description of the construction
Direct operation with selection functions A1, A2, . . . , A`, . . . is troublesome.Instead, we construct a new set of selection functions B1, B2, . . . , B`, . . .such that for our sequence (xi)
∞i=1, we can express
A` = D` ∨B2`+1 ∨ . . . ∨B2`+1
where A∨B denotes disjunction of selections and D` is a selection functionthat selects only a finite number of indices.
. As a result, it is sufficient to guarantee that (xi)∞i=1 is random with
respect to selection functions B1, B2, . . . , B`, . . ..
. The specific form of selection function B` makes it easy to construct thesequence (xi)
∞i=1.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 9
Logical operations with selection functions
Conjunction and disjunction of two selection functions:
∀n ∈ N : (A ∧B)(x1, . . . , xn) = A(x1, . . . , xn) ∧B(x1, . . . , xn)
∀n ∈ N : (A ∨B)(x1, . . . , xn) = A(x1, . . . , xn) ∨B(x1, . . . , xn)
Power operation with a Boolean constant c ∈ 0, 1:
∀n ∈ N : (Ac)(x1, . . . , xn) =
A(x1, . . . , xn), if c = 1
¬A(x1, . . . , xn), if c = 0
Special clipping operator for m ∈ N
A(m)(x1, . . . , xn) =
A(x1, . . . , xn), if
∑ni=1A(x1, . . . , xi) ≤ m
0, if∑n
i=1A(x1, . . . , xi) > m
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 10
List of ortogonal selection functions
We will construct a set of selection functions C1, C2 . . . , C`, . . . such that
. a sequence element xi can be chosen only by a single selection function;
. each selection function C` will select up to m` indices;
. And for a particular (xi)∞i=1 there exists C` that chooses xi.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 11
Double recursive definition of selection operators
Let (yi)∞i=1 be an arbitrary binary sequence and (mi)
∞i=1 be a sequence
of selection limits. Then (yi)∞i=1 together with initial selection functions
A1, A2, . . . , A`, . . . defines a series of selection functions:By1 = Ay1
1
Cy1 = Bm1y1
By1y2 = ¬Cy1 ∧Ay11 ∧Ay2
2
Cy1y2 = Bm2y1y2
. . . . . .By1...yk = ¬Cy1 ∧ . . . ∧ ¬Cy1...yk−1 ∧A
y11 ∧Ayk
k
Cy1...yk = B(mk)y1...yk
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 12
Corresponding illustration
Let the the sequence of selection limits be m1 = 1,m2 = 2,m3 = 2, . . ..
1 2 3 4 5 6 7 . . .(xi)
∞1 1 0 1 0 1 0 1 . . .
A1 1 0 0 1 1 1 1 . . .A2 0 0 1 1 0 0 0 . . .A3 1 1 0 1 1 0 1 . . .
B1 1 0 0 1 1 1 1 . . .B10 0 0 0 0 1 1 1 . . .B101 0 0 0 1 0 0 1 . . .
C1 1 0 0 0 0 0 0 . . .C10 0 0 0 0 1 1 0 . . .C101 0 0 0 0 0 0 1 . . .C111 0 0 0 1 0 0 0 . . .
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 13
Properties of the construction
Lemma. Let (xi)∞i=1 be a candidate sequence and let (yi)
∞i=1 and (zi)
∞i=1
be arbitrary binary sequences. Let (mi)∞i=1 be a sequence of selection limits.
Then two different selection functions Cy1...yk and Cz1...z` cannot choosexm at the same time:
∀i ∈ N : ¬Cy1...yk(x1, . . . , xi−1) ∨ ¬Cz1...z`(x1, . . . , xi−1)
Proof.
. If there exists j ≤ k, ` such that xj 6= zj, the conjunction terms Ayjj and
Azjj cannot be true at the same time.
. If y1 . . . yk is a proper prefix of z1 . . . z` then the conjunction term¬Cy1...yk does the trick.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 14
Properties of the construction
Lemma. Let (xi)∞i=1 be a candidate sequence and let
∑∞i=1mi be a
diverging sequence of selection limits. Then for any index i there exists afinite prefix y1 . . . y` such that Cy1...yk chooses xi:
Cy1...y`(x1, . . . , xi1) = 1
and selection functions Cy1...yk corresponding to its proper prefixes selectexactly mk indices.
Proof.
. Let us choose yi such that Ay1...yii (x1, . . . , xi−1) = 1.
. Now By1...yk(x1, . . . , xi−1) ≡ 1 until Cy1...yk−1(x1, . . . , xi−1) ≡ 0.
. As Cy1...yk can select up to mk indices the Cy1...yk−1cannot remain 0.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 15
Construction of the sequence
High-level goal. We have countably many selection functions (Cy)y∈0,1+
defined in terms of A1, A2, . . . , A`, . . .. We have to find a sequence (xi)∞i=1
that for these finite selection functions looks suitably random.
Let Iy1...yk(x1, . . . , xi) =n+ 1 : n < i ∧ Cy1...yk(x1, . . . , xn)
be the set
of selected indices by looking at the first i− 1 sequence elements. Then wewould like to construct (xi)
∞i=1 such that for alli ∈ N and for all y ∈ 0, 1+:
|Iy(x1, . . . , xi)|2
≤∑
j∈Iy(x1,...,xi)
xj <|Iy(x1, . . . , xi)|
2+ 1
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 16
Inductive construction
Basis. Set x1 = 1. Then for all nonempty index sets Iy() the goal holds.
Induction step. Assume that the goal holds for the sequence prefixx1, . . . , xi and for all non-empty index sets Iy1...yk(x1, . . . , xi).
. As Iy1...yk(x1, . . . , xi, xi+1) does not depend on xi+1, we can find outinto which set Iy(x1, . . . , xi, xi+1) the index i+ 1 falls.
. If |Iy(x1, . . . , xi, xi+1)| = 1 then set xi+1 = 1.
. Otherwise we must fix xi+1 such that
|Iy(x1, . . . , xi)|+ 1
2≤
∑j∈Iy(x1,...,xi)
xj + xi+1 <|Iy(x1, . . . , xi)|+ 1
2+ 1
This is always possible.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 17
The first claim about randomness
Theorem. The constructed sequence (xi)∞i=1 is Mises-Wald random wrt
the selection functions (By)y∈0,1+.
Proof. Let Jy1...yk(x1, . . . , xi) = n+ 1 : n < i ∧By(x1, . . . , xn) bethe set of selected indices wrt the functions (By)y∈0,1+. Then we can
decompose this set into disjoint union of sets Iy1...yk...z`(x1, . . . , xi).
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 18
Corresponding illustration
Iy
Iy0 Iy1
Iy00 Iy01 Iy10 Iy11
Iy000 Iy001 Iy010 Iy011 Iy100 Iy101 Iy110 Iy111
There are two types of non-empy sets. Full sets denoted by the olive colourand incomplete denoted by the red colour. We must make sure that theapproximation is good, i.e., that sets are large enough.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 19
How big are the sets?
If the length y is k and the length of the longest suffix that creates anon-empty set is `, then the number of selected elements is bounded:∣∣Jy1...yk(x1, . . . , xi)∣∣ ≥ mk +mk+1 · · ·+mk+`−1∣∣Jy1...yk(x1, . . . , xi)∣∣ ≤ mk + 2mk+1 · · ·+ 2`mk+`
The total number of green and red sets r is bounded by 2`+1.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 20
How good is the approximation?
Let aj denote the size of the jth nonempty selection set and bj the numberof ones in this selection set. Then by the construction
aj2≤ bj <
aj2
+ 1∣∣Jy1...yk(x1, . . . , xi)∣∣2
≤r∑
j=1
bj <
∣∣Jy1...yk(x1, . . . , xi)∣∣2
+ 2`+1
and we must choose the set of selection limits (mi)∞i=1 so that
1
2≤ 1∣∣Jy1...yk(x1, . . . , xi)∣∣ ·
r∑j=1
bj2<
1
2+
2`+1∣∣Jy1...yk(x1, . . . , xi)∣∣the last term converges to zero with a right speed.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 21
What do we need?
We need at least that the right hand side of the inequality would go to zero
ε(n) =2`+1∣∣Jy1...yk(x1, . . . , xi)∣∣ ≤ 2`+1
mk + · · ·mk+`−1
to prove that (x)∞i=1 is Mises-Wald random wrt the functions (By)y∈0,1+.
This means that mk+` −mk ≥ 2`+2 for all k, ` ∈ N, i.e., mk = Ω(2k).
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 22
The second claim about randomness
Theorem. The constructed sequence (xi)∞i=1 is Mises-Wald random wrt
the selection functions (A`)∞`=1.
Proof. Let K`(x1, . . . , xi) = n+ 1 : n < i ∧A`(x1, . . . , xn). As thefollowing conjunction is always true
A` =∨
y∈0,1`−1
(`−1∧i=1
Ayii ∧An
)
A`(x1, . . . , xi) =∨
y∈0,1`−1By1(x1, . . . xn) ∨D(x1, . . . , xi)
where D selects only finite number of indices.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 23
Direct consequences
By disjointness of selection functions (By)y∈0,1+, we can express
K`(x1, . . . , xi) in terms of Jy1(x1, . . . , xi) where y ∈ 0, 1`−1. Letay denote the size of Jy1(x1, . . . , xi) and by the number of ones inJy1(x1, . . . , xi). Then we get
ay2≤ by <
ay2
+ ayε(ay)
which yields
1
2<
1
|K`(x1, . . . , xi)|·∑
y∈0,1`by <
1
2+
∑y∈0,1`−1
ayε(ay)
|K`(x1, . . . , xi)|+ o(1)
and thus the last term must converges to zero with right speed.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 24
Final push
It turn out that if we choose the selection limits (mi)∞i=1 large enough, we
can assure that
ε(n) ≤ φ(n+ φ(`))
n2`−1
and thus∑y∈0,1`−1
ayε(ay)
|K`(x1, . . . , xi)|≤ 1
K`(x1, . . . , xi)·
∑y∈0,1`−1
φ(ay + φ(`))
2`−1
≤ φ(|K`(x1, . . . , xi)|+ φ(`))
|K`(x1, . . . , xi)|=φ(n+ φ(`))
n
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 25
Properties of constructed random sequence
Note that indices 1, . . . , k belong to sets Jy(x1, . . . , xk) for y ∈ 0, 1nsince
∨y∈0,1n
(n∧
i=1
Ayii
)≡ 1 .
Note that by the construction the number of ones in these sets is alwaysequal or more than one half. Thus, we have proven
i∑j=1
xj ≥i
2.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 26
Mises-Wald-Church sequences
Definition. Let (A`)∞`=1 be the set of all partially recursive functions. Then
the sequence (xi)∞i=1 is Mises-Wald-Church sequence if the Mises-Wald
criterion is satisfied.
Unpredictable sequences. For any algorithm B : 0, 1∗ → 0, 1 we cancompute its asymptotic accuracy as follows
AdvB(n) =1
n+ 1·n+1∑i=0
[B(x1, . . . , xi)?= xi+1]
A sequence is unpredictable if AdvB(n)→ 12 for any algorithm.
Corollary. Mises-Wald-Church sequences are unpredictable sequences.
MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 27