MTMS.02.056 Algorithmic Information Theory · MTMS.02.056 Algorithmic Information Theory, What are...

MTMS.02.056 Algorithmic Information

Theory

What are random sequences?

Sven LaurUniversity of Tartu

Historical perspective

Three dominant ways to view probability

Knowledgeof the

Long Run

FairPrice

Probability

BayesianismFrequentism

MathematicalStatistics

. Three notions in the graph form a vicious cycle.

. Depending on the starting point we get different interpretations.

. Each of them has its own application area and weaknesses.

MTMS.02.056 Algorithmic Information Theory, What are random sequences?, October 4, 2013 1

Approximate time-line

1700 1800 1900 2000

1713 J. BernoulliArs Conjectandi

1764 T. BayesBayes Theorem

1774 P. LaplaceBayes Theorem

1810 P. LaplaceCentral Limit Theorem

1834 A. CournotFinite frequentism

1900 K. Pearson!2-test

1921 M. KeynesLogical probability

1919 R. MisesKollektivs

1931 F. RamseySubjective probability

1933 A. KolmogorovGrundbegri!e

1937 de FinettiCoherence principle

1954 SavageSubjective utility

1969 P. Martin-LofRandom sequences

Kolmogorov’s neat axiomatisation of probability as a measure set off thebalance and mathematical statistics quickly became a dominant school.

. It took decades for other interpretations to return.

. The resurrection of Mises theory of kollektivs is particularly interesting.


Kollektivs as a way to define randomness

1919 von Mises postulated what is a random sequence. Probability isa property of an infinite sequence. A sequence x ∈ 0, 1∞ is a collectiveif it satisfies the following conditions.

. Relative frequency has a limiting value p(x).

. For any admissible sub-sequence x′, the corresponding relative frequencymust converge to p(x).

. An sub-sequence is admissible if it is chosen by a method that uses onlyvalues x1, . . . , xi to decide whether to take xi+1 or not.


Formal definition

A sequence x = (xi)∞i=1 is uniformly random collective if for any set

of selection functions φn : 0, 1n → 0, 1 the following subsequencex′ = (xi)i∈I for I = n+ 1 : φn(x1, . . . , xn) = 1 has a limiting frequency

p(x′) = limn

1

n·

n∑n=1

xik =1

2

Theorem (Kamke 1932). There no uniformly random collectives.

Proof. Fix a sequence x and define φn(·) = xn+1. Then (φn)∞n=1 is anadmissible selection function but p(x′) = 1.


Mises-Wald-Church random sequences

Theorem (Wald 1936). Let us consider a countable set of place selectionfunctions H = (A` : 0, 1∗ → 0, 1)∞`=1 which for any sequence (xi)

∞i=1

defines the set of selected elements as follows:

I`(x) = n+ 1 : A`(x1 . . . xn) = 1

Then the set of sequences (xi)∞i=1 such that

∀` ∈ N : p((xi)i∈I`

)=

1

2(1)

is uncountable set. If the infinite sequence is generated by flipping the faircoin the outcome (xi)

∞i=1 will satisfy the condition (1) with probability 1.


Frechet’s objections

Theorem I (Ville 1936). For any countable set of selection functions Hthere exists a specific sequence (xi)

∞i=1 that passes the Mises-Wald criterion

for randomness but for which the following condition holds

∃k0 ∈ N : ∀k > k0 :1

k·

k∑i=1

xk ≥1

2.

In other words, a gabler always wins if he or she plays long enough.

Theorem II (Ville 1936). For any countable set of selection functions Hthere exists a specific sequence (xi)

∞i=1 that passes Mises-Wald criterion for

randomness but for which the law of iterated logarithms does not hold.


Mises-Wald-Church sequences:

Ville’s construction

Construction target

Let A1, A2, . . . , A`, . . . be a countable set of place selection functions andlet φ(x) continuously increasing positive function such that

limx→∞

φ(x)

x= 0 and lim

x→∞φ(x) =∞ ,

i.e., φ(x) is sub-linear function. Then there exist a sequence (xi)∞i=1 such

that for any ` ∈ N there exists constants α`, β` > 0 such that:

∀n ∈ N : −α`

n≤ 1

n·

n∑k=1

xik −1

2<α`

n+ β` ·

φ(n)

n

for I` = (i1, i2, . . . , ik, . . .) and thus satisfies the Mises-Wald criterion.


Corresponding illustration

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 ↵`

n

1

2+

↵`

n+ ` · (n)

n

For each selection function, the frequency converges asymmetrically to 12.


High-level description of the construction

Direct operation with selection functions A1, A2, . . . , A`, . . . is troublesome.Instead, we construct a new set of selection functions B1, B2, . . . , B`, . . .such that for our sequence (xi)

∞i=1, we can express

A` = D` ∨B2`+1 ∨ . . . ∨B2`+1

where A∨B denotes disjunction of selections and D` is a selection functionthat selects only a finite number of indices.

. As a result, it is sufficient to guarantee that (xi)∞i=1 is random with

respect to selection functions B1, B2, . . . , B`, . . ..

. The specific form of selection function B` makes it easy to construct thesequence (xi)

∞i=1.


Logical operations with selection functions

Conjunction and disjunction of two selection functions:

∀n ∈ N : (A ∧B)(x1, . . . , xn) = A(x1, . . . , xn) ∧B(x1, . . . , xn)

∀n ∈ N : (A ∨B)(x1, . . . , xn) = A(x1, . . . , xn) ∨B(x1, . . . , xn)

Power operation with a Boolean constant c ∈ 0, 1:

∀n ∈ N : (Ac)(x1, . . . , xn) =

A(x1, . . . , xn), if c = 1

¬A(x1, . . . , xn), if c = 0

Special clipping operator for m ∈ N

A(m)(x1, . . . , xn) =

A(x1, . . . , xn), if

∑ni=1A(x1, . . . , xi) ≤ m

0, if∑n

i=1A(x1, . . . , xi) > m


List of ortogonal selection functions

We will construct a set of selection functions C1, C2 . . . , C`, . . . such that

. a sequence element xi can be chosen only by a single selection function;

. each selection function C` will select up to m` indices;

. And for a particular (xi)∞i=1 there exists C` that chooses xi.


Double recursive definition of selection operators

Let (yi)∞i=1 be an arbitrary binary sequence and (mi)

∞i=1 be a sequence

of selection limits. Then (yi)∞i=1 together with initial selection functions

A1, A2, . . . , A`, . . . defines a series of selection functions:By1 = Ay1

1

Cy1 = Bm1y1

By1y2 = ¬Cy1 ∧Ay11 ∧Ay2

2

Cy1y2 = Bm2y1y2

. . . . . .By1...yk = ¬Cy1 ∧ . . . ∧ ¬Cy1...yk−1 ∧A

y11 ∧Ayk

k

Cy1...yk = B(mk)y1...yk



Let the the sequence of selection limits be m1 = 1,m2 = 2,m3 = 2, . . ..

1 2 3 4 5 6 7 . . .(xi)

∞1 1 0 1 0 1 0 1 . . .

A1 1 0 0 1 1 1 1 . . .A2 0 0 1 1 0 0 0 . . .A3 1 1 0 1 1 0 1 . . .

B1 1 0 0 1 1 1 1 . . .B10 0 0 0 0 1 1 1 . . .B101 0 0 0 1 0 0 1 . . .

C1 1 0 0 0 0 0 0 . . .C10 0 0 0 0 1 1 0 . . .C101 0 0 0 0 0 0 1 . . .C111 0 0 0 1 0 0 0 . . .


Properties of the construction

Lemma. Let (xi)∞i=1 be a candidate sequence and let (yi)

∞i=1 and (zi)

∞i=1

be arbitrary binary sequences. Let (mi)∞i=1 be a sequence of selection limits.

Then two different selection functions Cy1...yk and Cz1...z` cannot choosexm at the same time:

∀i ∈ N : ¬Cy1...yk(x1, . . . , xi−1) ∨ ¬Cz1...z`(x1, . . . , xi−1)

Proof.

. If there exists j ≤ k, ` such that xj 6= zj, the conjunction terms Ayjj and

Azjj cannot be true at the same time.

. If y1 . . . yk is a proper prefix of z1 . . . z` then the conjunction term¬Cy1...yk does the trick.


Properties of the construction

Lemma. Let (xi)∞i=1 be a candidate sequence and let

∑∞i=1mi be a

diverging sequence of selection limits. Then for any index i there exists afinite prefix y1 . . . y` such that Cy1...yk chooses xi:

Cy1...y`(x1, . . . , xi1) = 1

and selection functions Cy1...yk corresponding to its proper prefixes selectexactly mk indices.

Proof.

. Let us choose yi such that Ay1...yii (x1, . . . , xi−1) = 1.

. Now By1...yk(x1, . . . , xi−1) ≡ 1 until Cy1...yk−1(x1, . . . , xi−1) ≡ 0.

. As Cy1...yk can select up to mk indices the Cy1...yk−1cannot remain 0.


Construction of the sequence

High-level goal. We have countably many selection functions (Cy)y∈0,1+

defined in terms of A1, A2, . . . , A`, . . .. We have to find a sequence (xi)∞i=1

that for these finite selection functions looks suitably random.

Let Iy1...yk(x1, . . . , xi) =n+ 1 : n < i ∧ Cy1...yk(x1, . . . , xn)

be the set

of selected indices by looking at the first i− 1 sequence elements. Then wewould like to construct (xi)

∞i=1 such that for alli ∈ N and for all y ∈ 0, 1+:

|Iy(x1, . . . , xi)|2

≤∑

j∈Iy(x1,...,xi)

xj <|Iy(x1, . . . , xi)|

2+ 1


Inductive construction

Basis. Set x1 = 1. Then for all nonempty index sets Iy() the goal holds.

Induction step. Assume that the goal holds for the sequence prefixx1, . . . , xi and for all non-empty index sets Iy1...yk(x1, . . . , xi).

. As Iy1...yk(x1, . . . , xi, xi+1) does not depend on xi+1, we can find outinto which set Iy(x1, . . . , xi, xi+1) the index i+ 1 falls.

. If |Iy(x1, . . . , xi, xi+1)| = 1 then set xi+1 = 1.

. Otherwise we must fix xi+1 such that

|Iy(x1, . . . , xi)|+ 1

2≤

∑j∈Iy(x1,...,xi)

xj + xi+1 <|Iy(x1, . . . , xi)|+ 1

2+ 1

This is always possible.


The first claim about randomness

Theorem. The constructed sequence (xi)∞i=1 is Mises-Wald random wrt

the selection functions (By)y∈0,1+.

Proof. Let Jy1...yk(x1, . . . , xi) = n+ 1 : n < i ∧By(x1, . . . , xn) bethe set of selected indices wrt the functions (By)y∈0,1+. Then we can

decompose this set into disjoint union of sets Iy1...yk...z`(x1, . . . , xi).



Iy

Iy0 Iy1

Iy00 Iy01 Iy10 Iy11

Iy000 Iy001 Iy010 Iy011 Iy100 Iy101 Iy110 Iy111

There are two types of non-empy sets. Full sets denoted by the olive colourand incomplete denoted by the red colour. We must make sure that theapproximation is good, i.e., that sets are large enough.


How big are the sets?

If the length y is k and the length of the longest suffix that creates anon-empty set is `, then the number of selected elements is bounded:∣∣Jy1...yk(x1, . . . , xi)∣∣ ≥ mk +mk+1 · · ·+mk+`−1∣∣Jy1...yk(x1, . . . , xi)∣∣ ≤ mk + 2mk+1 · · ·+ 2`mk+`

The total number of green and red sets r is bounded by 2`+1.


How good is the approximation?

Let aj denote the size of the jth nonempty selection set and bj the numberof ones in this selection set. Then by the construction

aj2≤ bj <

aj2

+ 1∣∣Jy1...yk(x1, . . . , xi)∣∣2

≤r∑

j=1

bj <

∣∣Jy1...yk(x1, . . . , xi)∣∣2

+ 2`+1

and we must choose the set of selection limits (mi)∞i=1 so that

1

2≤ 1∣∣Jy1...yk(x1, . . . , xi)∣∣ ·

r∑j=1

bj2<

1

2+

2`+1∣∣Jy1...yk(x1, . . . , xi)∣∣the last term converges to zero with a right speed.


What do we need?

We need at least that the right hand side of the inequality would go to zero

ε(n) =2`+1∣∣Jy1...yk(x1, . . . , xi)∣∣ ≤ 2`+1

mk + · · ·mk+`−1

to prove that (x)∞i=1 is Mises-Wald random wrt the functions (By)y∈0,1+.

This means that mk+` −mk ≥ 2`+2 for all k, ` ∈ N, i.e., mk = Ω(2k).


The second claim about randomness

Theorem. The constructed sequence (xi)∞i=1 is Mises-Wald random wrt

the selection functions (A`)∞`=1.

Proof. Let K`(x1, . . . , xi) = n+ 1 : n < i ∧A`(x1, . . . , xn). As thefollowing conjunction is always true

A` =∨

y∈0,1`−1

(`−1∧i=1

Ayii ∧An

)

A`(x1, . . . , xi) =∨

y∈0,1`−1By1(x1, . . . xn) ∨D(x1, . . . , xi)

where D selects only finite number of indices.


Direct consequences

By disjointness of selection functions (By)y∈0,1+, we can express

K`(x1, . . . , xi) in terms of Jy1(x1, . . . , xi) where y ∈ 0, 1`−1. Letay denote the size of Jy1(x1, . . . , xi) and by the number of ones inJy1(x1, . . . , xi). Then we get

ay2≤ by <

ay2

+ ayε(ay)

which yields

1

2<

1

|K`(x1, . . . , xi)|·∑

y∈0,1`by <

1

2+

∑y∈0,1`−1

ayε(ay)

|K`(x1, . . . , xi)|+ o(1)

and thus the last term must converges to zero with right speed.


Final push

It turn out that if we choose the selection limits (mi)∞i=1 large enough, we

can assure that

ε(n) ≤ φ(n+ φ(`))

n2`−1

and thus∑y∈0,1`−1

ayε(ay)

|K`(x1, . . . , xi)|≤ 1

K`(x1, . . . , xi)·

∑y∈0,1`−1

φ(ay + φ(`))

2`−1

≤ φ(|K`(x1, . . . , xi)|+ φ(`))

|K`(x1, . . . , xi)|=φ(n+ φ(`))

n


Properties of constructed random sequence

Note that indices 1, . . . , k belong to sets Jy(x1, . . . , xk) for y ∈ 0, 1nsince

∨y∈0,1n

(n∧

i=1

Ayii

)≡ 1 .

Note that by the construction the number of ones in these sets is alwaysequal or more than one half. Thus, we have proven

i∑j=1

xj ≥i

2.


Mises-Wald-Church sequences

Definition. Let (A`)∞`=1 be the set of all partially recursive functions. Then

the sequence (xi)∞i=1 is Mises-Wald-Church sequence if the Mises-Wald

criterion is satisfied.

Unpredictable sequences. For any algorithm B : 0, 1∗ → 0, 1 we cancompute its asymptotic accuracy as follows

AdvB(n) =1

n+ 1·n+1∑i=0

[B(x1, . . . , xi)?= xi+1]

A sequence is unpredictable if AdvB(n)→ 12 for any algorithm.

Corollary. Mises-Wald-Church sequences are unpredictable sequences.


MTMS.02.056 Algorithmic Information Theory · MTMS.02.056 Algorithmic Information Theory, What are...

Documents

Transcript of MTMS.02.056 Algorithmic Information Theory · MTMS.02.056 Algorithmic Information Theory, What are...