The beauty of prime numbers vs the beauty of the random Ely Porat Bar-Ilan University Israel

download The beauty of prime numbers vs the beauty of the random Ely Porat Bar-Ilan University Israel

of 28

  • date post

    29-Mar-2015
  • Category

    Documents

  • view

    212
  • download

    0

Embed Size (px)

Transcript of The beauty of prime numbers vs the beauty of the random Ely Porat Bar-Ilan University Israel

  • Slide 1

The beauty of prime numbers vs the beauty of the random Ely Porat Bar-Ilan University Israel Slide 2 Outline Applications Prime Numbers Group Testing De-randomized approach for group testing Applications getting into details Length Reduction Slide 3 Pattern Matching Given a Text T and Pattern P, the problem is to find all the substring of T that equal to P. T= P= Slide 4 Streaming Model T= P= Our goal is to do that with out saving P (P) The character of T arrive one by one We can t save T Slide 5 The character of T arrive one by one We can t save T Streaming Model T= P= Our goal is to do that without saving P (P) Automata? Slide 6 Hamming distance with wildcards Find a pattern in a text with 2 complications: Don t cares (wildcards ) Mismatches Text: Pattern: Slide 7 Summaries results Offline O(nklog 2 m) hamming distance with wildcards Online Pattern Matching hamming distance O(klog 2 m) hamming distance with wildcards O(klogm) Edit distance Streaming O(log 2 m) space O(logm) time Exact match O(k 3 log 5 m) space O(k 2 log 2 m) time hamming Slide 8 Open problem Online convolution in o(log 2 m) time per symbol. Offline is done by FFT in O(nlogm). t 1 t 2 t 3 t 4 t 5 t 6... t n p 1 p 2 p 3 p 4 p 5 t 1 p 1 +t 2 p 2 +t 5 p 5 p 1 p 2 p 3 p 4 p 5 t 2 p 1 +t 3 p 2 +t 5 p 6 m=5 Slide 9 m people at most k are sick Query: Is someone in this set sick? Goal: identify the sick people by only few tests. Non-adaptive ??????...... Problem Definition... Slide 10 Motivations Syphilis, HIV [Dor43] Mapping genomes [BLC91, BBK+95, TJP00] Quality control in product testing [SG59] Searching files in storage systems [KS64] Sequential screening of experimental variables [Li62] Efficient contention resolution algorithms for multiple access communication [KS64, Wol85] Data compression [HL00] Software testing [BG02, CDFP97] DNA sequencing [PL94] Molecular biology [DH00, FKKM97, ND00, BBKT96] Slide 11 Background Same conditions: Deterministic KS64 Random KS64 Heavy deterministic AMS06 Lower bound: CR96 Relaxed conditions: Fully adaptive Two staged group testing and selectors [CGR00, Kni95, BGV03, CMS01, BV03, BGV05] Optimal monotone encoding [AH08] Similar problems: Inhibitors [FKKM97, Dam98, BV98, BGV03] Bayesian case [Kni95, BL02, BL03, A.J98, BGV03] Errors [BGV98] DIMACS 2006 Scheme size Deterministic Random and Heavy deterministic Lower bound Slide 12 Our Results Deterministic Size Fast construction Scheme size Deterministic Random and Heavy deterministic Lower bound Slide 13 Prime Numbers Group Testing Position of sicks Bad event: Exist y s.t Slide 14 Prime Numbers Group Testing Bad event: Exist y s.t x1x2x3x4...xkx1x2x3x4...xk There is a dot below each prime There exisit x i that for p i1 p i2 p id >n Y mod p ij =x i By CRT x i =y Slide 15 Prime Numbers Group Testing This give group testing of size: p 1 +p 2 ++p r By choosing good enough primes we get O(k 2 log 2 m) Slide 16 Randomized Group Testing Just choose O(k 2 logn) random sets of size n/k. Slide 17 Overall derandomization plan Derandomization Good group testing schemes Reduction from error correction codes to group testing schemes Good deterministic linear error correction codes Good deterministic error correction codes Method of conditional probabilities Good random error correction codes Slide 18 Error correction codes Length of words = m Number of words = Distance = Rate = R Relative distance = Linear code Rm m Slide 19 Good random linear error correction codes GV bound: There exists with Linear codes faster construction Algorithm: Pick the entries of the generating matrix uniformly and independently. Slide 20 Method of conditional probabilities Algorithm: Pick the entries of the generating matrix one by one. In each step minimize the expected number of collisions between code words. Slide 21 0 1 2 0 1 2 0 1 2 0 0 0 1 0 2 1 0 1 1 1 2 2 0 2 1 2 2 0 2 1 0 2 1 1 0 2 0 2 1 0 2 1 0 0 1 2 1 1 1 2 1 C=[3,2,2] 3 -RS Slide 22 C=[3,2,2] 3 -RS: 1: 0 0 0 2: 1 1 1 3: 2 2 2 4: 0 1 2 5: 1 2 0 6: 2 0 1 7: 0 2 1 8: 2 1 0 9: 1 0 2 Reduction from Error correction codes to group testing schemes GT scheme: {1,4,7} {2,5,9} {3,6,8} {1,6,9} {2,4,8} {3,5,7} {1,5,8} {2,6,7} {3,4,9} Slide 23 Why should it work? Theorem: Let C be an Then F(C) is a group testing scheme for n people with up to sick people. C=[3,2,2] 3 -RS: 1: 0 0 0 2: 1 1 1 3: 2 2 2 4: 0 1 2 5: 1 2 0 6: 2 0 1 7: 0 2 1 8: 2 1 0 9: 1 0 2 GT scheme: {1,4,7} {2,5,9} {3,6,8} {1,6,9} {2,4,8} {3,5,7} {1,5,8} {2,6,7} {3,4,9} (Up to 2 Sick people) Slide 24 Why should it work? Proof A codeword representing a healthy man: Codewords representing sick men: k Slide 25 Worst Case A codeword representing a healthy man: Codewords representing sick men: k Slide 26 What we got? Scheme size Deterministic Random and Heavy deterministic Lower bound Slide 27 Applications getting into details Streaming Up to 1 mismatch: Assume we have a black box for searching for exact match. p 1 p 2 p 3 p 4 p 5 p m P: p 1 p 3 p 5 p m P 1,2 : p 2 p 4 P 2,2 : There is more then one mistake The other way around isnt true Slide 28 Streaming: Up to 1 mismatch p 1 p 2 p 3 p 4 p 5 p m P: p 1 p 3 p 5 p m P 1,2 : p 2 p 4 P 2,2 : p 1 p 4 p m p 2 p 5 P 2,3 : p 3 P 3,3 : P 1,3 : P q,q : 2*3*5*7*11**q>m With CRT we be able to find the position of the mismatch. In order to support more mistake we will had on that The Prime numbers group testing