Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

61
Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications

Transcript of Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Page 1: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Ely Porat

Bar-Ilan University

Group Testing and New Algorithmic Applications

Page 2: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Theory of Big data Pattern matching

Game theoryCoding theory

Compressive sensing

Group testing Distributed

Page 3: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Bloom filters

Theory of Big data

Succinct data structures

Streaming algorithmSketching & LSH

Big Databases

Page 4: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Group Testing Overview

Test soldier for a disease

WWII example: syphillis

Page 5: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Group Testing Overview

Test an army for a disease

WWII example: syphillis

What if only one soldier has the

disease?

What if only one soldier has the

disease?

Can pool blood samples and

check if at least one soldier has

the disease

Can pool blood samples and

check if at least one soldier has

the disease

Page 6: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

More Motivations• Syphilis, HIV [Dor43]• Mapping genomes [BLC91, BBK+95, TJP00]• Quality control in product testing [SG59]• Searching files in storage systems [KS64]• Sequential screening of experimental variables [Li62]• Efficient contention resolution algorithms for multiple access

communication [KS64, Wol85]• Data compression [HL00]• Software testing [BG02, CDFP97]• DNA sequencing [PL94]• Molecular biology [DH00, FKKM97, ND00, BBKT96]

Page 7: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Adaptive group testing

Number of sickd ≤ 2

Page 8: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Adaptive general case

Number of sick≤d

2dAt most d positive => There remain n/2

Run in recursion

n

O(dlog(n/d))

Page 9: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Non adaptive group testing

• All the tests set in advance.

n

t

Page 10: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Non adaptive group testing

n

t

1 0 1 1 0 0 0 1 1 0 100 0 1 0 1 0 1 0 1 0 110 1 0 1 0 1 1 0 0 1 011 0 1 1 0 1 0 1 0 1 001 1 0 1 1 0 0 1 0 0 100 1 0 0 1 0 1 0 1 0 11

110101

0

0

0

1

0

0

0

0

0

1

0

0

=

(and,or) matrix vector multiplication

Page 11: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Non adaptive group testing

1 2 3 n…………

1

2

3

t

.

.

.

1 0 0 1………….

0 0 1 0………….

0 0 0 1………….

1 1 1 0………….

.

.

.

x1

x2

x3

xn

.

.

.

.

.

.

r1

r2

r3

rt

.

.

.

unknownunknown

To be designedTo be designed

ObservedObserved

Upper bound: t=O(d2logn) [PR08]Lower bound: t=Ω(d2logdn) [DR82]

Page 12: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Non adaptive group testing

Page 13: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

2-Stage group testing

Page 14: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

2-Stage group testing

We misclassified 2 soldiers.

Using O(dlog n/d) measurement.We will misclassified O(d) soldiers,

which we can easily one by one in a second stage

Property of unbalanced expander.

Page 15: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Adaptive vs Non adaptiveIf one test take a day performing.Adaptive testing might take a month

2 stage group testing – take 2 daysTime

Store lessto be check later

Page 16: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Group testing for Pattern Matching

Text:n

Pattern:m

Page 17: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Part of 20M€ consortium project which is supported by MOI (cyber security)

Supported byGroup testing for Pattern Matching

Page 18: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Motivation…

• Stock market

Page 19: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Motivation..

• Espionage

The rest we monitor

Page 20: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Motivation…

• Viruses and malware

Software solutions:Snort: 73.5MbClamAV: 1.48Gb

Using TCAMs:Snort: 680KbClamAV: 25Mb

Our solution (software):Snort: 51KbClamAV: 216Kb

Page 21: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Group testing for Pattern Matching

Text:

Pattern:

• Pattern matching with wildcards – O(nlogm) [CH02]

• Up to k mismatches [CEPR07,CEPR09].

• Sketching hamming distance [PL07,AGGP13].• Pattern matching in the streaming model [PP09]

n

m

Page 22: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Group testing for Pattern Matching

Text:

Pattern:

• Up to k mismatch using group testing

Group testing scheme

Performing the tests is easy.However how can we analyze the results?

Page 23: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Fast Decoding

The naïve decoding take O(nt) time.

Page 24: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Fast DecodingWe perform 3 GT schemes.

1. The original.2. First projection.3. Second projection.

Page 25: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Fast DecodingWe first decode the projections.

Then we check the d2 options naively

In [NPR11] we mange to have scheme With optimal number of measurements

and decode time O(d2log2n). (Using recursion and 2-stage GT)

If we use the scheme of 2 stage GT,We will have 4d2 candidate to check

Page 26: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Faster Decoding

According to LW theorem the number of candidate in the join is d1.5 In [NPRR12] we show how to do join in optimal time.Best paper award

This give a scheme with optimal number of measurements, which can be decode in time O(d1+Ԑpoly(logn))

Page 27: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Compressive Sensing

n

t

2

2

0

10

1

Page 28: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Compressive Sensing

n

t

1 0 1 1 0 0 0 1 1 0 100 0 1 0 1 0 1 0 1 0 110 1 0 1 0 1 1 0 0 1 011 0 1 1 0 1 0 1 0 1 001 1 0 1 1 0 0 1 0 0 100 1 0 0 1 0 1 0 1 0 11

220101

0

0

0

1

0

0

0

0

0

1

0

0

=

Page 29: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Compressive Sensing

n

t

1 0 1 1 0 0 0 1 1 0 100 0 1 0 1 0 1 0 1 0 110 1 0 1 0 1 1 0 0 1 011 0 1 1 0 1 0 1 0 1 001 1 0 1 1 0 0 1 0 0 100 1 0 0 1 0 1 0 1 0 11

13.7

0.1

0.2

0.1

5.8

0.1

0.3

0.1

0.2

0.1

7.3

0.1

0.2

=

13.9

0.7

6.4

1.08.2

Page 30: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Compressive SensingProblem definition

Find a matrix Ф and an algorithm A s.t.:

)(* yAxxyRx n

qdp xxCxx |||*|

qdkxk xxxk

||minarg )(support

In [PS12] we gave the first optimal number of measurement sublinear decoding time.For p=q=1In [GLPS09, GNPRS13] we gave a randomized solution (foreach) for p=q=2 with sublineardecoding.

Page 31: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

How Compressive Sensing help Massive Recommender Systems

• Consider designing recommender system for web pages– Time a user examines a page is an implicit rating– Millions of users– Each user examines thousands of pages throughout

the year– Hard to store and process the information

Page 32: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Fingerprint Based Approach

F1a1 C1

F2a2 C2

Fnan Cn

Similarity (ai,aj)...

Page 33: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Sampling Approach

c,l,t

a1 C1

a,c,d,f,h,l,m,n,p,r,s,t

f,m,s

a2 C2

a,b,c,f,h,l,m,n,o,p,r,s

Regular sampling doesn’t work

Page 34: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Minwise hashing approach

h

a1

a,c,d,f,h,l,m,n,p,r,s,t

h

a2

a,b,c,f,h,l,m,n,o,p,r,s

h(x) 5,3, 7,9,2,8

h(x) 5,4, 3,7,2,8

[BHP09,BPR09,BP10,FPS11,FPS12,T13]

Page 35: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Min wise hash function

A B

)(minarg)(minarg xhxh BAxBAx

Page 36: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Min wise hash function

A B

Page 37: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Similarity

A B

We get ±є approximation with probability 1-δ

Min wise independent

Page 38: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Reducing sketching space [BP10]Instead of

Additional pairwise independent hash

It was discover independently by Ping Li and Christian Konig

Page 39: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Reducing sketching space [BP10]

Our algorithm estimates

Page 40: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Reducing sketching space even farther [BP10]

We usually interesting in the case that sets are very similar.Assume J>1-t => p>1-0.5t

A B A-B

0110100101

0100101101

001000-1000

CS20-2

Page 41: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Reducing sketching space even farther [BP10]

We usually interesting in the case that sets are very similar.Assume J>1-t => p>1-0.5t

A B A xor B

0110100101

0100101101

0010001000

CS101

This give an improvement of2

2log2

tt

Page 42: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Removing the min wise independent requirement [BP11]

• [KNW10] gave bits sketch for distinct count (F0)

• Their sketch is not linear – However given S(A) and S(B) one can calculate

S(A+B) (that will give the size of the union)

1

log1

2O

Page 43: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Removing the min wise independent requirement [BP11]

BA

BABA

BA

BAJ

)(~

OJBA

BABAJ

Using F2 instead of F0 we managed to reduce the sketch size to

tt

O1

log1

log)(

12

Using more randomness we mange to remove factor t

1log

Page 44: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

File sharingThe naïve way

Supported by

Page 45: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

File sharingTorrent/Emule/Kazaa

Page 46: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

File sharingSource:

Clients:

Coupon collector O(nlogn)In practice it could be 7Gb instead 1Gb

Page 47: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Network coding

Page 48: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Network coding

1 2 i nSource:

Client 1: 3X7+2X17, 5X2+X5+4X10, ....Client 2: 2X1+3X3+X17, ....Client 3: Client 4:

In a big field, n linear combinations will sufficeWe require 1Gb upload for 1Gb file

Page 49: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

PoisonTorrent/Emule/Kaza

Page 50: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Signatures against poison

MD5

Si

.torrent file

S1S2...Sn

1 2 i n

We might receive poisoned packetBut we won't forward it

Page 51: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Signatures in network coding

MD5

Si

.torrent fileS1,S2,...Sn,S(X1+X2),S(X1+X3),.......

1 2 i n

There are exponential number of options

Page 52: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Zhao - Homomorphic signature

1 2 n

1

2

n

1 0 ... 0

0 1 ... 0

. . . .

0 0 ... 1

M=

We can find a vector u s.t. Mu=0

A correct packet v will be orthogonal to u<v,u>=0

Page 53: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Zhao - Homomorphic signature

We can find a vector u s.t. Mu=0

A correct packet v will be orthogonal to u<v,u>=0

But if Eve know u then she can find v which is orthogonal to u.

Solution:Instead of sending u to everyone send vector

Page 54: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Zhao - Homomorphic signature

Given v which is a linear combination of the files packets

It require n+m power operations.In practice it take more time then downloading

Page 55: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Selective verification [PW12]

S'i

Packeti

S''i

If we have both signatures we can choose randomly which to check

Page 56: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Problem

Eve can combine signatures

Page 57: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Solution

Use a linear error correcting code.

1

2

n

1 0 ... 0

0 1 ... 0

. . . .

0 0 ... 1

We perform Zhao signature on each block

Page 58: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Analysis

q^n – True combinations

1

2

n

1 0 ... 0

0 1 ... 0

. . . .

0 0 ... 1

=defective (for our GT)

Page 59: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Analysis

Pr[one block pass the test]<qn/qdn=q-(d-1)n

Pr[r/2 out of r pass the test]< 2rq-(d-1)r/2

dnn+m

r1 2

Page 60: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Analysis

dnn+m

r1 2

Using union bound: the probability that a bad packet exist is bounded by q(n+m)+r/log q-(d-1)nr

Pr[one block pass the test]<qn/qdn=q-(d-1)n

Pr[r/2 out of r pass the test]< 2rq-(d-1)r/2

In practice we improve Zhao signature by a factor of 60.

Page 61: Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications.

Conclusion

• Group testing/Compressive sensing is very effective tool.

• We improved both construction and achieved sublinear decoding time.

• Surprising important applications.