Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan...
-
Upload
dulcie-thomas -
Category
Documents
-
view
214 -
download
0
Transcript of Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan...
Length Reduction in Binary Transforms
Oren KapahEly PoratAmir Rothschild
Amihood AmirBar Ilan University and Johns Hopkins University
Bar Ilan UniversityU. of Minnesota
Error in Address:
Error in Content:
U. of Minnesota Bar Ilan University
Motivation: Architecture.
Assume distributed memory.
Our processor has text and requests pattern of length m.
Pattern arrives in m asynchronous packets, of the form:
<symbol, addr>
Example: <A, 3>, <B, 0>, <A, 4>, <C, 1>, <B, 2>
Pattern: BCBAA
Our Model…
Text: T[0],T[1],…,T[n]
Pattern: P[0]=<C[0],A[0]>, P[1]=< C[1],A[1]>, …, P[m]=<C[m],A[m]>;
P[i] є ∑, I[i] є {1,…,m}.
Standard pattern Matching: no error in A.Asynchronous Pattern Matching: no error in C.Eventually: error in both.
Address Register log m bits
“bad” bits What does “bad” mean?
1. bit “flips” its value.2. bit sometimes flips its value.
3. Transient error.
We will now concentrate on consistent bit flips
Example: Let ∑={a,b}
T[0] T[1] T[2] T[3] a a b b
P[0] P[1] P[2] P[3] b b a a
Naïve Algorithm
For each of the 2 = m different bit combinations try matching.
Choose match with minimum bits.
Time: O(m ).2
log m
Approximate Pattern Matching
Hamming distance:For every location, write number of mismatches
Text: A B B A B C B A A B C B A B B C
Pattern: A B C B A
Approximate Pattern Matching
Hamming distance:For every location, write number of mismatches
Text: A B B A B C B A A B C B A B B C
Pattern: A B C B A
3
Approximate Pattern Matching
Hamming distance:For every location, write number of mismatches
Text: A B B A B C B A A B C B A B B C
Pattern: A B C B A
3
Approximate Pattern Matching
Hamming distance:For every location, write number of mismatches
Text: A B B A B C B A A B C B A B B C
Pattern: A B C B A
5
Approximate Pattern Matching
Hamming distance:For every location, write number of mismatches
Text: A B B A B C B A A B C B A B B C
Pattern: A B C B A
0
Approximate Pattern Matching
Naïve Algorithm Time: O(nm)
Hamming distance:For every location, write number of mismatches
Text: A B B A B C B A A B C B A B B C
Pattern: A B C B A
4
In Pattern MatchingPolynomial Multiplication:
210
2423222120
1413121110
0403020100
012
43210
rrr
bababababa
bababababa
bababababa
bbb
aaaaab0 b1 b2 b0 b1 b2b0 b1 b2
Naïve Time: O(nm)
What do the Two Examples have in Common?
What Really Happened?
0 0 0 T[0] T[1] T[2] T[3] 0 0 0
C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3]
Dot products array:
P[0] P[1] P[2] P[3]
What Really Happened?
0 0 0 T[0] T[1] T[2] T[3] 0 0 0
C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3]
P[0] P[1] P[2] P[3]
What Really Happened?
0 0 0 T[0] T[1] T[2] T[3] 0 0 0
C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3]
P[0] P[1] P[2] P[3]
What Really Happened?
0 0 0 T[0] T[1] T[2] T[3] 0 0 0
C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3]
P[0] P[1] P[2] P[3]
What Really Happened?
0 0 0 T[0] T[1] T[2] T[3] 0 0 0
C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3]
P[0] P[1] P[2] P[3]
What Really Happened?
0 0 0 T[0] T[1] T[2] T[3] 0 0 0
C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3]
P[0] P[1] P[2] P[3]
What Really Happened?
0 0 0 T[0] T[1] T[2] T[3] 0 0 0
C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3]
P[0] P[1] P[2] P[3]
Another way of defining the transform:
mmjjiPiTjPTC
m
i
,...,;][][])[,(0
Where we define: P[x]=0
for x<0 and x>m.
FFT solution to the “shift” convolution:
VXF m )(
BA
1 .Compute in time O(m log m)( values of X at roots of unity.)
2 .For polynomial multiplication compute values of product polynomial at roots
of unity in time O(m log m).
3 .Compute the coefficient of the product polynomial, again in time O(m log m).
VBFAF mm )()(
)()( 1 VF m
A General Convolution C
},...,0{},...,0{: mmf j
)(,...,1;)]([][])[,(0
mOjifPiTjPTCm
ijf
f
Bijections ; j=1,….,O(m)jf
Consistent bit flip as a Convolution
Construct a mask of length log m that has 0 in every bit except for the bad bits where it has a 1.
Example: Assume the bad bits are in indices i,j,k є{0,…,log m}. Then the mask is i j k 000001000100001000
An exclusive OR between the mask and a pattern index Gives the target index.
Our Case:
PT Denote our convolution by:
Our convolution: For each of the 2 =m masks, let jє{0,1}
log m
log m
m
i
ijPiTjPT0
][][][
To compute min bit flip:
][],...,0[ mjPjP Let T,P be over alphabet {0,1}:For each j, is a permutation of P.
Thus, only the j ’s for which
= number of 1 ‘s in T are valid flips.
Since for them all 1’s match 1’s and all 0’s match 0’s.
Choose valid j with minimum number of 1’s.
][ jPT
Time
All convolutions can be computed in time O(m )After preprocessing the permutation functions as tables.
Can we do better? (As in the FFT, for example)
2
Idea – Divide and Conquer-Walsh Transform
PTPT ,
PT
1. Split T and P to the length m/2 arrays:
2. Compute
3. Use their values to compute in time O(m) .
Time: Recurrence: t(m)=2t(m/2)+m Closed Form: t(m)=O(m log m)
PPTT
,,,
Sparse Transform
Applications where most of the input is 0.
The locations where there are “1”s are given as inputs
We are only interested in the transform resultsfor the locations where all pattern “1”s matchtext “1”s.
Motivation – Point Set Matching
1-D Point Set Matching:T: (t1,t2,…,tn)
P: (p1,p2,…,pm)
2-D Point Set Matching – Searching in Music:
Notations:
Length of text: NLength of Pattern: M
Number of “1”s in text: nNumber of “1”s in pattern: m.
Length Reduction in DFTGoal: Given two vectors V1&V2, obtain two vectors
V’1&V’2 of size O(n’) such that all non-zero in V1 and in V2 will appear as singletons respectively while maintaining the distance property.
The Distance Property: If V’2[h(0)] is aligned with V’1[h(i )], then V’2[h(j)] is aligned with V’1[h(fi (j))] = V’1[f(i +j)] .
Using the reduced size vectors, matching can be done in time O(n’ log n’) using the FFT algorithm.
Example: Length Reduction
The vectors are given as sets of pairs: (index, value)
V1: (0, 5), (6, 2), (13, 3), (19, 1)
V2: (0, 2), (7, 3)
Length Reduction Hash Function: mod(5)V’1:
V’2:
52031
20300
The Randomized Algorithmof Cole & Hariharan [STOC 02]
Idea: Find a set of log(n) short vectors, in which with high probability, each non-zero in V, appears as a singleton in at least one of the vectors.
Hash functions: (ax mod(q))mod(s). Where q is a large prime number, and s is O(n).
If s is c·n, then the probability of a non-zero appearing as a multiple is constant.
Using log(n) different hash functions will reduce the failure probability exponentially.
Problem
For the Walsh Transform, the mod function is useless.
The distance property has to do with exclusive or, not addition!
IDEA
Instead of the modulo function
Do an exclusive or( ) of the index bits
with a random bit string.
Location 000address000001010011100101110111Text01000010address000001010011100101110111Pattern10000001Dot Product
0
Location 001-XORaddress000001010011100101110111Text01000010address001000011010101100111110Pattern10000001Dot Product
0
Location 001-dot productaddress000001010011100101110111Text01000010address001000011010101100111110Pattern10000001Dot Product
02
Location 010-XORaddress000001010011100101110111Text01000010address010011000001110111100101Pattern10000001Dot Product
02
Location 010-dot productaddress000001010011100101110111Text01000010address010011000001110111100101Pattern10000001Dot Product
020
Location 011-XORaddress000001010011100101110111Text01000010address011010001000111110101100Pattern10000001Dot Product
020
Location 011-dot productaddress000001010011100101110111Text01000010address011010001000111110101100Pattern10000001Dot Product
0200
Location 100-XORaddress000001010011100101110111Text01000010address100101110111000001010011Pattern10000001Dot Product
0200
Location 100-dot productaddress000001010011100101110111Text01000010address100101110111000001010011Pattern10000001Dot Product
02000
Location 101-XORaddress000001010011100101110111Text01000010address101100111110001000011010Pattern10000001Dot Product
02000
Location 101-dot productaddress000001010011100101110111Text01000010address101100111110001000011010Pattern10000001Dot Product
020000
Location 110-XORaddress000001010011100101110111Text01000010address110111100101010011000001Pattern10000001Dot Product
020000
Location 110-dot productaddress000001010011100101110111Text01000010address110111100101010011000001Pattern10000001Dot Product
0200002
Location 111-XORaddress000001010011100101110111Text01000010address111110101100011010001000Pattern10000001Dot Product
0200002
Location 111-dot productaddress000001010011100101110111Text01000010address111110101100011010001000Pattern10000001Dot Product
02000020
The Length Reduction
Reduce the length by half. Choose a mask of log n - 1 bits at random, add to it a MSB 1, and XOR it with each index in thesecond half.
This will randomly hash all 1’s in the second half to the first half.
Length Reduction - Example
address000001010011100101110111Text01000010Pattern10000001
Let mask be 101
address000001010011001000011010Text01000010Pattern10000001
Reduced Strings - Exampleaddress000001010011100101110111Text01000010Pattern10000001
mask is 101
address000001010011Text0101Pattern1010
Reduced Strings - Example
Walsh Transform of reduced string:
address000001010011Text0101Pattern1010Walsh transform0
Reduced Strings - Example
Walsh Transform of reduced string:
address000001010011Text0101Pattern1010Walsh transform02
Reduced Strings - Example
Walsh Transform of reduced string:
address000001010011Text0101Pattern1010Walsh transform020
Reduced Strings - Example
Walsh Transform of reduced string:
address000001010011Text0101Pattern1010Walsh transform0202
Questions: 1. Does the distance property hold? 2. Which of these results is “legal”? 3. Where should it be mapped?
Answers: Distance property
The Distance Property: If T[h(0)] is aligned with P[h(i )], then T[h(j)] is aligned with P[h(fi (j))] = P[f(i j)] .
Holds because both h and f are XOR functions andbecause of the commutativity and associativity of XOR.
Answers: which are “legal”?
address000001010011100101110111Text01000010Pattern10000001
Let mask be 101
address000001010011001000011010Text01000010Pattern10000001
Answers: which are “legal”?
address000001010011100101110111Text01000010Pattern10000001
mask is 101
address000001010011Text0101Pattern1010
Original tenants
Johnny-come-latelies
Answers: which are “legal”?
address000001010011100101110111Text01000010Pattern10000001
mask is 101
address000001010011Text0s0mPatterns0m0
Original tenants
Johnny-come-latelies
Answers: which are “legal”?
address000001010011100101110111Text01000010Pattern10000001
mask is 101
address000001010011Text0s0mPatterns0m0
Observation:Legal multiplications are: all s in text by s in pattern and m in text by m in pattern orall s in text by m in pattern and m in text by s in pattern.
Answers: which are “legal”?
Observation:Legal multiplications are: 1. all s in text by s in pattern and m in text by m in pattern or2. all s in text by m in pattern and m in text by s in pattern.
This can be checked by a constant number of binary DWT’s with an added benefit:
1. means result stays. 2. means result is
moved to its address XOR with the mask.
Reduced Strings - Example
Walsh Transform of reduced string:
address000001010011Text0101Pattern1010Walsh transform02
Result in correct address
Reduced Strings - Example
Walsh Transform of reduced string:
address000001010011Text0101Pattern1010Walsh transform0202
Result belongs in address 011 101 = 110
Reminder – the dot product
address000001010011100101110111Text01000010address111110101100011010001000Pattern10000001Dot Product
02000020
Analysis:
We could continue this process recursively and analyze probability of clash of masked element with an element that is already there but…
More elegant solution:
Polynomials over a finite field
Consider indices as elements in F2L .
In F2L: x+y = x y.
Length Reduction: Every index in F2L
is written as a polynomial in F2ℓ[X] of degree d = L/ℓ - 1.
Length reduction example:
Index = 17In binary = 10001Take ℓ = 2 10001The polynomial: 1·X2 + 00·X + 01·1= X2+1Choose a value for X from F2ℓ at random and evaluate the polynomial.
Length reduction example:
Recall that in F2ℓ:
addition is exclusive or
multiplication is polynomial multiplication modulo some irreducible polynomial.
So, evaluating a polynomial at X gives a number with ℓ bits.
Probability of Collision:
Probability of collision of index i and j= Probability that the chosen value of X is the root of the difference polynomial Pi(x)-Pj(x)
Where Pi, Pj are the polynomials of index i and j, resp.
Degree of difference polynomial = dSo probability= d/2ℓ .
Moral of the Story:
Polynomials are a good candidate for locality preserving length reductions for discrete transforms.