Embedding and Similarity Search for Point Sets under Translation
-
Upload
kibo-morrison -
Category
Documents
-
view
34 -
download
2
description
Transcript of Embedding and Similarity Search for Point Sets under Translation
![Page 1: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/1.jpg)
1
Embedding and Similarity Search for Embedding and Similarity Search for Point Sets under TranslationPoint Sets under Translation
Minkyoung Cho and David M. Mount University of Maryland
SoCG 2008
![Page 2: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/2.jpg)
2
Point Pattern Point Pattern MatchingMatching
Point Pattern Matching Given two point sets P, Q, find Q’ Q to minimize Dist(P, Q’) = min dist(tP, Q’) where t is a geometric
transformation. (e.g., translation, rotation, …)
P
Q
![Page 3: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/3.jpg)
3
Point Pattern Point Pattern Similarity SearchSimilarity Search
Point Pattern Similarity Search
A collection of point sets S=S={P1,P2,…,PN}
has been preprocessed. Given a
query set Q, find (approximate)
nearest Pi with respect to a
distance function and transformation group.
Q
…
…
…
…
S = {P1, P2, …, PN}
![Page 4: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/4.jpg)
4
ResultsResults
Transformation
Space Index Note
Geometric Hashing[Wolfson & Rigoutsos 97]
TranslationRotationAffine …
O(Nnk+1)(k: frame size)
YES Space complexity
EMDM into Euclidean space [Indyk & Thaper03]
None O(Nn) YES Embedding EMD to L1
EMD under transformation sets [Cohen & Guibas99]
ScalingTranslation
O(Nn) NO Brute-force, Heuristic
Ours Translation O(Nn log2n )
YES EmbeddingSD to L1
EMD: Earth Mover’s DistanceSD: Symmetric Difference Distance
![Page 5: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/5.jpg)
5
Problem DefinitionProblem Definition
Point Pattern Similarity Searching::
• Distance Measure: Symmetric Difference
Distance
• Error Model: Outliers (but No Noise)
• Transformation: Translation • Restriction: Coordinates are integers
P\QQ\PQΔP
P = {p1,p2,p3,p4}Q = {p1,p2,p5,p6}
}6p,5p{}4p,3p{QΔP 4QΔP
{0,12,14,23,35,54,59,64}P =
{0,12,14,23,35,54,59,64}{ 12,14,23,35,54, 64}{12,14,17,23,35,54,62,64}t=3
{15,17,20,26,38,57,65,67}Q =
QP… ……… ……… …
![Page 6: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/6.jpg)
6
Motivation: Sources of ComplexityMotivation: Sources of Complexity
• Combination of Translation + Outliers
• Translation Only - translate the point set by aligning leftmost point to the
origin - trivial matching
• Outliers Only - Reduce to Nearest neighbor search in Hamming cube (By hashing or random sampling)
![Page 7: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/7.jpg)
7
IntuitionIntuition
P1
Q
P2
P3
P4
PN
f
ff
f
f
f
Metric space
![Page 8: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/8.jpg)
8
Embedding: Basic DefinitionsEmbedding: Basic Definitions
Given metric spaces (X, d) and (X', d'), a map f: X X’ is called an embedding.
The contraction of f is the maximum factor by which distancesare shrunk, i.e.,
The expansion or stretch of f is the maximum factor bywhich distances are stretched:
The distortion of f is the product of the contraction and expansion.
))Y(f),x(f('d)y,x(d
maxXy,x
)y,x(d))Y(f),x(f('d
maxXy,x
![Page 9: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/9.jpg)
9
Main Result: PreliminariesMain Result: Preliminaries
• Main result: There exists an randomized embedding that maps a point set under symmetric difference with respect to translation into a metric space L1 with distortion O(log2 n).
• Assumption: – Each point set has at most n elements and is in dimension d.– Coordinates are integers of magnitude polynomial in n
• Distance Function: Symmetric Difference with respect to translation
<PΔQ> = min |(P + t)ΔQ|
• Target Metric: L1
t
d
1iii1
d yxyx ,Ry,x
![Page 10: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/10.jpg)
10
Outline of AlgorithmOutline of Algorithm
1. Transform d-dimension points into 1-d dimension points.
(Distortion: 1)
2. Reduce the domain size using a linear hash function.
(Distortion: O(1))
3. Make invariant under translation.
(Distortion: O(log2n))
4. Reduce the target domain size using a universal hash function.
(Distortion: O(1))
{3,6,10,14,22}
1 0000 0 01 0 1 1
{101010, ..., 010100, …, 11101}
3 00 0 02 0 1
O(nlogn)
![Page 11: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/11.jpg)
11
Translation InvariantTranslation Invariant
1 0000 0 01 0 1 1 1 0 0 01 0
{ 1101, 0001,0000, 0010,1100, 1010}…ρ = 4
P =
s
![Page 12: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/12.jpg)
12
Intuition Intuition
1 0000 0 01 0 1 0
1 1000 0 10 0 1 0
hP
hQ
Φ2Q={10,00,01,00,11,00,10,01,00,11,00}
Φ2P={10,01,00,10,01,00,10,00,00,01,00}
Φ4P={1101,0000,0010,1100,0000,0001,1000,0010,0101,0000,0010}
Φ4Q={1011,0100,0010,0101,1000,0011,1100,0010,0100,1001,0000}
s
s
If one of probes hits mismatched positions, then the bit patterns generated may differ.
The probability that one of probes hits mismatched positions increases when the probe size increases.
![Page 13: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/13.jpg)
13
Relationship between Relationship between ρρ (probe size) and (probe size) and δδ**
*δ)s(lnO/*δ
s2
1
s
n2δ
δs
ρ2
Unknown
QΔΦPΦ ρρ
δ: estimated distanceδ*: original distance
Upper bound
Expectation
>2s-2
???
s/2i
increasesρ
Distanceof Invariants
![Page 14: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/14.jpg)
14
EmbeddingEmbedding
)s(logOδ*
)s(logO*δ
122s
QΦPΦ2QΨPΨ E 1L
L
0i
in2log
0i
1iii1
δ
???s2
QΔΦPΦ ρρ*δ
1
.5
20 21 22 … 2L 2H 2log 2n=2n… … …
*
n2log
0i
*Hn2log
Hi
*1H
0i
i1
δ)n(logOδ2δ2QΨPΨ
Distanceof Invariants
δ: estimated distanceδ*: original distance
![Page 15: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/15.jpg)
15
Build TimeBuild Time
The expensive operations are of building invariant and hashing for large domain.
Building invariant : (# of Probes) * (# of Translations) Trivial: O(s) * s = O(n log n) * O(n log n) = O(n2 log2 n)
Universal hash function: (# of Elements) * (Matrix operation) = (# of Elements) * (Input Size) * (Output Size)Trivial: O(s) * O(s) * O(log s) = O(s2 log s) = O( n2 log3 n )
We can improve it to O( n log3 n ) if we merge two operations. Surprise!!!
![Page 16: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/16.jpg)
16
Merge Two OperationsMerge Two Operations
1 1000 0 10 0 1 0P=s
1 0 1 0 1
1 0 1 0 1r0
H
…
y0y1y2 ys-1
…
f
)P),fr((Conv 0
…
Convolution can be computed in O(n log n) where n is the size of array
rlog s
![Page 17: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/17.jpg)
17
Main Result: Formal StatementMain Result: Formal Statement
Given failure probability β, there exists a randomized embedding from a point set P into a vector ΨP of dimension O(n (log2n) log(1/β)) such that for any P, Q
This embedding can be computed in time O(n (log4n) log(1/β))
QΔP nlog2QΨPΨ )i(
- β1 at least .with prob QΔP nlog17
1QΨPΨ )ii(
![Page 18: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/18.jpg)
18
Open ProblemsOpen Problems
• Q1. Can we improve the distortion bound? currently O(log2
n) Cormode & Muthukrishnan show how to embed a
string under edit distance with moves into L1 with O(log n log* n) distortion.
• Q2. Can we derandomize the algorithm? Cormode & Muthukrishnan’s algorithm is deterministic. • Q3. Can we improve space/time complexities?
![Page 19: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/19.jpg)
19
Other ExtensionsOther Extensions
• Q1. Can we support a distance measure (e.g., Hausdorff distance that is robust to noisy data)?
• Q2. Can we handle other transformation groups?
- integer scaling? - integer scaling +
translation? - affine transformations
over finite vector spaces?
Point Pattern Similarity Searching::
• Distance Measure: Symmetric Difference
Distance
• Error Model: Outliers (but No Noise)
• Transformation: Translation • Restriction: Coordinates are integral
![Page 20: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/20.jpg)
20
Thank You!
![Page 21: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/21.jpg)
21
Translation InvariantTranslation Invariant
1 0000 0 01 0 1 1
P = {3,6,10,14,22}
h(x) = x mod s (e.g. s = 11)
1 0 0 01 0
{ 1101, 0001,0000, 0010,1100, 1010}…
h’(x) : (for simplicity, x mod 10)
2 0001 2 01 0 0
ΦρP = {13,0,2,12,1,…,10}
ρ = 4
ΦρP =
hP =
s
0 0000 0 00 0 0
0 1 2 3 4 5 6 7 8 9
![Page 22: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/22.jpg)
22
Trial 1: Geometric Hashing for TranslationTrial 1: Geometric Hashing for Translation
• Naïve Version: - Space complexity is O( N n2 ) since the frame size is 1.
- With outliers in a query: # of queries will increase
• Adaptive Version: To reduce space complexity, if store only c transformed sets,
then # of queries will increase.
• Outliers may lead a false matching, thus they will increase the prob. of the false positive.
![Page 23: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/23.jpg)
23
Geometric Hashing with Outliers (delete)Geometric Hashing with Outliers (delete)
Based on the outliers $r$ and the frame size $k$, the number of queries will increase to get a correct result.
method 1. Pr[ choose a valid frame set] = ( 1 – r/n )^k method 2. (r + 1) different trials ( deterministic) method 3. pigeonhole theorem. Pr[ choose a valid frame set] = 1-r/(n/k)
[Grimson&Huttenlocher 90] : Outliers lead a false matching and increase the prob. of the false positive.
![Page 24: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/24.jpg)
24
d-Dimension d-Dimension 1-Dimension 1-Dimension
Let u be the maximum coordinate value of each point. Then, we can map a d-dimensional point set to a 1-dimensional point set with coordinates of size at most (3u)d. without changing the symmetric difference distance under translation.
0 1 0 10
0 0 1 00
0 1 0 00
0 1 0 10 … 0 0 1 00 … 0 1 0 00 …
(1,1)
(5,3)
1 35[6,15] [21,30]
![Page 25: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/25.jpg)
25
# of Primes & Collision Prob.# of Primes & Collision Prob.
•Collision Probability h(x) = x mod s where s is a prime number in Θ (n log n) ( where s is chosen uniformly at random )
For x != y Pr[h(x) = h(y)] = Pr[(x mod s) = (y mod s)] = Pr[(x-y) mod s = 0] Since x, y Є Znc, |x – y| < nc.
Pr[h(x) = h(y)] < c/(# of primes) = 1/O(n)
• Prime Number Theorem There exist O(m/log m) prime numbers in range between 1 and m.
![Page 26: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/26.jpg)
26
Distance Distortion by HashingDistance Distortion by Hashing
We can achieve o(1) distortion with the hash function which the probability of collision is 1/O(n).
Note that the distance is always contracted due to collision.
![Page 27: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/27.jpg)
27
Linear Hash Function (X)Linear Hash Function (X)
• h(x) = x mod s where s is a prime number in Θ(n log n)
• Linearity h( x + t ) = h(x) + h(t) - translation ΦρP = Φρ(P+t)
P = {3,6,10,14,22}
1 0000 0 01 0 1 1S
![Page 28: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/28.jpg)
28
Distance Distortion by Hashing (X)Distance Distortion by Hashing (X)
We can achieve o(1) distortion with the hash function which the probability of collision is 1/O(n).
Note that the distance is always contracted due to collision.
![Page 29: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/29.jpg)
29
Universal Hash Function for large domainUniversal Hash Function for large domain
Since the maximum probe size is O(n log n), the input domain of hash function is O(2O(n log n)). However, it has only θ(n log n) elements.
• H: 2s 2k
H(x) = R x + b (mod (2,2,…,2)) R: a random k x s matrix b: k bits random row vector.
• Time Complexity: For compute a value : O( k s ) = O( (log n) n log n ) = O( n log2 n ) For, all s (= O(n log n) ) , the time is O( n2 log3 n ).
![Page 30: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/30.jpg)
30
Relationship between Relationship between ρρ and and δδ**
*δ)s(lnO
δ*
s2
1
s
n2δ
δs
ρ2
Unknown
QΔΦPΦ ρρ
δ is a guess distanceδ* is an optimal distance
Upper bound
Expectation
>2s-2
???
s/2i
![Page 31: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/31.jpg)
31
Effect of Hash FunctionsEffect of Hash Functions
*δ)s(logO
δ*
s2
1
s
n2δ
δs
ρ2
???
QΔΦPΦ ρρ
h
h’
![Page 32: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/32.jpg)
32
Merge Two Operations using FFT & Merge Two Operations using FFT & ConvolutionConvolution
П = random_probe( ρ, s ) For t = 1, …., s, x(t) = (hP + t)[П] // make an invariant
For t = 1, …, s. x’(t) = H x(t) + b ( mod (2,2,2,…,2) ) // H: O(log s) x ρ
matrix
ΦρP[x’(t)]++Time Complexity: O(s) * O(matrix multi) = O( s ) * O(s log s)------------------------------------------------------------------------ H = [r1, r2, …, rO(log s)]’ // ri : a binary row bit vector
Hx(t) = [ r1 x(t), r2 x(t), r3 x(t), …, rO(logs) x(t)]’
ri x(t) = ri (hP + t)[П] = (hP + t)[П ri]
[ri x(0), ri x(1), …, ri x(s)] = fliplr(hP) [П ri]
Time Complexity: O(log s) * O(convolution) = O( log s ) * O(s log s)
![Page 33: Embedding and Similarity Search for Point Sets under Translation](https://reader030.fdocuments.in/reader030/viewer/2022032709/568130f5550346895d971bd2/html5/thumbnails/33.jpg)
33
Build TimeBuild Time
Trivial running time Ours
d-dimension -> 1-dimension
O(dn) O(dn)
Linear Hashing O(n) O(n)
Invariant under Translation
O(n^2 log^2 n)
O( n log^3 n)Universal Hashing(due to the domain size, we need to use matrix multiplication )
O(n^2 log^4 n)