15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and...

51
15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union- Find

description

Chameleon Island

Transcript of 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and...

Page 1: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

15-211Fundamental Data Structures and Algorithms

Klaus SutnerApril 20, 2004

Equivalence and Union-Find

Page 2: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Announcements

HW7 make sure to get your games in ...

Quiz 3: Thursday April 22

Final Exam on Tuesday May 4, 5:30 pmreview session April 29

Page 3: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Chameleon Island

Page 4: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Chameleon IslandOn a tropical island there are three kinds of chameleons perambulating themselves: red, green and blue. If a red and green chameleon meet, they both change color to blue, likewise for red/blue and green/blue.

Initially there are 12 red, 13 green and 14 blue chameleons.

Can the chameleons turn into a homogeneous population?

Page 5: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Brute ForceWe can compute this to death: use a digraph with nodes (r,g,b) and edges

(r,g,b) (r-1,g-1,b+2) (r,g,b) (r-1,g+2,b-1) (r,g,b) (r+2,g-1,b-1)

provided that the numbers are non-negative.

How many nodes are there?

The starting configuration is (12,13,14) so the total number of animals is n = 39.

Page 6: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

ReachabilityThe number of nodes is C(39+2,2) = 820.

We can simply use DFS or BFS to compute the nodes reachable from (12,13,14) and check if we run into one of (39,0,0), (0,39,0), (0,0,39).

It turns out, we don't. OK but rather crude.

Is there a more elegant solution?

How about invariants?

Page 7: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

InvariantsIf we suspect that some configuration cannot occur, we can try to prove this by finding some property P such that:

- P holds on the initial configuration,

- P is preserved in every single transition of the system,

- P does not hold on the specific target configuration.

Your favorite method: Induction.

Page 8: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Information HidingFor the chameleons, the key observation is that modulo 3 the three types of edges are all the same:

(r,g,b) (r+2,g+2,b+2) mod 3

Note that this quotient operation preserves paths, so it suffices to observe

(0,1,2) (2,0,1) (1,2,0) (0,1,2)

Of course, we lose a lot of information but this is enough to answer the original question.

Page 9: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Equivalence Relations

There is an important idea hiding here: identify objects that are distinct but share some property.

Modeled by a binary relation ~ on some carrier set A.

reflexive x ~ xsymmetric x ~ y y ~ xtransitive x ~ y y ~ z x ~ z

Page 10: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Examples

congruence modulo m

polygons of same area

people of same age

reachable in a undirected graph

programs with same input/output behavior

Page 11: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Classes and QuotientsEquivalence class of x: [x] = { y | y ~ x }

Quotient: A/~ = { [x] | x in A }

Index of ~: cardinality of A/~

Note that equivalence classes form a partition of A. In fact, partitions and equivalence relations are essentially the same.

Page 12: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Kernel RelationsGiven any function f : A B we can form a relation K(f) by defining

x K(f) y iff f(x) = f(y).

Note that K(f) is always an equivalence relation.

If R = K(f) we say that f is a (kernel) representation for R.

(As opposed to list of pairs, adjacency matrix, adjacency matrix, … ).

Page 13: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Everybody is a KernelClaim: All equivalence relations are of this form. In fact we can choose a function f : A A.

This is intuitively clear: we map all x in an equivalence class to some special member of that class.

(Take a course in set theory if you want to know why there are problems with this.)

Page 14: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Computational AspectsThe last observation allows us to represent an equivalence relation on [n] = {1,2,...,n} compactly:

Instead of n2 bits for a Boolean matrix representation we only need n integers for an array representing f.

We can still check if two elements are equivalent in O(1) time.

What is a good choice for the function f?

Page 15: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

The Canonical Representation

f(x) = min( z | z ~ x )

For example, if x ~ y iff x = y mod 3 on [10] we get

x 1 2 3 4 5 6 7 8 9 10f(x) 1 2 3 1 2 3 1 2 3 1

Page 16: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Index

x 1 2 3 4 5 6 7 8 9 10f(x) 1 1 3 1 5 1 1 5 3 1

Question: How does one compute the index of R from a kernel representation for R?

Page 17: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

RefinementSuppose we have two equivalence relations R and S on [n] both given by their canonical kernel function. How do we compute their intersection

x int(R,S) y iff x R y and x S y

In other words, we want to compute the canonical representation for T = int(R,S).

Example1 2 3 4 5 6 7 8

R 1 1 3 1 5 3 3 1S 1 1 3 3 5 5 5 1T 1 1 3 4 5 6 6 1

Page 18: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Codeinitialize H hashmap;

for x = 1,...,n do if( (R[x],S[x]) is undefined ) then

T[x] = H( (R[x],S[x]) ) = x;else

T[x] = H( (R[x],S[x]) )

Expected linear time.

Could also replace H by a n n array (interesting if the initialization cost can be amortized).

Page 19: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Small Machines

Page 20: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Recall: Finite State Machines

Recall that a finite state machine is essentially a lookup table with one entry for each symbol/state combination, plus an initial state and some final states.

Page 21: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

An ExperimentThink of the finite state machine as a black box. Suppose you can perform the following experiment as often as you wish:

- reset the machine to some state p,- feed some string to the machine, and - observe whether the resulting state is final.

Of course, you are not allowed to open up the machine.

Which states could be distinguished from each other by this experiment?

Page 22: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

A Black BoxCall p and q (behaviorally) equivalent if they cannot be distinguished.

Claim: 1. We can distinguish final from non-final states.

2. If we can distinguish p and q and d(p',a) = p and d(q',a) = q then we can also distinguish p' and q'.

Page 23: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Who Cares?If two states are equivalent, we may as well collapse them into a single state.

More precisely, we can replace the state set Q by Q/~.

The latter may be much smaller, so we can build potentially smaller machines.

Fact: One can show that the smallest possible finite state machine (for a given language) can be obtained this way.

Page 24: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Example

a

b b

a,b a,b

a,b

a,b

a

b

a,b

a,b

a

Page 25: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Computing Behavioral Equiv.How do we actually compute the behavioral equivalence relation ~?

Refine partitions.

Initially only distinguish between F and Q – F.

Then refine the partition as follows: Suppose we have an equivalence relation E. Define E' by

p E' q iff p E q and for all symbols s: d(p,s) E d(q,s).

Page 26: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Computing Behavioral Equiv.But that's just a intersection operation:

Define p Es q iff d(p,s) E d(q,s).

Then E' = int( E, Ea, Eb, ... ).

When E' = E for the first time we have E = ~.

Can be computed in O( k n2 ) steps where n is the number of states and k the number of input symbols.

Page 27: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Example

1 a

b b

a,b a,b

a,b

a,b

a

2

3 4

5 6

1 2 3 4 5 6

init 1 1 3 3 1 1

a 1 1 1 1 1 1

b 3 3 1 1 1 1

1 1 3 3 5 5

a 1 1 5 5 5 5

b 3 3 5 5 5 5

1 1 3 3 5 5

Page 28: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Dynamic Equivalence Relations

Page 29: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Recall: Mazes Think about a grid of rooms

separated by walls. Each room can be given a name.

a b c dhgfe

i j k lponm

Randomly knock out walls until we get a good maze.

Page 30: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

The Party ProblemYou arrive at a party. As usual, there are separate groups of people standing around. In each group people talk to each other, but they don't talk to anyone outside of the group.

You scan the groups, find someone that you know and join the corresponding group. If someone in another group knows you too, the two groups merge.

How do we figure out the groups given a list of “is-friend-of” relations. The list is revealed step by step, we don't have access to the whole list from the start.

Page 31: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Dynamic E-RelationsSo far we have only dealt with static equivalence relations: the whole relation is given from the start and we can represent it by the canonical kernel function.

Often that is not the case: all we have is knowledge about some equivalent pairs (x,y) of elements. The corresponding equivalence relation is thus given implicitely.

This is really a closure problem: we have some (arbitrary) relation R and we want to compute the least equivalence relation eqc(R) that contains R.

Page 32: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Say What? R is arbitrary.

We want S such that

- x R y implies x S y- S is reflexive, symmetric and transitive- S is the coarsest such relation.

Thus x S y only if this is forced by R and the equivalence condition. We do not frivolously identify elements.

Page 33: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

TransitivityMaking S reflexive and symmetric is no problem: we can just make R reflexive and symmetric. The difficult part is transitivity:

Whenever there is a chain

x1 R x2 R x3 ... xn-1 R xn

we need to set x1 S xn .

Page 34: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Static is Easy

If R is static this an old problem:

Think of R as a graph and use DFS/BFS or Warshall.

But what to do when the pairs in R pop up one after the other?

Page 35: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Kernel SchmernelSuppose we have the canonical kernel representation f for S.

If we get another pair (x,y), how can we update S?

If already f(x) = f(y) we're OK.

But otherwise we have to scan the whole array to update the the entries affected by setting x equivalent to y. Takes time linear in n.

Problem: Our representation is too uptight.

Page 36: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Fixed PointsWe need to relax the conditions on f a little.

But how?

Let FP(f,x) be the element z such that

f(z) = z f k(x) = z for some k.

Needless to say, fixed points do not exist in general, but we will make sure that f is constructed properly so that there is no problem.

Page 37: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

FP versus EQLet's say that f represents relation R if

x R y iff FP(f,x) = FP(f,y).

Clearly R has to be an equivalence relation.

Note that the canonical kernel function would work here. But the whole point is that many other functions also work. And that makes it much easier to update.

Also note: a query “x R y?” is no longer O(1) but a priori only O(n).

Page 38: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Testing EquivalenceTo test whether x is equivalent to y we do

x' = FP(f,x);y' = FP(f,y);return ( x' == y' );

Running time is clearly O(n).

But if we use a “good” f it can be close to O(1).

Page 39: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Updating EquivalenceSuppose we are told that x is equivalent to y. To update, do the following:

x' = FP(f,x);y' = FP(f,y);if( x' != y' ) then

f[x'] = y' or f[y'] = x';

Picking the right alternative will be important for running time.

Page 40: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Union-FindIn the world of programming the key operations are called

- find(x) return the fixed point- union(x,y) union the classes of x and y

So far, this is clever but not too exciting: both operations may be linear in n.

We need to be more careful about how to perform the union operation. Note that our definition of representation gives us a lot of leeway.

Page 41: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Example

{1} {2} {3} {4} {5} {6} {7}

{1} {2,3} {4} {5} {6} {7}

{1} {2,3,4} {5} {6} {7}

{1} {2,3,4} {5,6} {7}

{1} {2,3,4,5,6} {7}

union(2,3)union(3,4)union(5,6)union(6,3)

{1} {2,3,4,5,6} {7}

union(2,6)

Page 42: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Think TreeIt is helpful to think of the representing function f as a rooted tree.

1 3

0

4

2

5

1

3

0

42

5

Page 43: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Keeping the Trees ShallowIf we think of f as a collection of rooted trees it is natural to try to keep the depth of these trees small.

Several plausible strategies:

Union by depth: attach more shallow tree to deeper one.

Union by size: attach smaller tree to larger one.

Page 44: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

A Trick: Path CompressionSince we have to traverse a path from a node to the root we might as well smash all the nodes on that path up to the root.

E.g., find(0) would produce:

13

0

42

5 10

42

53

Page 45: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

How Hard to Implement?One might wonder how hard it is to code all these tricks (without union by size/depth and path compressions the code is nearly trivial).

Also, what is the actual payoff in the end?

As it turns out, the code is really simple, and the payoff is tremendous.

Page 46: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

The Code

Page 47: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

All the codeclass UnionFind { int[] u;

UnionFind(int n) { u = new int[n]; for (int i = 0; i < n; i++) u[i] = -1; }

int find(int i) { int j,root; for (j = i; u[j] >= 0; j = u[j]) ; root = j; while (u[i] >= 0) { j = u[i]; u[i] = root; i = j; } return root; }

void union(int i,int j) { i = find(i); j = find(j); if (i !=j) { if (u[i] < u[j]) { u[i] += u[j]; u[j] = i; } else { u[j] += u[i]; u[i] = j; } } }}

Page 48: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

The UnionFind classclass UnionFind { int[] u;

UnionFind(int n) { u = new int[n]; for (int i = 0; i < n; i++) u[i] = -1; }

int find(int i) { ... }

void union(int i,int j) { ... }}

Page 49: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Iterative find int find(int i) { int j, root;

for (j = i; u[j] >= 0; j = u[j]); root = j;

while (u[i] >= 0) { j = u[i]; u[i] = root; i = j; }

return root; }

Page 50: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

union by size void union(int i,int j) { i = find(i); j = find(j);

if (i != j) { if (u[i] < u[j]) { u[i] += u[j]; u[j] = i; } else { u[j] += u[i]; u[i] = j; } } }

Page 51: 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 20, 2004 Equivalence and Union-Find.

Time bounds Variables

M operations. N elements. Algorithms

Simple forest representation• Worst: find O(N). mixed operations O(MN).• Average: tricky

Union by height; Union by size• Worst: find O(log N). mixed operations O(M log N).• Average: mixed operations O(M) [see text]

Path compression in find• Worst: mixed operations: “nearly linear”

[analysis in 15-451]