Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory...

134
Connectivity A Semi-External Algorithm Analysis: • Scan vertex set to load vertices into main memory • Scan edge set to carry out algorithm • O(scan(|V| + |E|)) I/Os Theorem: If |V| M, the connected components of a graph can be computed in O(scan(|V| + |E|)) I/Os. 1

Transcript of Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory...

Page 1: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

ConnectivityA Semi-External Algorithm

Analysis:

• Scan vertex set to load vertices into main memory

• Scan edge set to carry out algorithm

• O(scan(|V| + |E|)) I/Os

Theorem: If |V| M, the connected components of a graph can be computed in O(scan(|V| + |E|)) I/Os.

1

Page 2: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

ConnectivityThe General CaseIdea [Chiang et al 1995]:

• If |V| M– Use semi-external algorithm

• If |V| > M– Identify simple connected subgraphs of G– Contract these subgraphs to obtain graph

G’ = (V’, E’) with |V’| c|V|, c < 1– Recursively compute connected components of G’– Obtain labelling of connected components of G

from labelling of components of G’

2

Page 3: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

A

B C D

E

ConnectivityThe General Case

a

b

c

de

f

gh

i

j

k

lm

n

A

BC

D

E

1

12

2

2

1

1

11

1

12

2

22 2

2

2

2

3

Page 4: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

ConnectivityThe General Case

Main steps:

• Find smallest neighbors

• Compute connected components of graph H induced by selected edges

• Contract each component into a single vertex

• Call the procedure recursively

• Copy label of every vertex v G’ to all vertices in G represented by v

4

Page 5: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Finding smallest neighbors

To find smallest neighbor w(v) of every vertex v:

Scan edges and replace each undirected edge u,v with directed edges (u,v) and (v,u)

Sort directed edges lexicographically

This produces adjacency lists

Scan adjacency list of v and return as w(v) first vertex in list

This takes overall O(sort(|E|)) I/Os

To produce edge set of (undirected) graph H, sort and scan edges v, w(v) to remove duplicates

This takes another O(sort(|V|)) I/Os

5

Page 6: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Computing Conn Comps of H

Cannot use same algorithm recursively (didn’t reduce vertex set)

Exploit following property:

Lemma Graph H is a forestAssume not. Then H must contain cycle x0, x1, …, xk = x0. Since no duplicate edges, k ≥ 3. Since each vertex v has at most one incident edge v,w(v) in H, w.l.o.g. xi+1 = w(xi) for 0 ≤ i < k. Then the existence of xi-1,xi implies that xi-1 > xi+1. Similarly, xk-1 > x1.

If k even: x0 > x2 > … > xk = x0 yields a contradiction.

If k odd: x0 > x2 > … > xk-1 > x1 > x3 > … > xk = x0 yields a contradiction.

6

Page 7: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Exploit Property that H is a Forest

Apply Euler tour to H in order to transform each tree into a list

Now compute connected components using ideas from list ranking:

Find large independent set I of H and remove vertices in I from H

Recursively find connected components of smaller graphs

Reintegrate vertices in I (assign component label of neighbor)

This takes sort(|H|) = sort(|V|) I/Os

7

Page 8: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Recursive Calls

Every connected component of H has size at least 2 |V’| |V|/2 O(log (|V|/M)) recursive calls

Theorem: The connected components of a graph G = (V,E) can be computed in O(sort(|V|) + sort(|E|) log(|V|/M)) I/Os.

8

Page 9: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Improved Connectivity via BFS

• BFS in O(|V| + sort(|E|)) I/Os [Munagala & Ranade 99] BFS can be used to identify connected components• When |V| = |E|/B, algorithm takes O(sort(|E|)) I/Os• Same alg. but stop recursion before, when # of vertices

reduced to |E|/B (after log (|V|B/|E|) recursive calls)• At this point, apply BFS rather than semi-external

connectivity

Theorem: The connected components of a graph

G = (V,E) can be computed in

O(sort(|V|) + sort(|E|) log (|V|B / |E|) I/Os.9

Page 10: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Minimum Spanning Tree (MST)

Theorem: A MST of a graph G = (V,E) can be computed in O(sort(|V|) + sort(|E|) log (|V|/M)) I/Os.

10

Theorem: A MST of a graph G = (V,E) can be found in O(sort(|V|) + sort(|E|) log (|V|B / |E|) I/Os.

Can push same ideas to work on MSTs:

Page 11: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Three Techniques for Graph Algs

• Time-forward processing:– Express graph problems as evaluation problems of DAGs

• Graph contraction:– Reduce the size of G while maintaining the properties of

interest– Solve problem recursively on compressed graph– Construct solution for G from solution for compressed

graph• Bootstrapping:

– Switch to generally less efficient algorithm as soon as (part of the) input is small enough

11

Page 12: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Cache Oblivious Algorithms

Page 13: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

13

Typical Cache Configuration

Page 14: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

14

Introduced by Frigo, Leiserson, Prokop & Ramachandran [FLPR99, Pro99]. Its principle idea simple: design external-memory algorithms without knowing B and M (internal details of the hierarchical memory) But this simple idea has several surprisingly powerful consequences.

Cache Oblivious Model

Page 15: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

15

If cache-oblivious alg. performs well between two levels of the memory hierarchy, then it must automatically work well between any two adjacent levels of memory hierarchy.

Self-tuning: a cache-oblivious algorithm should work well on all machines without modification (still subject to some tuning, e.g., where to trim base case of recursion)

In contrast to external-memory model, algs in thecache-oblivious model cannot explicitly manage the cache

Consequences of Cache Oblivious

Page 16: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

16

How can we design algs that minimize number of block transfers if we do not know the page-replacement strategy?

An adversarial page replacement strategy could always evict next block that will be accessed…

Cache oblivious model assumes an ideal cache: page replacement is optimal, and cache is fully associative.

Assumptions of Cache Oblivious

Page 17: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

17

Optimal Page Replacement: Page replacement strategy knows the future and always evicts page that will be accessed farthest in future.

Real-world caches do not know the future, and employ more realistic page replacement strategies such as evicting the least-recently-used block (LRU) or evicting the oldest block (FIFO).

Assumptions of Cache Oblivious

Page 18: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

18

Full AssociativityAny block can be stored anywhere in cache.

In contrast, most caches have limited associativity: each block belongs to a cluster and at most some small constant c of blocks from a common cluster can be stored in cacheat once.

Typical real-world caches are either directed mapped (c = 1) or 2-way associative (c = 2). Some caches have more associativity—4-way or 8-way—but constant c is certainly limited

Assumptions of Cache Oblivious

Page 19: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

19

Frigo et al. [FLPR99,Pro99] justify the ideal-cache model by a collection of reductions that modify an ideal-cache alg to operate on a more realistic cache model.

Running time of the alg. degrades somewhat, but in most cases by only a constant factor.

Will outline major steps, without going into the details of the proofs.

Justification of Ideal Cache

Page 20: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

20

Replacement Strategy:The first reduction removes optimal (omniscient) replacement strategy that uses information about future requests.

Lemma [FLPR99]. If an alg makes T memory transfers on cache of size M/2 with optimal replacement, then it makes at most 2T memory transfers on cache of size M with LRU or FIFO replacement (and same block size B).

I.e., LRU and FIFO do just as well as optimal replacement up to constant factors of memory transfers and wastage of the cache. This competitiveness property of LRU and FIFO goes back to a 1985 paper of Sleator and Tarjan.

Justification of Ideal Cache

Page 21: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

22

Commonly assumed that cache taller than wide, i.e., number of blocks, M/B, larger than size of each block, B: M = Ω ( B2 )

Particularly important in more sophisticated cache-oblivious algs: ensures that cache provides polynomially large “buffer” for guessing block size slightly wrong.

Also commonly assumed in external-memory algorithms.

Another Assumption: Tall Cache

Page 22: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

23

Ideal Cache Oblivious Model

Algorithm designer does not need to know

parameters M and B explicitly

Sometimes, tall cache assumption: M = Ω ( B2 ) usually true in practice.

Focus on two levels:Level 1 has size M

Level 2 tranfers blocks of size B.

Page 23: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

24

Scanning N elements stored in a contiguous segment of memory costs at most •N/B• +1 memory transfers:

(Easy) Cache Oblivious Algs

Reversing an array same as scanning:

Page 24: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

25

Matrix Transposition

for (i = 0; i < N; i++)

for (j = i+1; j < N; j++) swap(A[i][j], A[j][i])

How many cache misses?

O(N2) in the worst case.

Recursion (divide & conquer) may be helpful.

How to improve this?

Page 25: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

26

Cache Oblivious Matrix Transposition

xy

x+x

y+y

xmid = (x/2)

Which problem must be solved recursively?

Page 26: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

27

Cache Oblivious Matrix Transposition

O(N2/B) cache misses

Page 27: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

28

Rough Experiments

Athlon 1Ghz, 512M RAM, Linux

Page 28: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

29

Stop Recursion Earlier

Stop recursion when problem size becomes less than a certain block size and use simple for loop implementation inside block.

Using different

block sizes

seems to have little effect on running time.

Page 29: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

30

Divide & conquer repeatedly refines problem size. Eventually, problem will fit in cache (size ≤ M), and later will fit in single block (size ≤ B).

For divide & conquer recursion dominated by leaf costs, algorithm will usually use within a constant factor of the optimal number of memory transfers.

If divide and merge can be done using few memory transfers, then divide & conquer approach efficient even when cost not dominated by leaves.

Why Divide & Conquer Works?

Page 30: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

31

Divide & Conquer OK: Selection

Median and Selection: find k-th item in unsorted sequenceClassical (internal memory) algorithm [Blum et al]:

Recurrence on running time T(N) is: T(N) = T(N/5) + T(7N/10) + O(N) = O(N)

Page 31: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

32

Cache Oblivious Implementation

Step 1 conceptual; do nothingStep 2 in two parallel scans:one reads array 5 items at a time, other writes new array of computed medians. Assuming M ≥ 2B, that’s O(1 + N/B) memory transfers. Step 3 recursive call of size N/5. Step 4 in three parallel scans:one reads array, two others write partitioned arrays. Again, parallel scans use O(1 + N/B) memory transfers (M ≥ 3B)Step 5 recursive call of size at most 7N/10

Recurrence on memory transfers T(N) is: T(N) = T(N/5) + T(7N/10) + O(1 + N/B)

Page 32: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

33

Failed Attempt in the Analysis

Recurrence on memory transfer T(N) is: T(N) = T(N/5) + T(7N/10) + O(1 + N/B)

Wish to prove O(1 + N/B) memory tranfers

If T(O(1)) = O(1), each leaf incurs a constant number of memory transfers.

How many leaves does the recurrence tree have?

L(N) total number of leaves: L(N) = L(N/5) + L(7N/10)

If L(N) = Nc, then (1/5)c + (7/10)c = 1. I.e., c ≈ 0.8397803

But T(N) is Ω( Nc ), which is still larger than O(1 + N/B) (when B ≤ N ≤ B Nc. i.e, B ≤ N ≤ B1/(1-c) = B6.24)

Page 33: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

34

Refined Analysis

Recurrence on memory transfer T(N) is: T(N) = T(N/5) + T(7N/10) + O(1 + N/B)

Luckily, can use base case stronger than T(O(1)) = O(1): T(O(B)) = O(1)(once problem fits into O(1) blocks, all 5 steps incur only constant number of memory transfers)

Stop recursion at O(B): then there are only (N/B)c leaves in recursion tree, which cost only O((N/B)c)= o(N/B) memory transfers. Thus cost per level decreases geometrically from root, so total cost is cost of root: O(1 + N/B).

Page 34: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

35

Cache Oblivious Implementaion

Theorem. The worst-case linear-time median algorithm, implemented with appropriate scans, uses O(1 + N/B) memory transfers, provided M ≥ 3B.

Key part of analysis is to identify relevant base case, so that “overhead term” does not dominate cost for small problem sizes relative to cache. Other than the new base case, analysis is same as classic (internal memory) algorithm.

Page 35: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

36

Divide & Conquer KO: Binary Src

Binary search has the following recurrence: T(N) = T(N/2) + O(1)

In this case, solution to recurrence becomes: T(N) = log N - | Θ(log B) |

Cost of leaves balance with cost of root: cost of every level is the same, so extra log N factor

Hope to reduce log N factor in a blocked setting by using stronger base case T(O(B)) = O(1)

However, stronger base case does not help much: only reduce number of levels in the recursion tree by an additive Θ(log B)

Will see later how to get O(logB N) with a different layout than the sorted one

Page 36: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

37

Matrix Multiplication

Wish to compute C = A · B. For sake of simplicity, square matrices whose dimensions are powers of two (this is w.l.o.g)

Trivial alg.: For each cij, scan in parallel row i of A and column j of B. Ideally, A stored in row-major and B in column-major order. Then each element of C requires ≤ O(1 + N/B) memory transfers, if M ≥ 3B. Cost could only be smaller if M large enough to store previously visited row or column. If M ≥ N, relevant row of A remembered for an entire row of C. But for column of B to be remembered, M ≥ N2, in which case entire problem fits in cache.

Theorem. Assume A stored in row-major and B in column-major order. Then trivial matrix-multiplication uses O(N2 + N3/B) memory transfers if 3B ≤ M < N2 and O(1 + N2/B) memory transfers if M ≥ 3N2 .

Page 37: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

38

Matrix Multiplication

Point of theorem is that, even with ideal storage order of A and B, trivial algorithm still requires O ( N3 / B ) memory transfers unless entire problem fits in cache. Can do better, and achieve running time of O(N2/B + N3/B √M).

In external-memory, this bound first achieved by Hong and Kung [HK81]

Cache-oblivious solution uses same idea as external-memory solution: block matrices.

Page 38: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

39

Matrix Multiplication

Can write C = A · B as a divide-and-conquer recursion using block-matrix notation:

This way, reduce N · N multiplication problem down to eight (N/2) · (N/2) multiplication subproblems, plus four (N/2) · (N/2) addition subproblems (which can be solved by single scan in O(1+N2/B) memory transfers). Thus, we get following recurrence:

T (N) = 8 T (N/2) + O(1 + N2/B)

Page 39: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

40

Matrix Layout

To make small matrix blocks fit into blocks or main memory, matrix not stored in row-major or column-major order, but rather in recursive layout.

Each matrix A laid out so that blocks A11, A12, A21, A22 occupies consecutive segment of memory, and these four segments stored together in arbitrary order.

Page 40: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

41

Base Case

Base case becomes trickier, as both B and M relevant.

Certainly, T (O(√B)) = O(1), because O(√ B) · O(√B) submatrix fits in a constant number of blocks. But this base case turns out to be irrelevant.

More interesting is T (c√M)= O(M/B), where constant c chosen so that three c√M · c√M submatrices fit in cache, and hence each block is read or written at most once.

Page 41: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

42

Analysis

Recurrence is T (N) = 8 T (N/2) + O(1 + N2/B)Stronger base case T (c√M)= O(M/B).At level i of recurrence tree:

8i nodes, matrix dimension is N / 2i

total cost 8i O(N2 / (22i B)) = 2i O(N2 / B) Recursion stops when N / 2i = c√M, i.e., L = O(log (N/√M))Total cost is L

Σ 2i O(N2 / B) = (2L+1-1) O(N2 / B) = O(N2/B) + O(N3/ (B √M)) i=0

(That’s divide-merge cost at root plus total leaf cost). Divide/merge cost at root of the recursion tree is O(N2/B). These two costs balance when N = Θ (√M), when depth of tree is O(1).

Page 42: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

43

Matrix Multiplication

Trivial: O( N3/B )

Cache Ob.: O (

N2/B + N3/B√M )

Trivial vs. blocked Cache Oblivious

Page 43: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

44

Static Searching

Page 44: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

45

Cache Oblivious Searching

Divide and conquer on tree layout(van Emde Boas O(loglog U) priority queue)

Split tree at midde level, resulting in one top tree and ≈ √N bottom subtrees, each of size ≈ √N

Recursively layout top subtree followed by bottom subtrees

Page 45: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

46

Cache Oblivious Searching

If height not power of 2, each split rounds so that bottom subtrees have heights power of 2:

Page 46: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

47

CO Searching

• Recursively split tree (cut at middle level) until every recursive subtree has size at most B (or small enough to fit into cache line)

• Each recursive subtree stores an interval of memory of size at most B, so occupies at most two blocks.

• Each recursive subtree except topmost has same height.

• Since trees are cut at middle level in each step, this height may be as small as (log B)/2, for subtree of size Θ(√B), but no smaller.

Page 47: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

48

CO Searching

• Search visits nodes along root-to-leaf path of length log N ,

visiting sequence of recursive subtrees along the way.

• All but first recursive subtree has height at least (log B)/2, so number of visited recursive subtrees is

≤ 1 + 2(log N )/(log B) = 1 + 2 logB N .

• Each recursive subtree may incur up to two memory transfers, for a total of ≤ ( 2 + 4 logB N ) memory transfers.

• Faster than trivial search by log2 N / 4 logB N = log2 B / 4

• log2 B / 2 more realistic (each recursive subtree in a block)

• For disk blocks of 1024 elements, expect speedup ≈ 5 (or ≈ 2.5)

O(logB N) cache misses

Page 48: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

49

Experiments on CO Searching

256 bytes tree nodes

Page 49: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Resilient Algorithms and Data Structures

Page 50: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Memory Errors

Memory error: one or multiple bits read differently from how they were last written.

Many possible causes:• electrical or magnetic interference (cosmic rays)• hardware problems (bit permanently damaged)• corruption in data path between memories and processing units

Errors in DRAM devices concern for a long time [May & Woods 79, Ziegler et al 79, Chen & Hsiao 84, Normand 96, O’Gorman et al 96, Mukherjee et al 05, … ]

51

Page 51: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Memory Errors

Soft Errors: Randomly corrupt bits, but do not leave any physical damage --- cosmic rays

Hard Errors: Corrupt bits in a repeatable manner because of a physical defect (e.g., stuck bits) --- hardware problems

52

Page 52: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Error Correcting Codes (ECC)

Error correcting codes (ECC) allow detection and correction of one or multiple bit errors

Typical ECC is SECDED (i.e., single error correct, double error detect)

Chip-Kill can correct up to 4 adjacent bits at once

ECC has several overheads in terms of performance (33%), size (20%) and money (10%).

ECC memory chips are mostly used in memory systems for server machines rather than for client computers

53

Page 53: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Impact of Memory Errors

Consequence of a memory error is system dependent

1. Correctable errors : fixed by ECC

2. Uncorrectable errors :

2.1. Detected : Explicit failure (e.g., a machine reboot)

2.2. Undetected : 2.2.1. Induced failure (e.g., a kernel panic)2.2.2. Unnoticed (but application corrupted, e.g., segmentation fault, file not found,

file not readable, … )

54

Page 54: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

How Common are Memory Errors?

55

Page 55: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

How Common are Memory Errors?

56

Page 56: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

How Common are Memory Errors?[Schroeder et al 2009] experiments 2.5 years (Jan 06 – Jun 08) on Google fleet (104 machines, ECC memory)

Memory errors are NOT rare events!

57

Page 57: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Memory Errors

Not all machines (clients) have ECC memory chips.

Increased demand for larger capacities at low cost just makes the problem more serious – large clusters of inexpensive memories

Need of reliable computation in the presence of memory faults

58

Page 58: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Memory Errors

• Memory errors can cause security vulnerabilities: Fault-based cryptanalysis [Boneh et al 97, Xu et al 01, Bloemer & Seifert 03]

Attacking Java Virtual Machines [Govindavajhala & Appel 03]

Breaking smart cards [Skorobogatov & Anderson 02, Bar-El et al 06]

• Avionics and space electronic systems: Amount of cosmic rays increase with altitude (soft errors)

Other scenarios in which memory errors have impact (and seem to be modeled in an adversarial setting):

59

Page 59: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Memory Errors in Space

60

Page 60: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Memory Errors in Space

61

Page 61: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Memory Errors in Space

62

Page 62: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Recap on Memory Errors

1. Memory errors can be harmful: uncorrectable memory errors cause some catastrophic event (reboot, kernel panic, data corruption, …)

63

I’m thinking of getting back into crime, Luigi. Legitimate business is too corrupt…

Page 63: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

A small example

Classical algorithms may not be correct in the presence of (even very few) memory errors

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

A

B

Out

An example: merging two ordered lists

(n) (n)

(n2) inversions

...11 12 2013

80

...2 3 4 9 1080

64

Page 64: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Recap on Memory Errors

2. Memory errors are NOT rare: even a small cluster of computers with few GB per node can experience one bit error every few minutes.

65

I know my PIN number: it’s my name I can’t remember…

Page 65: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Memory Errors

Mem. size Mean Time Between Failures

512 MB 2.92 hours

1 GB 1.46 hours

16 GB 5.48 minutes

64 GB 1.37 minutes

1 TB 5.13 seconds

In the field study, Google researchers observed mean error rates of 2,000 – 6,000 per GB per year (25,000 – 75,000 FIT/Mbit)

66

Page 66: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Recap on Memory Errors

3. ECC may not be available (or may not be enough): No ECC in inexpensive memories. ECC does not guarantee complete fault coverage; expensive; system halt upon detection of uncorrectable errors; service disruption; etc… etc…

67

Page 67: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Impact of Memory Errors

68

Page 68: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Resilient Algorithms and Data Structures

Resilient Algorithms and Data Structures:

Capable of tolerating memory errors on data (even throughout their execution) without sacrificing

correctness, performance and storage space

Make sure that the algorithms and data structures we design are capable of dealing with memory errors

69

Page 69: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Faulty- Memory Model [Finocchi, I. 04]

• Memory fault = the correct data stored in a memory location gets altered (destructive faults)

• Faults can appearat any time

in any memory location

simultaneously

• Assumptions:

– Only O(1) words of reliable memory (safe memory)

– Corrupted values indistinguishable from correct ones

Wish to produce correct output on uncorrupted data (in an adversarial model)

• Even recursion may be problematic in this model.

70

Page 70: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Terminology

= upper bound known on the number of memory errors (may be function of n)

= actual number of memory errors (happen during specific execution)

Note: typically ≤

All the algorithms / data structure described here need to know in advance

71

Page 71: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Other Faulty Models

Design of fault-tolerant alg’s received attention for 50+ years

Liar Model [Ulam 77, Renyi 76,…]

Comparison questions answered by a possibly lying adversary. Can exploit query replication strategies.

Fault-tolerant sorting networks [Assaf Upfal 91, Yao Yao 85,…]

Comparators can be faulty. Exploit substantial data replication using fault-free data replicators.

Parallel Computations [Huang et al 84, Chlebus et al 94, …]

Faults on parallel/distributed architectures: PRAM or DMM simulations (rely on fault-detection mechanisms)

72

Page 72: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Other Faulty Models

Robustness in Computational Geometry [Schirra 00, …]

Faults from unreliable computation (geometric precision) rather than from memory errors

Noisy / Unreliable Computation [Bravermann Mossel 08]

Faults (with given probability) from unreliable primitives (e.g., comparisons) rather than from memory errors

Memory Checkers [Blum et al 93, Blum et al 95, …]

Programs not reliable objects: self-testing and self-correction. Essential error detection and error correction mechanisms.

………………………………………

73

Page 73: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Outline of the Talk

1. Motivation and Model

2. Resilient Algorithms:

• Sorting and Searching

3. Resilient Data Structures

• Priority Queues

• Dictionaries

4. (Ongoing) Experimental Results

5. Conclusions and Open Problems

74

Page 74: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Resilient Sorting

We are given a set of n keys that need to be sorted

Q1. Can sort efficiently correct values in presence of

memory errors?

Q2. How many memory errors can tolerate in the worst

case if we wish to maintain optimal time and space?

Value of some keys may get arbitrarily corrupted

We cannot tell which is faithful and which is corrupted

75

Page 75: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Terminology

• Faithfully ordered sequence = ordered except for corrupted keys

• Resilient sorting algorithm = produces a faithfully ordered sequence (i.e., wish to sort correctly all the uncorrupted keys)

• Faithful key = never corrupted

1 2 3 4 5 6 7 8 9 10 ordered

Faithfully

80

• Faulty key = corrupted

76

Page 76: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Trivially Resilient

Resilient variable: consists of (2+1) copies xx…xof a standard variable x

Value of resilient variable given by majority of its copies:

• cannot be corrupted by faults

• can be computed in linear time and constant space [Boyer Moore 91]

Trivially-resilient algorithms and data structures have Θmultiplicative overheads in terms of time and space

Note: Trivially-resilient does more than ECC (SECDED, Chip-Kill, ….)

77

Page 77: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Trivially Resilient Sorting

Can trivially sort in O(n log n) time during memory errors

Trivially Resilient Sorting

O(n log n) sorting algorithm able to tolerate only O (1) memory errors

78

Page 78: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Resilient Sorting

Comparison-based sorting algorithm that takes O(n log n + ) time to run during memory errors

O(n log n) sorting algorithm able to tolerate up to O ((n log n)1/2) memory errors

Any comparison-based resilient O(n log n) sorting algorithm can tolerate the corruption of at most O ((n log n)1/2) keys

Upper Bound [Finocchi, Grandoni, I. 05]:

Lower Bound [Finocchi, I. 04]:

79

Page 79: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Resilient Sorting (cont.)

Randomized integer sorting algorithm that takes O(n + ) time to run during memory errors

O(n) randomized integer sorting algorithm able to tolerate up to O(n1/2) memory errors

Integer Sorting [Finocchi, Grandoni, I. 05]:

80

Page 80: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

search(5) = false

Resilient Binary Search

2 3 4 5 8 9 13 20 261 780 10

Wish to get correct answers at least on correct keys:

search(s) either finds a key equal to s, or determines that no correct key is equal to s

If only faulty keys are equal to s, answer uninteresting (cannot hope to get trustworthy answer)

81

Page 81: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Trivially Resilient Binary Search

Can search in O(log n) time during memory errors

Trivially Resilient Binary Search

82

Page 82: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Resilient Searching

Randomized algorithm with O(log n + ) expected time

[Finocchi, Grandoni, I. 05]

Deterministic algorithm with O(log n + ) time

[Brodal et al. 07]

Upper Bounds :

Lower Bounds :

(log n + ) lower bound (deterministic)

[Finocchi, I. 04]

(log n + ) lower bound on expected time

[Finocchi, Grandoni, I. 05]

83

Page 83: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Outline of the Talk

1. Motivation and Model

2. Resilient Algorithms:

• Sorting and Searching

3. Resilient Data Structures

• Priority Queues

• Dictionaries

4. (Ongoing) Experimental Results

5. Conclusions and Open Problems

84

Page 84: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Resilient Data Structures

Algorithms affected by errors during execution

Data structures affected by errors in lifetime

Data structures more vulnerable to memory errors than algorithms:

85

Page 85: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Resilient Priority Queues

Maintain a set of elements under insert and deletemin

insert adds an element

deletemin deletes and returns either the minimum uncorrupted value or a corrupted value

Consistent with resilient sorting

86

Page 86: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Resilient Priority Queues

Upper Bound :

Both insert and deletemin can be implemented in

O(log n + ) time

[Jorgensen et al. 07]

(based on cache-oblivious priority queues)

Lower Bound : A resilient priority queue with n > elements must use

(log n + ) comparisons to answer an insert followed

by a deletemin [Jorgensen et al. 07]

87

Page 87: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Resilient Dictionaries

Maintain a set of elements under insert, delete and search

insert and delete as usual, search as in resilient searching:

Again, consistent with resilient sorting

search(s) either finds a key equal to s, or determines that no correct key is equal to s

88

Page 88: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Resilient Dictionaries

Randomized resilient dictionary implements each

operation in O(log n + ) time

[Brodal et al. 07]

More complicated deterministic resilient dictionary

implements each operation in O(log n + ) time

[Brodal et al. 07]

89

Page 89: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Resilient Dictionaries

Pointer-based data structures

Faults on pointers likely to be more problematic

than faults on keys

Randomized resilient dictionaries of Brodal et al.

built on top of traditional (non-resilient) dictionaries

Our implementation built on top of AVL trees

90

Page 90: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Outline of the Talk

1. Motivation and Model

2. Resilient Algorithms:

• Sorting and Searching

3. Resilient Data Structures

• Priority Queues

• Dictionaries

4. (Ongoing) Experimental Results

5. Conclusions and Open Problems

91

Page 91: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Experimental Framework

Alg

orit

hm

/ D

ata

Str

uct

ure

Non-Resilient

Trivially Resilient

Resilient

O(f(n))

O( · f(n))

O(f(n) + g( ))

92

Resilient sorting from [Ferraro-Petrillo et al. 09]

Resilient dictionaries from [Ferraro-Petrillo et al. 10]

Implemented resilient binary search and heaps

Implementations of resilient sorting and dictionaries more engineered than resilient binary search and heaps

Page 92: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Experimental Platform

• 2 CPUs Intel Quad-Core Xeon E5520 @ 2.26Ghz

• L1 cache 256Kb, L2 cache 1 Mb, L3 cache 8 Mb

• 48 GB RAM

• Scientific Linux release with Linux kernel 2.6.18-164

• gcc 4.1.2, optimization flag –O3

93

Page 93: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Fault Injection

This talk: Only random faults

Algorithm / data structure and fault injection implemented as separate threads

(Run on different CPUs)

Preliminary experiments (not here): error rates depend on memory usage and time.

94

Page 94: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Resiliency: Why should we care?

What’s the impact of memory errors?

Try to analyze impact of errors on mergesort, priority queues

and dictionaries using a common framework (sorting)

Attempt to measure error propagation: try to estimate how

much output sequence is far from being sorted (because of

memory errors)

Heapsort implemented on array. For coherence, in AVLSort

we do not induce faults on pointers

We’ll measure faults on AVL pointers in separate

experiment95

Page 95: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Error Propagation

• k-unordered sequence = faithfully ordered except for k (correct) keys

• k-unordered sorting algorithm = produces a k-unordered sequence, i.e., it faithfully sorts all but k correct keys

2-unordered1 2 3 4 9 5 7 8 6 1080

• Resilient is 0-unordered = i.e., it faithfully sorts all correct keys

96

Page 96: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

The Importance of Being Resilient

n = 5,000,000 0.01% (random) errors in input 0.13% errors in output

0.02% (random) errors in input 0.22% errors in output

97

Page 97: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

The Importance of Being Resilient

n = 5,000,000 0.01% (random) errors in input 0.40% errors in output

0.02% (random) errors in input 0.47% errors in output

98

Page 98: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

The Importance of Being Resilient

n = 5,000,000 0.01% (random) errors in input 68.20% errors in output

0.02% (random) errors in input 79.62% errors in output

99

Page 99: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

The Importance of Being Resilient

100

Page 100: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Error Amplification

Mergesort0.002-0.02% (random) errors in input 24.50-79.51% errors in

output!!!

AVLsort0.002-0.02% (random) errors in input 0.39-0.47% errors in output

Heapsort

0.002-0.02% (random) errors in input 0.01-0.22% errors in output

They all show some error amplification.

Large variations likely to depend on data organization

Note: Those are errors on keys. Errors on pointers are more

dramatic for pointer-based data structures 101

Page 101: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

The Importance of Being Resilient

AVL with n = 5,000,000; errors on memory used

(keys, parent pointers, pointers, etc…)

100,000 searches around searches fail: on the

avg, able to complete only about

(100,000/ searches before crashing

102

Page 102: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Isn’t Trivial Resiliency Enough?

Memory errors are a problem

Do we need to tackle it with new algorithms / data

structures?

Aren’t simple-minded approaches enough?

103

Page 103: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Isn’t Trivial Resiliency Enough?

104

Page 104: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Isn’t Trivial Resiliency Enough?

random search

105

Page 105: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Isn’t Trivial Resiliency Enough?

random ops

106

Page 106: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Isn’t Trivial Resiliency Enough?

random ops no errors on pointers

107

Page 107: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Isn’t Trivial Resiliency Enough?

All experiments for 105 ≤ n ≤ 5 105, =1024, unless specified

otherwise

Mergesort

Trivially resilient about 100-200X slower than non-resilient

Binary Search

Trivially resilient about 200-300X slower than non-resilient

Dictionaries

Trivially resilient AVL about 300X slower than non-resilient

Heaps Trivially resilient about 1000X slower than non-resilient ( = 512)

[deletemin are not random and slow]108

Page 108: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Performance of Resilient Algorithms

Memory errors are a problem

Trivial approaches produce slow algorithms /

data structures

Need non-trivial (hopefully fast) approaches

How fast can be resilient algorithms / data

structures?

109

Page 109: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Performance of Resilient Algorithms

110

Page 110: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Performance of Resilient Algorithms

111

Page 111: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Performance of Resilient Algorithms

random search

112

Page 112: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Performance of Resilient Algorithms

random search

113

Page 113: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Performance of Resilient Algorithms

random ops

114

Page 114: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Performance of Resilient Algorithms

random ops

115

Page 115: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Performance of Resilient Algorithms

random ops

116

Page 116: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Performance of Resilient Algorithms

random ops

117

Page 117: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Performance of Resiliency

All experiments for 105 ≤ n ≤ 5 105, unless specified otherwise

MergesortResilient mergesort about 1.5-2X slower than non-resilient mergesort

Trivially resilient mergesort about 100-200X slower]

Binary Search Resilient binary search about 60-80X slower than non-resilient binary search

[Trivially resilient binary search about 200-300X slower]

Heaps Resilient heaps about 20X slower than non-resilient heaps ([Trivially resilient heaps about 1000X slower]

DictionariesResilient AVL about 10-20X slower than non-resilient AVL

[Trivially resilient AVL about 300X slower]

118

Page 118: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Larger Data Sets

119

How well does the performance of resilient algorithms / data structures scale to larger data sets?

Previous experiments: 105 ≤ n ≤ 5 105

New experiment with n = 5 106 (no trivially resilient)

Page 119: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Larger Data Sets

120

n = 5,000,000

Page 120: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Larger Data Sets

n = 5,000,000

121

Page 121: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Larger Data Sets

100,000 random search on n =

5,000,000 elements

122

log2 n ≈ 22

Page 122: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Larger Data Sets

123

100,000 random search on n

= 5,000,000 elements

Page 123: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Larger Data Sets

100,000 random ops on a

heap with n = 5,000,000

124

log2 n ≈ 22

Page 124: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Larger Data Sets

100,000 random ops on a

heap with n = 5,000,000

125

Page 125: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Larger Data Sets

100,000 random ops on

AVL with n = 5,000,000

126

log2 n ≈ 22

Page 126: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Larger Data Sets

100,000 random ops on

AVL with n = 5,000,000

127

Page 127: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Larger Data Sets

All experiments for n = 5 106

Mergesort [was 1.5-2X for 105 ≤ n ≤ 5 105]

Resilient mergesort is 1.6-2.3X slower (requires ≤ 0.04% more

space)

Binary Search [was 60-80X for 105 ≤ n ≤ 5 105]

Resilient search is 100-1000X slower (requires ≤ 0.08% more

space)

Heaps [was 20X for 105 ≤ n ≤ 5 105]

Resilient heap is 100-1000X slower (requires 100X more space)

Dictionaries [was 10-20X for 105 ≤ n ≤ 5 105]

Resilient AVL is 6.9-14.6X slower (requires about 1/3 space)128

Page 128: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Sensitivity to

129

How critical is the choice of

Underestimating compromises resiliency

Overestimating gives some performance degradation

Page 129: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Performance Degradation

Mergesort

Resilient mergesort improves by 9.7% in time and degrades by

0.04% in space

Binary Search

Resilient search degrades to 9.8X in time and by 0.08% in space

Heaps

Resilient heap degrades to 13.1X in time and by 59.28% in

space

Dictionaries

Resilient AVL degrades by 49.71% in time130

but algorithm overestimates

Page 130: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Robustness

131

Resilient mergesort and dictionaries appear more robust than resilient search and heaps

I.e., resilient mergesort and dictionaries scale better with n, less sensitive to so less vulnerable to bad estimates of …

How much of this is due to the fact that their implementations are more engineered?

Page 131: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Outline of the Talk

1. Motivation and Model

2. Resilient Algorithms:

• Sorting and Searching

3. Resilient Data Structures

• Priority Queues

• Dictionaries

4. (Ongoing) Experimental Results

5. Conclusions and Open Problems

132

Page 132: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Concluding Remarks

• Need of reliable computation in the presence of

memory errors

• Investigated basic algorithms and data structures in

the faulty memory model: do not wish to detect

/correct errors, only produce correct output on

correct data

• Tight upper and lower bounds in this model• After first tests, resilient implementations of

algorithms and data structures look promising

133

Page 133: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Future Work and Open Problems

• More (faster) implementations, engineering and experimental analysis?

• Resilient graph algorithms?

• Lower bounds for resilient integer sorting?

• Better faulty memory model?

• Resilient algorithms oblivious to ?

• Full repertoire for resilient priority queues (delete, decreasekey, increasekey)?

134

Page 134: Connectivity A Semi-External Algorithm Analysis: Scan vertex set to load vertices into main memory Scan edge set to carry out algorithm O(scan(|V| + |E|))

Thank You!

135

My memory’s terrible these days…