Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California...

Post on 17-Dec-2015

212 views 0 download

Tags:

Transcript of Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California...

  • Slide 1
  • Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16
  • Slide 2
  • Motivation
  • Slide 3
  • 0.1
  • Slide 4
  • A B C
  • Slide 5
  • Success probability = 0.9 0 0.1 5 0 successful 0-subsets + 0.9 1 0.1 4 2 successful 1-subsets + 0.9 2 0.1 3 7 successful 2-subsets + 0.9 3 0.1 2 9 successful 3-subsets + 0.9 4 0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets = 0.99 A
  • Slide 6
  • Motivation Success probability = 0.9 0 0.1 5 0 successful 0-subsets + 0.9 1 0.1 4 0 successful 1-subsets + 0.9 2 0.1 3 0 successful 2-subsets + 0.9 3 0.1 2 10 successful 3-subsets + 0.9 4 0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets = 0.99144 B
  • Slide 7
  • Motivation Success probability = 0.9 0 0.1 5 0 successful 0-subsets + 0.9 1 0.1 4 0 successful 1-subsets + 0.9 2 0.1 3 6 successful 2-subsets + 0.9 3 0.1 2 10 successful 3-subsets + 0.9 4 0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets = 0.9963 C
  • Slide 8
  • MotivationA B C 0.99 0.99144 0.9963
  • Slide 9
  • 0.1 accessmodel
  • Slide 10
  • Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Objective
  • Slide 11
  • Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Source s has a data object of unit size It can use n storage nodes to store x 1, x 2, , x n amount of data But faces an aggregate storage budget T, i.e. Access by the Data Collector Objective
  • Slide 12
  • Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Data collector t attempts to recover the data object by accessing a subset r of storage nodes It succeeds when the total amount of data accessed is at least the size of the data object, i.e. Objective
  • Slide 13
  • Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Objective We seek the optimal allocation that maximizes the probability of successful recovery
  • Slide 14
  • Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Difficulty Problem is nonconvex Large space of possible symmetric and nonsymmetric allocations (an allocation is symmetric if all its nonzero elements are equal, and nonsymmetric otherwise)
  • Slide 15
  • [1] Deterministic Allocation with Probabilistic Access Data collector accesses each storage node independently with constant probability p
  • Slide 16
  • Symmetric allocations can be suboptimal Given n = 5 storage nodes, budget T = 12 / 5, and p = 0.9, the nonsymmetric allocation performs better than the optimal symmetric allocation Finding the optimal symmetric allocation is also nontrivial [1] Deterministic Allocation with Probabilistic Access Originally from a discussion among R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman, and others at UC Berkeley
  • Slide 17
  • [2] Deterministic Allocation with Fixed Access Data collector accesses an r -subset of storage nodes, selected uniformly at random from the collection of all possible r -subsets, where r < n is a constant
  • Slide 18
  • [2] Deterministic Allocation with Fixed Access Equivalently, we can seek the allocation that minimizes the budget T, among all allocations that achieve a given probability of successful recovery
  • Slide 19
  • [2] Deterministic Allocation with Fixed Access Example: ( n, r ) = (6,2) Question: For any budget T, is there always a symmetric allocation that produces the maximum success probability?
  • Slide 20
  • [2] Deterministic Allocation with Fixed Access Question: What is the optimal symmetric allocation? For most choices of ( n, r, T ), the optimal allocation either concentrates the budget over a minimal number of nodes, or spreads it out maximally An example of an exception is ( n, r, T ) = (15, 3, 4.6) for which the optimal number of nodes to use, 9, is neither of the extremes
  • Slide 21
  • [2] Deterministic Allocation with Fixed Access For Probability-1 Recovery, the problem reduces to a simple LP Result 1: If we require all possible r -subsets to allow successful recovery, then we need a minimum budget of which corresponds to the allocation i.e. it is optimal to spread the budget maximally We can also bound the success probability above which this allocation is optimal
  • Slide 22
  • [3] Symmetric Probabilistic Allocation with Fixed Access Each storage node is used independently with constant probability s / n to store the same amount of data 1 / `, and the total storage used must be at most budget T in expectation
  • Slide 23
  • [3] Symmetric Probabilistic Allocation with Fixed Access Probability of successful recovery can be written as where Bin( n, p ) denotes the binomial random variable with n trials and success probability p Reparameterizing in terms of budget T gives the success probability,, each nonempty node stores 1 / ` amount of data
  • Slide 24
  • [3] Symmetric Probabilistic Allocation with Fixed Access Result 2: For any r 2, and at any budget T large enough to support a success probability xXXxx P ( r, T, ` ) > 0.9 for some `, the choice of x x x x x x x x x x ` = r is optimal, i.e. it is best to spread the budget maximally each nonempty node stores 1 / ` amount of data
  • Slide 25
  • [3] Symmetric Probabilistic Allocation with Fixed Access As we increase the budget T, we observe a sharp change in the optimal allocation For small budgets and therefore low success probabilities, it is optimal to store the data object in its entirety ( ` = 1) and hope the data collector accesses at least one of the nonempty nodes For large budgets and therefore high success probabilities, it is optimal to store only 1 / r amount of data in each node used ( ` = r ) and hope the data collector accesses r of them r = 5
  • Slide 26
  • [3] Symmetric Probabilistic Allocation with Fixed Access We conjecture that for any r and T, the optimal choice of ` that maximizes success probability P ( r, T, ` ) is either ` = 1 or ` = r r = 5 each nonempty node stores 1 / ` amount of data
  • Slide 27
  • [3] Symmetric Probabilistic Allocation with Fixed Access We conjecture that for any r and T, the optimal choice of ` that maximizes success probability P ( r, T, ` ) is either ` = 1 or ` = r each nonempty node stores 1 / ` amount of data r = 5 store less store more increasing budget per node
  • Slide 28
  • Summary & Future Work [1] Deterministic Allocation with Probabilistic Access Suboptimality of symmetric allocations [2] Deterministic Allocation with Fixed Access Optimal allocation for high probability recovery Extreme point solutions not necessarily optimal for symmetric allocations Is there always a symmetric optimal allocation? [3]iSymmetric Probabilistic Allocation with Fixed Access Optimal allocation in high-probability regime Is there a phase transition in optimal allocation with increasing budget?
  • Slide 29
  • Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16