Problem Description x How do we use storage nodes to store a
data object reliably, subject to an aggregate storage budget? x
Storage Allocation Access by the Data Collector Objective
Slide 11
Problem Description x How do we use storage nodes to store a
data object reliably, subject to an aggregate storage budget? x
Storage Allocation Source s has a data object of unit size It can
use n storage nodes to store x 1, x 2, , x n amount of data But
faces an aggregate storage budget T, i.e. Access by the Data
Collector Objective
Slide 12
Problem Description x How do we use storage nodes to store a
data object reliably, subject to an aggregate storage budget? x
Storage Allocation Access by the Data Collector Data collector t
attempts to recover the data object by accessing a subset r of
storage nodes It succeeds when the total amount of data accessed is
at least the size of the data object, i.e. Objective
Slide 13
Problem Description x How do we use storage nodes to store a
data object reliably, subject to an aggregate storage budget? x
Storage Allocation Access by the Data Collector Objective We seek
the optimal allocation that maximizes the probability of successful
recovery
Slide 14
Problem Description x How do we use storage nodes to store a
data object reliably, subject to an aggregate storage budget? x
Difficulty Problem is nonconvex Large space of possible symmetric
and nonsymmetric allocations (an allocation is symmetric if all its
nonzero elements are equal, and nonsymmetric otherwise)
Slide 15
[1] Deterministic Allocation with Probabilistic Access Data
collector accesses each storage node independently with constant
probability p
Slide 16
Symmetric allocations can be suboptimal Given n = 5 storage
nodes, budget T = 12 / 5, and p = 0.9, the nonsymmetric allocation
performs better than the optimal symmetric allocation Finding the
optimal symmetric allocation is also nontrivial [1] Deterministic
Allocation with Probabilistic Access Originally from a discussion
among R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman, and
others at UC Berkeley
Slide 17
[2] Deterministic Allocation with Fixed Access Data collector
accesses an r -subset of storage nodes, selected uniformly at
random from the collection of all possible r -subsets, where r <
n is a constant
Slide 18
[2] Deterministic Allocation with Fixed Access Equivalently, we
can seek the allocation that minimizes the budget T, among all
allocations that achieve a given probability of successful
recovery
Slide 19
[2] Deterministic Allocation with Fixed Access Example: ( n, r
) = (6,2) Question: For any budget T, is there always a symmetric
allocation that produces the maximum success probability?
Slide 20
[2] Deterministic Allocation with Fixed Access Question: What
is the optimal symmetric allocation? For most choices of ( n, r, T
), the optimal allocation either concentrates the budget over a
minimal number of nodes, or spreads it out maximally An example of
an exception is ( n, r, T ) = (15, 3, 4.6) for which the optimal
number of nodes to use, 9, is neither of the extremes
Slide 21
[2] Deterministic Allocation with Fixed Access For
Probability-1 Recovery, the problem reduces to a simple LP Result
1: If we require all possible r -subsets to allow successful
recovery, then we need a minimum budget of which corresponds to the
allocation i.e. it is optimal to spread the budget maximally We can
also bound the success probability above which this allocation is
optimal
Slide 22
[3] Symmetric Probabilistic Allocation with Fixed Access Each
storage node is used independently with constant probability s / n
to store the same amount of data 1 / `, and the total storage used
must be at most budget T in expectation
Slide 23
[3] Symmetric Probabilistic Allocation with Fixed Access
Probability of successful recovery can be written as where Bin( n,
p ) denotes the binomial random variable with n trials and success
probability p Reparameterizing in terms of budget T gives the
success probability,, each nonempty node stores 1 / ` amount of
data
Slide 24
[3] Symmetric Probabilistic Allocation with Fixed Access Result
2: For any r 2, and at any budget T large enough to support a
success probability xXXxx P ( r, T, ` ) > 0.9 for some `, the
choice of x x x x x x x x x x ` = r is optimal, i.e. it is best to
spread the budget maximally each nonempty node stores 1 / ` amount
of data
Slide 25
[3] Symmetric Probabilistic Allocation with Fixed Access As we
increase the budget T, we observe a sharp change in the optimal
allocation For small budgets and therefore low success
probabilities, it is optimal to store the data object in its
entirety ( ` = 1) and hope the data collector accesses at least one
of the nonempty nodes For large budgets and therefore high success
probabilities, it is optimal to store only 1 / r amount of data in
each node used ( ` = r ) and hope the data collector accesses r of
them r = 5
Slide 26
[3] Symmetric Probabilistic Allocation with Fixed Access We
conjecture that for any r and T, the optimal choice of ` that
maximizes success probability P ( r, T, ` ) is either ` = 1 or ` =
r r = 5 each nonempty node stores 1 / ` amount of data
Slide 27
[3] Symmetric Probabilistic Allocation with Fixed Access We
conjecture that for any r and T, the optimal choice of ` that
maximizes success probability P ( r, T, ` ) is either ` = 1 or ` =
r each nonempty node stores 1 / ` amount of data r = 5 store less
store more increasing budget per node
Slide 28
Summary & Future Work [1] Deterministic Allocation with
Probabilistic Access Suboptimality of symmetric allocations [2]
Deterministic Allocation with Fixed Access Optimal allocation for
high probability recovery Extreme point solutions not necessarily
optimal for symmetric allocations Is there always a symmetric
optimal allocation? [3]iSymmetric Probabilistic Allocation with
Fixed Access Optimal allocation in high-probability regime Is there
a phase transition in optimal allocation with increasing
budget?
Slide 29
Distributed Storage Allocation Problems Derek Leong, Alexandros
G. Dimakis, Tracey Ho California Institute of Technology NetCod
2009 2009-06-16