Transcript of Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California...
- Slide 1
- Distributed Storage Allocation Problems Derek Leong, Alexandros
G. Dimakis, Tracey Ho California Institute of Technology NetCod
2009 2009-06-16
- Slide 2
- Motivation
- Slide 3
- 0.1
- Slide 4
- A B C
- Slide 5
- Success probability = 0.9 0 0.1 5 0 successful 0-subsets + 0.9
1 0.1 4 2 successful 1-subsets + 0.9 2 0.1 3 7 successful 2-subsets
+ 0.9 3 0.1 2 9 successful 3-subsets + 0.9 4 0.1 1 5 successful
4-subsets + 0.9 5 0.1 0 1 successful 5-subsets = 0.99 A
- Slide 6
- Motivation Success probability = 0.9 0 0.1 5 0 successful
0-subsets + 0.9 1 0.1 4 0 successful 1-subsets + 0.9 2 0.1 3 0
successful 2-subsets + 0.9 3 0.1 2 10 successful 3-subsets + 0.9 4
0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets =
0.99144 B
- Slide 7
- Motivation Success probability = 0.9 0 0.1 5 0 successful
0-subsets + 0.9 1 0.1 4 0 successful 1-subsets + 0.9 2 0.1 3 6
successful 2-subsets + 0.9 3 0.1 2 10 successful 3-subsets + 0.9 4
0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets =
0.9963 C
- Slide 8
- MotivationA B C 0.99 0.99144 0.9963
- Slide 9
- 0.1 accessmodel
- Slide 10
- Problem Description x How do we use storage nodes to store a
data object reliably, subject to an aggregate storage budget? x
Storage Allocation Access by the Data Collector Objective
- Slide 11
- Problem Description x How do we use storage nodes to store a
data object reliably, subject to an aggregate storage budget? x
Storage Allocation Source s has a data object of unit size It can
use n storage nodes to store x 1, x 2, , x n amount of data But
faces an aggregate storage budget T, i.e. Access by the Data
Collector Objective
- Slide 12
- Problem Description x How do we use storage nodes to store a
data object reliably, subject to an aggregate storage budget? x
Storage Allocation Access by the Data Collector Data collector t
attempts to recover the data object by accessing a subset r of
storage nodes It succeeds when the total amount of data accessed is
at least the size of the data object, i.e. Objective
- Slide 13
- Problem Description x How do we use storage nodes to store a
data object reliably, subject to an aggregate storage budget? x
Storage Allocation Access by the Data Collector Objective We seek
the optimal allocation that maximizes the probability of successful
recovery
- Slide 14
- Problem Description x How do we use storage nodes to store a
data object reliably, subject to an aggregate storage budget? x
Difficulty Problem is nonconvex Large space of possible symmetric
and nonsymmetric allocations (an allocation is symmetric if all its
nonzero elements are equal, and nonsymmetric otherwise)
- Slide 15
- [1] Deterministic Allocation with Probabilistic Access Data
collector accesses each storage node independently with constant
probability p
- Slide 16
- Symmetric allocations can be suboptimal Given n = 5 storage
nodes, budget T = 12 / 5, and p = 0.9, the nonsymmetric allocation
performs better than the optimal symmetric allocation Finding the
optimal symmetric allocation is also nontrivial [1] Deterministic
Allocation with Probabilistic Access Originally from a discussion
among R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman, and
others at UC Berkeley
- Slide 17
- [2] Deterministic Allocation with Fixed Access Data collector
accesses an r -subset of storage nodes, selected uniformly at
random from the collection of all possible r -subsets, where r <
n is a constant
- Slide 18
- [2] Deterministic Allocation with Fixed Access Equivalently, we
can seek the allocation that minimizes the budget T, among all
allocations that achieve a given probability of successful
recovery
- Slide 19
- [2] Deterministic Allocation with Fixed Access Example: ( n, r
) = (6,2) Question: For any budget T, is there always a symmetric
allocation that produces the maximum success probability?
- Slide 20
- [2] Deterministic Allocation with Fixed Access Question: What
is the optimal symmetric allocation? For most choices of ( n, r, T
), the optimal allocation either concentrates the budget over a
minimal number of nodes, or spreads it out maximally An example of
an exception is ( n, r, T ) = (15, 3, 4.6) for which the optimal
number of nodes to use, 9, is neither of the extremes
- Slide 21
- [2] Deterministic Allocation with Fixed Access For
Probability-1 Recovery, the problem reduces to a simple LP Result
1: If we require all possible r -subsets to allow successful
recovery, then we need a minimum budget of which corresponds to the
allocation i.e. it is optimal to spread the budget maximally We can
also bound the success probability above which this allocation is
optimal
- Slide 22
- [3] Symmetric Probabilistic Allocation with Fixed Access Each
storage node is used independently with constant probability s / n
to store the same amount of data 1 / `, and the total storage used
must be at most budget T in expectation
- Slide 23
- [3] Symmetric Probabilistic Allocation with Fixed Access
Probability of successful recovery can be written as where Bin( n,
p ) denotes the binomial random variable with n trials and success
probability p Reparameterizing in terms of budget T gives the
success probability,, each nonempty node stores 1 / ` amount of
data
- Slide 24
- [3] Symmetric Probabilistic Allocation with Fixed Access Result
2: For any r 2, and at any budget T large enough to support a
success probability xXXxx P ( r, T, ` ) > 0.9 for some `, the
choice of x x x x x x x x x x ` = r is optimal, i.e. it is best to
spread the budget maximally each nonempty node stores 1 / ` amount
of data
- Slide 25
- [3] Symmetric Probabilistic Allocation with Fixed Access As we
increase the budget T, we observe a sharp change in the optimal
allocation For small budgets and therefore low success
probabilities, it is optimal to store the data object in its
entirety ( ` = 1) and hope the data collector accesses at least one
of the nonempty nodes For large budgets and therefore high success
probabilities, it is optimal to store only 1 / r amount of data in
each node used ( ` = r ) and hope the data collector accesses r of
them r = 5
- Slide 26
- [3] Symmetric Probabilistic Allocation with Fixed Access We
conjecture that for any r and T, the optimal choice of ` that
maximizes success probability P ( r, T, ` ) is either ` = 1 or ` =
r r = 5 each nonempty node stores 1 / ` amount of data
- Slide 27
- [3] Symmetric Probabilistic Allocation with Fixed Access We
conjecture that for any r and T, the optimal choice of ` that
maximizes success probability P ( r, T, ` ) is either ` = 1 or ` =
r each nonempty node stores 1 / ` amount of data r = 5 store less
store more increasing budget per node
- Slide 28
- Summary & Future Work [1] Deterministic Allocation with
Probabilistic Access Suboptimality of symmetric allocations [2]
Deterministic Allocation with Fixed Access Optimal allocation for
high probability recovery Extreme point solutions not necessarily
optimal for symmetric allocations Is there always a symmetric
optimal allocation? [3]iSymmetric Probabilistic Allocation with
Fixed Access Optimal allocation in high-probability regime Is there
a phase transition in optimal allocation with increasing
budget?
- Slide 29
- Distributed Storage Allocation Problems Derek Leong, Alexandros
G. Dimakis, Tracey Ho California Institute of Technology NetCod
2009 2009-06-16