Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs...
-
Upload
fay-fleming -
Category
Documents
-
view
217 -
download
0
Transcript of Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs...
![Page 1: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/1.jpg)
Computation and Minimax Risk
• The most challenging topic…• Some recent progress:
– tradeoffs between time and accuracy via convex relaxations (Chandrasekaran & Jordan, 2013)
– constraints on computation via optimization oracles (Duchi, McMahan & Jordan, 2014)
– parallelization via optimistic concurrency control (Pan, et al., 2014)
![Page 2: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/2.jpg)
Concurrency Control for Distributed Machine
LearningMichael I. Jordan
University of California, Berkeley
(with Xinghao Pan, Joseph Gonzalez, Stefanie Jegelka, Tamara Broderick and Joseph Bradley)
![Page 3: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/3.jpg)
Distributed Computing Meets Large-Scale Statistical Inference
• In many areas of statistics, parallel/distributed approaches are increasingly essential (e.g., to provide time/sample tradeoffs)
• Many methods, either optimization-based or integration-based, involve exploring models having variable structure
• Leading to a core problem: how to ensure that statistical consistency and coherence are maintained when multiple processors are making structural changes to a model?
![Page 4: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/4.jpg)
Data
ModelState
Serial Inference
![Page 5: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/5.jpg)
ModelState
Coordination Free Parallel Inference
Processor 1
Processor 2
Data
![Page 6: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/6.jpg)
Data
ModelState
Coordination Free Parallel Inference
Processor 1
Processor 2
Keep Calm and Carry On.
![Page 7: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/7.jpg)
Accuracy
Serial
Low High
![Page 8: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/8.jpg)
Accuracy
Scalability
Coordination-free
Serial
High
Low High
Low
![Page 9: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/9.jpg)
Accuracy
Scalability
Coordination-free
Serial
High
Low High
Low
ConcurrencyControl
Database mechanismso Guarantee correctnesso Maximize concurrency Mutual exclusion Optimistic CC
![Page 10: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/10.jpg)
Data
ModelState
Mutual Exclusion Through Locking
Processor 1
Processor 2
Introducing locking (scheduling) protocols to identify
potential conflicts.
![Page 11: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/11.jpg)
Data
ModelState
Processor 1
Processor 2
✗
Enforce serialization of computation that could conflict.
Mutual Exclusion Through Locking
![Page 12: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/12.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
Allow computation to proceed without blocking.
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
![Page 13: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/13.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
?✔
Validate potential conflicts.
Valid outcome
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
![Page 14: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/14.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
? ?✗ ✗
Validate potential conflicts.
Invalid Outcome
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
![Page 15: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/15.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
Take a compensating action.
✗ ✗Amend the Value
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
![Page 16: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/16.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
✗ ✗
Validate potential conflicts.
Invalid Outcome
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
![Page 17: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/17.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
✗ ✗Rollback and Redo
Take a compensating action.
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
![Page 18: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/18.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
Rollback and Redo
Non-Blocking Computation
Validation: Identify Errors
Resolution: Correct Errors
Concurrency
AccuracyFast
Infrequent
Requirements:
![Page 19: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/19.jpg)
Concurrency Control
Coordination Free:
Provably fast and correct under key assumptions.
Concurrency Control:
Provably correct and fast under key assumptions.
Systems Ideas toImprove Efficiency
![Page 20: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/20.jpg)
Examples
Keyw
ord
sQ
ueri
es
A B C D E F G H
1 2 3 4 5 6 7 8
$2 $5 $1 $2 $5 $1 $4 $2
Costs
$2 $2 $4 $4 $3 $6 $5 $1
Value
θ1
ϕ1
θ2
θ3θ4
ϕ2 ϕ3 ϕ4θ5
θ6
Clustering: DP-means Submodularity: Double Greedy
Bayesian Nonparametrics: Chinese Restaurant Process
![Page 21: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/21.jpg)
Clustering with DP-means
![Page 22: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/22.jpg)
Bayesian Nonparametrics Meets Optimization
• A methodology whereby optimization functionals arise when “small-variance asymptotics” are applied to Bayesian models based on combinatorial stochastic process priors
![Page 23: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/23.jpg)
Bayesian Nonparametrics Meets Optimization
• A methodology whereby optimization functionals arise when “small-variance asymptotics” are applied to Bayesian models based on combinatorial stochastic process priors
• Inspiration: the venerable, scalable K-means algorithm can be derived as the limit of an Expectation-Maximization algorithm for fitting a mixture model
![Page 24: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/24.jpg)
Bayesian Nonparametrics Meets Optimization
• A methodology whereby optimization functionals arise when “small-variance asymptotics” are applied to Bayesian models based on combinatorial stochastic process priors
• Inspiration: the venerable, scalable K-means algorithm can be derived as the limit of an Expectation-Maximization algorithm for fitting a mixture model
• We do something similar in spirit, taking limits of various Bayesian nonparametric models:– Dirichlet process mixtures– hierarchical Dirichlet process mixtures– beta processes and hierarchical beta processes
![Page 25: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/25.jpg)
DP-Means Algorithm
Computing cluster membership
[Kulis and Jordan, 2012]
λ
![Page 26: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/26.jpg)
DP-Means Algorithm
Updating cluster centers:
[Kulis and Jordan, ICML’12]
![Page 27: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/27.jpg)
DP-Means Parallel Execution
Computing cluster membership in parallel:
CPU 1
CPU 2
Cannot introduce
overlapping clusters in parallel
<λ
![Page 28: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/28.jpg)
Optimistic Concurrency Control
for Parallel DP-Means
<λ
ResolutionAssign new cluster center to existing cluster
Optimistic AssumptionNo new cluster created nearby
ValidationVerify that new clusters don’t overlap
CPU 1
CPU 2
![Page 29: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/29.jpg)
Corr
ectn
es
sConcurrency Control for DP-means
Theorem: OCC DP-means is serializable, i.e. equivalent to some sequential execution.
Corollary: OCC DP-means preserves theoretical properties of DP-means.
Theorem: Assuming well-spaced clusters, expected overhead of OCC DP-means, in terms of number of rejected proposals, does not depend on size of data set.
Con
cu
rre
ncy
![Page 30: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/30.jpg)
Empirical Validation Failure Rate
30
OC
C O
verh
ead
Poin
ts F
aili
ng
Valid
ati
on
Dataset Size
λ Separable Clusters
2 Processors
4 Processors
8 Processors
16 Processors
32 Processors
Independence of dataset size
![Page 31: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/31.jpg)
Empirical Validation Failure Rate
31
OC
C O
verh
ead
Poin
ts F
aili
ng
Valid
ati
on
Dataset Size
Overlapping Clusters
2 Processors
4 Processors
8 Processors
16 Processors
32 Processors
Weak dependence of dataset size
![Page 32: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/32.jpg)
Distributed Evaluation Amazon EC2
1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
Number of Machines
Ru
nti
me I
n S
econ
dP
er
Com
ple
te P
ass o
ver
Data
OCC DP-means Runtime Projected Linear Scaling
2x #machines≈ ½x runtime
~140 million data points; 1, 2, 4, 8 machines
![Page 33: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/33.jpg)
Summary
Accuracy Scalability
SequentialAppealing theoretical properties
Little
Coordination-free
Approximate, under
assumptionsAlways fast
Concurrency Control
Always correctGood, under assumptions• Coordination-free approach guarantees speed, and
analysis focuses on showing accuracy under assumptions.• Our approach guarantees accuracy, and analysis focuses
on showing speed under assumptions.
![Page 34: Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e595503460f94b52939/html5/thumbnails/34.jpg)
Conclusions
• Many conceptual and mathematical challenges arising in taking seriously the problem of “Big Data”
• Facing these challenges will require a rapprochement between computer science and statistics, bringing them together at the level of their foundations – thus reshaping both disciplines