Multiprocessing on Supercomputers for Computational Aerodynamics
Elevating R to Supercomputers
-
Upload
truongkhanh -
Category
Documents
-
view
222 -
download
1
Transcript of Elevating R to Supercomputers
Introduction Benchmarks Challenges
Elevating R to Supercomputers
Drew Schmidt
National Institute for Computational SciencesUniversity of Tennessee, Knoxville
November 11, 2013
Drew Schmidt Elevating R to Supercomputers
Introduction Benchmarks Challenges
The pbdR Core Team
Wei-Chen Chen1
George Ostrouchov1,2
Pragneshkumar Patel2
Drew Schmidt2
SupportThis work used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory,which is supported by the Office of Science of the U.S. Department of Energy under Contract No.DE-AC05-00OR22725. This work also used resources of National Institute for Computational Sciences at theUniversity of Tennessee, Knoxville, which is supported by the Office of Cyberinfrastructure of the U.S. NationalScience Foundation under Award No. ARRA-NSF-OCI-0906324 for NICS-RDAV center. This work used resourcesof the Newton HPC Program at the University of Tennessee, Knoxville.
1Oak Ridge National Laboratory. Supported in part by the project “Visual Data Exploration and Analysis ofUltra-large Climate Data” funded by U.S. DOE Office of Science under Contract No. DE-AC05-00OR22725.
2University of Tennessee. Supported in part by the project “NICS Remote Data Analysis and Visualization Center”funded by the Office of Cyberinfrastructure of the U.S. National Science Foundation under Award No.ARRA-NSF-OCI-0906324 for NICS-RDAV center.
Drew Schmidt Elevating R to Supercomputers
Introduction Benchmarks Challenges
Contents
1 Introduction
2 Benchmarks
3 Challenges
Drew Schmidt Elevating R to Supercomputers
Introduction Benchmarks Challenges
pbdR
Why R?
1 Because.
2 R community has growing data size problem.
3 HPC community has growing need for data analytics.
Drew Schmidt Elevating R to Supercomputers 1 / 20
Introduction Benchmarks Challenges
pbdR
Why R?
1 Because.
2 R community has growing data size problem.
3 HPC community has growing need for data analytics.
Drew Schmidt Elevating R to Supercomputers 1 / 20
Introduction Benchmarks Challenges
pbdR
Why R?
1 Because.
2 R community has growing data size problem.
3 HPC community has growing need for data analytics.
Drew Schmidt Elevating R to Supercomputers 1 / 20
Introduction Benchmarks Challenges
pbdR
Why R?
1 Because.
2 R community has growing data size problem.
3 HPC community has growing need for data analytics.
Drew Schmidt Elevating R to Supercomputers 1 / 20
Introduction Benchmarks Challenges
pbdR
Elevating R to Supercomputers
1 Existing code.
2 Syntax.
3 Philosophy.
Drew Schmidt Elevating R to Supercomputers 2 / 20
Introduction Benchmarks Challenges
pbdR
Elevating R to Supercomputers
1 Existing code.
2 Syntax.
3 Philosophy.
Drew Schmidt Elevating R to Supercomputers 2 / 20
Introduction Benchmarks Challenges
pbdR
Elevating R to Supercomputers
1 Existing code.
2 Syntax.
3 Philosophy.
Drew Schmidt Elevating R to Supercomputers 2 / 20
Introduction Benchmarks Challenges
pbdR
Elevating R to Supercomputers
1 Existing code.
2 Syntax.
3 Philosophy.
Drew Schmidt Elevating R to Supercomputers 2 / 20
Introduction Benchmarks Challenges
pbdR
Programming with Big Data in R (pbdR)
Productivity, Portability, Performance
Freea R packages.
Bridging high-performance C withhigh-productivity of R
Distributed data details implicitlymanaged.
Methods have syntax identical to R.
aMPL, BSD, and GPL licensed
Drew Schmidt Elevating R to Supercomputers 3 / 20
Introduction Benchmarks Challenges
pbdR
pbdR Packages
Drew Schmidt Elevating R to Supercomputers 4 / 20
Introduction Benchmarks Challenges
pbdR
pbdMPI vs Rmpi: API
Reduction Operations
Rmpi
1 # int
2 mpi.allreduce(x, type =1)
3 # double
4 mpi.allreduce(x, type =2)
pbdMPI
1 allreduce(x)
Types in R
1 > is.integer (1)
2 [1] FALSE
3 > is.integer (2)
4 [1] FALSE
5 > is.integer (1:2)
6 [1] TRUE
Drew Schmidt Elevating R to Supercomputers 5 / 20
Introduction Benchmarks Challenges
pbdR
pbdMPI vs Rmpi: API
Reduction Operations
Rmpi
1 # int
2 mpi.allreduce(x, type =1)
3 # double
4 mpi.allreduce(x, type =2)
pbdMPI
1 allreduce(x)
Types in R
1 > is.integer (1)
2 [1] FALSE
3 > is.integer (2)
4 [1] FALSE
5 > is.integer (1:2)
6 [1] TRUE
Drew Schmidt Elevating R to Supercomputers 5 / 20
Introduction Benchmarks Challenges
pbdR
pbdMPI vs Rmpi: API
Reduction Operations
Rmpi
1 # int
2 mpi.allreduce(x, type =1)
3 # double
4 mpi.allreduce(x, type =2)
pbdMPI
1 allreduce(x)
Types in R
1 > is.integer (1)
2 [1] FALSE
3 > is.integer (2)
4 [1] FALSE
5 > is.integer (1:2)
6 [1] TRUE
Drew Schmidt Elevating R to Supercomputers 5 / 20
Introduction Benchmarks Challenges
pbdR
pbdMPI vs Rmpi: Performance
Table: Runtimes (seconds) for 10, 000 × 10, 000 allgather with Rmpiand pbdMPI.
Cores Rmpi pbdMPI Speedup
32 24.6 6.7 3.6764 25.2 7.1 3.55
128 22.3 7.2 3.10256 22.4 7.1 3.15
Drew Schmidt Elevating R to Supercomputers 6 / 20
Introduction Benchmarks Challenges
pbdR
pbdR Example Syntax
1 x <- x[-1, 2:5]
2 x <- log(abs(x) + 1)
3 xtx <- t(x) %*% x
4 ans <- svd(solve(xtx))
Look familiar?
The above runs on 1 core with R or 10,000 cores with pbdR
Drew Schmidt Elevating R to Supercomputers 7 / 20
Introduction Benchmarks Challenges
pbdR
Shared and Distributed Memory Machines
Shared Memory
Direct access to read/changememory (one node)
Distributed
No direct access toread/change memory.
Drew Schmidt Elevating R to Supercomputers 8 / 20
Introduction Benchmarks Challenges
pbdR
Shared and Distributed Memory Machines
Shared Memory Machines
Thousands of cores
Nautilus, University of Tennessee1024 cores4 TB RAM
Distributed Memory Machines
Hundreds of thousands of cores
Kraken, University of Tennessee112,896 cores147 TB RAM
Drew Schmidt Elevating R to Supercomputers 9 / 20
Introduction Benchmarks Challenges
Contents
1 Introduction
2 Benchmarks
3 Challenges
Drew Schmidt Elevating R to Supercomputers
Introduction Benchmarks Challenges
Benchmarks
Non-Optimal Choices Throughout
1 Only libre software used (no MKL, ACML, etc.).
2 1 core = 1 MPI process.
3 No tuning for data distribution.
Drew Schmidt Elevating R to Supercomputers 10 / 20
Introduction Benchmarks Challenges
Benchmarks
Non-Optimal Choices Throughout
1 Only libre software used (no MKL, ACML, etc.).
2 1 core = 1 MPI process.
3 No tuning for data distribution.
Drew Schmidt Elevating R to Supercomputers 10 / 20
Introduction Benchmarks Challenges
Benchmarks
Non-Optimal Choices Throughout
1 Only libre software used (no MKL, ACML, etc.).
2 1 core = 1 MPI process.
3 No tuning for data distribution.
Drew Schmidt Elevating R to Supercomputers 10 / 20
Introduction Benchmarks Challenges
Benchmarks
Benchmark Data
1 Random normal N(100, 10000).
2 Local problem size of ≈ 43.4MiB.
3 Three sets: 500, 1000, and 2000 columns.
4 Several runs at different core sizes within each set.
Drew Schmidt Elevating R to Supercomputers 11 / 20
Introduction Benchmarks Challenges
Benchmarks
Benchmark Data
1 Random normal N(100, 10000).
2 Local problem size of ≈ 43.4MiB.
3 Three sets: 500, 1000, and 2000 columns.
4 Several runs at different core sizes within each set.
Drew Schmidt Elevating R to Supercomputers 11 / 20
Introduction Benchmarks Challenges
Benchmarks
Benchmark Data
1 Random normal N(100, 10000).
2 Local problem size of ≈ 43.4MiB.
3 Three sets: 500, 1000, and 2000 columns.
4 Several runs at different core sizes within each set.
Drew Schmidt Elevating R to Supercomputers 11 / 20
Introduction Benchmarks Challenges
Benchmarks
Benchmark Data
1 Random normal N(100, 10000).
2 Local problem size of ≈ 43.4MiB.
3 Three sets: 500, 1000, and 2000 columns.
4 Several runs at different core sizes within each set.
Drew Schmidt Elevating R to Supercomputers 11 / 20
Introduction Benchmarks Challenges
Benchmarks
Covariance Code
1 x <- ddmatrix("rnorm", nrow=n, ncol=p, mean=mean , sd=sd)
2
3 cov.x <- cov(x)
Drew Schmidt Elevating R to Supercomputers 12 / 20
Introduction Benchmarks Challenges
Benchmarks
cov()
0.04
21.3442.68 85.35 170.7 341.41
0.04
21.3442.68 85.35 170.7 341.41
0.04
21.3442.68 85.35 170.7341.41
4
8
12
16
1 504 1008 2016 4032 80641 504 1008 2016 4032 80641 504 1008 2016 4032 8064Cores
Run
Tim
e (S
econ
ds)
Predictors 500 1000 2000
Calculating cov(x) With Fixed Local Size of ~43.4 MiB
Drew Schmidt Elevating R to Supercomputers 13 / 20
Introduction Benchmarks Challenges
Benchmarks
Linear Model Code
1 x <- ddmatrix("rnorm", nrow=n, ncol=p, mean=mean , sd=sd)
2 beta_true <- ddmatrix("runif", nrow=p, ncol =1)
3
4 y <- x %*% beta_true
5
6 beta_est <- lm.fit(x=x, y=y)$coefficients
Drew Schmidt Elevating R to Supercomputers 14 / 20
Introduction Benchmarks Challenges
Benchmarks
lm.fit()
21.34
42.6885.35170.7 341.41
1016.09
21.3442.6885.35 170.7
341.411016.09
21.3442.68
85.35
170.7341.41
1016.09
25
50
75
100
125
504
1008
2016
4032
8064
2400
050
410
0820
1640
3280
64
2400
050
410
0820
1640
3280
64
2400
0
Cores
Run
Tim
e (S
econ
ds)
Predictors 500 1000 2000
Fitting y~x With Fixed Local Size of ~43.4 MiB
Drew Schmidt Elevating R to Supercomputers 15 / 20
Introduction Benchmarks Challenges
Benchmarks
But wait! There’s more. . .
Anything worth doing is worth overdoing.
— Mick Jagger
Drew Schmidt Elevating R to Supercomputers 16 / 20
Introduction Benchmarks Challenges
Benchmarks
But wait! There’s more. . .
Anything worth doing is worth overdoing.
— Mick Jagger
Drew Schmidt Elevating R to Supercomputers 16 / 20
Introduction Benchmarks Challenges
Benchmarks
Data Generation
0
200
400
600
800
1 504 1008 2016 4032 8064 24000 48000Cores
Dat
a G
ener
atio
n (G
iB p
er S
econ
d)
Drew Schmidt Elevating R to Supercomputers 17 / 20
Introduction Benchmarks Challenges
Benchmarks
lm.fit()
21.34
42.68
85.35170.7
341.41
1016.09
2032.18
25
30
35
504
100820
1640
3280
64
2400
0
4800
0
Cores
Run
Tim
e (S
econ
ds)
Predictors 500
Fitting y~x With Fixed Local Size of ~43.4 MiB
Drew Schmidt Elevating R to Supercomputers 18 / 20
Introduction Benchmarks Challenges
Contents
1 Introduction
2 Benchmarks
3 Challenges
Drew Schmidt Elevating R to Supercomputers
Introduction Benchmarks Challenges
Challenges
Challenges
Perceptions.“R? Isn’t that slow?” – HPC people“HPC? Isn’t that hard?” – R people
Package loading.
Profiling.
Data distribution and performance.
Drew Schmidt Elevating R to Supercomputers 19 / 20
Introduction Benchmarks Challenges
Challenges
Challenges
Perceptions.“R? Isn’t that slow?” – HPC people“HPC? Isn’t that hard?” – R people
Package loading.
Profiling.
Data distribution and performance.
Drew Schmidt Elevating R to Supercomputers 19 / 20
Introduction Benchmarks Challenges
Challenges
Challenges
Perceptions.“R? Isn’t that slow?” – HPC people“HPC? Isn’t that hard?” – R people
Package loading.
Profiling.
Data distribution and performance.
Drew Schmidt Elevating R to Supercomputers 19 / 20
Introduction Benchmarks Challenges
Challenges
Challenges
Perceptions.“R? Isn’t that slow?” – HPC people“HPC? Isn’t that hard?” – R people
Package loading.
Profiling.
Data distribution and performance.
Drew Schmidt Elevating R to Supercomputers 19 / 20
Introduction Benchmarks Challenges
Challenges
Covariance Revisited: Distributed Data Parameter Calibration
16x4 Cores 8x8 Cores 4x16 Cores
0
200
400
600
2x2 4x4 8x8 2x2 4x4 8x8 2x2 4x4 8x8Blocking Factor
Run
time
(sec
onds
)
Construct Covariance Matrix for 100,000x1000 Matrix on 64 Processors
Drew Schmidt Elevating R to Supercomputers 20 / 20
Introduction Benchmarks Challenges
Challenges
Thanks for coming!
Questions?
http://r-pbd.org/
Drew Schmidt Elevating R to Supercomputers