Computing Outside The Box September 2009
-
Upload
ian-foster -
Category
Technology
-
view
679 -
download
2
description
Transcript of Computing Outside The Box September 2009
![Page 1: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/1.jpg)
1
Ian FosterComputation Institute
Argonne National Lab & University of Chicago
![Page 2: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/2.jpg)
3
“I’ve been doing cloud computing since before it
was called grid.”
![Page 3: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/3.jpg)
4
1890
![Page 4: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/4.jpg)
5
1953
![Page 5: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/5.jpg)
6
“Computation may someday be organized as a public utility …
The computing utility could become the basis for a new and important
industry.”
John McCarthy
(1961)
![Page 6: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/6.jpg)
7
![Page 7: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/7.jpg)
8Time
Con
nect
ivity
(on
log
scal
e) Science
“When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances”
(George Gilder, 2001)
Grid
![Page 8: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/8.jpg)
9
Application
Infrastructure
![Page 9: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/9.jpg)
10
Layered grid architecture
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Managing multiple resources”: ubiquitous infrastructure services
User“Specialized services”: user- or appln-specific distributed services
InternetTransport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
re
(“The Anatomy of the Grid,” 2001)
![Page 10: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/10.jpg)
11
Application
InfrastructureService oriented infrastructure
![Page 11: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/11.jpg)
12
![Page 12: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/12.jpg)
13www.opensciencegrid.org
![Page 13: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/13.jpg)
14www.opensciencegrid.org
![Page 14: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/14.jpg)
15
Application
InfrastructureService oriented infrastructure
![Page 15: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/15.jpg)
16
ApplicationService oriented applications
InfrastructureService oriented infrastructure
![Page 16: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/16.jpg)
17
![Page 17: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/17.jpg)
18
As of Oct19, 2008:
122 participants105 services
70 data35 analytical
![Page 18: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/18.jpg)
19
Microarray clustering using Taverna
1. Query and retrieve microarray data from a caArray data service:cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub
2. Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService
1. Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage
Workflow in/output
caGrid services
“Shim” servicesothers
Wei Tan
![Page 19: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/19.jpg)
20Infrastructure
Applications
![Page 20: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/20.jpg)
21
Energy
Progress of adoption
![Page 21: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/21.jpg)
22
Energy
Progress of adoption
$$ $$$$
![Page 22: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/22.jpg)
23
Energy
Progress of adoption
$$ $$$$
![Page 23: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/23.jpg)
24Time
Con
nect
ivity
(on
log
scal
e) Science Enterprise
“When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances”
(George Gilder, 2001)
Grid Cloud
![Page 24: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/24.jpg)
25
![Page 25: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/25.jpg)
26
![Page 26: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/26.jpg)
27US$3
![Page 27: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/27.jpg)
28Credit: Werner Vogels
![Page 28: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/28.jpg)
29Credit: Werner Vogels
![Page 29: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/29.jpg)
30
Animoto EC2 image usage
Day 1 Day 8
0
4000
![Page 30: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/30.jpg)
31
Software
Platform
Infrastructure
Salesforce.com, Google,Animoto, …, …, caBIG,TeraGrid gateways
![Page 31: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/31.jpg)
32
Software
Platform
Infrastructure Amazon, GoGrid, Sun,Microsoft, …
Salesforce.com, Google,Animoto, …, …, caBIG,TeraGrid gateways
![Page 32: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/32.jpg)
33
Software
Platform
Infrastructure Amazon, GoGrid,Microsoft, Flexiscale, …
Google, Microsoft, Amazon, …
Salesforce.com, Google,Animoto, …, …, caBIG,TeraGrid gateways
![Page 33: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/33.jpg)
34
![Page 34: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/34.jpg)
35
Dynamo: Amazon’s highly available key-value store (DeCandia et al., SOSP’07)
Simple query model Weak consistency,
no isolation Stringent SLAs (e.g.,
300ms for 99.9% of requests; peak 500 requests/sec)
Incremental scalability
Symmetry Decentralization Heterogeneity
![Page 35: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/35.jpg)
Technologies used in Dynamo
Problem Technique AdvantagePartitioning
Consistent hashing
Incremental scalability
High Availability for writes
Vector clocks with
reconciliation during reads
Version size is decoupled from
update rates
Handling temporary failures
Sloppy quorum and hinted
handoff
Provides high availability and
durability guarantee when some of the replicas are not
availableRecovering from
permanent failures
Anti-entropy using Merkle
trees
Synchronizes divergent replicas in
the background
Membership and failure detection
Gossip-based membership protocol and
failure detection.
Preserves symmetry and avoids having a centralized registry
for storing membership and
node liveness information
![Page 36: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/36.jpg)
Using IaaS for elastic capacity
NimbusNimbus
Amazon EC2Amazon EC2
STAR nodes
Local clusterLocal cluster
STAR nodes
Kate Keahey et al.
![Page 37: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/37.jpg)
38
ApplicationService oriented applications
InfrastructureService oriented infrastructure
![Page 38: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/38.jpg)
39
Birmingham•
The Globus-basedLIGO data grid
Replicating >1 Terabyte/day to 8 sites>100 million replicas so farMTBF = 1 month
LIGO Gravitational Wave Observatory
Cardiff
AEI/Golm
![Page 39: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/39.jpg)
40
Pull “missing” files to a storage system
List of required
Files
GridFTPLocal
ReplicaCatalog
ReplicaLocation
Index
Data Replication
Service
Reliable File
Transfer Service Local
ReplicaCatalog
GridFTP
Data replication service
“Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005
ReplicaLocation
Index
Data MovementData Location
Data Replication
![Page 40: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/40.jpg)
41
Specializing further …
User
ServiceProvider
“Provide access to data D at S1, S2, S3 with performance P”
ResourceProvider
“Provide storage with performance P1, network with P2, …”
D
S1
S2
S3
D
S1
S2
S3Replica catalog,User-level multicast, …
D
S1
S2
S3
![Page 41: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/41.jpg)
42
My servers
ChicagoChicago
handle.net
BIRN
Chicago
IaaS provider
Chicago
BIRN
Chicago
Using IaaS in biomedical informatics
![Page 42: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/42.jpg)
43
Clouds and supercomputers:Conventional wisdom?
Too slow
Too expensive
Clouds/clusters
Supercomputers
Loosely coupledapplications
Tightly coupledapplications
✔
✔
![Page 43: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/43.jpg)
44Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.
![Page 44: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/44.jpg)
45Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.
![Page 45: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/45.jpg)
46Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.
![Page 46: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/46.jpg)
47Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.
![Page 47: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/47.jpg)
48D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation from
time series. SIGMETRICS 2007: 379-380
![Page 48: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/48.jpg)
49D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation from
time series. SIGMETRICS 2007: 379-380
![Page 49: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/49.jpg)
50
![Page 50: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/50.jpg)
51
Clouds and supercomputers:Conventional wisdom?
Good for rapid
response
Too expensive
Clouds/clusters
Supercomputers
Loosely coupledapplications
Tightly coupledapplications
✔
✔
![Page 51: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/51.jpg)
5252
Loosely coupled problems Ensemble runs to quantify climate model uncertainty Identify potential drug targets by screening a database
of ligand structures against target proteins Study economic model sensitivity to parameters Analyze turbulence dataset from many perspectives Perform numerical optimization to determine optimal
resource assignment in energy problems Mine collection of data from advanced light sources Construct databases of computed properties of chemical
compounds Analyze data from the Large Hadron Collider Analyze log data from 100,000-node parallel
computations
![Page 52: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/52.jpg)
53
Many many tasks:Identifying potential drug targets
2M+ ligands Protein xtarget(s)
(Mike Kubal, Benoit Roux, and others)
![Page 53: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/53.jpg)
54
start
report
DOCK6Receptor
(1 per protein:defines pocket
to bind to)
ZINC3-D
structures
ligands complexes
NAB scriptparameters
(defines flexibleresidues, #MDsteps)
Amber Score:1. AmberizeLigand3. AmberizeComplex5. RunNABScript
end
BuildNABScript
NABScript
NABScript
Template
Amber prep:2. AmberizeReceptor4. perl: gen nabscript
FREDReceptor
(1 per protein:defines pocket
to bind to)
Manually prepDOCK6 rec file
Manually prepFRED rec file
1 protein(1MB)
6 GB2M
structures(6 GB)
DOCK6FRED ~4M x 60s x 1 cpu~60K cpu-hrs
Amber~10K x 20m x 1 cpu
~3K cpu-hrs
Select best ~500
~500 x 10hr x 100 cpu~500K cpu-hrsGCMC
PDBprotein
descriptions
Select best ~5KSelect best ~5K
For 1 target:4 million tasks
500,000 cpu-hrs(50 cpu-years)
![Page 54: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/54.jpg)
55
![Page 55: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/55.jpg)
56
DOCK on BG/P: ~1M tasks on 118,000 CPUs
CPU cores: 118784 Tasks: 934803 Elapsed time: 7257 sec Compute time: 21.43 CPU years Average task time: 667 sec Relative Efficiency: 99.7% (from 16 to 32 racks) Utilization:
Sustained: 99.6% Overall: 78.3%
• GPFS
• 1 script (~5KB)
• 2 file read (~10KB)
• 1 file write (~10KB)
• RAM (cached from GPFS on first task per node)
• 1 binary (~7MB)
• Static input data (~45MB)IoanRaicu
ZhaoZhang
MikeWilde
Time (secs)
![Page 56: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/56.jpg)
57
Managing 160,000 cores
Slower shared storage
High-speed local “disk”
Falkon
![Page 57: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/57.jpg)
58
Scaling Posix to
petascale
LFS Computenode
(local datasets)
LFS Computenode
(local datasets)
…
. . .
Largedataset
CN-striped intermediate file system
Torus and tree interconnects
Global file systemChirp(multicast)
MosaStore(striping)
Staging
Intermediate
Local
![Page 58: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/58.jpg)
59Efficiency for 4 second tasks and varying data size (1KB to 1MB) for CIO and GPFS up to 32K processors
![Page 59: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/59.jpg)
60
“Sine” workload, 2M tasks, 10MB:10ms ratio, 100 nodes, GCC policy, 50GB caches/node
IoanRaicu
![Page 60: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/60.jpg)
61
“Sine” workload, 2M tasks, 10MB:10ms ratio, 100 nodes, GCC policy, 50GB caches/node
IoanRaicu
![Page 61: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/61.jpg)
62Same scenario, but with dynamic resource provisioning
![Page 62: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/62.jpg)
63Same scenario, but with dynamic resource provisioning
![Page 63: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/63.jpg)
64
Data diffusion sine-wave workload: Summary
GPFS 5.70 hrs, ~8Gb/s, 1138 CPU hrs DD+SRP 1.80 hrs, ~25Gb/s, 361 CPU hrs DD+DRP 1.86 hrs, ~24Gb/s, 253 CPU hrs
![Page 64: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/64.jpg)
65
Clouds and supercomputers:Conventional wisdom?
Good for rapid
response
Excellent
Clouds/clusters
Supercomputers
Loosely coupledapplications
Tightly coupledapplications
✔
✔
![Page 65: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/65.jpg)
66
“The computer revolution hasn’t happened yet.”
Alan Kay, 1997
![Page 66: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/66.jpg)
67Time
Con
nect
ivity
(on
log
scal
e) Science Enterprise Consumer
“When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances”
(George Gilder, 2001)
Grid Cloud ????
![Page 67: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/67.jpg)
68
Energy InternetThe Shape of Grids to Come?
![Page 68: Computing Outside The Box September 2009](https://reader035.fdocuments.in/reader035/viewer/2022062703/554ea038b4c9055f7b8b468d/html5/thumbnails/68.jpg)
Computation Institutewww.ci.uchicago.edu
Thank you!