CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get...
Transcript of CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get...
![Page 1: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/1.jpg)
CSC2231: A Case for NOW
Stefan SaroiuDepartment of Computer Science
University of Toronto
http://www.cs.toronto.edu/~stefan/courses/csc2231/05au
![Page 2: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/2.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Administrivia• Research report:
– If you don’t have a group, see me after class– Choose topic <-- DUE on Monday at noon!
• Project:– Form group <-- DUE on Monday at noon!
• If you are debating whether to take the class– My advice: Don’t take it!!!
![Page 3: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/3.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Playfield• Supercomputers (Cray)
– Engineered and tuned for performance• Massively parallel processors (CM-5)
– Commodity processors, custom interconnect, integration, OS– Typically NUMA (each CPU has own memory, OS)
• Symmetric multiprocessors (SUN Enterprise, SGI machines)– Commodity processors, custom integration– Shared memory, commodity SMP-aware OS
• Clusters/NOW– Commodity nodes, OS, LAN– Custom applications, glue, network components
![Page 4: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/4.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Cray
![Page 5: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/5.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
FOR SALE: Cray Y-MPC90• In 2000, on eBay:
– There is a Cray Y-MP C90 supercomputer for sale oneBay. The current bid as this is written is US$44,500.69.The system features 16 processors, 4 GB of mainmemory, 4 GB of solid state storage, and 130 GB ofRAIDed hard drive space. The original price in 1991 was$10 million.
![Page 6: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/6.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
CM-5
![Page 7: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/7.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
SUN Enterprise 450
![Page 8: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/8.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Cluster
![Page 9: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/9.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Hardware food chain
supercomputer:supercomputer: O($10 million) O($10 million) O(100s) sold O(100s) sold highest highest perfperf./unit./unit worst price/worst price/perfperf..
MPP:MPP: O($1 million) O($1 million) O(1000s) sold O(1000s) sold medium medium perfperf./unit./unit medium price/ medium price/perfperf..
workstation/PC:workstation/PC: O($2000) O($2000) O(millions) sold O(millions) sold lowest lowest perfperf./unit./unit best price/ best price/perfperf..
![Page 10: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/10.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Explaining price/performance
hardware integration OS market lag applications
supercomputer custom custom custom 3-4 years customized
MPP commodity custom modified 1-2 years customized
workstation commodity commodity commodity none commodity
• SCs / MPPs lag workstations in the technology curve– more expensive custom hardware– time consuming integration– costly, time consuming software development
• Can one get performance of SC / MPP, with commodity PCs?
![Page 11: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/11.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
The Now Idea
• hey, let’s build a SC / MPP using of commodity PCs– best price / perf., avoids market lag of components
![Page 12: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/12.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
What makes this hard?
– fortunately, commodity LANs began to catch up ~1995• Myrinet 100 Mb/s switched ethernet Gigabit ethernet
•• high bandwidth, low latency, scalable network fabricshigh bandwidth, low latency, scalable network fabrics– crucial for fine-grained parallel apps, latency sensitive apps
•• need to scale and increase performance in OSneed to scale and increase performance in OS– NOW cannot afford to fork off of commodity OS development– insight: build “glue layer” for unix (GLUnix) (also what Google does)
•• even so, some differences are still visibleeven so, some differences are still visible– architectural: shared nothing, partial failure, heterogeneity
– software: commodity network stack, multiple OSs
![Page 13: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/13.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Timeline of SC/parallel apps
case for NOW
1994
simulation ofparticles, bombs, chemicals
timesharing installations[distributed OSs]
1940 --- 1993
•• ““setting a research agendasetting a research agenda”” 101: 101:– predict new apps for emerging technologies
• rarely get this right– or, project old apps onto new technology trends
![Page 14: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/14.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Projecting apps onto NOW• one installation to handle interactive and parallel jobs
– interactive: lots of idle resources, absorb for parallel jobs• harvest desktop PCs into NOW
– parallel: hogs, need to displace when interactive returns• PC is unit of scaling: all resources scale up with cluster
– exploit extra resources, idle resources• network RAM• cooperative caching
• network latency crucial for fine grained parallel apps– get the OS network stack out of the way– user-level networks (begat VIA, infiniband, DAFS, …)
![Page 15: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/15.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Were these good bets?• mostly, but some surprises along the way…• shared nothing is a mixed blessing
+ incremental scalability+ fault tolerance compared with SCS, MPPs– but, partial failure is a nasty mess
• the R in RAID: # failures scales with # nodes– group membership, replicated data, … [vaxclusters, parallel DB]– why programming parallel software is incredibly hard
•• system administration costs can scale up toosystem administration costs can scale up too– if not careful, cost proportional to # nodes
•• PC lifecycle creates significant heterogeneityPC lifecycle creates significant heterogeneity– HW balancing, network & CPU load balancing, capital depreciation
![Page 16: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/16.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Rest of timeline of parallel applications
Internet services
1995
case for NOW
1994
simulation ofparticles, bombs, chemicals
timesharing installations[distributed OSs]
1940 --- 1993
homelandsec., sensor
fusion
protein foldingcomp. bio.
1999 2003
•• A year after the paper was written, everything changedA year after the paper was written, everything changed• of course, this was basically impossible to predict
• but it turned out to be a perfect fit for clusters
![Page 17: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/17.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Implications of this (next week)• dedicated clusters
– forget about interactive jobs, distributed OSs– cluster as virtual server: three-tier model
• L4 switch, web server FE, middleware, DB back-end
• easier programming models– “embarassingly parallel”– “Internet consistency”, “reload consistency”
• different physical packaging– machine room: high density, low futz– real estate and energy became real costs (in silicon valley)
• manageability more important than performance– people are the most expensive resource
![Page 18: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/18.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Paper’s contributions• non-contributions
– inventing clusters, distributed operating systems– inventing (or predicting!) scalable Internet services
• follow-on work [Brewer et al.] did go after this• quasi-contributions
– practical user-level networking for clusters– notions of cooperative caching in VM, FS, …
• real contributions– recognizing that commodity LANs are cluster enablers– putting another nail in the coffin for MPPs– predicting enormous rise in popularity of clusters
• got simulation app right, initially missed Internet apps
![Page 19: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/19.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Discussion…
![Page 20: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/20.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Discussion• Is xFS the right model?
![Page 21: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/21.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Discussion• is xFS the right model?
– some traction, but not all the reasons are in the paper
• similarity between xFS and P2P– both shun any centralized dependency– P2P: legal xFS: practical issue for building-wide FS
• don’t always believe in centralized scaling bottlenecks– don’t need to be fully serverless to get benefits
• parallelize dominant cost to scale performance– disk I/Os, disk capacity
• centralize control structure, make pairwise redundant
• today’s version of xFS: cluster file systems, SANs
![Page 22: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/22.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Discussion• Will network RAM work in practice?
![Page 23: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/23.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Discussion• Will network RAM work in practice?
– yes, with strong caveats– still much worse latency than real RAM
• 10 ns vs. 50 microseconds– partial failures nail you
• back to MPP failure mode– requires free memory on some nodes
• multiplexing argument: what happens as utilization grows?
![Page 24: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/24.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Discussion• “security to be achilles’ heel of NOWs” – true?
![Page 25: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/25.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Discussion• “security to be achilles’ heel of NOWs” – true?
– not true anymore for closed machine room, single app cluster• but, clusters-on-demand, cluster reserves, emulab• safely time and space division multiplex cluster
– clusters are a nice multiplicative constant for attacks• break into one, you can break into all – zombie army
![Page 26: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/26.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Discussion• what about blades?
![Page 27: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/27.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Discussion• what about blades?
– unit of field replacement is smaller than workstation+ hot-swap failed component rather than entire node+ cabinetry is better designed (no visible wires, headless)– pay packaging cost per disk, CPU, NIC rather than per node– currently, components lag PCs by about a year
– higher density+ less real estate cost– power density and cooling requirements beyond data centers
![Page 28: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/28.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Discussion• limits of scale of NOWs/clusters?
![Page 29: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/29.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Discussion• limits of scale of NOWs/clusters?
– Google: 20,000 PCs (old figure)– Bomb supercomputers: 10,000 PCs
• all of the benefits at O(100)
![Page 30: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/30.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Discussion• Did cooperative caching have legs?
![Page 31: CSC2231: A Case for NOW€¦ · –costly, time consuming software development • Can one get performance of SC / MPP, with commodity PCs? CSC2231: Internet Systems Stefan Saroiu](https://reader036.fdocuments.in/reader036/viewer/2022070113/605c1887cac781686769554f/html5/thumbnails/31.jpg)
CSC2231: Internet Systems Stefan Saroiu 2005
Discussion• Did cooperative caching have legs?
– useful if:• cluster is multiplexed, we’re in a load valley (idle resources), and
one program can scale up to absorb the idle• multiple consumers that share data
– implicit or explicit sharing– by partitioning shared piece, each node only manages its
own non-shared, plus a small fraction of shared– good idea if non-shared working set smaller than cache size
– this idea came up later in the Internet services world• LARD: partition working set instead of replicating it