CSTB_SuperComputing_Study_Group.ppt

69
1 Information Centric Super Computing Jim Gray Microsoft Research [email protected] Talk at http://research.microsoft.com/~gray/talk s 20 May 2003 Presentation to Committee on the Future of Supercomputing of the National Research Council's Computer Science and Telecommunications

description

 

Transcript of CSTB_SuperComputing_Study_Group.ppt

Page 1: CSTB_SuperComputing_Study_Group.ppt

1

Information Centric Super Computing

Jim GrayMicrosoft Research

[email protected] at http://research.microsoft.com/~gray/talks

20 May 2003Presentation to

Committee on the Future of Supercomputing of the National Research Council's

Computer Science and Telecommunications Board

Page 2: CSTB_SuperComputing_Study_Group.ppt

2

Committee Goal… assess the status of supercomputing in the United States,

including the characteristics of relevant systems and architecture research in government, industry, and academia and the characteristics of the relevant market. The committee will examine key elements of context--the history of supercomputing, the erosion of research investment, the needs of government agencies for supercomputing capabilities--and assess options for progress. Key historical or causal factors will be identified. The committee will examine the changing nature of problems demanding supercomputing (e.g., weapons design, molecule modeling and simulation, cryptanalysis, bioinformatics, climate modeling) and the implications for systems design. It will seek to understand the role of national security in the supercomputer

market and the long-term federal interest in supercomputing.

Page 3: CSTB_SuperComputing_Study_Group.ppt

3

Summary: It’s the Software…

• superComputing is Information centric

• Scientific computing is Beowulf computing

• Scientific computing becoming Info-centric.

• Adequate investment in files/OS/networking

• Underinvestment in Scientific Information management and visualization tools.

• Computation Grid moves too much data, DataGrid (or App Grid) is right concept

Page 4: CSTB_SuperComputing_Study_Group.ppt

4

Thesis

• Most new information is digital(and old information is being digitized)

• A Computer Science Grand Challenge:– Capture– Organize– Summarize– Visualize

This information

• Optimize Human Attention as a resource

• Improve information quality

Page 5: CSTB_SuperComputing_Study_Group.ppt

5

Information Avalanche• The Situation

– We can record everything– Everything is a LOT!

• The Good news– Changes science, education, medicine, entertainment,….– Shrinks time and space– Can augment human intelligence

• The Bad News– The end of privacy– Cyber Crime / Cyber Terrorism– Monoculture

• The Technical Challenges– Amplify human intellect– Organize, summarize and prioritize information– Make programming easy

Page 6: CSTB_SuperComputing_Study_Group.ppt

6

Super Computers • You and Others use

Every day – Google, Inktomi,…– AOL, MSN, Yahoo!– Hotmail, MSN,…– eBay, Amazon.com,

• All are more than 10 Tops

• All more than 1PB

• IntraNets – Wal-Mart– Federal Reserve– Amex – 1 Tflops

• All more than 1PB

They are ALL Information Centric

Page 7: CSTB_SuperComputing_Study_Group.ppt

7

Q: How can I recognize a SuperComputer?A: Costs 10M$

Gordon Bell’s Seven Price Tiers 10$: wrist watch computers (sensors) 100$: pocket/ palm computers

(phone/camera) 1,000$: portable computers (tablet) 10,000$: personal computers (workstation) 100,000$: departmental computers (closet) 1,000,000$: site computers (glass house) 10,000,000$: regional computers (glass castle

SC)Super Computer / “Mainframe” Costs more than 1M$Must be an array of processors,

diskscomm ports

Page 8: CSTB_SuperComputing_Study_Group.ppt

8

Computing is Information Centricthat’s why they call it IT

• Programs capture, organize, abstract, filter, present Information to people.

• Networks carry Information.• File is wrong abstraction:

Information is typed / schematizedwords, pictures, sounds, arrays, lists,..

• Notice that none of the examples on prev slideserve files – they serve typed information.

• Recommendation:Increase Research investments ABOVE the OS levelInformation Management/Visualization

Page 9: CSTB_SuperComputing_Study_Group.ppt

9

Summary: It’s the Software…

• Computing is Information centric

• Scientific computing is Beowulf computing

• Scientific computing becoming Info-centric.

• Adequate investment in files/OS/networking

• Underinvestment in Scientific Information management and visualization tools.

• Computation Grid moves too much data, DataGrid (or App Grid) is right concept

Page 10: CSTB_SuperComputing_Study_Group.ppt

10

Anecdotal Evidence,Everywhere I go I see Beowulfs

• Clusters of PCs (or high-slice-price micros)• True: I have not visited Earth Simulator,

but… Google, MSN, Hotmail, Yahoo, NCBI, FNAL, Los Alamos, Cal Tech, MIT, Berkeley, NARO, Smithsonian, Wisconsin, eBay, Amazon.com, Schwab, Citicorp, Beijing, Cern, BaBar, NCSA, Cornell, UCSD, and of course NASA and Cal Tech

Page 11: CSTB_SuperComputing_Study_Group.ppt

11

Super ComputingThe Top 10 of Top 500

Adapted from Top500 Nov 2002

  Hardware TeraFlops SiteCumulative

TF

1 NEC Earth-Sim 35.9 Earth Sim Ctr 36

2 HP ASCI Q 7.7 LLNL 44

3 HP ASCI Q 7.7 LLNL 51

4 IBM ASCI White 7.2 LLNL 59

5 Intel/NetworX 5.7 LLNL 64

6 HP Alpha 4.5 PSC 69

7 HP Alpha 4.0 CEA 73

8 Intel/HPTi 3.3 NOAA 76

9 IBM SP2 3.2 HPCx 79

10 IBM SP2 3.2 NCAR 82

skip

Page 12: CSTB_SuperComputing_Study_Group.ppt

12

Seti@HomeThe worlds most powerful computer• 61 TF is sum of top 4 of Top 500.• 61 TF is 9x the number 2 system.• 61 TF more than the sum of systems 2..10

Seti@Homehttp://setiathome.ssl.berkeley.edu/totals.html

20 May 2003

  Total Last 24 Hours

Users 4,493,731 1,900

Results received 886 M 1,4 M

Total CPU time 1.5 M years 1,514 years

Floating Point Operations

3 E+21 ops3 zeta ops

5 E+18 FLOPS/day

61.3 TeraFLOPs

skip

Page 13: CSTB_SuperComputing_Study_Group.ppt

13

And…

• Google: – 10k cpus, 2PB,… as of 2 years ago– 40 Tops

• AOL, MSN, Hotmail, Yahoo!, … -- all ~10K cpus -- all have ~ 1PB …10PB storage

• Wal-Mart is a PB poster child

• Clusters / Beowulf everywhere you go.

skip

Page 14: CSTB_SuperComputing_Study_Group.ppt

14

Scientific == Beowulf (clusters) • Scientific/ Beowulf/ Grid computing

70’s style computing: process / file / socketbyte arrays, no data schema or semantics

batch job scheduling manual parallelism (MPI)poor / no Information management supportpoor / no Information visualization toolkits

• Recommendation: Increase investment in Info-Management Increase investment in Info-Visualization

Page 15: CSTB_SuperComputing_Study_Group.ppt

15

Summary: It’s the Software…

• Computing is Information centric

• Scientific computing is Beowulf computing

• Scientific computing becoming Info-centric.

• Adequate investment in files/OS/networking

• Underinvestment in Scientific Information management and visualization tools.

• Computation Grid moves too much data, DataGrid (or App Grid) is right concept

Page 16: CSTB_SuperComputing_Study_Group.ppt

16

The Evolution of Science• Observational Science

– Scientist gathers data by direct observation– Scientist analyzes Information

• Analytical Science – Scientist builds analytical model– Makes predictions.

• Computational Science – Simulate analytical model– Validate model and makes predictions

• Science - InformaticsInformation Exploration Science Information captured by instrumentsOr Information generated by simulator– Processed by software– Placed in a database / files– Scientist analyzes database / files

Page 17: CSTB_SuperComputing_Study_Group.ppt

17

How Discoveries Made?Adapted from slide by George Djorgovski

• Conceptual Discoveries: e.g., Relativity, QM, Brane World, Inflation … Theoretical, may be inspired by observations

• Phenomenological Discoveries: e.g., Dark Matter, QSOs, GRBs, CMBR, Extrasolar Planets, Obscured Universe …Empirical, inspire theories, can be motivated by them

New TechnicalNew TechnicalCapabilitiesCapabilities

ObservationalObservationalDiscoveriesDiscoveries TheoryTheory

Phenomenological Discoveries:Phenomenological Discoveries:Explore parameter space Explore parameter space

Make new connections (e.g., multi-Make new connections (e.g., multi-))Understanding of complex phenomena requires Understanding of complex phenomena requires complex, information-rich data (and simulations?)complex, information-rich data (and simulations?)

Page 18: CSTB_SuperComputing_Study_Group.ppt

18

The Information Avalancheboth comp-X and X-infogenerating Petabytes

• Comp-Science generating Information avalanche

comp-chem, comp-physics,

comp-bio, comp-astro, comp-linguistics, comp-music, comp-

entertainment, comp-warfare

• Science-Info generating Information avalanche

bio-info, astro-info, text-info,

Page 19: CSTB_SuperComputing_Study_Group.ppt

19

Information Avalanche Stories• Turbulence: 100 TB simulation

then mine the Information • BaBar: Grows 1TB/day

2/3 simulation Information 1/3 observational Information

• CERN: LHC will generate 1GB/s10 PB/y

• VLBA (NRAO) generates 1GB/s today• NCBI: “only ½ TB” but doubling each year

very rich dataset.• Pixar: 100 TB/Movie

Page 20: CSTB_SuperComputing_Study_Group.ppt

20

Astro-InfoWorld Wide Telescope

http://www.astro.caltech.edu/nvoconf/http://www.voforum.org/

• Premise: Most data is (or could be online)• Internet is the world’s best telescope:

– It has data on every part of the sky– In every measured spectral band: optical, x-ray, radio..

– As deep as the best instruments (2 years ago).– It is up when you are up.

The “seeing” is always great (no working at night, no clouds no moons no..).

– It’s a smart telescope: links objects and data to literature on them.

Page 21: CSTB_SuperComputing_Study_Group.ppt

21

Why Astronomy Data?•It has no commercial value

–No privacy concerns–Can freely share results with others–Great for experimenting with algorithms

•It is real and well documented– High-dimensional data (with confidence intervals)– Spatial data– Temporal data

•Many different instruments from many different places and many different times•But, it’s the same universe

so comparisons make sense & are interesting.•Federation is a goal•There is a lot of it (petabytes)•Great sandbox for data mining algorithms

–Can share cross company–University researchers

•Great way to teach both Astronomy and Computational Science

IRAS 100

ROSAT ~keV

DSS Optical

2MASS 2

IRAS 25

NVSS 20cm

WENSS 92cm

GB 6cm

Page 22: CSTB_SuperComputing_Study_Group.ppt

22

Summary: It’s the Software…

• Computing is Information centric

• Scientific computing is Beowulf computing

• Scientific computing becoming Info-centric.

• Adequate investment in files/OS/networking

• Underinvestment in Scientific Information management and visualization tools.

• Computation Grid moves too much data, DataGrid (or App Grid) is right concept

Page 23: CSTB_SuperComputing_Study_Group.ppt

23

What X-info Needs from us (cs)(not drawn to scale)

Science Data & Questions

Scientists

DatabaseTo store

dataExecuteQueries

Plumbers

Data Mining

Algorithms

Miners

Question & AnswerVisualizat

ion

Tools

Page 24: CSTB_SuperComputing_Study_Group.ppt

24

Data Access is hitting a wallFTP and GREP are not adequate

• You can GREP 1 MB in a second• You can GREP 1 GB in a minute • You can GREP 1 TB in 2 days• You can GREP 1 PB in 3 years.

• Oh!, and 1PB ~5,000 disks

• At some point you need indices to limit searchparallel data search and analysis

• This is where databases can help

• You can FTP 1 MB in 1 sec• You can FTP 1 GB / min (= 1 $/GB)

• … 2 days and 1K$• … 3 years and 1M$

Page 25: CSTB_SuperComputing_Study_Group.ppt

25

Next-Generation Data Analysis• Looking for

– Needles in haystacks – the Higgs particle– Haystacks: Dark matter, Dark energy

• Needles are easier than haystacks• Global statistics have poor scaling

– Correlation functions are N2, likelihood techniques N3

• As data and processing grow at same rate, we can only keep up with N logN

• A way out? – Discard notion of optimal (data is fuzzy, answers are approximate)– Don’t assume infinite computational resources or memory

• Requires combination of statistics & computer science

• Recommendation: invest in data mining researchboth general and domain-specific.

Page 26: CSTB_SuperComputing_Study_Group.ppt

26

Analysis and Databases• Statistical analysis deals with

– Creating uniform samples – data filtering & censoring bad data– Assembling subsets– Estimating completeness – Counting and building histograms– Generating Monte-Carlo subsets– Likelihood calculations– Hypothesis testing

• Traditionally these are performed on files• Most of these tasks are much better done inside a database

close to the data.• Move Mohamed to the mountain,

not the mountain to Mohamed.• Recommendation: Invest in database research:

extensible databases: text, temporal, spatial, …data interchange, parallelism, indexing, query optimization

Page 27: CSTB_SuperComputing_Study_Group.ppt

27

Goal: Easy Data Publication & Access

• Augment FTP with data query: Return intelligent data subsets

• Make it easy to – Publish: Record structured data– Find:

• Find data anywhere in the network• Get the subset you need

– Explore datasets interactively

• Realistic goal: – Make it as easy as

publishing/reading web sites today.

Page 28: CSTB_SuperComputing_Study_Group.ppt

28

Federation

Data Federations of Web Services• Massive datasets live near their owners:

– Near the instrument’s software pipeline– Near the applications– Near data knowledge and curation– Super Computer centers become Super Data Centers

• Each Archive publishes a web service– Schema: documents the data– Methods on objects (queries)

• Scientists get “personalized” extracts

• Uniform access to multiple Archives– A common global schema

Page 29: CSTB_SuperComputing_Study_Group.ppt

29

Web Services: The Key?• Web SERVER:

– Given a url + parameters – Returns a web page (often dynamic)

• Web SERVICE:– Given a XML document (soap msg)– Returns an XML document– Tools make this look like an RPC.

• F(x,y,z) returns (u, v, w)

– Distributed objects for the web.– + naming, discovery, security,..

• Internet-scale distributed computing

Yourprogram

DataIn your address

space

Web Service

soap

object

in

xml

Yourprogram Web

Server

http

Web

page

Page 30: CSTB_SuperComputing_Study_Group.ppt

30

The Challenge• This has failed several times before–

understand why.

• Develop – Common data models (schemas),– Common interfaces (class/method)

• Build useful prototypes (nodes and portals)

• Create a community that uses the prototypes and evolves the prototypes.

Page 31: CSTB_SuperComputing_Study_Group.ppt

31

Grid and Web Services Synergy• I believe the Grid will be many web services• IETF standards Provide

– Naming– Authorization / Security / Privacy– Distributed Objects

Discovery, Definition, Invocation, Object Model

– Higher level services: workflow, transactions, DB,..

• Synergy: commercial Internet & Grid tools

Page 32: CSTB_SuperComputing_Study_Group.ppt

32

Summary: It’s the Software…

• Computing is Information centric

• Scientific computing is Beowulf computing

• Scientific computing becoming Info-centric.

• Adequate investment in files/OS/networking

• Underinvestment in Scientific Information management and visualization tools.

• Computation Grid moves too much data, DataGrid (or App Grid) is right concept

Page 33: CSTB_SuperComputing_Study_Group.ppt

33

Recommendations

• Increase Research investments ABOVE the OS level Information Management/Visualization

• Invest in database research:extensible databases: text, temporal, spatial,

…data interchange, parallelism, indexing, query optimization

• invest in data mining researchboth general and domain-specific

Page 34: CSTB_SuperComputing_Study_Group.ppt

34

Stop Here

• Bonus slides on Distributed Computing Economics

Page 35: CSTB_SuperComputing_Study_Group.ppt

35

Distributed Computing Economics

• Why is Seti@Home a great idea

• Why is Napster a great deal?

• Why is the Computational Grid uneconomic

• When does computing on demand work?

• What is the “right” level of abstraction

• Is the Access Grid the real killer app?

Based on: Distributed Computing Economics, Jim Gray, Microsoft Tech report, March 2003, MSR-TR-2003-24

http://research.microsoft.com/research/pubs/view.aspx?tr_id=655

Page 36: CSTB_SuperComputing_Study_Group.ppt

36

Computing is Free

• Computers cost 1k$ (if you shop right)

• So 1 cpu day == 1$

• If you pay the phone bill (and I do)Internet bandwidth costs 50 … 500$/mbps/m(not including routers and management).

• So 1GB costs 1$ to send and 1$ to receive

Page 37: CSTB_SuperComputing_Study_Group.ppt

37

Why is Seti@Home a Good Deal?

• Send 300 KB for costs 3e-4$

• User computes for ½ day: benefit .5e-1$

• ROI: 1500:1

Page 38: CSTB_SuperComputing_Study_Group.ppt

38

Why is Napster a Good Deal?

• Send 5 MB costs 5e-3$• ½ a penny per song• Both sender and receiver can afford it.

• Same logic powers web sites (Yahoo!...):– 1e-3$/page view advertising revenue– 1e-5$/page view cost of serving web page– 100:1 ROI

Page 39: CSTB_SuperComputing_Study_Group.ppt

39

The Cost of Computing:Computers are NOT free!

• Capital Cost of a TpcC system is mostly storage and storage software (database)

• IBM 32 cpu, 512 GB ram 2,500 disks, 43 TB (680,613 tpmC @ 11.13 $/tpmc available 11/08/03)http://www.tpc.org/results/individual_results/IBM/IBMp690es_05092003.pdf

• A 7.5M$ super-computer

• Total Data Center Cost: 40% capital &facilities60% staff

(includes app development)

TpcC Cost Components DB2/AIXhttp://www.tpc.org/results/individual_results/IBM /IBM p690es_05092003.pdf

cpu/mem29%

storage61%

software10%

Page 40: CSTB_SuperComputing_Study_Group.ppt

40

Computing Equivalents1 $ buys

• 1 day of cpu time

• 4 GB ram for a day

• 1 GB of network bandwidth

• 1 GB of disk storage

• 10 M database accesses

• 10 TB of disk access (sequential)

• 10 TB of LAN bandwidth (bulk)

Page 41: CSTB_SuperComputing_Study_Group.ppt

41

Some consequences• Beowulf networking is

10,000x cheaper than WAN networkingfactors of 105 matter.

• The cheapest and fastest way to move a Terabyte cross country is sneakernet.24 hours = 4 MB/s50$ shipping vs 1,000$ wan cost.

• Sending 10PB CERN data via networkis silly: buy disk bricks in Geneva, fill them, ship them.

TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data ExchangeJim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan vandenBergMicrosoft Technical Report may 2002, MSR-TR-2002-54 http://research.microsoft.com/research/pubs/view.aspx?tr_id=569

Page 42: CSTB_SuperComputing_Study_Group.ppt

42

How Do You Move A Terabyte?

14 minutes6172001,920,0009600OC 192

2.2 hours1000Gbps

1 day100100 Mpbs

14 hours97631649,000155OC3

2 days2,01065128,00043T3

2 months2,4698001,2001.5T1

5 months360117700.6Home DSL

6 years3,0861,000400.04Home phone

Time/TB$/TBSent

$/MbpsRent

$/monthSpeedMbps

Context

Source: TeraScale Sneakernet, Microsoft Research, Jim Gray et. all Source: TeraScale Sneakernet, Microsoft Research, Jim Gray et. all

Page 43: CSTB_SuperComputing_Study_Group.ppt

43

Computational Grid Economics

• To the extent that computational grid is like Seti@Home or ZetaNet or Folding@home or… it is a great thing

• The extent that the computational grid is MPI or data analysis, it fails on economic grounds: move the programs to the data, not the data to the programs.

• The Internet is NOT the cpu backplane.• The USG should not hide this economic fact

from the academic/scientific research community.

Page 44: CSTB_SuperComputing_Study_Group.ppt

44

Computing on Demand• Was called outsourcing / service bureaus

in my youth. CSC and IBM did it.• It is not a new way of doing things: think payroll.

Payroll is standard outsource.• Now we have Hotmail, Salesforce.com, Oracle.com,

….• Works for standard apps.• Airlines outsource reservations.

Banks outsource ATMs.• But Amazon, Amex, Wal-Mart, ...

Can’t outsource their core competence.• So, COD works for commoditized services.

Page 45: CSTB_SuperComputing_Study_Group.ppt

45

What’s the right abstraction level for Internet Scale Distributed Computing?• Disk block? No too low.• File? No too low.• Database? No too low.• Application? Yes, of course.

– Blast search– Google search– Send/Get eMail– Portals that federate astronomy archives

(http://skyQuery.Net/)

• Web Services (.NET, EJB, OGSA) give this abstraction level.

Page 46: CSTB_SuperComputing_Study_Group.ppt

46

Access Grid• Q: What comes after the telephone?

• A: eMail?

• A: Instant messaging?

• Both seem retro technology: text & emotons.

• Access Grid could revolutionize human communication.

• But, it needs a new idea.

• Q: What comes after the telephone?

Page 47: CSTB_SuperComputing_Study_Group.ppt

47

Distributed Computing Economics

• Why is Seti@Home a great idea?

• Why is Napster a great deal?

• Why is the Computational Grid uneconomic

• When does computing on demand work?

• What is the “right” level of abstraction?

• Is the Access Grid the real killer app?

Based on: Distributed Computing Economics, Jim Gray, Microsoft Tech report, March 2003, MSR-TR-2003-24

http://research.microsoft.com/research/pubs/view.aspx?tr_id=655

Page 48: CSTB_SuperComputing_Study_Group.ppt

48

Turbulence, an old problemObservational

Described 5 centuries ago by Leonardo

Theoretical

Best minds have tried and …. “moved on”:• Lamb:

… “When I die and go to heaven…”

• Heisenberg, von Weizsäcker …some attempts

• Partial successes: Kolmogorov, Onsager

• Feynman “…the last unsolved problem of classical physics”

Adapted from ASCI ASCP galleryhttp://www.cacr.caltech.edu/~slombey/asci/fluids/turbulence-volren.med.jpg

Page 49: CSTB_SuperComputing_Study_Group.ppt

49

• How does the turbulent energy cascade work?• Direct numerical simulation of “turbulence in a box”• Pushing comp-limits along specific directions:

Ref: Cao, Chen et al.Ref: Chen & Kraichnan

Three-dimensional (5123 - 4,0963), but only static information

81922, but only two-dimensional

Simulation: Comp-Physics

Slide courtesy of Charles Meneveau @ JHU

Page 50: CSTB_SuperComputing_Study_Group.ppt

50

We can now “put it all together”:Large scale range, scale-ratio O(1,000)Three-dimensional in spaceTime-evolution and Lagrangian approach (follow the flow)

Turbulence data-base:

• Create a 100 TB database of O(2,000) consecutive snapshots of a 1,0243 turbulence simulation.

• Mine the database to understand flows in detail

Data-Exploration: Physics-Info

Slide courtesy of Charles Meneveau, Alex Szalay @ JHU

Page 51: CSTB_SuperComputing_Study_Group.ppt

51

Following 18 slides from 1997

• Bell & Gray Computer Industry “laws”

• Rules of thumb

• Still relevant

Page 52: CSTB_SuperComputing_Study_Group.ppt

52

Computer Industry Laws (rules of

thumb)• Metcalf’s law• Moore’s First Law• Bell’s Computer Classes (7 price tiers)• Bell’s Platform Evolution• Bell’s Platform Economics• Bill’s Law• Software Economics• Grove’s law• Moore’s second law• Is Info-Demand Infinite?• The Death of Grosch’s Law

Page 53: CSTB_SuperComputing_Study_Group.ppt

53

Metcalf’s LawNetwork Utility =

Users2• How many connections can it make?

– 1 user: no utility– 1K users: a few contacts– 1M users: many on net– 1B users: everyone on net

• That is why the Internet is so “hot”– Exponential benefit

Page 54: CSTB_SuperComputing_Study_Group.ppt

54

Moore’s First Law

128KB

128MB

20008KB

1MB

8MB

1GB

1970 1980 1990

1M 16Mbits: 1K

4K 16K

64K

256K 4M 64M256M

1 chip memory size ( 2 MB to 32 MB)

•XXX doubles every 18 months 60% increase per year–Micro Processor speeds–chip density–Magnetic disk density–Communications bandwidthWAN bandwidth approaching LANs

•Exponential Growth:

–The past does not matter

–10x here, 10x there, soon you're talking REAL change.

•PC costs decline faster than any other platform

–Volume & learning curves

–PCs will be the building bricks of all future systems

Page 55: CSTB_SuperComputing_Study_Group.ppt

55

Bumps in the Moore’s Law Road

1

100

10000

1000000

1970 1980 1990 2000

$/MB of DRAM$/MB of DRAM

.01

1

100

10,000

1970 1980 1990 2000

$/MB of DISK$/MB of DISK

• DRAM:–1988: US Anti-Dumping rules–1993-1995: ?? price flat

Magnetic Disk–1965-1989: 10x/decade–1989-2002: 7x/3year!

1,000X/decade

Page 56: CSTB_SuperComputing_Study_Group.ppt

56

Gordon Bell’s 1975 VAX planning model...

He didn’t believe it!System Price = 5 x 3 x .04 x memory size/ 1.26 (t-1972) K$

5x: Memory is 20% of cost 3x:DEC markup.04x: $ per byte

He didn’t believe:The projection500$ machine

He couldn’t comprehend implications

0.01K$

0.1K$

1.K$

10.K$

100.K$

1,000.K$

10,000.K$

100,000.K$

1960 1970 1980 1990 2000

16 KB 64 KB 256 KB 1 MB 8 MB

Page 57: CSTB_SuperComputing_Study_Group.ppt

57

Gordon Bell’sProcessing, memories, & comm 100

years

1.E+00

1.E+03

1.E+06

1.E+09

1.E+12

1.E+15

1.E+18

1947 1967 1987 2007 2027 2047

Processing Pri. Mem Sec. Mem.

POTS(bps) Backbone

Page 58: CSTB_SuperComputing_Study_Group.ppt

58

Gordon Bell’s Seven Price Tiers• 10$: wrist watch computers

• 100$: pocket/ palm computers• 1,000$: portable computers• 10,000$: personal computers (desktop)• 100,000$: departmental computers (closet)

• 1,000,000$: site computers (glass house)

• 10,000,000$:regional computers (glass castle)

SuperServer: Costs more than 100,000 $“Mainframe” Costs more than 1M$Must be an array of processors,

disks, tapescomm ports

Page 59: CSTB_SuperComputing_Study_Group.ppt

59

Bell’s Evolution of Computer ClassesTechnology enable two evolutionary paths:

1. constant performance, decreasing cost2. constant price, increasing performance

??Time

Mainframes (central)

Minis (dep’t.)

PCs (personals)Lo

g P

rice

WSs

1.26 = 2x/3 yrs -- 10x/decade; 1/1.26 = .81.6 = 4x/3 yrs --100x/decade; 1/1.6 = .62

Page 60: CSTB_SuperComputing_Study_Group.ppt

60

Gordon Bell’s Platform Economics

Computer type

$

units

Mainframe WS Browser0.01

0.1

1

10

100

1000

10000

100000

Mainframe WS Browser

Price (K$)

Volume (K)

App price

• Traditional computers: Custom or Semi-Custom

high-tech and high-touch

• New computers: high-tech and no-touch

Page 61: CSTB_SuperComputing_Study_Group.ppt

61

Software Economics

CIRCA 1997• An engineer costs about 150 k$/year

• R&D gets [5%…15%] of budget

• Need [3M$…1M$] revenue

per engineer

Microsoft: 9 B$R&D16%

SG&A34%

Product&Service13%

Tax13%

Profit24%

Intel 16 B$R&D8%

SG&A11%

Product&Service47%

Tax

12%

Profit22%

R&D8%

SG&A22%

Product&Service59%

Tax5%

Profit6%

IBM: 72 B$R&D9%

SG&A43%

Tax7%

Profit15%

Product&Services

26%

Oracle: 3 B$

Page 62: CSTB_SuperComputing_Study_Group.ppt

62

Software Economics: Bill’s Law

• Bill Joy’s law (Sun): Don’t write software for less than 100,000 platforms.

@10M$ engineering expense, 1,000$ price

• Bill Gate’s law:Don’t write software for less than 1,000,000 platforms.

@10M$ engineering expense, 100$ price• Examples:

– UNIX vs NT: 3,500$ vs 500$– Oracle vs SQL-Server: 100,000$ vs 6,000$– No Spreadsheet or Presentation pack on UNIX/VMS/...

• Commoditization of base Software & Hardware

PriceFixed Cost

UnitsMarginal_Cost

_

Page 63: CSTB_SuperComputing_Study_Group.ppt

63

Grove's LawThe New Computer

Industry• Horizontal integration

is new structure• Each layer picks best

from lower layer.• Desktop (C/S) market

– 1991: 50%– 1995: 75% Intel & SeagateSilicon & Oxide

SystemsBaseware

Middleware

Applications SAP

OracleMicrosoft

Compaq

Integration EDS

Operation AT&TFunction Example

Page 64: CSTB_SuperComputing_Study_Group.ppt

64

Moore’s Second Law•The Cost of Fab Lines Doubles Every Generation

(3 years)

• Money Limit:hard to imagine

10 B$ line20 B$ line40 B$ line

• Physical limit:• Quantum Effects

at 0.25 micron now0.05 micron seems hard12 years, 3 generations

• Lithograph:need Xray below 0.13 micron

$1

$10

$100

$1,000

$10,000

1960 1970 1980 1990 2000

Year

M$

/ F

ab L

ine

Page 65: CSTB_SuperComputing_Study_Group.ppt

65

Constant Dollars vs Constant Work

• Constant Work:– One SuperServer can do all the world’s computations.

• Constant Dollars:– The world spends 10% on information processing– Computers are moving from 5% penetration to 50%

• 300 B$ to 3T$• We have the patent on the byte and algorithm

Page 66: CSTB_SuperComputing_Study_Group.ppt

66

Crossing the Chasm

OldMarket

OldTechnology

NewTechnology

VeryVeryHard

Hard

hardhardBoringBoring

CompetitveCompetitveSlow GrowthSlow Growth

No ProductNo ProductNo CustomersNo Customers

product findsproduct finds customerscustomers

CustomersCustomersfind productfind product

hardhard

New Market

Page 67: CSTB_SuperComputing_Study_Group.ppt

67

Billions of Clients Need

Millions of Serversmobileclients

fixed clients

server

superserver

Clients

Servers

Super ServersLarge DatabasesHigh Traffic shared data

All clients are networked to serversmay be nomadic or on-demand

Fast clients want faster servers

Servers provide

data,

control,

coordination

communication

Page 68: CSTB_SuperComputing_Study_Group.ppt

68

The Parallel Law of ComputingGrosch's Law:

Parallel Law: Needs

Linear Speedup and Linear ScaleupNot always possible

1 MIPS1 $

1,000 $

1,000 MIPS

2x $ is 2x performance

1 MIPS1 $

1,000 MIPS 32 $.03$/MIPS

2x $ is 4x performance

Page 69: CSTB_SuperComputing_Study_Group.ppt

69

Our Biggest Problem

software5%

hardware20%

Maintenancecare& feeding

75%

What is the trend line?

This wasn’t a problem when MIPS cost 100k$ and Disks cost 1k$/MB