Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo Personal views...

40
Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo Personal views on PITAC report: invest in long term research Preview of Turing lecture: 10 long term research problems Bush: Summarize info in cyberspace Turing: Intelligent Computers 7 9s: build systems that are always up and prove it. 5-Minute rule For disks For tapes Sorting Progress PennySort Terabyte Sort (!) Slides will be at http://research.Microsoft.com/~Gray/talks

Transcript of Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo Personal views...

Page 1: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Jim GrayTalk at University of Tokyo

Personal views on PITAC report: invest in long term research

Preview of Turing lecture: 10 long term research problems• Bush: Summarize info in cyberspace

• Turing: Intelligent Computers

• 7 9s: build systems that are always up and prove it.

5-Minute rule• For disks

• For tapes

Sorting Progress• PennySort

• Terabyte Sort (!)

Slides will be at http://research.Microsoft.com/~Gray/talks

Page 2: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Presidential Advisory Committee on

High Performance Computing and Communications,

Information Technologies, and the Next Generation Internet

Information Technology

http://www.ccic.gov/ac/interim/ orhttp://research.microsoft.com/~Gray/papers/PITAC_Interim_Report_8_98.doc

Page 3: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Charter for the Committee: provide an independent assessment of

High-Performance Computing and Communications (HPCC)

• Progress

• Balance among research components;

Next Generation Internet initiative;

• Progress

• Balance

IT Research and development• Maintain United States leadership in

—IT and

—Applications

Page 4: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Committee Members

Co-Chairs:• Bill Joy, Sun Microsystems • Ken Kennedy, Rice

University

Members:• Eric Benhamou, 3Com • Vinton Cerf, MCI

• Ching-chih Chen, Simmons • David Cooper, LLNL

• Steve Dorfman, Hughes • David Dorman, PointCast

• Bob Ewald, SGI • David Farber, U. of Pennsylvania

• Sherri Fuller, U. of Washington • Hector Garcia-Molina, Stanford

• Susan Graham, UC Berkeley • Jim Gray, Microsoft

• Danny Hillis, Disney, Inc • John Miller, Montana State Univ.

• David Nagel, AT&T • Raj Reddy, Carnegie Mellon

• Ted Shortliffe, Stanford • Larry Smarr, U. of Illinois @ UC

• Joe Thompson, Miss. State U. • Les Vadasz, Intel

• Andy Viterbi, Qualcom • Steve Wallach, Centerpoint

• Irving Wladawsky-Berger, IBM

Page 5: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

My Summary of the Report 1/3 of the US economic growth since 1992 was in the IT sector.

IT is key to our health, wealth, and safety. Created 400 B$ of wealth in last 3 years (!!)Federal IT research funding of twenty years ago, created the boom. Federal IT research funding for the last decade has been flat (in constant dollars).

Research funding is increasingly near-term & applied developmentThe committee recommends Increase long-term research funding in:• Software design and implementation technologies • Technologies to scale the Next Generation Internet to 6 billion users. • Tools, algorithms, and systems for high-performance computing.

Spend a billion dollars over the next 5 years on Lewis and Clark style "expeditions" into cyberspace.

Page 6: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Myths

1. Now that IT is a big business, Industry will do long term research.

FACT:

industry spends LITTLE on long-term research.

it is not in their best interest

2. IT research = buy computers for scientists.

FACT

computer science research

is different from

the application of computers to some discipline.

Page 7: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Research Priorities

Findings:• Total federal Information technology R&D investment is

inadequate

• Federal IT R&D is excessively focused on near-term problems

Recommendations:• Create a strategic initiative in long-term IT R&D

• Increase the investment for research in software, scalable information infrastructure, high-end computing, and socio-economic and workforce impacts

Page 8: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Software Research

Findings:• Demand for software far exceeds the nation’s ability to produce it

• The nation depends on fragile software

• Technologies to build reliable and secure software are inadequate

• The nation is under-investing in fundamental software research

Recommendations:• Fund more fundamental research in software development methods and

component technologies

• Sponsor a national library of software components

• Make software research a substantive component of every major IT research initiative

• Support research in human-computer interfaces and interaction

Make fundamental software research an absolute priority

Page 9: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Scalable Information Infrastructure Findings:

• The Internet has grown well beyond the intent of its original designers

• Our nation’s dependence on the information infrastructure is increasing daily

• We cannot safely extend what we currently know to more complex systems

• Learning how to build large-scale, highly reliable and secure systems requires research

Recommendations:• Increase funding in research and development of core software and

communications technologies aimed directly at the challenge of scaling the information infrastructure

• Expand the Next Generation Internet test beds to include additional industry partnerships in order to foster the rapid commercialization and deployment of enabling technologies

Page 10: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

High-End Computing

Findings HEC is:• essential for science and engineering research• an element of the United States national security • ripe for new applications • suppliers suffer from unusual market pressures

Research& Development Recommendations• Fund innovative technologies and architectures• Fund HEC software (parallel programming)• Aim for a real application petaops by 2010

through a both hardware and software strategies• Fund HEC systems for science and engineering research

Page 11: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Social, Economic, Workforce Recommendations

Expand research on the social and economic impacts of information technology diffusion and adoption

Expand initiatives to increase IT literacy, access and research capabilities

Address the shortage of high-technology workers

Programs to re-train “stale” IT workers

Encourage participation by women and minorities

Short-term increase in immigration of skilled IT workers

Page 12: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Conclusions

IT is an essential foundation for commerce, education, health care, environmental stewardship, and national security:• Dramatically transform the way we communicate, learn, deal with

information and conduct research• Transform the nature of work, nature of commerce, product design cycle,

practice of health care, and the government itself

The total Federal IT R&D investment is inadequate

The Federal IT R&D is excessively focused on near-term problems

U. S. government must:• Create a strategic initiative in long-term IT R&D• Establish an effective structure for managing and coordinating IT

Page 13: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Jim GrayTalk at University of Tokyo

Personal views on PITAC report: invest in long term research

Preview of Turing lecture: 10 long term research problems• Bush: Summarize info in cyberspace

• Turing: Intelligent Computers

• 7 9s: build systems that are always up and prove it.

5-Minute rule• For disks

• For tapes

Sorting Progress• PennySort

• Terabyte Sort (!)

Slides will be at http://research.Microsoft.com/~Gray/talks

Page 14: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Vanaveer Bush: Memex

Memex: Proposed putting all information online (1948)

It will happen

Result: InfoGlut. Too much information in the shoebox

Challenge:

• Organize the information.

• Give answers as good as an expert in the field.

• Anticipate questions and so inform “subscriber”

Protect personal privacy

• A hacker cannot get access to your personal information without your consent.

Page 15: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Turing’s Test (1951): Intelligent Machines

Computers helped with the 4-color problem end game Computers (and people) won world chess

championship Computers will likely be our 5th brain

• Augment our intelligence• See for us, hear for us, read for us,• Prosthetic eyes, ears, voices, arms, legs,….

Probably computers will be intelligent like plants and animals.

Perhaps computers can be intelligent like people• Pass the Turing Test (easy/impossible?) (70%, 5 minutes, B can lie)• Translating telephone (as good as a human translator)• Read a textbook and pass the written exam.• Pass a graduate programming class • Pass a graduate literature class

Radical: Download someone.

Page 16: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Dependable Systems

Build a system used by millions of people each day.

Then:

• Prove that it does what it is supposed to do (code matches spec).

• Prove that it delivers 99.99999% (7 9s) availability (1 hr per millennium)

• Prove that it cannot be “hacked” for less than 1B$ (Y2K $)

Then build the system automatically from the specification.

Page 17: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Jim GrayTalk at University of Tokyo

Personal views on PITAC report: invest in long term research

Preview of Turing lecture: 10 long term research problems• Bush: Summarize info in cyberspace

• Turing: Intelligent Computers

• 7 9s: build systems that are always up and prove it.

5-Minute rule• For disks

• For tapes

Sorting Progress• PennySort

• Terabyte Sort (!)

Slides will be at http://research.Microsoft.com/~Gray/talks

Page 18: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Storage Hierarchy (9 levels)

Cache 1, 2

Main (1, 2, 3 if nUMA).

Disk (1 (cached), 2)

Tape (1 (mounted), 2)

Page 19: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Meta-Message: Technology Ratios Are Important

Meta-Message: Technology Ratios Are Important

If everything gets faster & cheaper at the same rate THEN nothing really changes.

Things getting MUCH BETTER:• communication speed & cost 1,000x• processor speed & cost 100x• storage size & cost 100x

Things staying about the same• speed of light (more or less constant)• people (10x more expensive)• storage speed (only 10x better)

Page 20: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Today’s Storage Hierarchy : Speed & Capacity vs Cost Tradeoffs

Today’s Storage Hierarchy : Speed & Capacity vs Cost Tradeoffs

1015

1012

109

106

103

Typ

ical

Sys

tem

(by

tes)

Size vs Speed

Access Time (seconds)10-9 10-6 10-3 10 0 10 3

Cache

Main

Secondary

Disc

Nearline Tape Offline

Tape

Online Tape

104

102

100

10-2

10-4

$/M

B

Price vs Speed

Access Time (seconds)10-9 10-6 10-3 10 0 10 3

Cache

MainSecondary

DiscNearline

TapeOffline Tape

Online Tape

Page 21: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Storage Ratios Changed

10x better access time

10x more bandwidth

4,000x lower media price

DRAM/DISK 100:1 to 10:10 to 50:1

Disk Performance vs Time

1

10

100

1980 1990 2000

Year

acce

ss t

ime

(ms)

1

10

100

ban

dw

idth

(M

B/s

)

Disk Performance vs Time(accesses/ second & Capacity)

1

10

100

1980 1990 2000

Year

Acc

esse

s p

er

Sec

on

d

0.1

1

10

Dis

k C

apac

kty

(GB

)

Storage Price vs Time

0.01

0.1

1

10

100

1000

10000

1980 1990 2000

Year

$/M

B

Page 22: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

The 5 Minute Rule DerivedM$: cost of a RAM page

RAM $/MB PageSize x Lifetime

A$: cost of a disk accessDisk Price

AccessesPerSec x Lifetime

RI: Reference Interval time between accesses to page

$

Reference Interval =Time

Disk access cost A$/RI

M$

Cost of a RAM page

M$= A$/RI

Breakeven:Breakeven: M$ = A$ / Reference IntervalM$ = A$ / Reference Interval

Reference Interval = M$/A$Reference Interval = M$/A$

= = DiskPrice x PageSizeDiskPrice x PageSize RAMprice x AccPerSecRAMprice x AccPerSec

Page 23: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

The Five Minute Rule Observations

Break even has two terms:

(2) Economic term: DiskPrice / RAM_MB_Price ~ 400:4 = 100:1

(1) Technology term: PageSize / DiskAccPerSec ~ 8KB : 80 = 100:1

Economic term trends down

Technology term trends up to compensate.

Still at 5 minute for random, 1 minute sequential

1ofRAMPricePerMB

skDrivePricePerDi

skecondPerDiAccessPerS

ofRAMPagesPerMBtervaleferenceInBreakEvenR

Page 24: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Shows Best Page Index Page Size ~16KB

Index Page Utility vs Page Size and Index Elemet Size

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Page Size (KB)

Uti

lity

16 B 0.64 0.72 0.78 0.82 0.79 0.69 0.54

32 B 0.54 0.62 0.69 0.73 0.71 0.63 0.50

64 B 0.44 0.53 0.60 0.64 0.64 0.57 0.45

128 B 0.34 0.43 0.51 0.56 0.56 0.51 0.41

2 4 8 16 32 64 128

16 byte entries

32 byte

64 byte

128 byte

Index Page Utility vs Page Size and Disk Performance

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Page Size (KB)

Uti

lity

40 MB/s 0.65 0.74 0.83 0.91 0.97 0.99 0.94

10 MB/s 0.64 0.72 0.78 0.82 0.79 0.69 0.54

5 MB/s 0.62 0.69 0.73 0.71 0.63 0.50 0.34

3 MB/s 0.51 0.56 0.58 0.54 0.46 0.34 0.22

1 MB/s 0.40 0.44 0.44 0.41 0.33 0.24 0.16

2 4 8 16 32 64 128

10 MB/s

5 MB/s

3 MB/s

1MB/s

Page 25: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Standard Storage MetricsStandard Storage Metrics

Capacity: • RAM: MB and $/MB: today at 10MB & 100$/MB• Disk: GB and $/GB: today at 10 GB and 200$/GB• Tape: TB and $/TB: today at .1TB and 25k$/TB

(nearline)Access time (latency)• RAM: 100 ns• Disk: 10 ms• Tape: 30 second pick, 30 second position

Transfer rate• RAM: 1 GB/s• Disk: 5 MB/s - - - Arrays can go to 1GB/s• Tape: 5 MB/s - - - striping is problematic

Page 26: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

New Storage Metrics: Kaps, Maps, SCAN?

New Storage Metrics: Kaps, Maps, SCAN?

Kaps: How many KB objects served per second• The file server, transaction processing metric• This is the OLD metric.

Maps: How many MB objects served per sec• The Multi-Media metric

SCAN: How long to scan all the data• The data mining and utility metric

And

•Kaps/$, Maps/$, TBscan/$

Page 27: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

For the Record (good 1998 devices packaged in system

http://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdf)

DRAM DISK TAPE robotUnit capacity (GB) 1 18 35

Unit price $ 4000 500 10000$/GB 4000 28 20

Latency (s) 1.E-7 1.E-2 3.E+1Bandwidth (Mbps) 500 15 7

Kaps 5.E+5 1.E+2 3.E-2Maps 5.E+2 13.04 3.E-2

Scan time (s/TB) 2 1200 70000$/Kaps 9.E-11 5.E-8 3.E-3$/Maps 8.E-8 4.E-7 3.E-3

$/TBscan $0.08 $0.35 $211

X 14

Page 28: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

For the Record (good 1998 devices packaged in system

http://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdf)

4.E+03500

5.E+05

500

2

9.E-11

8.E-08

0.08

28 15 9913

1200

5.E-084.E-07

0.3520 7

0.03 0.03

7.E+04

3.E-03 3.E-03

211

1.E-12

1.E-09

1.E-06

1.E-03

1.E+00

1.E+03

1.E+06

$/GB

Bandw

idth (

Mbp

s)

Kaps

Map

s

Scan tim

e (s/

TB)

$/Kap

s

$/M

aps

$/TBsc

an

DRAM

DISK

TAPE robot X 14

Page 29: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

How To Get Lots of Maps, SCANsHow To Get Lots of Maps, SCANs

parallelism: use many little devices in parallel

Beware of the media myth

Beware of the access time myth

1 Terabyte

10 MB/s

At 10 MB/s: 1.2 days to scan

1 Terabyte

1,000 x parallel: 100 seconds SCAN.

Parallelism: divide a big problem into many smaller ones to be solved in parallel.

Page 30: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

The Disk Farm On a CardThe Disk Farm On a CardThe 1 TB disc cardAn array of discsCan be used as 100 discs 1 striped disc 10 Fault Tolerant discs ....etcLOTS of accesses/second bandwidth

14"

Life is cheap, its the accessories that cost ya.

Processors are cheap, it’s the peripherals that cost ya (a 10k$ disc card).

Page 31: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Tape Farms for Tertiary StorageNot Mainframe Silos

Tape Farms for Tertiary StorageNot Mainframe Silos

Scan in 27 hours.many independent tape robots(like a disc farm)

10K$ robot 14 tapes500 GB 5 MB/s 20$/GB 30 Maps

100 robots

50TB 50$/GB 3K Maps

27 hr Scan

1M$

Page 32: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Tape & Optical: Beware of the Media Myth

Tape & Optical: Beware of the Media Myth

Optical is cheap: 200 $/platter 2 GB/platter => 100$/GB (2x cheaper than disc)

Tape is cheap: 30 $/tape 20 GB/tape => 1.5 $/GB (100x cheaper than disc).

Page 33: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Tape & Optical Reality: Media is 10% of System Cost

Tape & Optical Reality: Media is 10% of System Cost

Tape needs a robot (10 k$ ... 3 m$ ) 10 ... 1000 tapes (at 20GB each) => 20$/GB ... 200$/GB

(1x…10x cheaper than disc)

Optical needs a robot (100 k$ ) 100 platters = 200GB ( TODAY ) => 400 $/GB

( more expensive than mag disc ) Robots have poor access times Not good for Library of Congress (25TB) Data motel: data checks in but it never checks out!

Page 34: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

The Access Time MythThe Access Time Myth

The Myth: seek or pick time dominatesThe reality: (1) Queuing dominates (2) Transfer dominates BLOBs (3) Disk seeks often shortImplication: many cheap servers

better than one fast expensive server• shorter queues• parallel transfer• lower cost/access and cost/byte

This is now obvious for disk arraysThis will be obvious for tape arrays

Seek

Rotate

Transfer

Seek

Rotate

Transfer

Wait

Page 35: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Jim GrayTalk at University of Tokyo

Personal views on PITAC report: invest in long term research

Preview of Turing lecture: 10 long term research problems• Bush: Summarize info in cyberspace

• Turing: Intelligent Computers

• 7 9s: build systems that are always up and prove it.

5-Minute rule• For disks

• For tapes

Sorting Progress• PennySort

• Terabyte Sort (!)

Slides will be at http://research.Microsoft.com/~Gray/talks

Page 36: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Penny Sort Ground Ruleshttp://research.microsoft.com/barc/SortBenchmark

How much can you sort for a penny.• Hardware and Software cost• Depreciated over 3 years• 1M$ system gets about 1 second,• 1K$ system gets about 1,000 seconds.• Time (seconds) = SystemPrice ($) / 946,080

Input and output are disk resident

Input is • 100-byte records (random data)• key is first 10 bytes.

Must create output file and fill with sorted version of input file.

Daytona (product) and Indy (special) categories

Page 37: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

PennySort Hardware

• 266 Mhz Intel PPro

• 64 MB SDRAM (10ns)

• Dual Fujitsu DMA 3.2GB EIDE disks

Software• NT workstation 4.3

• NT 5 sort

Performance• sort 15 M 100-byte records (~1.5 GB)

• Disk to disk

• elapsed time 820 sec

—cpu time = 404 sec

PennySort Machine (1107$ )

board13%

Memory8%

Cabinet + Assembly

7%

Network, Video, floppy

9%

Software6%

Other22%

cpu 32%

Disk25%

Page 38: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Sort Speed Doubles Every Year

1E-3

1E-2

1E-1

1E+0

1E+1

1E+2

1E+3

1E+4

1E+5

1E+6

1E+7

1E+8

1985 1990 1995 2000

Records Sorted per Second

Doubles Every Year

GB Sorted per DollarDoubles Every Year

Page 39: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Recent Results NOW Sort: 9 GB on a cluster of 100 UltraSparcs in 1 minute

MilleniumSort: 16x Dell NT cluster: 100 MB in 1.8 Sec (Datamation)

Tandem/Sandia Sort: 68 CPU ServerNet 1 TB in 47 minutes

Rumor of IBM Sort: 7000 cpu Blue Pacific 1 TB in 1024 seconds (17 minutes). 10 Mrps (1GBps)

Page 40: Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.

Jim Gray / Presented at U. Tokyo / 23 Jan 1999

Jim GrayTalk at University of Tokyo

Personal views on PITAC report: invest in long term research

Preview of Turing lecture: 10 long term research problems• Bush: Summarize info in cyberspace

• Turing: Intelligent Computers

• 7 9s: build systems that are always up and prove it.

5-Minute rule• For disks

• For tapes

Sorting Progress• PennySort

• Terabyte Sort (!)

Slides will be at http://research.Microsoft.com/~Gray/talks