Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray...

28
Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution 2. What California proposed Co workers: Mike Stonebraker: Producer / Director / Script Writer / Propeller Head Bill Farrell: Ramrod and Computer-literate DirtBag Jeff Dozier: Godfather Special effects: Earth Science: Frank Davis, C. Roberto Mechoso, Jim Frew Computer Science: Reagan Moore, Jim Gray, Joe Pasquale Administration: Claire Mosher

Transcript of Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray...

Page 1: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

1

EOSDIS Alternate Architecture StudyJim Gray

McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com

1. Background - problem and proposed solution

2. What California proposed

Co workers:

Mike Stonebraker: Producer / Director / Script Writer / Propeller Head

Bill Farrell: Ramrod and Computer-literate DirtBag

Jeff Dozier: Godfather

Special effects:

Earth Science: Frank Davis, C. Roberto Mechoso, Jim Frew

Computer Science: Reagan Moore, Jim Gray, Joe Pasquale

Administration: Claire Mosher

Writing: Stephanie Sides

Prototypes: many-many....people

Page 2: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

2

What’s The Problem?Antarctica is melting -- 77% of fresh water liberated

=> sea level rises 70 meters => Chico & Memphis are beach front propertyNew York, Washington, SF, LA, London, Paris

Let’s study it! Mission to Planet Earth

EOS: Earth Observing System (17B$ => 10B$) 50 instruments on 10 satellites 1997-2001

Plus Landsat (added later)

EOS DIS: Data Information System:3-5 MB/s raw, 30-50 MB/s processed.4 TB/day, 15 PB by year 2007

Issues

How to store it?

How to serve it to users?

Page 3: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

3

What Happened?1986: Mission to Planet Earth

1989: Bids from Hughes & TRW

1993: Contract Grant, Public Review:

customers do not want it (tape/mainframe centric)

1994: Alternate Architecture

Three “outside teams”Wyoming: Internet 20,000,000

Maryland: Software Engineering

California: DB centric

One “home team” CORBA & Z 39.50 & UNIX

1995: Drifting in the Sequoia direction

Page 4: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

4

The Hughes Plan8 DAACs (Data Active Archive Centers) = Bytes

(one per congressional district?)

N SCFs (Scientific Computation Facilities) = MIPS(typically instrument or science teams)

Thin wires among them

90% of DAAC processing is PULLbuilding standard data products

fixed pipeline: calibrate, grid, derive

Typical subscriber gets tapes or CDroms(standard data products)

One “chauffeur” per 10 customers (high ops costs)

Build everything (operations, HSM, DBMS,...) from scratchCORBA and Z 39.50 is the glue.

Criticism: not evolvable, not open, not online, not useful.

Page 5: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

5

What California Proposed0. Design for success: expect that millions will use the system (online)1. DBMS centric design automates discovery, access, management2. Object relational databases enable

Automate access to data so that the NASA 500, Global Change 10,000 and Internet 20,000,000 can use system.

Cache popular results, not all results (saves 3x or more) Compute on demand (saves lots of storage and cpu).

Emphasize pull processing rather than push processing.Use parallelism to get scaleup.

Do Batch as a data pump

3 Be Smart Shoppers: Use COTS hardware/software (saves 400M$)

Just-in-time acquisition (saves 400M$)Use workstation not mainframe technology (gives 10x more stuff)Depreciate over 3 years (ends in 2007 with "fresh" equipment)

4. 2 + N node architecture 2 Super DAACs for fault tolerance and for growth.

Unify the 2 "big" data storage centers with 2 big data analysis centers.Allow many “little” peer-DAACs at science/user groups

Page 6: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

6

Meta-Model for Sequoia ProposalBe technological optimists:

couldn’t build it today, count on progress.

ride technology wave (= not water cooled)

Buy or Seed, do not build.

Use COTS where possible

Fund 2 or more COTS vendors if need product

OR DBMS

HSM

Operations.

Replace people with technology (= OR DBMS):

automate data discovery, access, visualization

DBMS Centric view.

Page 7: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

7

DBMS Centric ViewThis is a database problem (no kidding)!

This is not

a file system problem (file wrong abstraction)

a rpc problem (CORBA wrong abstraction)

a Z 39.50 problem (Z 39.50 is a FAP).

This is a operations problem

Hierarchical storage management

Network management

Source code control

client-server tools.

You can BUY all this stuff. Fund COTS.

BUILD AS LITTLE AS POSSIBLE

Page 8: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

8

What California Proposed0. Design for success: expect that millions will use the system (online)1. DBMS centric design automates discovery, access, management2. Object relational databases enable

Automate access to data so that the NASA 500, Global Change 10,000 and Internet 20,000,000 can use system.

Cache popular results, not all results (saves 3x or more) Compute on demand (saves lots of storage and cpu).

Emphasize pull processing rather than push processing.Use parallelism to get scaleup.

Do Batch as a data pump

3 Be Smart Shoppers: Use COTS hardware/software (saves 400M$)

Just-in-time acquisition (saves 400M$)Use workstation not mainframe technology (gives 10x more stuff)Depreciate over 3 years (ends in 2007 with "fresh" equipment)

4. 2 + N node architecture 2 Super DAACs for fault tolerance and for growth.

Unify the 2 "big" data storage centers with 2 big data analysis centers.Allow many “little” peer-DAACs at science/user groups

Page 9: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

9

Design for Success: Expect Lots of UsersExpect that millions will use the system (online)

Three user categories:

NASA 500 -- funded by NASA to do science

Global Change 10 k - other dirt bags

Internet 20 m - everyone elseGrain speculatorsEnvironmental Impact ReportsNew applications

=> discovery & access must be automatic

Allow anyone to set up a peer-DAAC & SCFDesign for Ad Hoc queries, Not Standard Data Products

If push is 90%, then 10% of data is read (on average).

=> A failure: no one uses the data, in DSS, push is 1% or less.

=> computation demand is 100x Hughes estimate (pull is 10x to 100x greater than push)

Page 10: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

10

The Process FlowData arrives and is pre-processed.

instrument data is calibrated,

gridded

averaged

Geophysical data is derived

Users ask for stored data

OR to analyze and combine data.

Can make the pull-push split dynamically

Pull Processing Push ProcessingOther Data

Page 11: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

11

The Software Model: Global View

SQL* is the FAP and API.

Applications use it to access data.

It includes

stored procedures

(so RPC)

GC class libraries

Computation is data driven

Gateways for other interfaces

HTTP, Z 39.50, Corba & COM

TP or TP-lite manages workflow

COTS middleware

client program

client program

other protocol

gateway

gateway

SQL–*SQL–*

SQL–*

non-SQL–* DBMS on

peerDAAC orsuperDAAC or

non-EOSDIS system

SQL–*COTS SQL-*local DBMS

on peerDAAC orsuperDAAC other protocol

Page 12: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

12

Automate access to data Invest in:

Design global change schema.cooperate with standards groups.

OR DBMS class libraries for GC datatypes

Develop browser to do resource discovery

Community will develop access & vis tools

OR DBMS will do

PUSH processing: triggers and workflow

PULL processing: query optimization.

(some assembly required).

Page 13: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

13

How Well Did SQL Work?Bill Farrell and others did 30 user scenarios schema,

application, SQL, performance

Snow cover, CO2, GCM,...

Avg ad hoc scenario generated about 30% of

EOSDIS baseline processing

=> validated PULL over PUSH demand

SQL was indeed a power tool:

Many scenarios became a few simple SQL queries:

Need a spatial & temporal SQL.

Personal view:

It’s great!, much better than Farrell or I expected.

Page 14: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

14

Compute on demand90% of data is NEVER used (according to Hughes).

Some data is used only once.

Data is often re-calculated

repair hardware/software bugs,

new & better algorithms

Optimization: store only popular data.

Compute this based on past use(of this data and related data)

Balance two costs:1. Re_Compute_Cost / Re_Use_Interval2. Storage_Cost x Re_Use_Interval

Recompute is often cheaper (saves 3x we think).

Page 15: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

15

Use parallelism to get scaleup.Many queries look at 100s or 1,000s of data tiles.

e.g. Berkeley weekly Landsat images since 1972.

= 1000 tape accesses.

= 4,000 tape minutes = 6 days.

Done 1,000 way parallel: = 4 minutes.Disk & tape demands are huge: multi-GOX

Computation demands are huge: tera-ops.

Only solution:

Use parallel execution

Use parallel data access

SQL* does this for you automatically.

Page 16: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

16

Data PumpCompute on demand small jobs

less than 1,000 tape mountsless than 100 M disk accessesless than 100 TeraOps.(less than 30 minute response time)

For BIG JOBS scan entire 15PB database once a day /week

Any BIG JOB can piggyback on this data scan.

DAAC in 2007:

15 PB of Tape Robot

1 PB of Disk

10-TB RAM 500 nodes

10,000 drives

1,000 robots

Page 17: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

17

What California Proposed0. Design for success: expect that millions will use the system (online)1. DBMS centric design automates discovery, access, management2. Object relational databases enable

Automate access to data so that the NASA 500, Global Change 10,000 and Internet 20,000,000 can use system.

Cache popular results, not all results (saves 3x or more) Compute on demand (saves lots of storage and cpu).

Emphasize pull processing rather than push processing.Use parallelism to get scaleup.

Do Batch as a data pump

3 Be Smart Shoppers: Use COTS hardware/software (saves 400M$)

Just-in-time acquisition (saves 400M$)Use workstation not mainframe technology (gives 10x more stuff)Depreciate over 3 years (ends in 2007 with "fresh" equipment)

4. 2 + N node architecture 2 Super DAACs for fault tolerance and for growth.

Unify the 2 "big" data storage centers with 2 big data analysis centers.Allow many “little” peer-DAACs at science/user groups

Page 18: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

18

Use COTS hardware/software (saves 400M$)

Defense contractors want to build (and maintain) stuff.(they do it for the money)

Fund SQL* (SQL-2007): Object-Relational (extensible)

supports Global Change data types

Automates access

Reliable storage

Tertiary storage

Parallel data search (automatic)

Workflow (job control)

Reliable

Fund Operations software companies (Tivoli...)

Page 19: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

19

Use workstation technology (NOW)Use workstation hardware technology,

not Super Computers

0.5$/MB of disk vs 30$/MB of disk

100$/MIPS vs 18,000$/MIPS

3k$/tape drive vs 50k$/tape drive

Processor, Disk, Tape ARRAYS: connected by ATM

a NOW

Gives 10x (?100x) more stuff for same dollars

Allows ad hoc query load

Allows a scaleable design

Allows same hardware: SuperDAACs = PeerDAACs

Page 20: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

20

Use workstation technology (NOW)Study used RS/6000 and DEC 7000 as workstation

(they are 100k$/slice).

Should have used Compaq.

Price for 20GFlop, 24 TB disk, 2PB tape TODAY

Compaq/DLT prices computed by Gray.10% Peer DAAC costs 3M$ today, 1% Micro DAAC (200TB) costs 300K$

Type ofplatform

Maxnumberof daysto readarchive

Numberof dataservers

Numberof

compute

servers

Numberof taperobots

Dataserver

cost($M)

Diskcachecost($M)

Archive tapecost($M)

Computeplatform

cost($M)

Totalhardware

cost ($M)

WS/NTP 27.7 18 106 8 $29.5 $31.5 $12.8 $55.7 $129.5

Vector/NTP 27.7 2 4 8 $31.0 $31.5 $12.8 $128.0 $203.3

Compaq/dlt 1 100 200 1000 $4 $12 $6 $8 $30

Page 21: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

21

Just-in-time acquisition (saves 400M$)Hardware prices decline 20%-40%/year

So buy at last moment

Buy best product that day: commodity

Depreciate over 3 years so that facility is fresh. (after 3 years, cost is 23% of original). 60% decline peaks at 10M$

1996

EOS DIS Disk Storage Size and Cost

1994 1998 2000 2002 2004 2006 2008

Storage Cost M$

Data Need TB

1

10

10

10

10

10

2

3

4

5 (assume 40% price decline/year and 10% on disk.)

Page 22: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

22

What California Proposed0. Design for success: expect that millions will use the system (online)1. DBMS centric design automates discovery, access, management2. Object relational databases enable

Automate access to data so that the NASA 500, Global Change 10,000 and Internet 20,000,000 can use system.

Cache popular results, not all results (saves 3x or more) Compute on demand (saves lots of storage and cpu).

Emphasize pull processing rather than push processing.Use parallelism to get scaleup.

Do Batch as a data pump

3 Be Smart Shoppers: Use COTS hardware/software (saves 400M$)

Just-in-time acquisition (saves 400M$)Use workstation not mainframe technology (gives 10x more stuff)Depreciate over 3 years (ends in 2007 with "fresh" equipment)

4. 2 + N node architecture 2 Super DAACs for fault tolerance and for growth.

Unify the 2 "big" data storage centers with 2 big data analysis centers.Allow many “little” peer-DAACs at science/user groups

Page 23: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

23

2+N DAAC architecture2 Super-DAACs Have 2 BIG sites which

Each store ALL the data (back each other up)

no other way to archive these 15 PB databases

Each service 1/2 the queries and run a data pump

Each produces 1/2 the standard data products

Each has a BIG MIP farm next to the Byte farm (a SCF science computation facility).

N Peer-DAACs

Each stores part of the data (got from a super DAAC)

Can be NASA sponsored or private.

Same software and hardware as Super-DAACs

Super-DAACs are “banks”, Peer-DAACs are “pubs” careful anything

goes

Page 24: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

24

Minimize Operations Costs

Reduced sites (DAACs) have reduced costs

Use Mosaic, Email, Telephone user support model

Count on vendors to provide:

Network management (NetView & SMTP)

Data replication

Application software version control

Workflow control

Help desk software

More reliable hardware/software

Page 25: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

25

Unify data storage centers with data analysisData analysis (Science Computation Facilities)

need quick & high bandwidth access to DB.

WAN technology is good but not that good.

WAN technology is not free.

=> Co-Locate DAACs and SCFs.

=> two super SCFs, many peer SCFs.

Instrument teams often find a bug or new algorithm

=> reprocess all the base data to make new data set.

=> ripple effect to data consumers

=> must track data lineage.

Page 26: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

26

BudgetWe had a VERY difficult time discovering a budget.

So we did our own.

It was less.

Big savings in operations and development

Hardware savings could give bigger DAACsHughes Sequoia

COTS 108 52Development 207 50Operations 260 120Other 193 100total 766SCFs 312 DAACs 158 SuperTotal 1236

0

100

200

300

400

500

600

700

800

Hughes Sequoia

COTS

Operations

Development

Other

10-year costs

Page 27: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

27

What California Proposed0. Design for success: expect that millions will use the system (online)1. DBMS centric design automates discovery, access, management2. Object relational databases enable

Automate access to data so that the NASA 500, Global Change 10,000 and Internet 20,000,000 can use system.

Cache popular results, not all results (saves 3x or more) Compute on demand (saves lots of storage and cpu).

Emphasize pull processing rather than push processing.Use parallelism to get scaleup.

Do Batch as a data pump

3 Be Smart Shoppers: Use COTS hardware/software (saves 400M$)

Just-in-time acquisition (saves 400M$)Use workstation not mainframe technology (gives 10x more stuff)Depreciate over 3 years (ends in 2007 with "fresh" equipment)

4. 2 + N node architecture 2 Super DAACs for fault tolerance and for growth.

Unify the 2 "big" data storage centers with 2 big data analysis centers.Allow many “little” peer-DAACs at science/user groups

Page 28: Jim Gray 1 EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution.

Jim Gray

28

Challenging Problems

Design the Global Change Schema

Understand data lineage

Build discovery, analysis, visualization tools

Build an OR DBMS Including distributed,

parallel, workflowlazy-eager evaluationtertiary storage, SQLworkflow

Build a decent & reliable HSM

Build a way to operate a 1,000 node NOW.