On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

24
On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011 Allan Darling Deputy Director NCEP Central Operations Where America’s Climate, Weather, Ocean and Space Weather Services Begin

description

On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011. Allan Darling Deputy Director NCEP Central Operations. Where America’s Climate, Weather, Ocean and Space Weather Services Begin. On-Time Product Delivery. NCEP Mission - PowerPoint PPT Presentation

Transcript of On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

Page 1: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

On-Time Product DeliveryCOPC - HPCC Best Practices

14-15 March 2011

Allan DarlingDeputy Director

NCEP Central Operations

Where America’s Climate, Weather, Ocean and Space Weather Services Begin

Page 2: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 2

On-Time Product Delivery

Page 3: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 3

NCEP MissionNCEP delivers science-based environmental predictions to the

Nation and the global community. We collaborate with partners and customers to produce reliable, timely, and accurate analyses, guidance, forecasts and warnings for the protection of life and property and the enhancement of the national economy.

NCEP Goals and Strategies• Information Systems

– Enhance the real-time, on-time, all the time access, display and delivery of NCEP products and services.

Page 4: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 4

On-Time Product DeliveryThe principle performance metric for NCEP Operational

Supercomputing, measured since 1999

Underlying PhilosophyProduct delivery is the last event in the whole

modeling process. To deliver on time, the entire chain of events must work as intended.

One Measurement of Operational Success

Page 5: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 5

Incentives for CapabilityMeasurement Area Indicator 2010

Baseline Calendar Year 2010

Comments

Customer Results 1-day Precipitation Forecast threat score

29 35

Customer Results Seasonal Heidke Temperature skill score:

19 18

Mission and Business Results

48-Hour Hurricane Tracking Forecast 48-hr Hurricance tracking intensity Forecast

142 miles 14 knots

95 nm* 14.7 *

*The final outcomes of these measures are reported at the end of hurricane season which is Nov. 30.

Processes and Activities On Time Product Generation

99.92% 99.85%

Technology System Availability 99% 99.98%

Technology Time to Switch to Backup System

30 min. 9.6 min

Page 6: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 6

On-Time Product Delivery

Dual System CM & Ops Practice Refinement

Page 7: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 7

Enabling the Capability

SystemArchitecture Technical Practice

High Availability Configuration Management

Operations Practices

On-Time Product Delivery

Metrics

Page 8: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 8

Technical PracticeMeasurement

• Products are “on time” if they are released within 15 minutes of their assigned target delivery time

• Target delivery times are based on 30-day average availability times of products

• Target delivery times are adjusted as needed– Model changes– System changes

• New products added as part of the model implementation process

• Timeliness measured for ~720,000 products today

Page 9: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 9

Technical PracticeMeasurement

• Some products are excluded from measurement– Inconsistent delivery times (e.g. on-demand

dispersion models)– Not delivered through operational dissemination

services• Measurement performed daily at 1200Z

– Entire previous day– First half of current day

Page 10: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 10

Operations Practice• Daily Meeting to review:

– Operations log– Status of open issues– On time delivery metrics– Calendar of planned events

• Weekly Meeting with HPC vendors to review:– Facility and system status– System utilization– Vendor open issues

Page 11: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

12:1

0:00

12:4

0:00

13:1

0:00

13:4

0:00

14:1

0:00

14:4

0:00

15:1

0:00

15:4

0:00

16:1

0:00

16:4

0:00

17:1

0:00

17:4

0:00

18:1

0:00

18:4

0:00

19:1

0:00

19:4

0:00

20:1

0:00

20:4

0:00

21:1

0:00

21:4

0:00

22:1

0:00

22:4

0:00

23:1

0:00

23:4

0:00

0:10

:00

0:40

:00

1:10

:00

1:40

:00

2:10

:00

2:40

:00

3:10

:00

3:40

:00

4:10

:00

4:40

:00

5:10

:00

5:40

:00

6:10

:00

6:40

:00

7:10

:00

7:40

:00

8:10

:00

8:40

:00

9:10

:00

9:40

:00

10:1

0:00

10:4

0:00

11:1

0:00

11:4

0:00

0

10

20

30

40

50

60

70

80

90

100

Percent On-Time Product Creation for Stratus/Cirrus

<15 Minutes <10 Minutes <5 Minutes

Time (GMT)

Per

cen

t3/2/2011 3/3/2011to

Percent less than 15 min delayed for Wednesday (00Z to 23:59Z)

Percent less than 15 min delayedfor Thursday (00Z to 12Z)

95.1258 100

Page 12: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011
Page 13: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

2011

0201

2011

0202

2011

0203

2011

0204

2011

0205

2011

0206

2011

0207

2011

0208

2011

0209

2011

0210

2011

0211

2011

0212

2011

0213

2011

0214

2011

0215

2011

0216

2011

0217

2011

0218

2011

0219

2011

0220

2011

0221

2011

0222

2011

0223

2011

0224

2011

0225

2011

0226

2011

0227

2011

0228

80

82

84

86

88

90

92

94

96

98

100

On-time Product Delivery

Daily On-time

Month-to-date

Date

Pro

du

cts

Del

iver

ed O

n-t

ime

(%)

Month-to-Date On Time:99.334%

2/15/2011:06Z HI and AK smoke products MISSING due to firewall problem that created comms loss;

06Z HIRESW products 85 minutes late due to

time out from over-loaded node

2/16/2011:12Z GFS, 12Z GEFS, 12Z WAVE,

12Z HIRESW, 14Z RUC, 15Z SREF, 15Z/20Z RTMA, 16Z/17Z/18Z

LAMP, 18Z NAM, and 18Z GFS products 15-150 minutes late due to

CCS issues

2/17/2011:06Z HIRESW and 06Z GFS

storm surge files 18-36 minutes late due to to landing on a prob-

lematic node

2/19/2011:18Z GFS, 18Z HIRESW, and 18Z OMB files 15-42 minutes late due to gpfs

resource contention

2/28/2011:12Z ECMWF prod-

ucts 19 minutes late due to implementa-

tion glitch on ftp server

Page 14: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 14

2011

0101

2011

0102

2011

0103

2011

0104

2011

0105

2011

0106

2011

0107

2011

0108

2011

0109

2011

0110

2011

0111

2011

0112

2011

0113

2011

0114

2011

0115

2011

0116

2011

0117

2011

0118

2011

0119

2011

0120

2011

0121

2011

0122

2011

0123

2011

0124

2011

0125

2011

0126

2011

0127

2011

0128

2011

0129

2011

0130

2011

0131

98

98.2

98.4

98.6

98.8

99

99.2

99.4

99.6

99.8

100

On-time Product Delivery

Daily On-time

Month-to-date

Date

Pro

du

cts

Del

iver

ed O

n-t

ime

(%)

Month-to-Date On Time: 99.953%

1/28/2011:00Z ECMWF data was cor-

rupt or missing. All data was then resent. All data arrived 417 minutes late; 15Z SREF products and 18Z NAM prod-ucts 15-32 minutes late due to silent failure of 15Z SREF

1/31/2011:00Z UK-

MET prod-ucts 88

minutes late to MISSING

due to dataflow

problems at UKMET

Page 15: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 15

On-Time Product Delivery

Dual System CM & Ops Practice Refinement

Page 16: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 16

CM Incentive

• Backup supercomputer implemented, with associated IT infrastructure and requirements– Network between systems– System configuration synchronization– Coordinated model implementations– Failover capability

Expectation – Better Performance

Reality – Greater Complexity

Page 17: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 17

Configuration Management• Ensure system integrity• Weekly meeting to review executed and

proposed changes• Before change occurs…

– Validate and test– Schedule appropriately– Review and approve– Communicate with customers

• After change occurs…– Identify and communicate outcomes

Page 18: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 18

Configuration Management

• Covers all NCO IT practice, not just supercomputers

• Includes NWS and other partners• Full-time staff (primary and backup)• Weekly tempo with daily tie-in to operations

Page 19: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 19

On-Time Product Delivery

Dual System CM & Ops Practice Refinement

Page 20: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 20

CM Evolution w/ On-time Feedback

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

June

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

June

July

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

June

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Jul

May

June

July

98.0

98.5

99.0

99.5

100.0

Pe

rce

nt

on

Tim

e

2004 2005 2006 2007 2008 2009 2010

First CM Attempt CM Process Focus

CM Refinement

Page 21: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 1 3 5 7

0

10

20

30

40

50

60

70

80

90

100

110

120 0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

RFC Classification and Rate of Problems Accelerated Instability Accelerated High Benefit Routine Significant Major Problematic Withdrawn before Implementation

Withdrawn after Implmentation

Week

Ex

ec

ute

d C

ha

ng

es

GEFS

Prob

lematic or w

ithd

rawn

Q2

Q3

Q4

GFSQ

1

AQFS HI&AK

SSTOIQD

GENESIS (5)MAG (4)

WSR_88d(2)

GLOFS &

NAEFS

NDFD &

RTMA

Q2

Change Metrics

Last 12 months – 15 changes withdrawn out of 1004

Page 22: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 22

Ancillary Benefits• Daily review

– Identifies performance problems before customers are affected

– Reveals silent failures• Weekly & Monthly Reviews

– Identify system management gaps– Identify model instability

Page 23: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

COPC HPCC Best Practices - 14-15 March 2011 23

On-Time Product Delivery

Yearly Average2006: 99.42%2007: 99.70%2008: 99.82%2009: 99.85%2010: 99.83%

Page 24: On-Time Product Delivery COPC - HPCC Best Practices 14-15 March 2011

Questions / Discussion