1 U.S. Department of the Interior U.S. Geological Survey Contractor for the USGS at the EROS Data...

32
1 U.S. Department of the Interior U.S. Geological Survey Contractor for the USGS at the EROS Data Center EDC CR1 Storage Architecture August 2003 Ken Gacke Systems Engineer (605) 594-6846 [email protected]

Transcript of 1 U.S. Department of the Interior U.S. Geological Survey Contractor for the USGS at the EROS Data...

1

U.S. Department of the Interior

U.S. Geological Survey

Contractor for the USGS at the EROS Data Center

EDC CR1 Storage ArchitectureEDC CR1 Storage Architecture

August 2003

Ken GackeSystems Engineer

(605) [email protected]

2Contractor for the USGS at the EROS Data Center

Storage Architecture DecisionsStorage Architecture Decisions

Evaluated and recommended through engineering white papers and weighted decision matrices

Requirements Factors Reliability – Data Preservation Performance – Data Access Cost – $/GB, Engineering Support, O&M Scalability – Data Growth, Multi-mission, etc. Compatibility with current Architecture

Program/Project selects best solution

5Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

Online Storage Characteristics Immediate Data Access Server Limitations

Number of I/O slots System Bandwidth

Cost is Linear High Performance RAID -- $30/GB using 146GB drives Low Cost RAID -- $5/GB using ATA or IDE Drives Non RAID – Less than $5/GB using 146GB drives

Facility Costs Disk drives are always powered up Increased cooling requirements

Life cycle of 3 to 4 years

6Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

Online Storage Direct Attach Storage (DAS)

Storage directly attached to server Network Attach Storage (NAS)

TCP/IP access to storage typically with CIFS and NFS access Storage Area Network (SAN)

Dedicated high speed network connecting storage devices Storage devices disassociated from server

7Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

Direct Attach Online Storage Disk is direct attached to single server System Configuration

SCSI or Fibre Channel RAID Fibre Channel devices are typically SAN ready

Just a Bunch of Disk (JBOD) Redundant Array Independent Disk (RAID)

High Performance on the local server Manageability

Simple Configuration Resource reallocation requires physical move of controllers and

disk

8Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

Direct Attach Online Storage Advantages

High performance on local server Good for image processing and database applications

Disadvantages Data sharing limited to slower network performance Difficult to reallocate resources to other servers

9Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

Direct Attached

Host A

File System

Host B

File System

Host C

File System

100Mb Network (FTP/NFS)

100MB FC

10Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

NAS Online Storage Disk attached on server accessible over TCP/IP Network System Configuration

Fibre Channel RAID Configurations Switched Network Environment

Performance Network Switches and/or dedicated network topologies

Reliability NAS Server performs a single function thereby reducing faults RAID, Mirror, Snapshot capabilities

Easy to Manage

11Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

Network Attach Online Storage Advantages

Easy to share files among servers Network Storage support NFS and CIFS Servers can use existing network infrastructure

Good for small file sharing such as office automation Availability of fault protection such as snapshot and mirroring

Disadvantages Slower performance due to TCP/IP overhead Increases network load Backup/Restore to tape may be difficult and/or slow Does not integrate with nearline storage

12Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

Network Attached1Gb Network (NFS/CIFS)

Host A Host B Host C

File System

File System

File System

Share Files

NASServer

15Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

SAN Online Storage Disk attached within Fabric Network System Configuration

Fibre Channel RAID Configurations

Scalable High Performance High Reliability with redundant paths Manageability

Configuration becomes more complex Logical reallocation of resources

17Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

Host A

Host B

Host C

Redundancy SAN Configuration

100Mb Network

FibreSwitch

(DMF)

FibreSwitch

18Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

SAN Online Storage Architecture Disk Farm

Multiple servers share large disk farm Server mounts unique file systems

Clustered File Systems Multiple servers share a single file system Software Required – Vendor solutions include

SGI CXFS ADIC StorNext File System Tivoli SANErgy

19Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

Host A

Host B

Host C

Disk Farm SAN Configuration

100Mb Network

FibreSwitch

Logicalreallocationof disk

20Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

Host A

Host B

Host C

Cluster SAN Configuration

100Mb Network

FibreSwitch

CXFS

CXFS

ClusteredFile System

21Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

SAN Risks Cost is higher than DAS/NAS Technology Maturity

Solutions are typically vendor specific Application software dependencies

Infrastructure Support Complexity of Architecture Management of SAN Resources Sharing of storage resources across multiple

Programs/Projects

22Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

SAN Benefits Administration flexibility

Logically move disk space among servers Large capacity drives can be sliced into smaller file systems Scales better than direct attach Integrate within nearline configuration

Data Reliability Storage disassociated from the server Fault Tolerant with Redundant Paths

Increase Resource Utilization Reduce the number of FTP network transfers Logically allocate space among servers

23Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

Host A

Host B

Host C

SAN with Nearline Configuration

1Gb Network

FibreSwitch

CXFS

DMF/CXFS

ClusteredFile System

Tape Library

24Contractor for the USGS at the EROS Data Center

Online/Nearline Cost ComparisonOnline/Nearline Cost Comparison

0

500

1000

1500

2000

2500

3000

3500

4000

5yr

Co

st

(1000s)

5TB 10TB 20TB 40TB 80TB

Perf RAID

Bulk RAID

PH 9840C

PH 9940B

Use of Existing Infrastructure (CR1 Silo)

25Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

Bulk RAID Storage Considerations Manageability

Server connectivity constraints Many “islands” of storage

Multiple storage management utilities Multiple vendor maintenance contracts

Data Reliability Loss of online file system requires full restore from backup

On average, could restore one to two terabyte per day Performance

Multiple user access will reduce performance Life Cycle

Disk storage life cycle shorter then tape technologies

26Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

SAN Nearline Storage Data Access

Data stored on infinite file system Immediate access to data residing on disk cache Delayed access for data retrieved from tape

Access via LAN using FTP/NFS Access via SAN Clustered File System

SGI DMF/CXFS Server SGI, SUN, Linux, NT clients

27Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

SAN Cluster Proposal Mass Storage System & Product Distribution System (PDS) Limit Exposure to Risk

Servers are homogeneous Implement with Single dataset Data is file orientated Data currently being FTP

Anticipated Benefits Improved performance Reduce total disk capacity requirements Experience for future storage solutions

29Contractor for the USGS at the EROS Data Center

Current DMF/SAN ConfigurationCurrent DMF/SAN Configuration

DMF ServerProduct Distribution CXFS SAN Storage

Tape Drives 8x9840 2x9940

1Gb Fibre 2Gb Fibre

Disk Cache /dmf/edc 68GB/dmf/doqq 547GB/dmf/guo 50GB/dmf/pds 223GB/dmf/pdsc 547GB

30Contractor for the USGS at the EROS Data Center

CR1 Mass Storage SystemCR1 Mass Storage System

Nearline Data Storage

0

4

8

12

16

20

24

28

32

36

40

44

48

Dec

-93

Dec

-94

Dec

-95

Dec

-96

Dec

-97

Dec

-98

Dec

-99

Dec

-00

Dec

-01

Dec

-02

Ter

ab

ytes

Sto

red

31Contractor for the USGS at the EROS Data Center

CR1 Mass Storage SystemCR1 Mass Storage System

Nearline Data Storage by Data Type

0

4

8

12

16

20

24

Dec

-93

Dec

-94

Dec

-95

Dec

-96

Dec

-97

Dec

-98

Dec

-99

Dec

-00

Dec

-01

Dec

-02

Dec

-03

Ter

ab

ytes

Sto

red

General

Archive

Ortho

PDS

32Contractor for the USGS at the EROS Data Center

CR1 Mass Storage SystemCR1 Mass Storage System

Nearline Data Storage

0

10

20

30

40

50

60

70

80

90

100

Dec

-93

Dec

-94

Dec

-95

Dec

-96

Dec

-97

Dec

-98

Dec

-99

Dec

-00

Dec

-01

Dec

-02

Dec

-03

Dec

-04

Ter

ab

ytes

Sto

red

General

Archive

Ortho

PDS

Total

33Contractor for the USGS at the EROS Data Center

CR1 Mass StorageCR1 Mass Storage

0

1

2

3

4

5

6

7

8

9

Te

rab

yte

Pe

r M

on

th

19

93

19

94

19

95

19

96

19

97

19

98

19

99

20

00

20

01

20

02

20

03

Nearline Monthly Average Data Archive/Retrieve

Data Archived

Data Retrieved

34Contractor for the USGS at the EROS Data Center

CR1 Mass StorageCR1 Mass Storage

0

1

2

3

4

5

6

7

8

MB

/Se

c

19

94

19

95

19

96

19

97

19

98

19

99

20

00

20

01

20

02

20

03

Nearline Average Transfer Rate

Data Archived

Data Retrieved

35Contractor for the USGS at the EROS Data Center

CR1 Mass StorageCR1 Mass Storage

0

200

400

600

800

1000

1200

1400

Gig

ab

yte

19

96

19

99

20

02

20

03

Largest Single Day Data Transfers

Data Archived

Data Retrieved

Description 1996 – 3490, pre DOQQ1999 – D-3, DOQQ2002 – 9840, DOQQ2003 – 9840/9940, UA/AVHRR

Av 12.1MB/sec

36Contractor for the USGS at the EROS Data Center

CR1 DMF FY04 BudgetCR1 DMF FY04 Budget

DescriptionEstimated

CostStorageTek Maintenance $41,000.00SGI Maintenance (O300, DMF/SAN) $22,000.00Sun Maintenance $1,300.00ITS Charges (Labor, Legato) $20,000.00Infrastructure Upgrades $41,700.00Project Staff $64,000.00

Total $190,000.00

37Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

Multi Tiered Storage Vision Online

Supported Configurations DAS – Local processing such as image processing NAS – Data sharing such as office automation SAN – Production processing such as product generation

Data accessed frequently Nearline

Integrated within SAN Scalable for large datasets and less frequently accessed data Multiple Copies and/or Offsite Storage

38Contractor for the USGS at the EROS Data Center

Storage TechnologiesStorage Technologies

SAN – Final Thoughts SAN Technology Maturity

SAN solution should be from a single vendor Program/Project SAN solution benefits

+ Decrease storage requirements+ Increase performance+ Increase reliability+ Increase flexibility of resource allocations- Increase cost (hardware/software)- Increase configuration complexity