Storage Management in Grid December 11, 2002
-
Upload
datacenters -
Category
Technology
-
view
302 -
download
3
Transcript of Storage Management in Grid December 11, 2002
![Page 1: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/1.jpg)
Storage Management in Grid
December 11, 2002
Sangyong Ha, Chan-Hyun Youn
Information and Communications Univ.
![Page 2: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/2.jpg)
2
OutlineBackgroundExisting Storage Management
SystemsDPSSSRBData management in Globus
Our ApproachConcluding Remarks
![Page 3: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/3.jpg)
3
Background Problem
To enable a geographically distributed community to performance analyses on petabytes of data efficiently and cost-effectively
A large, geographically dispersed group of researchers Require access to huge amounts of data
Solution Services for handling remote access to large data sets(Storage System) in a grid environment Aimed at Data Intensive Grid Applications
High Energy Physics, Astronomy, Climate modeling, BioInformatics, many others
![Page 4: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/4.jpg)
4
BackgroundData Grid Requirement
Seamless access to data and information stored at local and remote sites
Virtualization of data, collection and meta information
Dataset Scaling – size & numberIntegrate Data Collections & Associated MetadataMultiplicity of Platforms, Resource & Data TypesAuthentication, Access Control, Auditing FacilitiesHandling Legacy Data & Methods
![Page 5: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/5.jpg)
5Existing Storage Management Systems in Data Grid
Data storage resource management systemsDPSS, HPSS: focus on high-performance access,
utilize parallel data transfer, striping SRB: connects heterogeneous data collections,
uniform client interface, metadata queries DFS: focus on high-volume usage, dataset
replication, local cachingGlobus data grid support: interface to many
storage system, common extensible transfer protocol
Incompatible Existing Storage Systems
![Page 6: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/6.jpg)
DPSS( Distributed Parallel Storage
Server)
![Page 7: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/7.jpg)
7
DPSS(A data cache storage server)Developed by LBNL with support from DoEA network data cache to provide high-speed
parallel access to remote, large, image-like, read-mostly data(cache, not a storage system)
At the application level, the DPSS is a semi-persistent cache of named data-objects, and at the storage level it is a logical block server
Parallel Transfer(Parallelism for many component), Pipeline support, Network Tuning, Agent based Management
Not appropriate for small block R/W
![Page 8: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/8.jpg)
8
DPSS(Architecture)
Source : LBNL
![Page 9: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/9.jpg)
9
DPSS(An Architectural Model)
Source : LBNL
![Page 10: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/10.jpg)
10
DPSS(Overall Architecture)
Source : LBNL
![Page 11: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/11.jpg)
11
Data Cache Research ItemsHow and when to migrate filesWhen is it better to move the processing to the
data, instead of the data to the processingHow to reserve space on the data cacheHow to achieve high data rates across wide
area networksHow to provide a global data set name spaceHow to ensure efficient data set consistency
![Page 12: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/12.jpg)
SRB(Storage Resource Broker)
![Page 13: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/13.jpg)
13
Overview Developed at the San Diego Supercomputer
Center(SDSC) A middleware to provide distributed clients with
uniform access to diverse storage resources, including: Unix file system Archival storage systems such as UNITREE and HPSS Database objects managed by various DBMS including DB2,
Oracle and lllustra MCAT(Metadata Catalogue) to facilitate the brokering
SRB metadata is managed by an MCAT server Stores metadata associated with data sets, users(access
control) and resources
![Page 14: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/14.jpg)
14
Architecture
SRBArchives
HPSS, ADSM,UniTree, DMF
DatabasesDB2, Oracle,
Sybase
File SystemsUnix, NT,Mac OSX
Application
C, C++, Linux I/O
Unix Shell
Dublin Core
Resource,User
User Defined
ApplicationMeta-data
RemoteProxies
DataCutter
Third-partycopy
Java, NTBrowsers
WebPrologPython
MCATHRM
Source : SDSC
![Page 15: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/15.jpg)
15
Concept Abstraction of User Space
Single sign-on, Multiple authentication schemes Virtualization of Resources
Resource Location, Type & Access transparency Logical Resource Definitions - bundling
Abstraction of Data and Collections Virtual Collections: Persistent Identifier and Global Name Space Replication & Segmentation
Data Discovery – System & application metadata User-defined Metadata Attribute-based Access (path names become irrelevant)
Uniform Access Methods APIs, Command Line, GUI Browsers, Web-Access (Portal,CGI) Parallel Access with both Client and Server-driven strategies
![Page 16: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/16.jpg)
16
MCAT(Metadata Catalog) Stores metadata about
Data sets, Collections, Users, Resources, Proxy Methods Maintains replica information for data & containers Provides “Collection” abstraction for data Provides “Global User” name space & authentication Provides Authorization through ACL & tickets Maintains Audit trail on data & collections Maintains metadata for methods and resources Provides Resource Transparency - logical resources Implemented as a relational database
Oracle or DB2 or Sybase
![Page 17: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/17.jpg)
17
Research Items Large Datasets; Large Number of Datasets; Scaling Distributed, Heterogeneous Storage, Handling
Legacy Data and Methods Discovery and Search, Fault Tolerance and Load
Distribution, Replication Scheduling, Caching & Data Placements, Data
Migration over Time & Space Uniform Name Space Types of Metadata
XML to unstructured Standardized to User-defined Metadata Large Number of Attributes
Presentation – user friendly, Maintenance
![Page 18: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/18.jpg)
18
DPSS + HPSS + SRB HPSS
High Performance Storage System
HPSS + DPSS HPSS <-> DPSS
Integration Works on integrating the
DPSS into several Grid-like tools SRB + DPSS : DPSS can be
used as a SRB device Globus + DPSS SRB + Globus
Source : LBNL
![Page 19: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/19.jpg)
Data Management in Globus
![Page 20: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/20.jpg)
20
Overview Interface to many storage system(DPSS, File
systems, SRB) Decouple low-level data transfer mechanisms
form storage services Three Major Components
Data Transport and Access : Grid FTP based on GSI(Grid Security Infrastructure)
Data Replication : a Replica Location Service and Replica Management
Globus Access to Secondary Storage(GASS) : allows applications to access data stored in any remote filesystem by specifying a URL. (HTTP URL or x-gass URL)
![Page 21: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/21.jpg)
21
Architecture
Source : Globus, ANL
![Page 22: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/22.jpg)
22
Major Functions(1)Data Transport and Access(Grid FTP)
PKI or Kerberos supportThird-party control of data transferParallel data transfer Striped data transfer Partial file transferAutomatic negotiation of TCP buffer/window sizesSupport for reliable and re-startable data transferIntegrated instrumentation, for monitoring ongoing transfer performance
![Page 23: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/23.jpg)
23
Data ReplicationMaintain a mapping between logical names for files
and collections and one or more physical locationsLow-level replica Location and High-level reliable
replicationCombine with GIS(NWS, MDS) to build replica selection
service(find best replica) GASS
Libraries and utilities are provided to eliminate the need tomanually login to sites and ftp files install a distributed file systemCurrently the ftp and x-gass (GASS server) protocols are
supported
Major Functions(2)
![Page 24: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/24.jpg)
24
Operation of SRB and Globus
Source : GGF Performance & Information WG
Application
ClientReplicaCatalog
FTP Daemon
Storage System
Application
SRB Client
MetadataCatalog
SRB Server
Storage System
Globus SRB
![Page 25: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/25.jpg)
25
Data Handling Implementations
Source : GGF Performance & Information WG
![Page 26: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/26.jpg)
Our Approach
![Page 27: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/27.jpg)
27
Architectural Model
![Page 28: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/28.jpg)
28
Test System
PCP-III
Linux
CD-ROMCD-ROMIDE HDDIDE HDDF. Drive
PCP-III
Windows
CD-ROMCD-ROMIDE HDDIDE HDDZip Drive
PCP-IVI
Windows
CD-ROMCD-ROMIDE HDDIDE HDDF. Drive
PCAMDLinux
CD-ROMCD-ROMIDE HDDIDE HDDF. Drive
File ServerLinux
Dual NICMySQL 외부
NETWORK SWITCH
![Page 29: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/29.jpg)
29
Replica Creation/Location/RegistrationReplica Discovery/Lookup/SelectionReplica Deletion and ConsistencyReplication and Load BalancingReplication and
Robustness(Redundancy)Expression of Replica Data(Metadata)Replication Overlay Network
Replica Management
![Page 30: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/30.jpg)
30
A Scenario Replica Selection Process Read Process
![Page 31: Storage Management in Grid December 11, 2002](https://reader033.fdocuments.in/reader033/viewer/2022051705/58a725bf1a28ab0d0d8b4f99/html5/thumbnails/31.jpg)
31
Concluding RemarksIntroduction to Storage Management
in GridDPSS, SRB, Globus Data Grid support
Our approach for Storage management system
Develop Dynamic Data(Replica) Selection and Scheduling model for data intensive Grid applications