Data Systems Modernization (DSM) Project: Development ... · 1 Data Systems Modernization (DSM)...

3
1 Data Systems Modernization (DSM) Project: Development, Deployment, and Direction Robert M. Whitten Jr., ORNL ABSTRACT: The Data Systems Modernization (DSM) project was undertaken to consolidate and update the current information systems of the Oak Ridge Leadership Computing Facility (OLCF). The project combined the Resource Allocation and Tracking System (RATS), the New Account Creation System (NACS) and open-source process management and business intelligence software to streamline the data processing systems of the OLCF. This paper will discuss the development, deployment and future directions of this ongoing project. KEYWORDS: XT5, Cray, RATS, Jaguar, Gaea, Kraken, NACS, DSM 1 Introduction 1 1.1 ORNL Computing Facility The ORNL computing facility is home to the Department of Energy’s (DOE) Oak Ridge Leadership Computing Facility (OLCF). The facility houses some of the world’s largest computational resources. The center is also known as the National Center for Computational Sciences (NCCS). In addition to DOE resources, the facility also houses systems for the National Science Foundation (NSF) and the National Oceanic and Atmospheric Administration (NOAA). (see the below image) Sciences Office, Office of Advanced Scientific Computing Research, Office of Science, U. S. Department of Energy, under contract No.DE-AC05-00OR22725 with UT-Battelle, LLC. 1.2 What is DSM? The Data Systems Modernization (DSM) project is a software project that strives to consolidate the various data sinks that exist the NCCS. DSM is a business intelligence tool as well as a data-warehousing tool. Its architecture can be thought of as a silo-architecture of middle-ware applications. DSM has the primary function of acting as an extract-transform-load (ETL) tool. DSM is a combination of several middle-ware tools that are in use at the NCCS. These tools include the Resource Allocation and Tracking System (RATS) the New Account Creation System (NACS), the DowntimeDB, and HPSS stats. Components of these middle-ware tools use a combination of technologies such as MySQL, LDAP, and scripts to access data. DSM adds additional components such as ProcessMaker, LDAP synchronization scripts, and LogiXML. 2.0 Middle-ware Applications 2.1 RATS The Resource Allocation and Tracking System (RATS) is the primary data source of the NCCS. The system tracks allocation usage on a per-user basis. User allocation usage is then charged on a per-project basis. 2 OLCF/NCCS Computing Complex Peak performance 1.03 PF/s Memory 132 TB Disk bandwidth > 50 GB/s Square feet 2,300 Power 3 MW !"#$% '( )*"+,-./ 0'/$ #'1"+(23 4'0#2$"+ 567'*63 84"6*94 6*: ;$0'/#<"+94 ;:09*9/$+67'*./ 0'/$ #'1"+(23 4'0#2$"+ !"#$"% Peak performance 2.33 PF/s Memory 300 TB Disk bandwidth > 240 GB/s Square feet 5,000 Power 7 MW '%"()* +,-- .")" Peak Performance 1.1 PF/s Memory 248 TB Disk Bandwidth 104 GB/s Square feet 1,600 Power 2.2 MW 567'*63 =49"*4" >'2*:67'*./ 0'/$ #'1"+(23 4'0#2$"+ ?@ ?A ?B@

Transcript of Data Systems Modernization (DSM) Project: Development ... · 1 Data Systems Modernization (DSM)...

1

Data Systems Modernization (DSM) Project: Development, Deployment, and Direction

Robert M. Whitten Jr., ORNL

ABSTRACT: The Data Systems Modernization (DSM) project was undertaken to consolidate and update the current information systems of the Oak Ridge Leadership Computing Facility (OLCF). The project combined the Resource Allocation and Tracking System (RATS), the New Account Creation System (NACS) and open-source process management and business intelligence software to streamline the data processing systems of the OLCF. This paper will discuss the development, deployment and future directions of this ongoing project. KEYWORDS: XT5, Cray, RATS, Jaguar, Gaea, Kraken, NACS, DSM

1 Introduction1 1.1 ORNL Computing Facility The ORNL computing facility is home to the Department of Energy’s (DOE) Oak Ridge Leadership Computing Facility (OLCF). The facility houses some of the world’s largest computational resources. The center is also known as the National Center for Computational Sciences (NCCS). In addition to DOE resources, the facility also houses systems for the National Science Foundation (NSF) and the National Oceanic and Atmospheric Administration (NOAA). (see the below image)

1 Research supported by the Mathematics, Information and Computational

Sciences Office, Office of Advanced Scientific Computing Research,

Office of Science, U. S. Department of Energy, under contract

No.DE-AC05-00OR22725 with UT-Battelle, LLC.

1.2 What is DSM? The Data Systems Modernization (DSM) project is a software project that strives to consolidate the various data sinks that exist the NCCS. DSM is a business intelligence tool as well as a data-warehousing tool. Its architecture can be thought of as a silo-architecture of middle-ware applications. DSM has the primary function of acting as an extract-transform-load (ETL) tool. DSM is a combination of several middle-ware tools that are in use at the NCCS. These tools include the Resource Allocation and Tracking System (RATS) the New Account Creation System (NACS), the DowntimeDB, and HPSS stats. Components of these middle-ware tools use a combination of technologies such as MySQL, LDAP, and scripts to access data. DSM adds additional components such as ProcessMaker, LDAP synchronization scripts, and LogiXML. 2.0 Middle-ware Applications 2.1 RATS The Resource Allocation and Tracking System (RATS) is the primary data source of the NCCS. The system tracks allocation usage on a per-user basis. User allocation usage is then charged on a per-project basis.

2

OLCF/NCCS Computing Complex

Peak performance 1.03 PF/s

Memory 132 TB

Disk bandwidth > 50 GB/s

Square feet 2,300

Power 3 MW

!"#$%&'(&)*"+,-./&0'/$&#'1"+(23&4'0#2$"+&

567'*63&84"6*94&6*:&&;$0'/#<"+94&;:09*9/$+67'*./&&0'/$&#'1"+(23&4'0#2$"+&

!"#$"%&

Peak performance 2.33 PF/s

Memory 300 TB

Disk bandwidth > 240 GB/s

Square feet 5,000

Power 7 MW

'%"()*&

+,--&.")"&

Peak Performance 1.1 PF/s

Memory 248 TB

Disk Bandwidth 104 GB/s

Square feet 1,600

Power 2.2 MW

567'*63&=49"*4"&&>'2*:67'*./&0'/$&&#'1"+(23&4'0#2$"+&

?@&

?A&

?B@&

2

The below diagram describes the components of RATS.

Jobs IDLog

Cycle Servers

Jobs Monitor

Metascheduler

Job Status

AdmissibilityTester

Job Statistics

Static Attributes

HostConfiguration

ResourceCharges

Projects

RATS Users

Submit JobQuery Job/Receive Job Info

???

Submitted Job

Check Scheduled Job Info/Remove Info

Validate

Char

ges

Validate R

ATSU

sers

Res

ourc

eC

onsu

mpt

ion

Rep

ort

Test

Job V

alidit

y

Check Job Validity/Ack

Check Machine Availabilty

Stats f

rom

Consu

mption

Platform Users

Validate

PlatformU

sers

Sch 0 Sch N...

Resource Status

Jobs ID LogManager

Job ID Registration

Update Resources

ScheduledJobs Dataset

Scheduled JobsManager

Report Job Charges

ResourceDataset

ProjectsDataset

RATS UsersDataset

PlatformUsers Dataset

Job StatusDatasetJob Statistics

Dataset

Resource StatusDataset

Host ConfDataset

StaticAttributesDataset

2.1 NACS The New Account Creation System (NACS) is a ‘last mile’ system designed to take user applications from a web-based interface and create system user accounts and file system spaces. NACS uses a push-pull architecture to update necessary system resources such as LDAP and file systems.

7

!"#$%&'(')'*+%

!"#$%$,-./(*%

0&"1%

&'('%$23-,+%03*(-+%

!"#$%%

!4$%

2.2 DowntimeDB The DowntimeDB provides a mechanism for system administrators to manually enter downtime information. The data is then used to provide management reports.

8

DowntimeDB

• Manual entry of downtime information

!"#$%&'(!)*)+),'(

!)*)(-"./0'( 1'2"/*,(

2.3 HPSS Stats Users store data on the High Performance Storage System (HPSS). The HPSS Stats component reads the appropriate metadata stores to determine the amount of data stored for archival.

9

HPSS Stats

• Data read directly from HPSS metadata

!"##$ %&'()*+$

3.0 Why DSM? The issue with running multiple middle-ware applications is data redundancy and inconsistency. Having user data common among the different data sinks causes inaccurate data to exist in one area. This problem is exacerbated by not having a consistent interface to entering updates or changes to the various data sinks. DSM’s primary goal is to provide a consistent interface to the data stored for NCCS users and resources. A secondary goal of DSM is provide substantially better reporting mechanism to allow management personnel an easier time at generating needed reports.

3

4.0 Additional DSM Components 4.1 ProcessMaker ProcessMaker is open source workflow software solution. It provides a way for users to translate their business processes into an automated system capable of generating emails and notifications to personnel of pending workflow items. In the initial phases, DSM will use ProcessMaker for account and project application creation. In later phases, DSM will migrate increased functionality to ProcessMaker. 4.2 Interface Scripts DSM provides scripts developed at ORNL to allow staff to modify user, group, project, and allocation attributes. These scripts are mostly written in Python. The functionality provided by these scripts will eventually be migrated to ProcessMaker. 4.3 LogiXML LogiXML is a business intelligence (BI) tool that provides enhanced ability to produce reports and information based on data stored in DSM. It is a commercial product similar to Crystal Reports or any of the other BI products on the market. 5.0 DSM Deployment Schedule 5.1 Phase 1 In this phase, DSM will be deployed on NOAA systems. This instance of DSM will consist of the RATS and NACS components. The difference will be in how data is entered into the system. A synchronization script will retrieve data from a remote LDAP instance and store directly into a local LDAP instance. There will be neither LogiXML nor ProcessMaker component. NOAA is a remote user facility and provides their own application and reporting functionality. This phase was completed in the first quarter of fiscal year 2011.

5.2 Phase 2 In this phase, DSM will be deployed on DOE systems. LogiXML and ProcessMaker components will be deployed. Additional access scripts will be created to aid personnel in accessing DSM. This phase is targeted for completion in the fourth quarter of fiscal year 2011. 5.3 Phase 3 In this phase, additional requirements that were gathered during previous phases will be considered. Added functionality beyond account creation will be implemented in ProcessMaker. RATS has an open source descendent, which has recently (as of April 2011) been released. Knows as DataMux (available on Source Forge), this release has enhanced components that may replace the core components of RATS as in deployed with DSM. Consolidation of the NOAA and DOE instances of DSM will also be considered. 6.0 Conclusion DSM has to goal of consolidating and augmenting the software data systems of the NCCS. By combining feature of RATS, NACS, and other systems, DSM will become the authoritative data source of the NCCS.

About the author

Robert Whitten Jr. is a member of the User Assistance and Outreach Group at Oak Ridge National Laboratory and can be reached at [email protected].