Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance...
Transcript of Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance...
![Page 1: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/1.jpg)
Joachim Gabler
Software Engineer
Sun Microsystems
Utilizing Databases in Grid Engine 6.0
http://sun.com/grid
![Page 2: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/2.jpg)
Copyright © 2003 Sun Microsystems Inc.
Current status
● flat file spooling– binary format for jobs– ASCII format for other objects
● accounting file● statistics file● history file
●
![Page 3: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/3.jpg)
Copyright © 2003 Sun Microsystems Inc.
Reasons for database usage
● Scalability– bottleneck qmaster spooling– accounting
● AccessibilitySQL, ODBC, JDBC
● ACIDatomicity, consistency, isolation, durability
● Maintenanceonline backup, truncate historical data
![Page 4: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/4.jpg)
Copyright © 2003 Sun Microsystems Inc.
Plans for 6.0
Two separate applications of databases● Spooling Database– replace current flat file spooling– reflects qmaster's internal data structures
● Accounting and Reporting Database– no impact on core system– structure suited for report generation– precalculated values
![Page 5: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/5.jpg)
Copyright © 2003 Sun Microsystems Inc.
Database in the core system
● Spooling Framework– abstraction layer
● Multiple spooling methods– classic– new flat file– SQL database (PostgreSQL)– BerkeleyDB
![Page 6: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/6.jpg)
Copyright © 2003 Sun Microsystems Inc.
The spooling framework
● abstraction layer (spooling context)
● extendable (add shared libs)
● combine spooling methods
Application(qmaster)
Spooling Context
ClassicSpooling
FlatfileSpooling
SQL-DBSpooling
Berkeley-DBSpooling
Filesystem Filesystem Spooling Database Database File
DatabaseServer
![Page 7: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/7.jpg)
Copyright © 2003 Sun Microsystems Inc.
Classic spooling
● pros– needn't install database server– files can be read and edited (besides jobs)
● cons– slow, esp. on NFS mounted filesystems– difficult to analyze / generate reports– hardcoded functions for each data type– very limited transaction handling– no consistency guarantied
![Page 8: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/8.jpg)
Copyright © 2003 Sun Microsystems Inc.
New Flat File Spooling
● pros– needn't install database server– files can be read and edited (besides jobs)– more flexible than classic spooling (generic spooling
functions)● cons– slow, esp. on NFS mounted filesystems– difficult to analyze / generate reports– very limited transaction handling– no consistency guarantied
![Page 9: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/9.jpg)
Copyright © 2003 Sun Microsystems Inc.
SQL Database
● pros– fast– ACID– easy to access with standard tools– generic spooling functions– open for future development (replication, distributed
database ...)● cons– have to install and maintain a database server– requires tuning expertise
![Page 10: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/10.jpg)
Copyright © 2003 Sun Microsystems Inc.
Berkeley DB
● pros– very fast– ACID– needn't install database server– flexible spooling functions– open for future development (replication ...)
● cons– difficult data access (requires maintenance client)– not supported by standard analysis tools
![Page 11: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/11.jpg)
Copyright © 2003 Sun Microsystems Inc.
Characteristics of Berkeley DB
● embedded database library● compact● high performance● transaction protected● simple function call api
C, C++, Java, Perl, Tcl, Python, PHP
● many supported platformsalmost all UNIX and Linux variants, Windows, embedded real-time OS
● commonly available in standard UNIX distributions
![Page 12: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/12.jpg)
Copyright © 2003 Sun Microsystems Inc.
Future plans
● spooling methods (LDAP for user management)
● clients access database instead of querying qmaster
● Future development based on Berkeley DB– use replication subsystem– use transaction subsystem in SGE components
![Page 13: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/13.jpg)
Copyright © 2003 Sun Microsystems Inc.
Accounting and reporting DB
![Page 14: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/14.jpg)
Copyright © 2003 Sun Microsystems Inc.
Why two separated databases?
● standardized access (SQL, ODBC, JDBC)● simpler, easier to use database structure● historical data (not needed in core system)● preprocessed data (sums, averages ...)● queries don't affect core system● lower requirements on availability
![Page 15: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/15.jpg)
Copyright © 2003 Sun Microsystems Inc.
Architecture
● Java application● Database access
through JDBC● loosely coupled
to the SGE system
Reporting-DB
AccountingFile
StatisticsFile
SharelogFile
ReportingFile
Reporting-Writer
SGE 5.3 SGE 6.0
rawdata
build derivedvalues
![Page 16: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/16.jpg)
Copyright © 2003 Sun Microsystems Inc.
Supported Databases
● PostgreSQL● MySQL● Oracle● “any relational Database providing JDBC
interface and standard SQL”
![Page 17: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/17.jpg)
Copyright © 2003 Sun Microsystems Inc.
Database Schema
![Page 18: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/18.jpg)
Copyright © 2003 Sun Microsystems Inc.
Stored Data● Job related information
times, user, project, exit status ...● Host and queue related information
load information, consumables ...● Sharetree
configured shares, actual shares ...● Precomputed, derived values
sums, averages per host, queue, user, project ...● Daemon statistics
![Page 19: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/19.jpg)
Copyright © 2003 Sun Microsystems Inc.
Metrics (examples)● Average (min, max) job wait time● Number of jobs completed● Total (avg, max) cpu time consumed● Total (avg, min, max) job runtime● Utilization of slots, cpus, hosts, queues● Cluster availability, outage times● Share utilization● Memory abuse
![Page 20: Utilizing Databases in Grid Engine 6 - University of Liverpool · 2011-09-13 · high performance transaction protected simple function call api C, C++, Java, Perl, Tcl, Python, PHP](https://reader035.fdocuments.in/reader035/viewer/2022081607/5f01e4967e708231d4018d03/html5/thumbnails/20.jpg)
Copyright © 2003 Sun Microsystems Inc.
Summary● Improve scalability and performance of the
core system through integration of Berkeley DB
● Provide easy access to a reporting and analysis facility through use of a standard SQL database