SSS Validation and Testing
September 11, 2003
Rockville, MDRockville, MD
William McLendon
Neil PunditErik DeBenedictis
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy under contract DE-AC04-94AL85000.
• APItest
• Release Testing Experiences at Sandia
• Status daemon
Overview
Distributed Runtime System Testing
• Complex system of interactions• Approach to testing
– Component Testing
– Benchmarks• Performance / Functionality
– Operational Profile
– Stress Testing
• Users expect a high-degree of quality in today’s high end systems!
APItest
APITEST - Overview
• Unit-testing tool for network components– Targeted for networked applications– Extensible framework– Dependency calculus for inter-test relationships
• Scriptable Tests (XML Schema Grammar)
• Multi-Protocol Support– TCP/IP, SSSLib, Portals, HTTP
Accomplishments Since Last Meeting
• Spent a week at Argonne (July)– Major rework of framework of APItest
• Individual tests are atomic.• Framework handles the hard work of checking tests,
dependencies, and aggregate results.
– Extensibility • New test types are easy to create
• Dependency System– Define relationships as a DAG encoded in XML.
– Boolean dependencies on edges.
Supported Test Types
• sssTest– use ssslib to communicate with ssslib enabled
components
• shellTest– execute a command
• httpTest– ie. app testing web interfaces (a’la Globus, etc)
• tcpipTest– raw socket via tcpip transmission.
Creating New Test Types is Easy
A simple test that will always pass:
class passTest(Test):__attrfields__=[‘name’]typemap =
{‘dependencies’:TODependencies}
def setup(self):pass
def execute(self, scratch):self.expect[‘foo’] = [ (‘REGEXP’,
‘a’) ]self.response[‘foo’] = ‘a’
Matching and Aggregation
• An individual test can be executed many times in a sequence.– PASS/FAIL can be determined based on the percent of runs
that matched.– Percent Match can be specified as a range as well.
• Expected result is specified as a regular expression (REGEX) or a string for exact matching (TXTSTR)
• Notation:– M[min:max] - Percent matching.
• min/max = bounds on % of tests where actual and expected results match.
• If the actual number of tests is within the range specified the test will PASS, otherwise it will FAIL.
Test Dependencies
T iff A[40:90]OR B[0:0]
M[0:0]
A B
?
M[40:90]
A B
M[100:]
C
?
M[90:]
M[:]
T iff (A[100:] B[90:]) C[:]
M[40:90] : >= 40% and <= 90% of test runs matched
An Example Dependency
(A[100:] ((B[100:100] C) D[:0] ))
<dependencies> <AND> <dependency name=’A' minPctMatch=’100'/> <OR> <AND> <dependency name=’B' minPctMatch=’100'/> <dependency name=’C'/> </AND> <dependency name=’D’ maxPctMatch=‘0’/> </OR> </AND></dependencies>
An Example Test Sequence
reset daemon
test daemon
test other stuff
M[:]
M[:]
M[:30]
Standard Test Metadata Attributes
Attribute Type Req’d Default Desc
name string YES name of test
numReps integer NO 1 number of times to execute test
minPctMatch float NO 0.0 min % of repetitions that must match for test to pass
maxPctMatch float NO 100.0 max % of repetitions that must match for test to pass
preDelay float NO 0.0 delay in seconds prior to executing test
postDelay float NO 0.0 delay in seconds after executing test before continuing to next test.
iterDelay float NO 0.0 delay in seconds between tests during loop (only used if numReps > 1)
onMismatch string NO CONTINUE If a test fails to match, what do we do?
CONTINUE,BREAK,HALT
Example Scripts
• A simple shell execution test:
<shellTest name=‘test1’ numReps=‘1’ preDelay=‘5’ postDelay=‘5.3’ command=‘ls -ltr’/>
• Test with a dependency and stdout matching:
<shellTest name=‘test2’ command=‘apitest.py --test’> <output format=‘REGEXP’ type=‘stdout’>.*stdout.*</output> <dependencies> <dependency name=‘test1’ minPctMatch=‘100.0’/> </dependencies></shellTest>
APItest Output
iterations test name % matched Pass/Fail message---------- --------- --------- --------- ----------[1 of 1] A 100.00% PASS [1 of 1] K 100.00% FAIL m[0.0% : 0.0%][1 of 1] J 0.00% FAIL m[90.0% : 90.0%][5 of 5] M 100.00% PASS [1 of 1] L 100.00% FAIL m[0.0% : 0.0%][1 of 1] N 100.00% PASS [0 of 1] T DEPENDENCY FAILURE(S)
F expected [0.0% : 90.0%], got 100.0J expected [90.0% : 100.0%], got 0.0
[0 of 1] R DEPENDENCY FAILURE(S)J expected [90.0% : 100.0%], got 0.0K expected [0.0% : 90.0%], got 100.0
[1 of 1] S 100.00% PASS [1 of 1] U1 100.00% PASS [1 of 1] U2 100.00% PASS
[0 of 1] U3 DEPENDENCY FAILURE(S)N expected [0.0% : 90.0%], got 100.0S expected [0.0% : 90.0%], got 100.0
[1 of 1] U4 100.00% PASS
sssTest outputs from Chiba City
iterations test name % matched Pass/Fail message---------- --------- --------- --------- ----------[1 of 1] add-location 100.00% PASS [1 of 1] QuerySDComps 100.00% PASS [1 of 1] QuerySDHost 100.00% PASS [1 of 1] QuerySDProtocol 100.00% PASS [1 of 1] QuerySDPort 100.00% PASS [1 of 1] del-location 100.00% PASS [1 of 1] val-removal 100.00% PASS
iterations test name % matched Pass/Fail message---------- --------- --------- --------- ----------[1 of 1] sss-getproto 100.00% PASS [1 of 1] sss-getport 100.00% PASS [1 of 1] sss-gethost 100.00% PASS [1 of 1] sss-getcomp 100.00% PASS [1 of 1] sss-getproto 100.00% PASS [1 of 1] sss-getport 100.00% PASS [1 of 1] sss-gethost 100.00% PASS [1 of 1] sss-getcomp 100.00% PASS
Release Testing…
Tales from Cplant Release Testing
• Methodical execution of production jobs and 3rd Party benchmarks to identify system instabilities, enabling them to be resolved. Ie:– Rapid job turnover rate (caused mismatches between
scheduler and allocator)– Heavy io (I/O which passes through launch node
process instead of directly to ENFS “yod-io”)
• Wrapping above codes into Ctest framework to enable portable compile, launch, and analysis of synthetic workloads
Ctest
• Extension of Mike Carifio’s work
– Presented at the SciDAC meeting in Houston during Fall of 2002
– Make structure that holds a suite of independent applications.
– Tools to launch as a reproducible workload.
– Goal: 30 users and 60 concurrent apps
Sample Load Profile on CPlant
Issue Tracking
• SNL uses a program called RT– Centralized repository for issue tracking helps give
an overall picture of what problems are.
– Helps give summary of progress.
• Bugzilla is on the SciDAC SSS website– http://bugzilla.mcs.anl.gov/scidac-sss/
– Who’s using it?
Status Daemon …
Status Daemon
• Highly configurable monitoring infrastructure for clusters.– Does not need to run daemon on the node you are
monitoring.
– XML configurable
– Web interface
• “Cluster Aware”• Used on CPlant production clusters• James Laros ([email protected])
Status Daemon Communication
Admin Node
Local Tests
Daemon
Disk
Status Update
Status
XMLConfig File
XML Data
RemoteTests
Remote Tests
Compute nodesLeader
LocalTests
DaemonXML Data
Remote Tests
Compute nodesLeader
LocalTests
DaemonXML Data
Summary
• New Hire– Ron Oldfield
• APItest functionality and flexibility increases
• Release testing experience
• Status Daemon
Plans
• APItest– User / Programmer Manuals– User Interface
• GUI? HTTP?
– daemon mode for parallel testing mode– DB Connectivity– Test Development
• ssslib event tests• HTTPtest work• ptlTest (SNL)
• SWP Integration– Port SWP to Chiba for SC2003?
Questions?
Top Related