Download - SSS Validation and Testing September 11, 2003 Rockville, MD William McLendon Neil Pundit Erik DeBenedictis Sandia is a multiprogram laboratory operated.

SSS Validation and Testing

September 11, 2003

Rockville, MDRockville, MD

William McLendon

Neil PunditErik DeBenedictis

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy under contract DE-AC04-94AL85000.

• APItest

• Release Testing Experiences at Sandia

• Status daemon

Overview

Distributed Runtime System Testing

• Complex system of interactions• Approach to testing

– Component Testing

– Benchmarks• Performance / Functionality

– Operational Profile

– Stress Testing

• Users expect a high-degree of quality in today’s high end systems!

APItest

APITEST - Overview

• Unit-testing tool for network components– Targeted for networked applications– Extensible framework– Dependency calculus for inter-test relationships

• Scriptable Tests (XML Schema Grammar)

• Multi-Protocol Support– TCP/IP, SSSLib, Portals, HTTP

Accomplishments Since Last Meeting

• Spent a week at Argonne (July)– Major rework of framework of APItest

• Individual tests are atomic.• Framework handles the hard work of checking tests,

dependencies, and aggregate results.

– Extensibility • New test types are easy to create

• Dependency System– Define relationships as a DAG encoded in XML.

– Boolean dependencies on edges.

Supported Test Types

• sssTest– use ssslib to communicate with ssslib enabled

components

• shellTest– execute a command

• httpTest– ie. app testing web interfaces (a’la Globus, etc)

• tcpipTest– raw socket via tcpip transmission.

Creating New Test Types is Easy

A simple test that will always pass:

class passTest(Test):__attrfields__=[‘name’]typemap =

{‘dependencies’:TODependencies}

def setup(self):pass

def execute(self, scratch):self.expect[‘foo’] = [ (‘REGEXP’,

‘a’) ]self.response[‘foo’] = ‘a’

Matching and Aggregation

• An individual test can be executed many times in a sequence.– PASS/FAIL can be determined based on the percent of runs

that matched.– Percent Match can be specified as a range as well.

• Expected result is specified as a regular expression (REGEX) or a string for exact matching (TXTSTR)

• Notation:– M[min:max] - Percent matching.

• min/max = bounds on % of tests where actual and expected results match.

• If the actual number of tests is within the range specified the test will PASS, otherwise it will FAIL.

Test Dependencies

T iff A[40:90]OR B[0:0]

M[0:0]

A B

?

M[40:90]

A B

M[100:]

C

?

M[90:]

M[:]

T iff (A[100:] B[90:]) C[:]

M[40:90] : >= 40% and <= 90% of test runs matched

An Example Dependency

(A[100:] ((B[100:100] C) D[:0] ))

<dependencies> <AND> <dependency name=’A' minPctMatch=’100'/> <OR> <AND> <dependency name=’B' minPctMatch=’100'/> <dependency name=’C'/> </AND> <dependency name=’D’ maxPctMatch=‘0’/> </OR> </AND></dependencies>

An Example Test Sequence

reset daemon

test daemon

test other stuff

M[:]

M[:]

M[:30]

Standard Test Metadata Attributes

Attribute Type Req’d Default Desc

name string YES name of test

numReps integer NO 1 number of times to execute test

minPctMatch float NO 0.0 min % of repetitions that must match for test to pass

maxPctMatch float NO 100.0 max % of repetitions that must match for test to pass

preDelay float NO 0.0 delay in seconds prior to executing test

postDelay float NO 0.0 delay in seconds after executing test before continuing to next test.

iterDelay float NO 0.0 delay in seconds between tests during loop (only used if numReps > 1)

onMismatch string NO CONTINUE If a test fails to match, what do we do?

CONTINUE,BREAK,HALT

Example Scripts

• A simple shell execution test:

<shellTest name=‘test1’ numReps=‘1’ preDelay=‘5’ postDelay=‘5.3’ command=‘ls -ltr’/>

• Test with a dependency and stdout matching:

<shellTest name=‘test2’ command=‘apitest.py --test’> <output format=‘REGEXP’ type=‘stdout’>.*stdout.*</output> <dependencies> <dependency name=‘test1’ minPctMatch=‘100.0’/> </dependencies></shellTest>

APItest Output

iterations test name % matched Pass/Fail message---------- --------- --------- --------- ----------[1 of 1] A 100.00% PASS [1 of 1] K 100.00% FAIL m[0.0% : 0.0%][1 of 1] J 0.00% FAIL m[90.0% : 90.0%][5 of 5] M 100.00% PASS [1 of 1] L 100.00% FAIL m[0.0% : 0.0%][1 of 1] N 100.00% PASS [0 of 1] T DEPENDENCY FAILURE(S)

F expected [0.0% : 90.0%], got 100.0J expected [90.0% : 100.0%], got 0.0

[0 of 1] R DEPENDENCY FAILURE(S)J expected [90.0% : 100.0%], got 0.0K expected [0.0% : 90.0%], got 100.0

[1 of 1] S 100.00% PASS [1 of 1] U1 100.00% PASS [1 of 1] U2 100.00% PASS

[0 of 1] U3 DEPENDENCY FAILURE(S)N expected [0.0% : 90.0%], got 100.0S expected [0.0% : 90.0%], got 100.0

[1 of 1] U4 100.00% PASS

sssTest outputs from Chiba City

iterations test name % matched Pass/Fail message---------- --------- --------- --------- ----------[1 of 1] add-location 100.00% PASS [1 of 1] QuerySDComps 100.00% PASS [1 of 1] QuerySDHost 100.00% PASS [1 of 1] QuerySDProtocol 100.00% PASS [1 of 1] QuerySDPort 100.00% PASS [1 of 1] del-location 100.00% PASS [1 of 1] val-removal 100.00% PASS

iterations test name % matched Pass/Fail message---------- --------- --------- --------- ----------[1 of 1] sss-getproto 100.00% PASS [1 of 1] sss-getport 100.00% PASS [1 of 1] sss-gethost 100.00% PASS [1 of 1] sss-getcomp 100.00% PASS [1 of 1] sss-getproto 100.00% PASS [1 of 1] sss-getport 100.00% PASS [1 of 1] sss-gethost 100.00% PASS [1 of 1] sss-getcomp 100.00% PASS

Release Testing…

Tales from Cplant Release Testing

• Methodical execution of production jobs and 3rd Party benchmarks to identify system instabilities, enabling them to be resolved. Ie:– Rapid job turnover rate (caused mismatches between

scheduler and allocator)– Heavy io (I/O which passes through launch node

process instead of directly to ENFS “yod-io”)

• Wrapping above codes into Ctest framework to enable portable compile, launch, and analysis of synthetic workloads

Ctest

• Extension of Mike Carifio’s work

– Presented at the SciDAC meeting in Houston during Fall of 2002

– Make structure that holds a suite of independent applications.

– Tools to launch as a reproducible workload.

– Goal: 30 users and 60 concurrent apps

Sample Load Profile on CPlant

Issue Tracking

• SNL uses a program called RT– Centralized repository for issue tracking helps give

an overall picture of what problems are.

– Helps give summary of progress.

• Bugzilla is on the SciDAC SSS website– http://bugzilla.mcs.anl.gov/scidac-sss/

– Who’s using it?

http://bugzilla.mcs.anl.gov/scidac-sss/

Status Daemon …

Status Daemon

• Highly configurable monitoring infrastructure for clusters.– Does not need to run daemon on the node you are

monitoring.

– XML configurable

– Web interface

• “Cluster Aware”• Used on CPlant production clusters• James Laros ([email protected])

Status Daemon Communication

Admin Node

Local Tests

Daemon

Disk

Status Update

Status

XMLConfig File

XML Data

RemoteTests

Remote Tests

Compute nodesLeader

LocalTests

DaemonXML Data

Remote Tests

Compute nodesLeader

LocalTests

DaemonXML Data

Summary

• New Hire– Ron Oldfield

• APItest functionality and flexibility increases

• Release testing experience

• Status Daemon

Plans

• APItest– User / Programmer Manuals– User Interface

• GUI? HTTP?

– daemon mode for parallel testing mode– DB Connectivity– Test Development

• ssslib event tests• HTTPtest work• ptlTest (SNL)

• SWP Integration– Port SWP to Chiba for SC2003?

Questions?