“The Future of Software Testing” Probabilistic Stochastic Test Data · 2019-07-25 · “The...

Probabilistic Stochastic Test Data

Bj Rollison, Microsoft, USA

Europe’s Premier Software Testing EventWorld Forum Convention Centre, The Hague, Netherlands

WWW.QUALTECHCONFERENCES.COM

“The Future of Software Testing”

Bj Rollison Test Architect

Microsoft

http://www.TestingMentor.comhttp://blogs.msdn.com/imtesty

Customer provided data

Domain expertise

Generally very limited in scope

Tester generated data

Happy path, probabilistic data

Input population poorly defined, human bias

Random data not representative of population

Static data files

Library of historical failure indicators

Too restrictive

Ineffective with multiple iterations

Large number of variables

Variable sequences can result in a virtual infinitenumber of combinations

Impractical to test all values and combinations of values in any reasonable testing cycle

Example:

NetBIOS name 15 alphanumeric characters

Using ASCII only chars, 82 allowable characters (0x20 \ * + = | : ; “ ? < > , ) invalid*

Total number of possible input tests equals8215 + 8214 + 8213…+ 821 = 51,586,566,049,662,994,687,009,994,574

It does not “look” like real world test data.

Years ago developers would argue that a name textbox couldn’t contain a number!

To a computer, what is the difference between the strings Margaret and ksjCu9ls?

Random data is not reproducible.

A seeded random generator will produce the same exact result given the same seed value

Random data violates constraints of real data

Representative data from population

Deterministic algorithms

Sampling is commonly used in risk based testing

Samples must be representative

Samples must be statistically unbiased

Samples set must include variability for breadth

Random data generation provides variability, but

Simple random data may not be representative

Simple random data hard to reproduce

Goal – generate random data that is

Representative of the input data set

Statistically unbiased - random sample of elements from a probability distribution

Value – input test data that

Provides greater variability

Includes expected and unexpected sequences

Eliminates human bias

Is better at evaluating robustness

Is dynamic!

System.Security.Cryptography.RandomNumberGenerator class

Encrypted data indistinguishable from random

Cannot be seeded; no repeatability

System.Random class

Sequence of numbers that meet certain statistical requirements for randomness

Can be seeded for repeatability

Not perfect, but reasonably random for practical purposes

Comparison between RandomNumberGenerator class and Random class

Red – RNG

Blue – Random

Both pseudo –random

No obvious pattern

based on sample byJeff Attwoodhttp://www.codinghorror.com

User defined seed

Tester provides seed value for repeatability

Dynamic seed

New seed valuegenerated at runtime

Seed variablemust be preserved in test log

public static int GetSeedValue(string seedValue)

{int seed = 0;if (seedValue != string.Empty){

seed = int.Parse(seedValue);}else{

Random r = new Random();seed = r.Next();

}return seed;

}

Define the representative data set

Example – Credit card numbers

341846580149320

Card length –

(BIN + digits)

between 14 and

19 depending on

card type

Bank Identification

Number (BIN) –

between 1 and 4

digits depending

on card type

Checksum – Luhn (Mod 10) algorithm

Equivalence class partitioning decomposes data into discrete valid and invalid class subsets

Card type Valid Class subsets Invalid Class subsets

American

Express

BIN – 34, 37

Length – 15 digits

Checksum – Mod 10

Unassigned BINs

Length <= 16 digits

Length >= 14 digits

Fail Checksum

Maestro BIN – 5020, 5038,

6034, 6759

Length – 16, 18

Checksum – Mod 10

Unassigned BINs

Length <= 15 digits

Length >= 19 digits

Length == 17 digits

Fail Checksum

Input variable Valid input Invalid input

Valid BIN

Number(s)

& Length

Seed

Generator

Is Valid

Luhn

Algorithm

Random

Number

Generator

Card

Length(s)

by Type

Get

credit card

Info

Input

(card type)Output

(card #)

Input

(optional seed)

Assigned BINs ensures the data looks real

The Mod10 check ensures the data feels real

Result is representative of real data!

GetCardNumber(int cardType, int seed)Get BIN (cardType, seed);Get CardLength (cardType, seed);Assign BIN to cardNumber;Generate a new random object;for (cardNumberLength < CardLength)

Generate a random number 0 <> 9;Append it to the cardNumber;

if IsNotValidCardNumber(cardNumber)while (IsNotValidCardNumber(cardNumber))

increment last number by 1;return cardNumber;

Deterministic

algorithm to

generate a valid

random credit

card

Model

test

data

Generate

test data

Apply

test

data

Verify

results

Decompose the

data set for each

parameter using

equivalence class

partitioning

Generate valid

and invalid test

data adhering to

parameter properties,

business rules, and

test hypothesis

Apply the test

data to the

application

under test

Verify the actual

results against

the expected

results – oracle!

JCB Type 1

BIN = 35 Len = 16

JCB Type 2

BIN = 1800, 2131, Len = 15

Robusttesting

Multi-languageinputtesting

String length

fixed or variable

Seed value

Custom range for

greater controlUnicode

language

familiesAssigned code

points

Reserved

characters

Unicode surrogate

pairs

1000 Unicode charactersfrom the sample population

Character corruption and data loss

135 characters (bytes)

obvious data loss

Static test data wears out!

Random test data that is not repeatable or not representative may find defects, but…

Probabilistic stochastic test data

Is a modeled representation of the population

Is statistically unbiased

Is especially good at testing robustness

Recommend using both static (real-world)test data and probabilistic stochastic test data for breadth

Helping Testers

Unleash Their Potential!TM

http://www.TestingMentor.com

[email protected]

Practice .NET Testing with IR DataBj Rollisonhttp://www.stpmag.com/issues/stp-2007-06.pdf

Automatic test data generation for path testing using a new stochastic algorithmBruno T. de Abreu, Eliane Martins, Fabiano L. de Sousahttp://www.sbbd-sbes2005.ufu.br/arquivos/16-%209523.pdf

Data Generation Techniques for Automated Software Robustness TestingMatthew Schmid & Frank Hillhttp://www.cigital.com/papers/download/ictcsfinal.pdf

Toolshttp://www.TestingMentor.com

“The Future of Software Testing” Probabilistic Stochastic Test Data · 2019-07-25 · “The...

Documents

Transcript of “The Future of Software Testing” Probabilistic Stochastic Test Data · 2019-07-25 · “The...