12. Experimental Evaluation 18-749: Fault-Tolerant Distributed Systems

12. Experimental Evaluation

18-749: Fault-Tolerant Distributed Systems

Tudor Dumitraş &Prof. Priya Narasimhan

Carnegie Mellon University

Recommended readings and these lecture slides are available

on CMU’s BlackBoard

&Electrical ComputerENGINEERING

2

What Are We Going To Do Today?

Overview of experimental techniques Case study: “Fault-Tolerant Middleware and the Magical 1%” Experimental requirements for the project

3

Overview of Experimental Techniques

Basics– Probability distributions, density functions– Outlier detection: 3σ test

Visual representation of data– Boxplots– 3D, contour plots– Multivariate plots

Do’s and don’ts of experimental science

4

Experimental Research

“God has chosen that which is the most simple in hypotheses and the most rich in phenomena [...] But when a rule is extremely complex, that which conforms to it passes for random.”

Gottfried Wilhelm Leibniz, Discours de Métaphysique, 1686

5

Statistical Distributions

If a metric is measured repeatedly, then we can determine its probability distribution function (PDF)

– PDF(x) is the probability that the metrictakes the value x

– – Matlab function ksdensity

Common statistics– Mean = sum of values / #measurements (mean)– Median = half the measured values are below this point (median)– Mode = measurement that appears most often in the dataset– Standard deviation (σ) = how widely spread the data points are (std)

where Xi is a measurement and X is the mean

]Pr[)( bmetricadxxPDFb

a

n

ii XX

n 1

2

1

1

6

Statistical Tools

Percentiles– “The Nth percentile” is a value X such that N% of the measured samples are

less than X– The median is the 50th percentile– Matlab function prctile

Outlier detection: 3σ test– Any value that is more than 3 standard deviations away from the mean is an

outlier– For example, for latency:– In Matlab: outliers(a) = a (a > mean(a) + 3*std(a))

3 LatencyLatencyoutlier

7

Basic Plots

Line plot (plot)– Y-axis is a function of X-axis values– Can use error bars to show standard

deviation– Can also do an area plot to emphasize

overhead or difference between similar metrics Scatter plot (plot, scatter)

– Determine a relationship between two variables

– Reveal clustering of data Bar graphs (bar, bar3)

– Compare discrete values Pie charts (pie, pie3)

– Breakdown of a metric into its constituent components

Rounds

Nod

es r

each

ed

0% Data upsets50% Data upsets

Latency [in µs]

Cli

ent-

per

ceiv

ed t

hro

ugh

pu

t[b

ytes

/s]

8

Boxplots

A “box and whisker” plot describes a probability distribution– The box represents the size of the inter-quartile range

(the difference between the 25th and 75th percentiles of the dataset)– The whiskers indicate the maximum and minimum values– The median is also shown– Matlab function boxplot

In 1970, US Congress instituted a random selection process for the military draft

– All 366 possible birth dates were placed in a rotating drum and selected one by one

– The order in which the dates were drawn defined the priority for drafting

The boxplots show that men born later in the year were more likely to be drafted

From http://lib.stat.cmu.edu/DASL/Stories/DraftLottery.html

9

Impact of Two Variables

3D plots– Z axis is a function of X and Y values– Surface plots: mesh, surf– Scatter plots: plot3, scatter3– Volume: display convex hull using

convhulln and trisurf

Contour plots– Represents a function of 2 variables

(the X and Y axes)– Suggests the values of the function

through color and annotations– Displays the isolines (variable

combinations that yield the same value) of the function

pupset

p

94

110

9772

8070

67

6562

10

Impact of Many Variables

Multi-variate plot

11

Do …

Make Results Comparable– Use same hardware for all the experiments– Use same versions of your software– Avoid interference from other programs or make sure you always get the same

interference– Vary one parameter at a time

Make Results Reproducible– Record and report all the parameters of your experimental setup– Archive and publish raw data

Be Rigorous– Minimize the impact of your monitoring infrastructure– Report number of runs– Report mean values and standard deviations– Examine statistical distributions (modes, long tails, etc.)

12

Don’t …

Forget to label the axes of your figures

Use different axis limits when comparing results

Plot mean values without looking at the error margin

0 5 10 15 202000

4000

6000

8000

10000

12000

14000

Late

ncy

[s]

Clients0 5 10 15 20

0

0.5

1

1.5

2

2.5

3x 10

4

Clients

Late

ncy

[s]

0 5 10 15 202000

4000

6000

8000

10000

12000

14000

Late

ncy

[s]

0 5 10 15 200

5000

10000

15000

Late

ncy

[s]

Clients0 5 10 15 20

0

5000

10000

15000

Late

ncy

[s]

Clients

13

FT Middleware and the Magical 1%

Unpredictability of FT middleware Unpredictability limited to 1% of

remote invocations

T. Dumitraş and P. Narasimhan. Fault-Tolerant Middleware and the Magical 1%. In ACM/IFIP/USENIX Conference on Middleware, Grenoble, France, Nov.-Dec. 2005.http://www.ece.cmu.edu/~tdumitra/public_documents/dumitras05magical.pdf

14

Predictability in FT Middleware Systems ?

Group Communication

Client

CORBA

Replicator

Server

CORBA

Replicator

Host OS Host OS

Host OS

R

CR

C

R

C

Cli

Srv

Srv

Networking

Networ

king

Replic

ated C

lient

Replic

ated S

erve

r

Faults are inherently unpredictable What about the fault-free case?

15

System Configuration for Predictability

Can we configure an FT CORBA system for predictable latency?

Software configuration– Operating system: RedHat Linux w/ TimeSys 3.1 kernel– Group Communication: Spread v. 1.3.1– Replication: MEAD v. 1.1– ORB: TAO Real Time ORB v. 1.4– Micro-benchmark: 10,000 remote invocations per client

Hardware configuration– 25 hosts on the Emulab test bed– Pentium III at 850 MHz – 100 Mb/s LAN

16

Experimental Methodology

Parameters varied: – Replication style: active, warm passive – Replication degree: 1, 2, 3 replicas– Number of clients: 1, 4, 7, 10, 13, 16, 19, 22 clients– Request arrival rates: 0, 0.5, 2, 8, 32 ms client pause– Sizes of reply messages: 16, 256, 4096, 65536 bytes

Tested all 960 combinations, collected 9.1 Gb of data– Trace available at: www.ece.cmu.edu/~tdumitra/MEAD_trace

Statistical analysis of end-to-end latency:– Means, medians, standard deviations– Maximum and minimum values – 1st, 5th, 95th, 99th percentiles– Numbers and sizes of the outliers

17

Example of Unpredictability

Maximum latency can be several orders of magnitude larger than the average Distribution is skewed to the right and has a long tail Long tail occurs on only one side because the latency cannot be arbitrarily low

– MEAD latency is lower-bounded by CORBA and group communication latency

18

Systematic Unpredictability

Average values increase linearly with the number of clients

Maximum values are unpredictable

19

Counting the Outliers

An outlier is a measurement that fails the 3σ test

In most cases, less than 1% of the measured latencies are outliers

Outliers originate in various modules of the system:

– The ORB– The group communication– The application

20

The “Magical” 1%

21

The “Magical” 1%

The “haircut” effect of removing 1% of the highest remote latencies

22

Observable Trends

The 99th percentile helps us identify trends in the data– E.g., latency increases with request rate and size

0500

10001500

2000

16

256

4096

65536

104

105

106

107

Request rate [req/s]Request size [bytes]

Max

imum

late

ncy

[s]

0500

10001500

2000

16

256

4096

65536

103

104

105

106

Request rate [req/s]Request size [bytes]

99%

late

ncy

[s]

23

Interpretation

Predictable maximum latencies are hard to achieve– Tried to achieve predictability by selecting a good FT CORBA

configuration – Even in the fault-free case, end-to-end latencies have skewed distributions

for almost all 960 parameter combinations– Maximums are several orders of magnitude higher than averages– Unpredictability cannot be isolated to a single component

Magical 1%: achieving predictability through statistical approaches– We remove 1% of the highest measured latencies– Remaining samples have more deterministic properties

• 99th percentile helps us identify trends in the data

– This allows us to extract tunable, predictable behavior out of fairly complex, dependable systems

24

Experimental Evaluation of 18-749 Projects

Requirements for experimental evaluation– List of client invocations– Probes– Graphs

Tips Digging deeper

25

Requirements for Experimental Evaluation

Things to hand in:– List of client invocations – the server methods you’re going to exercise– Raw data from the 7 probes in your application– Graphs of end-to-end latency– Interpretation of the results

Constraints– All clients must run on separate machines– Each client must issue at least 10,000 requests– All requests must receive a reply (two-way invocations)– The middle tier must have 2 replicas (e.g., primary & backup)– Try all 48 combinations of the following:

• Number of clients: 1, 4, 7, 10

• Size of reply message: original, 256, 512, 1024 bytes

• Inter-request time: 0 (no pause), 20, 40 ms

Administrative– Each team must designate a chief experimenter

26

List of Client Invocations

METHOD ONE_WAY DB_ACCESS SZ_REQUEST SZ_REPLY

createObj() No Yes 16 4

getInfo() No Yes 4 256

deleteObj() No Yes 4 4

Name of remoteinvocation

Is it a one-way(no reply)?

Does it require a DB access(all 3 tiers are involved)?

Size of the forward message before marshaling (the combined sizes of all

the in and inout parameters)

Size of the return message before marshaling (the combined sizes of all

the out and inout parameters)

27

Application Modifications

Use only two-way invocations – The client must receive a reply from the server for each invocation– Suggestion: have at least 2 different invocations in your benchmark

Tunable size of replies– Add a variable-sized parameter that is returned by the server

(e.g., sequence<octet>)– Try the following reply sizes: original, 256 bytes, 512 bytes and 1024 bytes

Inter-request time– Insert a pause in-between requests– Try the following pauses: 0 (no pause), 20, 40 ms– CAUTION:

• sleep(0) inserts a non-zero pause• On most Linux kernels, you cannot pause for less than 10 ms• For more information: http://

www.atl.lmco.com/projects/QoS/RTOS_html/periodic.html

28

Experiments Make Your Life Meaningful

29

Stages of an Invocation

Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

Database

30

Data Probes (1 of 7)

Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

DatabaseP1

Legend

${STY} Replication style

(ACTIVE or WARM_PASSIVE)

${C} Number of clients

${IRT} Inter-request time (in µs)

${BYT} Reply size (in bytes)

${HOST} Hostname

${N} Your team number

File NameDATA749_app_out_cli_${STY}_2srv_${C}cli_${IRT}us_${BYT}req_${HOST}_team${N}.txt

Data

Time (in µs) when each request is issued

Example67605

69070

69877

72807

...

31


Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

Database

File NameDATA749_app_in_cli_${STY}_2srv_${C}cli_${IRT}us_${BYT}req_${HOST}_team${N}.txt

Data

Time (in µs) when each reply is received

Example67605

69070

69877

72807

...

P1 P2

Legend






${HOST} Hostname


32


Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

DatabaseP1 P2

File NameDATA749_app_msg_cli_${STY}_2srv_${C}cli_${IRT}us_${BYT}req_${HOST}_team${N}.txt

Data

Name of each invocation

ExamplecreateObj()

createObj()

getInfo()

deleteObj()

...

Legend






${HOST} Hostname


P3

33

Data Probes (example)

Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

DatabaseP1 P2P3

Example:probe1.record (new Long(gettimeofday()));remoteFactory.createObj ();probe2.record (new Long(gettimeofday()));probe3.record (new String(“createObj()”));

34


Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

DatabaseP1 P2P3

P4

Legend






${HOST} Hostname


File NameDATA749_app_in_srv_${STY}_2srv_${C}cli_${IRT}us_${BYT}req_${HOST}_team${N}.txt

Data

Time (in µs) when each request is received

Example67605

69070

69877

72807

...

35


Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

DatabaseP1 P2P3

P5

File NameDATA749_app_out_srv_${STY}_2srv_${C}cli_${IRT}us_${BYT}req_${HOST}_team${N}.txt

Data

Time (in µs) when each reply is completed

Example67605

69070

69877

72807

...

Legend






${HOST} Hostname


P4

36


Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

DatabaseP1 P2P3

P5

P6

P4

File NameDATA749_app_msg_srv_${STY}_2srv_${C}cli_${IRT}us_${BYT}req_${HOST}_team${N}.txt

Data

Name of each invocation

ExamplecreateObj()

createObj()

getInfo()

deleteObj()

...

Legend






${HOST} Hostname


37


Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

DatabaseP1 P2P3

P5

P6

P4

P7

File NameDATA749_app_source_srv_${STY}_2srv_${C}cli_${IRT}us_${BYT}req_${HOST}_team${N}.txt

Data

Hostname of client sending the invocation

Exampleblack

black

blue

magenta

...

Legend






${HOST} Hostname


38

Probe Invariant

Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

DatabaseP1 P2P3

P5

P6

P4

P7

Probes at the same side and same level must have the same number of records!

39

Computing End-To-End Latency

Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

DatabaseP1 P2P3

P5

P6

P4

P7

)()()( 12 iPiPiLatency For request i:

40

Computing the Components of Latency

Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

DatabaseP1 P2P3

P5

P6

P4

P7

)()()( 45 iPiPiServer For request i:

)()()( iServeriLatencyiMiddleware

41

Computing the Request Arrival Rate

Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

DatabaseP1 P2P3

P5

P6

P4

P7

For request i:

)1()(

10)(

44

6

iPiPiReq_rate

42

Computing the Server Throughput

Client Server

Application

Replication

Middleware

Network

out in

in outout in

out in

in outout in

request

reply

request

reply

DatabaseP1 P2P3

P5

P6

P4

P7

For request i:

replySizeiPiP

iThroughput)1()(

10)(

44

6

43

Graphs Required

Line plots of latency for increasing number of clients and different reply sizes (no pause)

Area plots of (mean, max) latency and (mean, 99%) latency, sorted by increasing mean values

Bar graphs of latency component break-down for outliers and normal requests

3D scatter plots of reply size and request rate impact on max and 99% latency

Latency vs. throughput

44

Interpretation of Results

Short write-up containing the “lessons learned” from the experiments

What did you learn about your system?– What can you tell (good or bad) about the performance, dependability and

robustness of your application?– Were the results surprising?– If you observed some behavior you didn’t expect, how can you explain it?– What further experiments would be needed to verify your hypothesis?

Do your results confirm or infirm the magical 1% theory?

45

Tips for Experimental Evaluation Avoid interference

– Use separate machines for each client, server replica, NamingService/JNDI, FT manager, database, etc.

– Make sure there are no other processes using your CPU or bandwidth

Minimize impact of monitoring– Store data in pre-allocated memory buffer– Flush buffers to the disk at the end– Record timestamps as time from the start of the process

• Use 4-byte integers (long) for the timestamps

Automate the experimental process as much as possible– Create scripts for launching the servers and clients, for collecting data, for analyzing it

and for creating the graphs

Use Matlab for graphs and data processing– This is installed on the ECE cluster and is available to students

• Can also download it from https://www.cmu.edu/myandrew/– If you need help with plotting your graphs, please send email to us

46

Digging Deeper Do the same thing while injecting faults

Other probes– CPU usage (time spend in kernel, user mode)– Memory (total, resident set)– Bandwidth usage– Context switches– Major/minor page faults (page not in physical memory)

Other ways to represent data– Boxplots for end-to-end latency– Impact of varying #clients, size, request rate on #outliers, size of outliers, latency, etc.– Do you see multi-modal distributions (can you explain them)?

Interpretation of results– Are outliers isolated or do they come in bursts?– What is the source of the outliers?– Can you predict anything about the behavior of your system?– What questions can you answer by looking at this data?

47

Summary of Lecture

What matters to you?– What experiments should you run?– What data should you collect?– How should you present your data?– What should you analyze?– What lessons might you learn about your system?

Email all questions to the course mailing list– The other two TAs and myself (Tudor) are on this list– We’re happy to sit down and work out the details with you and to help you run

your experiments

It might sound like a lot of work, but the hard part is behind you – you’ve already built your system

– Now, it’s time to understand what you actually built!

12. Experimental Evaluation 18-749: Fault-Tolerant Distributed Systems

Documents

Transcript of 12. Experimental Evaluation 18-749: Fault-Tolerant Distributed Systems