Background (Floating-Point Representation 101) Floating-point represents real numbers as (± sig ×...

27
Dynamic Floating- Point Cancellation Detection Michael O. Lam (Presenter) Jeffrey K. Hollingsworth G. W. Stewart University of Maryland, College Park

Transcript of Background (Floating-Point Representation 101) Floating-point represents real numbers as (± sig ×...

Page 1: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

Dynamic Floating-Point Cancellation

Detection

Michael O. Lam (Presenter)Jeffrey K. Hollingsworth

G. W. StewartUniversity of Maryland, College Park

Page 2: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

2

Background(Floating-Point Representation 101)

Floating-point represents real numbers as (± sig × 2exp) Sign bit Significand (“mantissa” or “fraction”) Exponent

Floating-point numbers have finite binary precision Single-precision: 24 binary digits (~7 decimal digits) Double-precision: 53 binary digits (~16 decimal digits)

Examples: π 3.141592… 11.0010010… 1/10 0.1 0.0001100110…

Image from Wikipedia (“Single precision”)

Page 3: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

3

Motivation

Finite precision causes round-off error Compromises ill-conditioned calculations Hard to detect and diagnose

Increasingly important as HPC scales Need to balance speed and accuracy

Lower precision is faster Higher precision is more accurate

Industry-standard double precision may still fail on long-running computations

Page 4: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

4

Previous Solutions

Analytical Requires numerical analysis expertise Conservative static error bounds are largely

unhelpful

Ad-hoc Run experiments at different precisions Increase precision where necessary Tedious and time-consuming

Page 5: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

5

Instrumentation Solution

Automated (vs. manual) Minimize developer effort Ensure consistency and correctness

Binary-level (vs. source-level) Include shared libraries without source code Include compiler optimizations

Runtime (vs. compile time) Dataset and communication sensitivity

Page 6: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

6

Solution Components

Dyninst-based instrumentation utility (“mutator”) Cross-platform No special hardware required Stack walking and binary rewriting

Shared library with runtime analysis routines Flexibility and ease of development

Java-based log viewer GUI Cross-platform Minimal development effort

Page 7: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

7

Analysis Process

Run mutator Find floating-point instructions Insert calls to shared library

Run instrumented program Executes analysis alongside original program Stores results in a log file

View output with GUI

Page 8: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

8

Analysis Types

Cancellation detection

Shadow-value analysis

Page 9: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

9

Cancellation

Loss of significant digits during subtraction operations

Cancellation is a symptom, not the root problem

Indicates that a loss of information has occurred that may cause problems later

1.613647 (7) 1.613647 (7) - 1.613635 (7) - 1.613647 (7) 0.000012 (2) 0.000000 (0)

(5 digits cancelled) (all digits cancelled)

1.6136473- 1.6136467 0.0000006

Page 10: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

10

Detecting Cancellation

For each addition/subtraction: Extract value of each operand Calculate result and compare magnitudes

(binary exponents) If eans < max(ex,ey) there is a cancellation

For each cancellation event: Calculate “priority:” max(ex,ey) - eans

If above threshold, save event information to log

For some events, record operand values

Page 11: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

11

Page 12: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

12

Page 13: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

13

Experiments

Gaussian elimination Benefits of partial pivoting Differing runtime behavior of popular

algorithms

Page 14: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

14

Gaussian Elimination

A [L,U]

Partial pivoting Nominally to avoid division by zero Also avoids inaccurate results from small pivots This can be detected using cancellation

swap

Page 15: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

15

cancellation

loss of data

pivot

Page 16: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

16

Gaussian Cancellation

  log(diag. element size)  

  Threshold  

Matrix Size

-2 -4 -6 -8Estimate

1 7 13 17

10 x 10 66 37 37 34 2515 x 15 225 123 122 122 10020 x 20 663 247 252 257 22525 x 25 1227 394 423 441 400

Cancellation Counts

Page 17: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

17

Gaussian Elimination

This suggests that cancellation can be used to detect the effects of a small pivot

Useful in sparse elimination with limited ability to pivot

Threshold must be kept high enough

Page 18: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

18

Gaussian Elimination

A [L,U]

Classical Bordered

Page 19: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

19

Size of diagonal elements

Iterations of algorithm

Classical Bordered

 Classical Bordered

threshold 1 2 3 4 5 1 2 3 4 5

smallest diag. value

                   

10-5 14 8 1 0 0 8 7 6 5 410-10 29 23 16 11 3 8 8 7 7 610-15 39 33 27 21 17 9 9 9 8 8

Page 20: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

20

Gaussian Elimination

Classical method: many small cancellations

Bordered method: fewer but larger cancellations

Our tool can detect these differences and inform the developer, who can then make decisions regarding which algorithm to use

Page 21: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

21

Other Results

Approximate nearest neighbor More cancellations in denser point sets

SPEC benchmarks milc and lbm Cancellations in error calculations indicate

good results

SPEC benchmark povray Cancellations indicate color black

Page 22: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

22

Conclusions

It is important to vary the threshold Most calculations have background

cancellations Small cancellations can hide large ones

Cancellation results require interpretation by someone who is familiar with the algorithm

Properly employed, cancellation detection can help find “trouble spots” in numerical codes

Page 23: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

23

Ongoing Research

Shadow value analysis Replace floating-point numbers with pointers to

auxiliary information (higher precision, etc.)

double x = 1.0;

void func() { double y = 4.0; x = x + y;}

printf(“%f”, x);

1.0004.0005.000

“shadow value”

Page 24: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

24

Shadow Value Analysis

Current status: allows programmers to automatically test their entire program in different precisions

Next step: selectively instrument particular code blocks or data structures

Goal: automated floating-point analysis and recommendation framework

Page 25: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

25

Thank you!

Code available upon request

Questions?

Page 26: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

26

Size of diagonal elements

Iterations of algorithm

Classical Bordered

threshold

1 2 3 4 5

smallest diag. value

C B C B C B C B C B

10-5 14 8 8 7 1 6 0 5 0 4

10-10 29 8 23 8 16 7 11 7 3 6

10-15 39 9 33 9 27 9 21 8 17 8

Page 27: Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

27

Gaussian Cancellation

log(pivot) -2 -4 -6 -8 log(pivot) -2 -4 -6 -8 Threshold 1 7 13 17 Threshold 1 7 13 17 n = 10         n = 20         Count 66 37 37 34 Count 663 247 252 257 Trunc 55 37 37 34 Trunc 298 245 252 257 Est 25 25 25 25 Est 225 225 225 225 n = 15 n = 25   Count 225 123 122 122 Count 1227 394 423 441 Trunc 154 122 122 122 Trunc 447 381 423 441 Est 100 100 100 100 Est 400 400 400 400