University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete...

20
University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart

Transcript of University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete...

Page 1: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland

Dynamic Floating-Point Error Detection

Mike Lam, Jeff Hollingsworth and Pete Stewart

Page 2: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 2

Motivation

Finite precision -> roundoff error Compromises ill-conditioned calculations Hard to detect and diagnose Increasingly important as HPC grows

Single-precision is faster on GPUs Double-precision fails on long-running

computations Previous solutions are problematic

Numerical analysis requires training Manual re-writing and testing in higher

precision is tedious and time-consuming

Page 3: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 3

Our Solution

• Instrument floating-point instructions

• Automatic• Minimize developer effort• Ensure analysis consistency and correctness

• Binary-level• Include shared libraries w/o source code• Include compiler optimizations

• Runtime• Data-sensitive

Page 4: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 4

Our Solution

• Three parts• Utility that inserts binary instrumentation• Runtime shared library with analysis routines• GUI log viewer

General overview Find floating-point instructions and insert

calls to shared library Run instrumented program View output with GUI

Page 5: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 5

Our Solution

Dyninst-based instrumentation Cross-platform No special hardware required Stack walking and binary rewriting

Java GUI Cross-platform Minimal development effort

Page 6: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 6

Our Solution

• Cancellation detection• Instrument addition & subtraction• Compare runtime operand values• Report cancelled digits

• Side-by-side (“shadow”) calculations• Instrument all floating-point instructions• Higher/lower precision• Different representation (i.e. rationals)• Report final errors

Page 7: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 7

Cancellation Detection

• Overview• Loss of significant digits during operations

• For each addition/subtraction: Extract value of each operand Calculate result and compare magnitudes

(binary exponents)• If eans < max(ex,ey) there is a

cancellation

• For each cancellation event:• Record a “priority:” max(ex,ey) - eans

• Save event information to log

Page 8: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 8

Page 9: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 9

Page 10: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 10

Gaussian Elimination

A -> [L,U]

Comparison of eight methods Classical Classical w/ partial pivoting Classical w/ full pivoting Bordering (“Sherman’s march”) “Pickett’s charge” “Pickett’s charge” w/ partial pivoting Crout’s method Crout’s method w/ partial pivoting

Page 11: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 11

Gaussian Elimination

Page 12: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 12

Gaussian Elimination

Classical vs. Bordering

Page 13: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 13

Gaussian Elimination

Classical Bordering

Operations 285 294

Cancellations 39 9

Cancels/ops 14% 3%

Average bits 5.23 22.78

Page 14: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 14

SPEC Benchmarks

• Results are hard to interpret without domain knowledge

• Overheads:

Page 15: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 15

Roundoff Error

Sparse “shadow value” table Maps memory addresses to alternate values Shadow values can be single-, double-, quad- or

arbitrary-precision Other ideas: rationals, # of significant digits, etc.

Instrument every FP instruction• Extract operation type and operand addresses• Perform the same operation on corresponding

shadow values• Output shadow values and errors upon

termination

Page 16: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 16

Page 17: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 17

More Gaussian Elimination

Maximum relative error

25x25 50x50 100x100

Partial pivoting

9.3e-10 2.3e-2 1.0

Full pivoting 1.3e-15 2.4e-15 4.8e-15

Page 18: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 18

Issues & Possible Solutions

• Expensive overheads (100-500X)• Optimize with inline snippets• Reduce workload with data flow analysis

• Following values through compiler optimizations• Selectively instrument MOV instructions

• Filtering false positives• Deduce “root cause” of error using data flow

Page 19: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 19

Conclusion

• Analysis of floating-point error is hard

• Our tool provides automatic analysis of such error

• Work in progress

Page 20: University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

University of Maryland 20

Thank you!