Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior
2014-06-26 - A guide to undefined behavior in c and c++
Transcript of 2014-06-26 - A guide to undefined behavior in c and c++
A Guide to Undefined Behavior in C and C++
StanleySIT@Synology
2014-06-26
Great Articles
● A Guide to Undefined Behavior in C and C++○ http://blog.regehr.org/archives/213
● What Every C Programmer Should Know About Undefined Behavior○ http://blog.llvm.org/2011/05/what-every-c-
programmer-should-know.html■ by Chris Lattner, primary author of the LLVM
project
void *safe_alloc(size_t n, size_t m) {size_t total_size = n * m;if (n > 0 && SIZE_MAX/n < m)
return 0;return malloc(total_size);
}
Let’s have a quiz 1.What’s the bug inside?
void *safe_alloc(size_t n, size_t m) {// Compiler will assume total_size must not overflow// n*m must not exceed SIZE_MAX, thus remove checksize_t total_size = n * m;if (n > 0 && SIZE_MAX/n < m)
return 0;return malloc(total_size);
}
Let’s have a quiz 1.What’s the bug inside?
void *safe_alloc(size_t n, size_t m) {size_t total_size = 0;if (n > 0 && SIZE_MAX/n < m)
return 0;total_size = n * m;return malloc(total_size);
}
Revised version
Let’s have a quiz 2.What main() will return?
int a;int assign_a (int val) { a = val; return val;}int main (void) { assign_a (0) + assign_a (1); return a;}http://blog.regehr.org/archives/161
Let’s have a quiz 2.What main() will return?
int a;int assign_a (int val) { a = val; return val;}int main (void) { // order of evaluation of the subexpressions in C is unspecified // main() may either return 0, or return 1 assign_a (0) + assign_a (1); return a;}
Revised version
int a;int assign_a (int val) { a = val; return val;}int main (void) { int x = assign_a (0); int y = assign_a (1); x+y; return a;}http://blog.regehr.org/archives/161
“undefined behavior”
● Anything at all can happen● Standard imposes no requirements.
○ may fail to compile○ may crashing○ may silently generating incorrect results○ may fortunately do exactly what the programmer
intended.
List of undefined behavior
● Use of an uninitialized variable○ Int a;○ if(a>0) {}
● Signed integer overflow○ "INT_MAX+1" is not guaranteed to be INT_MIN.
● Oversized Shift Amounts○ Shifting a uint32_t by 32○ 1<<32
List of undefined behavior
● Dereferences of Wild Pointers and Out of Bounds Array Accesses○ Int *a = rand();○ *a = 1;
● Dereferencing a NULL Pointer○ Int *a = NULL;○ *a = 1;
■ contrary to popular belief, It is not defined to trap
List of undefined behavior
● Violating Type Rules○ cast an int* to a float*
● Divide by zero● …
Why Is Undefined Behavior Good?
● the only good thing!○ it simplifies the compiler’s job○ can generate very efficient code
Why Is Undefined Behavior Good?
● Avoid overhead○ initialization○ array range checked
● Enable loop optimization● Compiler Don't have to deal with various
CPUs● Enable advanced optimization technique
○ "Type-Based Alias Analysis" (TBAA)
Why Is Undefined Behavior Bad?
● Application Developer may not aware of○ the code generate undefined behavior○ Modern compiler optimizer contains many
optimizations○ different compilers often have substantially different
optimizers
bug in Linux Kernelvoid contains_null_check(int *P) { int dead = *P; if (P == 0) return; *P = 4;}
Compiler Optimizations:● Dead Code Elimination● Redundant Null Check
EliminationIf two optimizations run at different order on this code snippet ...
If compiler run Dead Code Elimination First
Dead Code Elimination:void contains_null_check_after_DCE(int *P) { int dead = *P; // deleted by the optimizer. if (P == 0) return; *P = 4;}
Redundant Null Check Elimination:void contains_null_check_after_DCE_and_RNCE(int *P) { // Null check is kept. if (P == 0) return; *P = 4;}
If compiler runRedundant Null Check Elimination First
Redundant Null Check Elimination:void contains_null_check_after_RNCE(int *P) { int dead = *P; if (false) // P was dereferenced by this point, so it can't be null return; *P = 4;}
Dead Code Elimination:void contains_null_check_after_RNCE_and_DCE(int *P) { int dead = *P; if (false) return; *P = 4;}
If performance is not your only goal
● undefined behavior is often a scary
CVE-2009-1897 bug in Linux Kernel
● kernel/git/torvalds/linux.git○ tun subsystem in the Linux kernel 2.6.30 and
2.6.30.1○ drivers/net/tun.c
● null check removed by optimize○ http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-
2009-1897○ Fun with NULL pointers, part 1
■ http://lwn.net/Articles/342330/
Is a large codebase contains undefined behavior?
● no good way to determine● some useful tools that can help find bugs
primary author of the LLVM project
Is a large codebase contains undefined behavior?
● Enable and pay attention to compiler warnings, preferably using multiple compilers
● Use static analyzers (like Clang’s, Coverity, etc.) to get even more warnings
● Use compiler-supported dynamic checks○ gcc’s -ftrapv flag generates code to trap signed
integer overflows
Is a large codebase contains undefined behavior?
● Use tools like Valgrind to get additional dynamic checks● When functions are “type 2″ as categorized above,
document their preconditions and postconditions● Use assertions to verify that functions’ preconditions are
postconditions actually hold● Particularly in C++, use high-quality data structure
libraries
Is a large codebase contains undefined behavior?
● Clang has an experimental -fcatch-undefined-behavior mode
$ clang t.c $ ./a.out $ clang t.c -fcatch-undefined-behavior $ ./a.out Illegal instruction
Reference
● The C FAQ○ http://c-faq.com/ansi/undef.html
● Undefined behavior○ http://en.wikipedia.org/wiki/Undefined_behavior
Q&A