x86 Programming Memory Accessing Modes, Characters, and Strings
SECURE PROGRAMMING Chapter 2 Strings
description
Transcript of SECURE PROGRAMMING Chapter 2 Strings
SECURE PROGRAMMING
Chapter 2
Strings
Overview
● Arrays and their Problems● Character Strings● Common String Manipulation errors● String Vulnerabilities and exploits● Mitigation Strategies● String Handling Functions, the bad and the good● Runtime Protection Strategies● Some Notable Vulnerabilities● Summary
Arrays and their Problems
1) Hard to determine size.
2) Size defaults may not work.
3) Easy to index an array out of bounds.
4) Easy to write non-portable code (non-consistent handling, for example).
5) Size parameters may be wrong (see 3))
6) Array copying may overflow the array
7) Pointer arithmetic may be incorrect.
Character Strings
The problem: Many strings come from outside:• Command line arguments• Environment variables• Console or other input• Text files• Network Connections
Strings are not built-in to C/C++, though there is (some) Library support
Character Strings: String Data Type
Most people implement a string as a Null terminated array of characters; addressed by a pointer. Have all the problems of arrays magnified because most string manipulation is done through procedures.
Five Important terms for arrays:
1. Bound = size of the array.
2. Lo = Address of first element of the array
3. Hi = Address of last element of the array
4. TooFar = The address of the one-too-far element of the array = Hi + 1 = Lo + Bound
5. Target size (Tsize) = Bound
Character Strings: String Data Type
Two more terms for strings.
1. Null-terminated if there is a null character within the array.
2. Length: For null-terminated strings, the number of characters before the (first) null terminator.
Problem with determining array size (clear procedure)
Character Strings: String Data Type
More problems:
What Characters? “Execution Character Set”
-locale- setlocale() function
Basic execution character set: 26 UC/LC letters, 10 digits 29 graphic characters, space, 33 control characters including HT VT FF Bell BS CR NL, NULL, DEL
Execution character set may contain many characters, require multiple bytes to represent a character (multibyte character set); basic character set still present. Locale-specific shift states.
Character Strings: UTF-8
Can represent any character in the Unicode character set, use 1-4 bytes.
0-127, 1 Byte
o.w As many 1 bits as the total number of bytes in the sequence, followed by a 0 bit; all succeeding bytes start with 10.
Thus: If leading 0, 1 byte:
If leading 11, start of multibyte code
If leading 10, continuation of multibyte code.
(Watch out for vulnerabilities!)
Character Strings: UTF-8
Wide Strings
16 or 32 bit characters
Terminated with a null wide character.
As is the case with regular strings (with caveats!)● Pointers point to left-most character.● The length is the number of wide characters
preceding the null wide character.● The value is the sequence of code values of the
contained wide characters, in order.
String Literals
Enclosed in double quotes “
Wide string literals prefixed by L
String literal tokens are concatenated together. If any of them is prefixed by L, the string is a wide string. Example in text, page 34. Null appended, used to initialize a static array.
In C, such a string is modifiable (no 'const' modifier available) but modification is “forbidden”.
Watch for declarations of the form:
const char s[3] = “abc”; //Not Null terminated string. Use:
const char s[] = “abc”
Strings in C++
● Proliferation of string classes.● Standardized (STL) down to
● String = typedef for basic_string<char>● Wstring = typedef for basic_string<wchar_t>
● Also allows:● null-terminated byte string (NTBS)● NTMBS is an NTBS that contains a sequence
of valid multibyte characters and ends in the same shift state it starts.
Strings in C++ (2)
basic_string class template specializations are safer than NTBS, but
NTBS are required all over the place:● Literals are NTBS● Existing libraries need NTBS or NTMBS
string objects are passed by value or reference, while c-strings are passed by pointer.
Thank goodness for member function data aka c_str
Character Types
Three types:● Plain● Signed● Unsigned
May cause compiler warnings if the wrong type is used.
int
Some gotcha's:● Getc and friends return an int so that EOF is an
authentic -1.● Functions in ctype.h (cctype) like isalpha accept an
int because they might be passed the result of a getc or similar.
● In C, a character constant has type int, so that sizeof('a') is 4, not 1. In C++ a character constant has type char and its size is 1.
Wide character literals have type wchar_t and multicharacter literals have type int.
Unsigned char and wchar_t
Unsigned char: all bits handled equally; pure binary. No padding bits, no trap representation, no sign extension, etc.
wchar_t: Can be used for natural-language character data. For characters in the basic character set, it does not matter, except for type compatibility issues.
Sizing String headaches
Three important numbers:
Size = number of bytes allocated to the array (sizeof(a))
Count = number of elements in the array (maybe different from size!)
Length = Number of characters before null terminator.
Notes:
If characters are wide, size may be 2*count or 4*count. (depends on OS)
Length MUST be smaller than count.
See Program fragments in book, pages 40-41.
Common String Manipulation Errors
● Use of gets NONONONONONONONO!!!!!!!!!!● Improperly bounded string copies. Do not use:
● strcpy()● strcat()● sprintf()
● Watch out for:● Input strings● Environment strings● Parameter strings.... (see programs, pp 42-47)
Common String Manipulation Errors
● Sizing strings: ● do not use strlen for wide strings; use wcslen● Multiply result by sizeof(wchar_t)
Programs, pages 41-42● Improperly bounded string input:
● Do not use:● gets● cin of string with unbounded length● Unbounded string scanf
See programs pp 42-43 (the program on page 43 is a typical implementation of gets)
Common String Manipulation Errors
● Careless copying and concatenation of strings
Program, page 44● Watch for strcpy, strcat, memcpy, sprint, etc.
● Off-by-one errors. (see program, page 47)● Null termination errors (pp 49-49)● String truncation● If you implement them yourself, you may still be
in trouble! (page 50)
String Vulnerabilities and Exploits
● String Vulnerabilities and Exploits● Where does your data come from? Are you
sure?
Program on page 51 is bad:● Uses gets● Doesn't even check the exit status of gets
String Vulnerabilities and Exploits
String Vulnerabilities and exploits
String Vulnerabilities and Exploits
String Vulnerabilities and exploits
(see ASM code, pp 56-58)
Effect called “Stack Smashing”
Example follows (remember the code from IsPasswordOK?)
String Vulnerabilities and exploits
String Vulnerabilities and exploits
String Vulnerabilities and exploits
String Vulnerabilities and exploits
String Vulnerabilities and exploits
This exploit is called “arc injection”
String Vulnerabilities and exploits
● Code Injection:● Injection of malicious address and malicious
code● Must be acceptable as legitimate input● May not cause abnormal termination● Must result in execution of the malicious code.
● IsPasswordOK is vulnerable (page 65)● Exploit with fgets and strcpy on page 66
(unclear; obviously not tested).
String Vulnerabilities and exploits
Arc injection aka return-into-libc includes:
Branching to an existing function
System(), exec(), setuid() are favorites
Example of vulnerable code, page 70
Prevents memory-based protection schemes from working.
String Vulnerabilities and exploits
Return-Oriented Programming
“gadget” = sequence of instructions followed by return.
Turing-complete set exists for many architectures, including x86, Solaris libc and there is a compiler.
Programs use the stack; values are pushed/popped,
return addresses can be skipped for branching.
Actually similar to FORTH programming.
Mitigation Strategies
Two kinds:
Prevent buffer overflows
Detect buffer overflows and recover securely
Best to do defense in depth and apply both.
Mitigation Strategies
Preventing Buffer Overflows:
Cert recommends using a consistent plan for managing strings.
Three models:
1) Caller allocates and frees
Most likely to prevent memory leaks
2) Callee allocates, caller frees
Ensures sufficient memory is available
3) Callee allocates and frees (only available in C++)
Most secure of the three solutions
Mitigation Strategies
Mitigation strategies:
Caller allocates and frees:C <string.h> family expanded with c11 functions:
strcpy_s strcat_s strncpy_s strncat_s
See example 2.5, 2.6, pages 74,75
Mitigation Strategies
Callee allocates and frees
Biggest problems:
DOS attack by exhausting memory
Dynamic memory management errors
Example 2.7 p 77
FILE *fmemopen , *open_memstream(signature, p78) to do memory “I/O”
Example code, page 79
Dynamic allocation disallowed in safety-critical systems
Mitigation Strategies
C++ string class pp 80-83
String Handling Functions, the bad and the good
gets: replace with fgets or getchar
Examples 2.9, 2.10, pp 84-86
… or gets_s
Example 2.11, page 87
… or getline() (~= getdelim())
Example 2.12, p88
String Handling Functions, the bad and the good
Strcpy() and strcat()
Fixes:
Allocate required space dynamically
Strncpy and strncat are not recommended.
Strlcpy() and strlcat() (always null-terminate result)
strcpy_s and strcat_s (implementation, page 91)
Strdup() (dynamically allocated, requires free().
Summary, pp 92-93
String Handling Functions, the bad and the good
strncpy() and strncat() (p 93)
See strncpy_s (p 95) and strncat_s (pp 97-98)
strndup() (uses dynamic memory allocation)
Summary on p 99
String Handling Functions, the bad and the good
memcpy() and memmove(): replace by memcpy_s() and memmove_s() respectively
Watch out for strlen(). There is an strlen_s, strnlen and strnlen_s, all identical.
Runtime Protection Strategies
Detection and recovery
Provided via:
input validation
the compiler and its runtime system (e. g. array bounds checking)
Operating system
Runtime Protection StrategiesInput Validation
Input data size checking.
Object size checking (with ___builtin_object_size()) Use by turning on _FORTIFY_SOURCE=n for n ⩾ 1 (p 104, 105)
Runtime Protection StrategiesThe compiler, runtime system.
Visual Studio Compiler-Generated Runtime Checks
Turn on with flags: /RTCs turns on checks for:
Local variable overflows (including arrays)
Use of uninitialized variables
Stack pointer corruption
Can be tweaked: #pragma runtime_checks(“s”, off/restore)
Runtime Bounds Checkers:
Libsafe
Libverify
CRED
Runtime Protection StrategiesThe compiler, runtime system
Stack Canaries:
StackGuard
GCC's Stack-Smashing Protector aka ProPolice
-fstack-protector[-all] -wstack-protector
C++ .NET stack overrun detection capability /GS
recommend adding: #pragma strict_gs_check(on)
recommend adding #pragma string_gs_check(on)
Recommend compiling with /GS flag and linking with /GS compiledlibraries.
Runtime Protection StrategiesThe Operating System
Address space layout randomization
Linux (PaX project, 2000)
Windows, since Vista
MAC OS X since 2007/2011, IOS since 4.3
Nonexecutable Stacks
W^X
Data Execution Prevention (Microsoft Visual Studio)
PaX marked stack as non-executable
StackGap
Some Notable Vulnerabilities
rlogin – strcpy
Kerberos
Summary
● Arrays and their Problems● Character Strings● Common String Manipulation errors● String Vulnerabilities and exploits● Mitigation Strategies● String Handling Functions, the bad and the good● Runtime Protection Strategies● Some Notable Vulnerabilities