1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen...

40
1 Enhancing Security of Real- Enhancing Security of Real- World Systems with a Better World Systems with a Better Understanding of the Threats Understanding of the Threats Shuo Chen Shuo Chen Candidate of Ph.D. in Computer Science Candidate of Ph.D. in Computer Science Center for Reliable and High Performance Center for Reliable and High Performance Computing Computing Coordinated Science Laboratories Coordinated Science Laboratories University of Illinois at Urbana- University of Illinois at Urbana- Champaign Champaign

Transcript of 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen...

Page 1: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

11

Enhancing Security of Real-Enhancing Security of Real-World Systems with a Better World Systems with a Better Understanding of the Threats Understanding of the Threats

Shuo ChenShuo ChenCandidate of Ph.D. in Computer ScienceCandidate of Ph.D. in Computer ScienceCenter for Reliable and High Performance Center for Reliable and High Performance ComputingComputingCoordinated Science LaboratoriesCoordinated Science LaboratoriesUniversity of Illinois at Urbana-ChampaignUniversity of Illinois at Urbana-Champaign

Page 2: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

22

Security Threat Analysis and Mitigations in Security Threat Analysis and Mitigations in Real-World SystemsReal-World Systems– Investigate the impact of hardware memory errors on Investigate the impact of hardware memory errors on

the security of Internet servers and firewalls. the security of Internet servers and firewalls. Simulate random hardware memory errorsSimulate random hardware memory errors Stochastic model to estimate the probability of security Stochastic model to estimate the probability of security

violations.violations.– Analyze and model a wide spectrum of software security Analyze and model a wide spectrum of software security

vulnerabilities reported by CERT and Bugtraq. vulnerabilities reported by CERT and Bugtraq. Decompose each vulnerability to many primitive operations.Decompose each vulnerability to many primitive operations. Introduce formalism into reasoning and description of real Introduce formalism into reasoning and description of real

vulnerabilities.vulnerabilities. Interesting outcome: discovered a new security bug in an Interesting outcome: discovered a new security bug in an

HTTP server, now published in Bugtraq.HTTP server, now published in Bugtraq.

– Construct non-traditional methods to attack major Construct non-traditional methods to attack major Internet server programs without being detected by Internet server programs without being detected by most current defense techniques. This represents a new most current defense techniques. This represents a new challenge for defense research.challenge for defense research.

– Develop techniques to provide a better security Develop techniques to provide a better security protection for real-world systemsprotection for real-world systems

A theorem proving based code analysisA theorem proving based code analysis A processor architecture level runtime defenseA processor architecture level runtime defense

Focus of this talk

My DissertationMy Dissertation

Earlier work

Page 3: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

33

PART I:PART I:

Analyzing and Identifying Analyzing and Identifying Security Threats on Real-World Security Threats on Real-World SoftwareSoftware

Page 4: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

44

Significance of Memory Significance of Memory VulnerabilitiesVulnerabilities

CERT Advisories: CERT Advisories: 66% vulnerabilities are low 66% vulnerabilities are low level memory errors in software.level memory errors in software.

Widely exploited by attackers, worms and viruses.Widely exploited by attackers, worms and viruses.

Format String 7%

Globbing2%

Heap Corruption

8%

Integer Overflow

6%

Buffer Overflow

44%

Other33%

Page 5: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

55

Widely Understood Threats of Memory Widely Understood Threats of Memory CorruptionsCorruptions

Once a memory error is found, it is Once a memory error is found, it is straightforward to take control of the straightforward to take control of the victim system by victim system by control-hijacking control-hijacking attacksattacks..– First, overwrite control data, such as First, overwrite control data, such as

return addresses, function pointers, GOT return addresses, function pointers, GOT entries or DTOR entries.entries or DTOR entries.

– Program control is hijacked to execute Program control is hijacked to execute code with malicious purposes.code with malicious purposes.

– The malicious code is able to make system The malicious code is able to make system calls with the privilege of the victim calls with the privilege of the victim process. Do real damages to the system.process. Do real damages to the system.

Page 6: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

66

Current Techniques to Defeat Current Techniques to Defeat Memory Corruption Attacks Memory Corruption Attacks Control hijacking is the most dominant form of memory Control hijacking is the most dominant form of memory

corruption attacks (CERT and Microsoft Security Bulletin)corruption attacks (CERT and Microsoft Security Bulletin)

Accordingly, many current defense techniques are Accordingly, many current defense techniques are designed to enforce program control flow integrity in designed to enforce program control flow integrity in order to provide software security. This research area order to provide software security. This research area has been active for many years. has been active for many years.

A common justification: attacks not hijacking program A common justification: attacks not hijacking program control flow (i.e., non-control-hijacking attacks) are rare control flow (i.e., non-control-hijacking attacks) are rare against real-world software.against real-world software.

Important question: Important question: – How confident can we rely on this justification to build How confident can we rely on this justification to build

defenses?defenses?– Is it possible that people currently underestimate the real Is it possible that people currently underestimate the real

threats of memory corruption attacks?threats of memory corruption attacks?– Specifically, dominance of control-hijacking attacks Specifically, dominance of control-hijacking attacks

attackers’ attackers’ incapabilityincapability or or lack of incentivelack of incentive to mount non- to mount non-control-hijacking attacks?control-hijacking attacks?

Page 7: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

77

Our Claim: General Applicability Our Claim: General Applicability of Non-control-hijacking Attacksof Non-control-hijacking Attacks

Our previous papers suggest an initial doubtOur previous papers suggest an initial doubt– Even random hardware memory errors can subvert the Even random hardware memory errors can subvert the

security of real-world systems with a non-negligible security of real-world systems with a non-negligible probability. None of the compromises is due to control probability. None of the compromises is due to control hijacking.hijacking.

– Software vulnerabilities are more deterministic and Software vulnerabilities are more deterministic and more amenable to attacks. Why attackers are more amenable to attacks. Why attackers are incapable to mount non-control-hijacking attacks incapable to mount non-control-hijacking attacks against real-world systems?against real-world systems?

We make a hypothetical claim:We make a hypothetical claim:– Many real-world software applications are susceptible Many real-world software applications are susceptible

to non-control-hijacking attacks;to non-control-hijacking attacks;– The severity of the attack consequences is equivalent The severity of the attack consequences is equivalent

to that due to control hijacking attacks. to that due to control hijacking attacks.

If the claim is indeed true, it represents a new If the claim is indeed true, it represents a new challenge to defense techniques.challenge to defense techniques.

Page 8: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

88

Goal: Empirical Validation of the Goal: Empirical Validation of the ClaimClaim

Investigate many “representative Investigate many “representative software applications”. Try to break into software applications”. Try to break into them using non-control-hijacking attacks.them using non-control-hijacking attacks.

Choose representative software Choose representative software applicationsapplications– We did a quick survey on the recent four We did a quick survey on the recent four

years of CERT advisories. Over 1/3 years of CERT advisories. Over 1/3 vulnerabilities are in vulnerabilities are in FTPFTP, , SSHSSH, , TelnetTelnet and and HTTPHTTP servers. servers.

Construct non-control-hijacking attacks to Construct non-control-hijacking attacks to compromise these servers. Each attack compromise these servers. Each attack results in the root compromise of the results in the root compromise of the victim server.victim server.

Page 9: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

99

Non-control-hijacking attack on Non-control-hijacking attack on WU-WU-FTPFTP Server (via a format string bug) Server (via a format string bug)

int x;FTP_service(...) { authenticate(); x = user ID of the authenticated user; seteuid(x); while (1) { get_FTP_command(...); if (a data command?) getdatasock(...); }}getdatasock( ... ) { seteuid(0); setsockopt( ... ); seteuid(x);}

x=109, run as EUID 0

x uninitialized, run as EUID 0

x=109, run as EUID 109. Lose the root privilege!

Get a special SITE EXEC command. Exploit a format string vulnerability.x= 0, still run as EUID 109.

x=0, run as EUID 0

x=0, run as EUID 0

When return to service loop, still runs as EUID 0 (root). Allow me to upload /etc/passwdI can grant myself the root privilege!

Only corrupt an integer, not control hijacking.

Get a data command (e.g., PUT)

Page 10: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

1010

Non-control-hijacking attack on Non-control-hijacking attack on NULL-HTTPNULL-HTTP Server (via a heap overflow Server (via a heap overflow bug)bug)

Attack the configuration string of CGI-BIN Attack the configuration string of CGI-BIN path.path.

Mechanism of CGIMechanism of CGI– suppose server name = www.foo.comsuppose server name = www.foo.com

CGI-BIN = /usr/local/httpd/exe CGI-BIN = /usr/local/httpd/exe – Requested URL = http://www.foo.com/cgi-bin/barRequested URL = http://www.foo.com/cgi-bin/bar– The server executesThe server executes

Our attackOur attack– Exploit a heap overflow vulnerability to overwrite Exploit a heap overflow vulnerability to overwrite

CGI-BIN to /binCGI-BIN to /bin– Request URL http://www.foo.com/cgi-bin/shRequest URL http://www.foo.com/cgi-bin/sh– The server executes The server executes

The server gives me a root shell!Only overwrite four characters in the CGI-BIN string.Not control hijacking.

//usr/local/httpd/exeusr/local/httpd/exe//barbar

/bin/bin/sh/sh

Page 11: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

1111

Non-control-hijacking attack on Non-control-hijacking attack on SSH SSH CommunicationsCommunications SSH Server (via an integer SSH Server (via an integer overflow bug)overflow bug)

void do_authentication(char *user, ...) { int auth = 0; ... while (!auth) { /* Get a packet from the client */ type = packet_read(); switch (type) { ... case SSH_CMSG_AUTH_PASSWORD: if (auth_password(user, password)) auth =1; case ... } if (auth) break; } /* Perform session preparation. */ do_authenticated(…);}

auth = 0

auth = 0

auth = 1

Password incorrect, but auth = 1

auth = 1

Logged in without correct password

Page 12: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

1212

More non-control-hijacking More non-control-hijacking attacksattacks

Against Against NetKitNetKit Telnet server (default Telnet Telnet server (default Telnet server of server of Redhat LinuxRedhat Linux))– Exploit a heap overflow bugExploit a heap overflow bug– Overwrite two strings:Overwrite two strings:

/bin//bin/loginlogin –h –h foo.comfoo.com -p (normal scenario) -p (normal scenario) /bin//bin/shsh –h –h –p–p -p (attack scenario) -p (attack scenario)

– The server runs /bin/sh when it tries to The server runs /bin/sh when it tries to authenticate the user.authenticate the user.

Against Against GazTekGazTek HTTP server HTTP server– Exploit a stack buffer overflow bugExploit a stack buffer overflow bug

Send a legitimate URL http://www.foo.com/cgi-bin/barSend a legitimate URL http://www.foo.com/cgi-bin/bar The server checks that “/..” is not embedded in the URLThe server checks that “/..” is not embedded in the URL Exploit the bug to change the URL to Exploit the bug to change the URL to

http://www.foo.com/cgi-bin/http://www.foo.com/cgi-bin/../../../../bin/sh../../../../bin/sh The server executes /bin/shThe server executes /bin/sh

Page 13: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

1313

Implications of Non-Control-Hijacking Implications of Non-Control-Hijacking AttacksAttacks

Control flow integrity is not a sufficiently Control flow integrity is not a sufficiently accurate approximation to software security.accurate approximation to software security.– Given a memory bug in a real software, attackers’ Given a memory bug in a real software, attackers’

behaviors can be very diversified.behaviors can be very diversified.

Although non-control-hijacking attacks are Although non-control-hijacking attacks are specific to application semantics, there are specific to application semantics, there are many types of non-control data critical to many types of non-control data critical to software securitysoftware security– E.g., user identity data, configuration data, user E.g., user identity data, configuration data, user

input data and decision-making Booleans.input data and decision-making Booleans.

Once attackers have the incentive, they are Once attackers have the incentive, they are likely to succeed in non-control-hijacking likely to succeed in non-control-hijacking attacks.attacks.

Page 14: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

1414

Re-Examining Current Defense Re-Examining Current Defense TechniquesTechniques

They were mainly tested against control-hijacking attacks. They were mainly tested against control-hijacking attacks. Need to re-examine the effectiveness.Need to re-examine the effectiveness.– Many of them are based on control flow integrityMany of them are based on control flow integrity

Monitor system call sequence Monitor system call sequence Protect control dataProtect control data Non-executable stack and heapNon-executable stack and heap

– Pointer encryption (PointGuard)Pointer encryption (PointGuard) Need to encrypt pointers in libraries to be effective (challenging Need to encrypt pointers in libraries to be effective (challenging

because no enough type info, type casting very often, because no enough type info, type casting very often, performance).performance).

– Address space randomizationAddress space randomization Good idea. In each run of the program, memory layout is different.Good idea. In each run of the program, memory layout is different. Challenging to deploy on all program segments.Challenging to deploy on all program segments. Even every segment is randomized, a recent paper shows the Even every segment is randomized, a recent paper shows the

deployment on 32-bit address space doesn’t provide enough deployment on 32-bit address space doesn’t provide enough entropy.entropy.

– StackGuard, Libsafe and FormatGuardStackGuard, Libsafe and FormatGuard They are specific to defeat stack smashing attacks and format They are specific to defeat stack smashing attacks and format

string attacks. Not generic solutions.string attacks. Not generic solutions. Building a generic and secure defense technique to defeat Building a generic and secure defense technique to defeat

memory corruption attacks is still an open problem. memory corruption attacks is still an open problem. Future defense research should consider non-control-Future defense research should consider non-control-

hijacking attacks more seriously.hijacking attacks more seriously.

Page 15: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

1515

PART II:PART II:

Pointer TaintednessPointer Taintedness Detection: Detection: Towards a Better Security Towards a Better Security Protection for Real-World Protection for Real-World SystemsSystems

Page 16: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

1616

Pointer TaintednessPointer Taintedness

Pointer Taintedness: a pointer value, : a pointer value, including a return address, is derived including a return address, is derived from user input. from user input.

Most memory corruption attacks are due Most memory corruption attacks are due to pointer taintedness. to pointer taintedness. – It allows attackers to specify the memory It allows attackers to specify the memory

locations to read, write or transfer control to. locations to read, write or transfer control to. Usually a pathological program behaviorUsually a pathological program behavior..

Pointer taintedness provides a unifying Pointer taintedness provides a unifying perspective for reasoning about a perspective for reasoning about a significant number of security significant number of security vulnerabilities.vulnerabilities.

Page 17: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

1717

Most Memory Corruption Attacks are Most Memory Corruption Attacks are Due to Pointer TaintednessDue to Pointer Taintedness

Format string attack Format string attack – Taint an argument pointer of functions such Taint an argument pointer of functions such

as as printf, fprintf, sprintf printf, fprintf, sprintf andand syslog. syslog. Stack buffer overflow (stack smashing)Stack buffer overflow (stack smashing)

– Taint a function frame pointer or a return Taint a function frame pointer or a return address.address.

Heap corruption Heap corruption – Taint the free-chunk doubly-linked list of the Taint the free-chunk doubly-linked list of the

heap.heap. Glibc Glibc globbingglobbing attack attack

– User input resides in a location that is used as User input resides in a location that is used as a pointer by the parent function of a pointer by the parent function of glob().glob().

Page 18: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

1818

Stack Buffer Overflow Stack Buffer Overflow

Vulnerable code: char buf[100]; strcpy(buf,user_input);

Return addrReturn addr

Frame pointerFrame pointer

buf[99]buf[99]

……

buf[1]buf[1]

buf[0]buf[0]

High

Low

Sta

ck g

row

th

buf

user_input

Frame pointer or return address can be tainted.

Page 19: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

1919

ap: argument pointer

fmt: format string pointer

Format String AttackFormat String Attack

In vfprintf(), if (fmt points to “%n”) then **ap = (character count)

Vulnerable code: recv(buf); printf(buf); /* should be printf(“%s”,buf) */

\xdd \xcc \xbb \xaa %d %d %d %n

……

%n%n

%d%d

%d%d

%d%d

0xaabbccdd0xaabbccdd

fmt: format string pointer

ap: argument pointer

High

Low

Sta

ck g

row

th

*ap is a tainted value.

Page 20: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

2020

Heap Corruption AttackHeap Corruption Attack

Free chunk A

Free chunk Bfd=Abk=C

Allocated buffer buf

Free chunk C

user

inpu

t

Vulnerable code:buf = malloc(1000);recv(sock,buf,1024);free(buf);

In free():B->fd->bk=B->bk; B->bk->fd=B->fd;

When B->fd and B->bk are tainted, the effect of free() is to write a user specified value to a user specified address.

Page 21: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

2121

Building Defense Techniques Building Defense Techniques based on Pointer Taintednessbased on Pointer Taintedness

Static code analysis: analyze the source Static code analysis: analyze the source code to extract the conditions under code to extract the conditions under which the possibility of pointer which the possibility of pointer taintedness exists.taintedness exists.– To uncover potential vulnerabilitiesTo uncover potential vulnerabilities

Runtime detection: monitor at runtime Runtime detection: monitor at runtime whether a tainted value is dereferenced whether a tainted value is dereferenced as a pointer.as a pointer.– To defeat memory corruption attacks (both To defeat memory corruption attacks (both

control-hijacking and non-control-hijacking control-hijacking and non-control-hijacking attacks)attacks)

Page 22: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

2222

Project AProject AFormal Reasoning about Pointer Formal Reasoning about Pointer Taintedness: Taintedness: To Extract Security Specifications of Library To Extract Security Specifications of Library FunctionsFunctions

Page 23: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

2323

Project OverviewProject Overview Our analysis on CERT advisories showsOur analysis on CERT advisories shows

– A significant portion of vulnerabilities (A significant portion of vulnerabilities ( 33.6%) due to errors in 33.6%) due to errors in library functions or incorrect invocations of library functions.library functions or incorrect invocations of library functions.

– Need a more rigorous reasoning on library function specifications.Need a more rigorous reasoning on library function specifications. Library function specifications are currently ad-hoc. Many of Library function specifications are currently ad-hoc. Many of

them are specified after real attacks are discovered.them are specified after real attacks are discovered.– printf(fmt,…)printf(fmt,…): : fmtfmt cannot be a user-specified string cannot be a user-specified string– strcpy(d,s)strcpy(d,s): the length of string : the length of string ss should not exceed the size of buffer should not exceed the size of buffer

dd, and , and dd and and ss cannot be overlapped. cannot be overlapped.– d= savestr(s)d= savestr(s): do not free : do not free dd if this is not the first invocation of if this is not the first invocation of

savestr. savestr. – free(p)free(p): : pp must be a pointer obtained from a previous must be a pointer obtained from a previous mallocmalloc; ; pp

cannot be freed before.cannot be freed before.– glob(p)glob(p): p cannot be a string starting with ‘~’ and ending with ‘{’. : p cannot be a string starting with ‘~’ and ending with ‘{’.

What is a unified reason why these specifications are required?What is a unified reason why these specifications are required?– Answer: they are required to eliminate the possibility of pointer Answer: they are required to eliminate the possibility of pointer

taintedness.taintedness. Extraction of security specifications of a function is reduced to a Extraction of security specifications of a function is reduced to a

theorem proving problem: under which conditions can a function theorem proving problem: under which conditions can a function eliminate the possibility of pointer taintedness.eliminate the possibility of pointer taintedness.– I develop an equational logic based theorem proving approach to I develop an equational logic based theorem proving approach to

extract security specifications.extract security specifications.

Page 24: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

2424

Extracting Function Extracting Function Specifications by Theorem Specifications by Theorem ProverProver

C source code of a library function

formal semantic representation

Automatically translated to formal semantic representation

Theorem generation

Theorem proving

A set of sufficient conditions that imply the validity of the theorems. They are the security specifications of the analyzed function.

For each pointer dereference in an assignment, generate a theorem stating that the pointer is not tainted

Page 25: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

2525

Example: Example: vfprintf()vfprintf()

int vfprintf (FILE *s, const char *format, va_list ap){ char * p, *q; int done,data,n,state; char buf[10]; p=format; done=0; if (p==NULL) return 0; state=NO_PENDING; while (*p != 0) { if (state==NO_PENDING) {

if (*p=='%') state=PENDING; else outchar(s,*p); }

else { switch (*p) { case '%': outchar(s,'%')

break;case 'd': data=va_arg (ap, int);

if (data<0) { outchar(s,'-'); data=-data; }

n=0; while (data>0 && n<10) {

buf[n]=data%10+'0'; data/=10;

n++; } while (n>0) { n--; outchar(s,buf[n]); } break;case 's': q=va_arg (ap, char *); if (q==NULL) break; while (*q!=0) {

outchar(s,*q)q++; }

break;case 'n': q= va_arg(ap,void*) ;

*(int*) q = done;break;

default: outchar(s,*p)}state=NO_PENDING;

} p++; } return done; }

Theorem1: buf+n should not be a tainted value

Theorem2: q should not be a tainted value

Page 26: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

2626

Extracting the Specifications of Extracting the Specifications of vfprintf()vfprintf()

Try to prove the two theoremsTry to prove the two theorems Initially, the theorem prover cannot complete the Initially, the theorem prover cannot complete the

proof, because the theorems are only valid under proof, because the theorems are only valid under certain preconditions.certain preconditions.

Add these preconditions as axioms to the theorem Add these preconditions as axioms to the theorem prover.prover.

Repeat the above step until the theorems are Repeat the above step until the theorems are proved.proved.

Finally, the following four preconditions are Finally, the following four preconditions are added, which are the specifications of added, which are the specifications of vfprintf (FILE *s, const char *format, va_list ap)– apap never points to any location within the current never points to any location within the current

function frame.function frame.– *ap*ap never points to the location of variable ap, i.e., never points to the location of variable ap, i.e., *ap *ap

&ap&ap– Suppose the memory segment that Suppose the memory segment that apap sweeps over is sweeps over is

called called ap_activitiy_rangeap_activitiy_range, then , then *ap*ap never points to any never points to any location within location within ap_activitiy_rangeap_activitiy_range..

– No locations within No locations within ap_activitiy_rangeap_activitiy_range are tainted before are tainted before vfprintf()vfprintf() is called. is called.Suggest the scenario of format string vulnerability

Page 27: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

2727

Other Studied ExamplesOther Studied Examples Function Function strcpy()strcpy()

– Four security specifications indicating buffer overflow, Four security specifications indicating buffer overflow, buffer overlapping and buffer underflow scenarios buffer overlapping and buffer underflow scenarios causing pointer taintedness.causing pointer taintedness.

Function Function free()free() of a heap management system of a heap management system– Seven security specifications are extracted, including Seven security specifications are extracted, including

several specifications indicating several specifications indicating heap corruption vulnerabilities.vulnerabilities.

Socket read functions of Apache HTTPD and Socket read functions of Apache HTTPD and NULL HTTPDNULL HTTPD– The Apache function is proven to be free of pointer The Apache function is proven to be free of pointer

taintedness.taintedness.– Two (known) vulnerabilities are exposed in the Two (known) vulnerabilities are exposed in the

theorem proving process of NULL HTTPD function. theorem proving process of NULL HTTPD function.

Page 28: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

2828

Project BProject BRuntime Pointer Taintedness Runtime Pointer Taintedness Detection: Detection: To Defeat Memory Corruption AttacksTo Defeat Memory Corruption Attacks

Page 29: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

2929

Project OverviewProject Overview

We propose a processor architectural level We propose a processor architectural level mechanism to detect pointer taintednessmechanism to detect pointer taintedness– Implemented on SimpleScalar simulatorImplemented on SimpleScalar simulator– An extended memory system with taintedness bit An extended memory system with taintedness bit

attached to every byteattached to every byte– Enhanced load, store and ALU instructions to track Enhanced load, store and ALU instructions to track

taintedness bits in memorytaintedness bits in memory– Detecting security attacks when tainted data are Detecting security attacks when tainted data are

dereferenced.dereferenced. Evaluation Evaluation

– It detects both control hijacking and non-control-It detects both control hijacking and non-control-hijacking attacks against hijacking attacks against real-world software.real-world software.

– No known false positive: no alarm during normal No known false positive: no alarm during normal executions of network servers and SPEC executions of network servers and SPEC benchmarks. Fully compatible to existing benchmarks. Fully compatible to existing applications.applications.

– Transparent to applications. We can run Transparent to applications. We can run precompiled binaries on the architecture.precompiled binaries on the architecture.

– Some potential false negative scenarios. They are Some potential false negative scenarios. They are rare and not defeated by current generic detection rare and not defeated by current generic detection techniques either.techniques either.

Page 30: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

3030

ConclusionsConclusions

Page 31: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

3131

ConclusionsConclusions

Our analysis shows that real-world software can be Our analysis shows that real-world software can be compromised by corrupting non-control data. Non-compromised by corrupting non-control data. Non-control-hijacking attacks represent a realistic threat. control-hijacking attacks represent a realistic threat. – It is insufficient to rely on control flow integrity for It is insufficient to rely on control flow integrity for

software security.software security.

Pointer taintedness is a common characteristic of Pointer taintedness is a common characteristic of most memory corruption attacks, including control most memory corruption attacks, including control hijacking and non-control-hijacking attacks. hijacking and non-control-hijacking attacks.

A theorem proving based code analysis approach is A theorem proving based code analysis approach is designed to reason about possibilities of pointer designed to reason about possibilities of pointer taintedness.taintedness.– E.g., to formally extract security specifications of library E.g., to formally extract security specifications of library

functions.functions.

A runtime pointer taintedness detection mechanism is A runtime pointer taintedness detection mechanism is designed. It can effectively detect most memory designed. It can effectively detect most memory corruption attacks.corruption attacks.

Page 32: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

3232

Summary of My Research Summary of My Research MethodologyMethodology Analysis-centric approachAnalysis-centric approach

– Analyzed impact hardware faults on security Analyzed impact hardware faults on security (fault injection + stochastic modeling)(fault injection + stochastic modeling)

– Analyzed Bugtraq and CERT vulnerability Analyzed Bugtraq and CERT vulnerability databasesdatabases

– Analyzed application source code, attacks and Analyzed application source code, attacks and current defense techniquescurrent defense techniques

– Analysis results motivate Analysis results motivate To expose new security threatsTo expose new security threats Propose new defense techniquesPropose new defense techniques

I like doing analysis of real data and I like doing analysis of real data and incidentsincidents– Tedious? Sometimes, but it is a crucial step Tedious? Sometimes, but it is a crucial step

toward a lot of fun.toward a lot of fun.– Rewarding? Definitely. Analysis is especially Rewarding? Definitely. Analysis is especially

important for systems research.important for systems research.– Goal: strongly motivate research topics that solve Goal: strongly motivate research topics that solve

problems in the reality. problems in the reality.

Page 33: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

3333

Backup SlidesBackup Slides

Page 34: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

3434

Static and Dynamic ApproachesStatic and Dynamic Approaches

Static approaches (avoid producing memory Static approaches (avoid producing memory vulnerabilities in programs)vulnerabilities in programs)

Writing code with type safe languageWriting code with type safe language Compiler techniques to uncover memory vulnerabilitiesCompiler techniques to uncover memory vulnerabilities Compiler instruments source code according to program Compiler instruments source code according to program

annotations.annotations. Challenges: legacy code and low level code, Challenges: legacy code and low level code,

compatibility and performance.compatibility and performance. Fact: Memory vulnerabilities are still constantly Fact: Memory vulnerabilities are still constantly

discovered and exploited.discovered and exploited. Intrusion detection techniques (defeat attacks, Intrusion detection techniques (defeat attacks,

given the existence of vulnerabilities)given the existence of vulnerabilities)– Specialized techniques Specialized techniques

Defeat stack buffer overflow and format string attacks.Defeat stack buffer overflow and format string attacks.– Generic defense techniquesGeneric defense techniques

Most techniques are designed to defeat control-hijacking Most techniques are designed to defeat control-hijacking attacksattacks. Host intrusion detection system and control flow . Host intrusion detection system and control flow integrity protection techniques. very active research integrity protection techniques. very active research area.area.

Others have constraints and difficulties in their Others have constraints and difficulties in their deployments. (pointer encryption and address deployments. (pointer encryption and address randomization)randomization)

Page 35: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

3535

One-Slide Intro to Equational LogicOne-Slide Intro to Equational Logic

Use term rewriting to establish proofs of theorems.Use term rewriting to establish proofs of theorems. Natural number addition expressed in the Maude Natural number addition expressed in the Maude

system. system.

0 : Natural .s_ : Natural -> Natural ._+_ : Natural Natural -> Natural .

vars N M : Natural Axiom: N + 0 = N .Axiom: N + s M = s (N + M) .

(s s s 0) + (s s 0) = s ((s s s 0) + (s 0)) = s( s((s s s 0) + 0)) = s(s((s s s 0)) = s s s s s 0Intuitively, this is a proof of “3 + 2 = 5” in natural number algebra.

Page 36: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

3636

Axioms of Axioms of EvalEval and and ExpTExpT operationsoperationsEval(S, I) = I // I is an integer constantEval(S, ^ E1) = Ftch(S, Eval(S,E1))Eval(S, E1 + E2) = Eval(S, E1) + Eval(S, E2)Eval(S, E1 - E2) = Eval(S, E1) - Eval(S, E2) … …ExpT (S, I) = falseExpT(S, ^ E1) = LocT(S,Eval(S,E1)) ExpT(S,E1 + E2) = ExpT(S,E1) or ExpT(S,E2)ExpT(S,E1 - E2) = ExpT(S,E1) or ExpT(S,E2)… …

E.g., is the expression (^100)–2 tainted in store S?ExpT(S, (^100)–2) = ExpT(S, (^100)) or ExpT(S, 2) = LocT(S,100) or false = LocT(S,100)

Note: ^ is the dereference operator, ^100 gives the content in the location 100

Page 37: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

3737

Taintedness-Aware Memory Taintedness-Aware Memory ModelModel

• A store represents a snapshot of the memory state at a point in the program execution. • For each memory location, we can evaluate two properties: content and taintedness (true/false).• Operations on memory locations:

•The fetch operation Ftch(S,A) gives the content of the memory address A in store S•The location-taintedness operation LocT(S,A) gives the taintedness of the location A in store S

• Operations on expressions:•The evaluation operation Eval(S,E) evaluates expression E in store S•The expression-taintedness operation ExpT(S,E) computes the taintedness of expression E in store S.

Page 38: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

3838

Semantics of Language LSemantics of Language L The following instructions are defined:The following instructions are defined:

– mov [Exp1] <- Exp2mov [Exp1] <- Exp2– branch (Condition) Labelbranch (Condition) Label – call FuncName(Exp1,Exp2,…)call FuncName(Exp1,Exp2,…)

Axioms defining Axioms defining movmov instruction semantics instruction semantics– Specify the effects of applying Specify the effects of applying movmov instruction on a instruction on a

storestore– Allow taintedness to propagate from Exp2 to [Exp1].Allow taintedness to propagate from Exp2 to [Exp1].

Axioms defining the semantics of Axioms defining the semantics of recvrecv (similarly, (similarly, scanfscanf, , recvfrom: recvfrom: user input functions)user input functions)– Specify the memory locations tainted by the recv call.

Page 39: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

3939

Example: strcpy()Example: strcpy()

char * strcpy (char * dst, char * src) { char * res;0: res =dst; while (*src!=0) {1: *dst=*src; dst++; src++; }2: *dst=0; return res;}

0: mov [res] <- ^ dst

lbl(#while#6)

branch (^ ^ src is 0) #ex#while#6

1: mov [^ dst] <- ^ ^ src

mov [dst] <- (^ dst) + 1

mov [src] <- (^ src) + 1

branch true #while#6

lbl(#ex#while#6)

2: mov [^ dst] <- 0

mov [ret] <- ^ res

Translate to formal semantics

a) Suppose S1 is the store before Line L1, then LocT(S1,dst) = false b) If S0 is the store before Line L0, and S2 is the store after Line L1, then

I < Eval(S0, ^dst) or Eval(S0, ^dst+dstsize) I => LocT(S2,I) = LocT(S0, I)

c) Suppose S3 is the store before Line L2, then LocT(S3,dst) = false

Theorem generation

Theorem proving

Page 40: 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable.

4040

Specifications ExtractedSpecifications Extracted

Specifications that are Specifications that are extracted by the theorem extracted by the theorem proving approachproving approach– srclensrclen <= <= dstsizedstsize– The buffers The buffers srcsrc and and dstdst do not do not

overlap in such a way that the overlap in such a way that the buffer buffer dstdst covers the string covers the string terminator of the terminator of the srcsrc string. string.

– The buffers The buffers dstdst and and srcsrc do not do not cover the function frame of strcpy.cover the function frame of strcpy.

– Initially, Initially, dst dst is not taintedis not tainted

Documented in Linux man page

Not documented

Suppose when function strcpy() is called, the Suppose when function strcpy() is called, the sizesize of of destination buffer (dst) is destination buffer (dst) is dstsizedstsize, the , the lengthlength of user of user input string (src) is input string (src) is srclensrclen