1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen...

11

Enhancing Security of Real-Enhancing Security of Real-World Systems with a Better World Systems with a Better Understanding of the Threats Understanding of the Threats

Shuo ChenShuo ChenCandidate of Ph.D. in Computer ScienceCandidate of Ph.D. in Computer ScienceCenter for Reliable and High Performance Center for Reliable and High Performance ComputingComputingCoordinated Science LaboratoriesCoordinated Science LaboratoriesUniversity of Illinois at Urbana-ChampaignUniversity of Illinois at Urbana-Champaign

22

Security Threat Analysis and Mitigations in Security Threat Analysis and Mitigations in Real-World SystemsReal-World Systems– Investigate the impact of hardware memory errors on Investigate the impact of hardware memory errors on

the security of Internet servers and firewalls. the security of Internet servers and firewalls. Simulate random hardware memory errorsSimulate random hardware memory errors Stochastic model to estimate the probability of security Stochastic model to estimate the probability of security

violations.violations.– Analyze and model a wide spectrum of software security Analyze and model a wide spectrum of software security

vulnerabilities reported by CERT and Bugtraq. vulnerabilities reported by CERT and Bugtraq. Decompose each vulnerability to many primitive operations.Decompose each vulnerability to many primitive operations. Introduce formalism into reasoning and description of real Introduce formalism into reasoning and description of real

vulnerabilities.vulnerabilities. Interesting outcome: discovered a new security bug in an Interesting outcome: discovered a new security bug in an

HTTP server, now published in Bugtraq.HTTP server, now published in Bugtraq.

– Construct non-traditional methods to attack major Construct non-traditional methods to attack major Internet server programs without being detected by Internet server programs without being detected by most current defense techniques. This represents a new most current defense techniques. This represents a new challenge for defense research.challenge for defense research.

– Develop techniques to provide a better security Develop techniques to provide a better security protection for real-world systemsprotection for real-world systems

A theorem proving based code analysisA theorem proving based code analysis A processor architecture level runtime defenseA processor architecture level runtime defense

Focus of this talk

My DissertationMy Dissertation

Earlier work

33

PART I:PART I:

Analyzing and Identifying Analyzing and Identifying Security Threats on Real-World Security Threats on Real-World SoftwareSoftware

44

Significance of Memory Significance of Memory VulnerabilitiesVulnerabilities

CERT Advisories: CERT Advisories: 66% vulnerabilities are low 66% vulnerabilities are low level memory errors in software.level memory errors in software.

Widely exploited by attackers, worms and viruses.Widely exploited by attackers, worms and viruses.

Format String 7%

Globbing2%

Heap Corruption

8%

Integer Overflow

6%

Buffer Overflow

44%

Other33%

55

Widely Understood Threats of Memory Widely Understood Threats of Memory CorruptionsCorruptions

Once a memory error is found, it is Once a memory error is found, it is straightforward to take control of the straightforward to take control of the victim system by victim system by control-hijacking control-hijacking attacksattacks..– First, overwrite control data, such as First, overwrite control data, such as

return addresses, function pointers, GOT return addresses, function pointers, GOT entries or DTOR entries.entries or DTOR entries.

– Program control is hijacked to execute Program control is hijacked to execute code with malicious purposes.code with malicious purposes.

– The malicious code is able to make system The malicious code is able to make system calls with the privilege of the victim calls with the privilege of the victim process. Do real damages to the system.process. Do real damages to the system.

66

Current Techniques to Defeat Current Techniques to Defeat Memory Corruption Attacks Memory Corruption Attacks Control hijacking is the most dominant form of memory Control hijacking is the most dominant form of memory

corruption attacks (CERT and Microsoft Security Bulletin)corruption attacks (CERT and Microsoft Security Bulletin)

Accordingly, many current defense techniques are Accordingly, many current defense techniques are designed to enforce program control flow integrity in designed to enforce program control flow integrity in order to provide software security. This research area order to provide software security. This research area has been active for many years. has been active for many years.

A common justification: attacks not hijacking program A common justification: attacks not hijacking program control flow (i.e., non-control-hijacking attacks) are rare control flow (i.e., non-control-hijacking attacks) are rare against real-world software.against real-world software.

Important question: Important question: – How confident can we rely on this justification to build How confident can we rely on this justification to build

defenses?defenses?– Is it possible that people currently underestimate the real Is it possible that people currently underestimate the real

threats of memory corruption attacks?threats of memory corruption attacks?– Specifically, dominance of control-hijacking attacks Specifically, dominance of control-hijacking attacks

attackers’ attackers’ incapabilityincapability or or lack of incentivelack of incentive to mount non- to mount non-control-hijacking attacks?control-hijacking attacks?

77

Our Claim: General Applicability Our Claim: General Applicability of Non-control-hijacking Attacksof Non-control-hijacking Attacks

Our previous papers suggest an initial doubtOur previous papers suggest an initial doubt– Even random hardware memory errors can subvert the Even random hardware memory errors can subvert the

security of real-world systems with a non-negligible security of real-world systems with a non-negligible probability. None of the compromises is due to control probability. None of the compromises is due to control hijacking.hijacking.

– Software vulnerabilities are more deterministic and Software vulnerabilities are more deterministic and more amenable to attacks. Why attackers are more amenable to attacks. Why attackers are incapable to mount non-control-hijacking attacks incapable to mount non-control-hijacking attacks against real-world systems?against real-world systems?

We make a hypothetical claim:We make a hypothetical claim:– Many real-world software applications are susceptible Many real-world software applications are susceptible

to non-control-hijacking attacks;to non-control-hijacking attacks;– The severity of the attack consequences is equivalent The severity of the attack consequences is equivalent

to that due to control hijacking attacks. to that due to control hijacking attacks.

If the claim is indeed true, it represents a new If the claim is indeed true, it represents a new challenge to defense techniques.challenge to defense techniques.

88

Goal: Empirical Validation of the Goal: Empirical Validation of the ClaimClaim

Investigate many “representative Investigate many “representative software applications”. Try to break into software applications”. Try to break into them using non-control-hijacking attacks.them using non-control-hijacking attacks.

Choose representative software Choose representative software applicationsapplications– We did a quick survey on the recent four We did a quick survey on the recent four

years of CERT advisories. Over 1/3 years of CERT advisories. Over 1/3 vulnerabilities are in vulnerabilities are in FTPFTP, , SSHSSH, , TelnetTelnet and and HTTPHTTP servers. servers.

Construct non-control-hijacking attacks to Construct non-control-hijacking attacks to compromise these servers. Each attack compromise these servers. Each attack results in the root compromise of the results in the root compromise of the victim server.victim server.

99

Non-control-hijacking attack on Non-control-hijacking attack on WU-WU-FTPFTP Server (via a format string bug) Server (via a format string bug)

int x;FTP_service(...) { authenticate(); x = user ID of the authenticated user; seteuid(x); while (1) { get_FTP_command(...); if (a data command?) getdatasock(...); }}getdatasock( ... ) { seteuid(0); setsockopt( ... ); seteuid(x);}

x=109, run as EUID 0

x uninitialized, run as EUID 0

x=109, run as EUID 109. Lose the root privilege!

Get a special SITE EXEC command. Exploit a format string vulnerability.x= 0, still run as EUID 109.

x=0, run as EUID 0

x=0, run as EUID 0

When return to service loop, still runs as EUID 0 (root). Allow me to upload /etc/passwdI can grant myself the root privilege!

Only corrupt an integer, not control hijacking.

Get a data command (e.g., PUT)

1010

Non-control-hijacking attack on Non-control-hijacking attack on NULL-HTTPNULL-HTTP Server (via a heap overflow Server (via a heap overflow bug)bug)

Attack the configuration string of CGI-BIN Attack the configuration string of CGI-BIN path.path.

Mechanism of CGIMechanism of CGI– suppose server name = www.foo.comsuppose server name = www.foo.com

CGI-BIN = /usr/local/httpd/exe CGI-BIN = /usr/local/httpd/exe – Requested URL = http://www.foo.com/cgi-bin/barRequested URL = http://www.foo.com/cgi-bin/bar– The server executesThe server executes

Our attackOur attack– Exploit a heap overflow vulnerability to overwrite Exploit a heap overflow vulnerability to overwrite

CGI-BIN to /binCGI-BIN to /bin– Request URL http://www.foo.com/cgi-bin/shRequest URL http://www.foo.com/cgi-bin/sh– The server executes The server executes

The server gives me a root shell!Only overwrite four characters in the CGI-BIN string.Not control hijacking.

//usr/local/httpd/exeusr/local/httpd/exe//barbar

/bin/bin/sh/sh

1111

Non-control-hijacking attack on Non-control-hijacking attack on SSH SSH CommunicationsCommunications SSH Server (via an integer SSH Server (via an integer overflow bug)overflow bug)

void do_authentication(char *user, ...) { int auth = 0; ... while (!auth) { /* Get a packet from the client */ type = packet_read(); switch (type) { ... case SSH_CMSG_AUTH_PASSWORD: if (auth_password(user, password)) auth =1; case ... } if (auth) break; } /* Perform session preparation. */ do_authenticated(…);}

auth = 0

auth = 0

auth = 1

Password incorrect, but auth = 1

auth = 1

Logged in without correct password

1212

More non-control-hijacking More non-control-hijacking attacksattacks

Against Against NetKitNetKit Telnet server (default Telnet Telnet server (default Telnet server of server of Redhat LinuxRedhat Linux))– Exploit a heap overflow bugExploit a heap overflow bug– Overwrite two strings:Overwrite two strings:

/bin//bin/loginlogin –h –h foo.comfoo.com -p (normal scenario) -p (normal scenario) /bin//bin/shsh –h –h –p–p -p (attack scenario) -p (attack scenario)

– The server runs /bin/sh when it tries to The server runs /bin/sh when it tries to authenticate the user.authenticate the user.

Against Against GazTekGazTek HTTP server HTTP server– Exploit a stack buffer overflow bugExploit a stack buffer overflow bug

Send a legitimate URL http://www.foo.com/cgi-bin/barSend a legitimate URL http://www.foo.com/cgi-bin/bar The server checks that “/..” is not embedded in the URLThe server checks that “/..” is not embedded in the URL Exploit the bug to change the URL to Exploit the bug to change the URL to

http://www.foo.com/cgi-bin/http://www.foo.com/cgi-bin/../../../../bin/sh../../../../bin/sh The server executes /bin/shThe server executes /bin/sh

1313

Implications of Non-Control-Hijacking Implications of Non-Control-Hijacking AttacksAttacks

Control flow integrity is not a sufficiently Control flow integrity is not a sufficiently accurate approximation to software security.accurate approximation to software security.– Given a memory bug in a real software, attackers’ Given a memory bug in a real software, attackers’

behaviors can be very diversified.behaviors can be very diversified.

Although non-control-hijacking attacks are Although non-control-hijacking attacks are specific to application semantics, there are specific to application semantics, there are many types of non-control data critical to many types of non-control data critical to software securitysoftware security– E.g., user identity data, configuration data, user E.g., user identity data, configuration data, user

input data and decision-making Booleans.input data and decision-making Booleans.

Once attackers have the incentive, they are Once attackers have the incentive, they are likely to succeed in non-control-hijacking likely to succeed in non-control-hijacking attacks.attacks.

1414

Re-Examining Current Defense Re-Examining Current Defense TechniquesTechniques

They were mainly tested against control-hijacking attacks. They were mainly tested against control-hijacking attacks. Need to re-examine the effectiveness.Need to re-examine the effectiveness.– Many of them are based on control flow integrityMany of them are based on control flow integrity

Monitor system call sequence Monitor system call sequence Protect control dataProtect control data Non-executable stack and heapNon-executable stack and heap

– Pointer encryption (PointGuard)Pointer encryption (PointGuard) Need to encrypt pointers in libraries to be effective (challenging Need to encrypt pointers in libraries to be effective (challenging

because no enough type info, type casting very often, because no enough type info, type casting very often, performance).performance).

– Address space randomizationAddress space randomization Good idea. In each run of the program, memory layout is different.Good idea. In each run of the program, memory layout is different. Challenging to deploy on all program segments.Challenging to deploy on all program segments. Even every segment is randomized, a recent paper shows the Even every segment is randomized, a recent paper shows the

deployment on 32-bit address space doesn’t provide enough deployment on 32-bit address space doesn’t provide enough entropy.entropy.

– StackGuard, Libsafe and FormatGuardStackGuard, Libsafe and FormatGuard They are specific to defeat stack smashing attacks and format They are specific to defeat stack smashing attacks and format

string attacks. Not generic solutions.string attacks. Not generic solutions. Building a generic and secure defense technique to defeat Building a generic and secure defense technique to defeat

memory corruption attacks is still an open problem. memory corruption attacks is still an open problem. Future defense research should consider non-control-Future defense research should consider non-control-

hijacking attacks more seriously.hijacking attacks more seriously.

1515

PART II:PART II:

Pointer TaintednessPointer Taintedness Detection: Detection: Towards a Better Security Towards a Better Security Protection for Real-World Protection for Real-World SystemsSystems

1616

Pointer TaintednessPointer Taintedness

Pointer Taintedness: a pointer value, : a pointer value, including a return address, is derived including a return address, is derived from user input. from user input.

Most memory corruption attacks are due Most memory corruption attacks are due to pointer taintedness. to pointer taintedness. – It allows attackers to specify the memory It allows attackers to specify the memory

locations to read, write or transfer control to. locations to read, write or transfer control to. Usually a pathological program behaviorUsually a pathological program behavior..

Pointer taintedness provides a unifying Pointer taintedness provides a unifying perspective for reasoning about a perspective for reasoning about a significant number of security significant number of security vulnerabilities.vulnerabilities.

1717

Most Memory Corruption Attacks are Most Memory Corruption Attacks are Due to Pointer TaintednessDue to Pointer Taintedness

Format string attack Format string attack – Taint an argument pointer of functions such Taint an argument pointer of functions such

as as printf, fprintf, sprintf printf, fprintf, sprintf andand syslog. syslog. Stack buffer overflow (stack smashing)Stack buffer overflow (stack smashing)

– Taint a function frame pointer or a return Taint a function frame pointer or a return address.address.

Heap corruption Heap corruption – Taint the free-chunk doubly-linked list of the Taint the free-chunk doubly-linked list of the

heap.heap. Glibc Glibc globbingglobbing attack attack

– User input resides in a location that is used as User input resides in a location that is used as a pointer by the parent function of a pointer by the parent function of glob().glob().

1818

Stack Buffer Overflow Stack Buffer Overflow

Vulnerable code: char buf[100]; strcpy(buf,user_input);

Return addrReturn addr

Frame pointerFrame pointer

buf[99]buf[99]

……

buf[1]buf[1]

buf[0]buf[0]

High

Low

Sta

ck g

row

th

buf

user_input

Frame pointer or return address can be tainted.

1919

ap: argument pointer

fmt: format string pointer

Format String AttackFormat String Attack

In vfprintf(), if (fmt points to “%n”) then **ap = (character count)

Vulnerable code: recv(buf); printf(buf); /* should be printf(“%s”,buf) */

\xdd \xcc \xbb \xaa %d %d %d %n

……

%n%n

%d%d

%d%d

%d%d

0xaabbccdd0xaabbccdd

fmt: format string pointer

ap: argument pointer

High

Low

Sta

ck g

row

th

*ap is a tainted value.

2020

Heap Corruption AttackHeap Corruption Attack

Free chunk A

Free chunk Bfd=Abk=C

Allocated buffer buf

Free chunk C

user

inpu

t

Vulnerable code:buf = malloc(1000);recv(sock,buf,1024);free(buf);

In free():B->fd->bk=B->bk; B->bk->fd=B->fd;

When B->fd and B->bk are tainted, the effect of free() is to write a user specified value to a user specified address.

2121

Building Defense Techniques Building Defense Techniques based on Pointer Taintednessbased on Pointer Taintedness

Static code analysis: analyze the source Static code analysis: analyze the source code to extract the conditions under code to extract the conditions under which the possibility of pointer which the possibility of pointer taintedness exists.taintedness exists.– To uncover potential vulnerabilitiesTo uncover potential vulnerabilities

Runtime detection: monitor at runtime Runtime detection: monitor at runtime whether a tainted value is dereferenced whether a tainted value is dereferenced as a pointer.as a pointer.– To defeat memory corruption attacks (both To defeat memory corruption attacks (both

control-hijacking and non-control-hijacking control-hijacking and non-control-hijacking attacks)attacks)

2222

Project AProject AFormal Reasoning about Pointer Formal Reasoning about Pointer Taintedness: Taintedness: To Extract Security Specifications of Library To Extract Security Specifications of Library FunctionsFunctions

2323

Project OverviewProject Overview Our analysis on CERT advisories showsOur analysis on CERT advisories shows

– A significant portion of vulnerabilities (A significant portion of vulnerabilities ( 33.6%) due to errors in 33.6%) due to errors in library functions or incorrect invocations of library functions.library functions or incorrect invocations of library functions.

– Need a more rigorous reasoning on library function specifications.Need a more rigorous reasoning on library function specifications. Library function specifications are currently ad-hoc. Many of Library function specifications are currently ad-hoc. Many of

them are specified after real attacks are discovered.them are specified after real attacks are discovered.– printf(fmt,…)printf(fmt,…): : fmtfmt cannot be a user-specified string cannot be a user-specified string– strcpy(d,s)strcpy(d,s): the length of string : the length of string ss should not exceed the size of buffer should not exceed the size of buffer

dd, and , and dd and and ss cannot be overlapped. cannot be overlapped.– d= savestr(s)d= savestr(s): do not free : do not free dd if this is not the first invocation of if this is not the first invocation of

savestr. savestr. – free(p)free(p): : pp must be a pointer obtained from a previous must be a pointer obtained from a previous mallocmalloc; ; pp

cannot be freed before.cannot be freed before.– glob(p)glob(p): p cannot be a string starting with ‘~’ and ending with ‘{’. : p cannot be a string starting with ‘~’ and ending with ‘{’.

What is a unified reason why these specifications are required?What is a unified reason why these specifications are required?– Answer: they are required to eliminate the possibility of pointer Answer: they are required to eliminate the possibility of pointer

taintedness.taintedness. Extraction of security specifications of a function is reduced to a Extraction of security specifications of a function is reduced to a

theorem proving problem: under which conditions can a function theorem proving problem: under which conditions can a function eliminate the possibility of pointer taintedness.eliminate the possibility of pointer taintedness.– I develop an equational logic based theorem proving approach to I develop an equational logic based theorem proving approach to

extract security specifications.extract security specifications.

2424

Extracting Function Extracting Function Specifications by Theorem Specifications by Theorem ProverProver

C source code of a library function

formal semantic representation

Automatically translated to formal semantic representation

Theorem generation

Theorem proving

A set of sufficient conditions that imply the validity of the theorems. They are the security specifications of the analyzed function.

For each pointer dereference in an assignment, generate a theorem stating that the pointer is not tainted

2525

Example: Example: vfprintf()vfprintf()

int vfprintf (FILE *s, const char *format, va_list ap){ char * p, *q; int done,data,n,state; char buf[10]; p=format; done=0; if (p==NULL) return 0; state=NO_PENDING; while (*p != 0) { if (state==NO_PENDING) {

if (*p=='%') state=PENDING; else outchar(s,*p); }

else { switch (*p) { case '%': outchar(s,'%')

break;case 'd': data=va_arg (ap, int);

if (data<0) { outchar(s,'-'); data=-data; }

n=0; while (data>0 && n<10) {

buf[n]=data%10+'0'; data/=10;

n++; } while (n>0) { n--; outchar(s,buf[n]); } break;case 's': q=va_arg (ap, char *); if (q==NULL) break; while (*q!=0) {

outchar(s,*q)q++; }

break;case 'n': q= va_arg(ap,void*) ;

*(int*) q = done;break;

default: outchar(s,*p)}state=NO_PENDING;

} p++; } return done; }

Theorem1: buf+n should not be a tainted value

Theorem2: q should not be a tainted value

2626

Extracting the Specifications of Extracting the Specifications of vfprintf()vfprintf()

Try to prove the two theoremsTry to prove the two theorems Initially, the theorem prover cannot complete the Initially, the theorem prover cannot complete the

proof, because the theorems are only valid under proof, because the theorems are only valid under certain preconditions.certain preconditions.

Add these preconditions as axioms to the theorem Add these preconditions as axioms to the theorem prover.prover.

Repeat the above step until the theorems are Repeat the above step until the theorems are proved.proved.

Finally, the following four preconditions are Finally, the following four preconditions are added, which are the specifications of added, which are the specifications of vfprintf (FILE *s, const char *format, va_list ap)– apap never points to any location within the current never points to any location within the current

function frame.function frame.– *ap*ap never points to the location of variable ap, i.e., never points to the location of variable ap, i.e., *ap *ap

&ap&ap– Suppose the memory segment that Suppose the memory segment that apap sweeps over is sweeps over is

called called ap_activitiy_rangeap_activitiy_range, then , then *ap*ap never points to any never points to any location within location within ap_activitiy_rangeap_activitiy_range..

– No locations within No locations within ap_activitiy_rangeap_activitiy_range are tainted before are tainted before vfprintf()vfprintf() is called. is called.Suggest the scenario of format string vulnerability

2727

Other Studied ExamplesOther Studied Examples Function Function strcpy()strcpy()

– Four security specifications indicating buffer overflow, Four security specifications indicating buffer overflow, buffer overlapping and buffer underflow scenarios buffer overlapping and buffer underflow scenarios causing pointer taintedness.causing pointer taintedness.

Function Function free()free() of a heap management system of a heap management system– Seven security specifications are extracted, including Seven security specifications are extracted, including

several specifications indicating several specifications indicating heap corruption vulnerabilities.vulnerabilities.

Socket read functions of Apache HTTPD and Socket read functions of Apache HTTPD and NULL HTTPDNULL HTTPD– The Apache function is proven to be free of pointer The Apache function is proven to be free of pointer

taintedness.taintedness.– Two (known) vulnerabilities are exposed in the Two (known) vulnerabilities are exposed in the

theorem proving process of NULL HTTPD function. theorem proving process of NULL HTTPD function.

2828

Project BProject BRuntime Pointer Taintedness Runtime Pointer Taintedness Detection: Detection: To Defeat Memory Corruption AttacksTo Defeat Memory Corruption Attacks

2929

Project OverviewProject Overview

We propose a processor architectural level We propose a processor architectural level mechanism to detect pointer taintednessmechanism to detect pointer taintedness– Implemented on SimpleScalar simulatorImplemented on SimpleScalar simulator– An extended memory system with taintedness bit An extended memory system with taintedness bit

attached to every byteattached to every byte– Enhanced load, store and ALU instructions to track Enhanced load, store and ALU instructions to track

taintedness bits in memorytaintedness bits in memory– Detecting security attacks when tainted data are Detecting security attacks when tainted data are

dereferenced.dereferenced. Evaluation Evaluation

– It detects both control hijacking and non-control-It detects both control hijacking and non-control-hijacking attacks against hijacking attacks against real-world software.real-world software.

– No known false positive: no alarm during normal No known false positive: no alarm during normal executions of network servers and SPEC executions of network servers and SPEC benchmarks. Fully compatible to existing benchmarks. Fully compatible to existing applications.applications.

– Transparent to applications. We can run Transparent to applications. We can run precompiled binaries on the architecture.precompiled binaries on the architecture.

– Some potential false negative scenarios. They are Some potential false negative scenarios. They are rare and not defeated by current generic detection rare and not defeated by current generic detection techniques either.techniques either.

3030

ConclusionsConclusions

3131

ConclusionsConclusions

Our analysis shows that real-world software can be Our analysis shows that real-world software can be compromised by corrupting non-control data. Non-compromised by corrupting non-control data. Non-control-hijacking attacks represent a realistic threat. control-hijacking attacks represent a realistic threat. – It is insufficient to rely on control flow integrity for It is insufficient to rely on control flow integrity for

software security.software security.

Pointer taintedness is a common characteristic of Pointer taintedness is a common characteristic of most memory corruption attacks, including control most memory corruption attacks, including control hijacking and non-control-hijacking attacks. hijacking and non-control-hijacking attacks.

A theorem proving based code analysis approach is A theorem proving based code analysis approach is designed to reason about possibilities of pointer designed to reason about possibilities of pointer taintedness.taintedness.– E.g., to formally extract security specifications of library E.g., to formally extract security specifications of library

functions.functions.

A runtime pointer taintedness detection mechanism is A runtime pointer taintedness detection mechanism is designed. It can effectively detect most memory designed. It can effectively detect most memory corruption attacks.corruption attacks.

3232

Summary of My Research Summary of My Research MethodologyMethodology Analysis-centric approachAnalysis-centric approach

– Analyzed impact hardware faults on security Analyzed impact hardware faults on security (fault injection + stochastic modeling)(fault injection + stochastic modeling)

– Analyzed Bugtraq and CERT vulnerability Analyzed Bugtraq and CERT vulnerability databasesdatabases

– Analyzed application source code, attacks and Analyzed application source code, attacks and current defense techniquescurrent defense techniques

– Analysis results motivate Analysis results motivate To expose new security threatsTo expose new security threats Propose new defense techniquesPropose new defense techniques

I like doing analysis of real data and I like doing analysis of real data and incidentsincidents– Tedious? Sometimes, but it is a crucial step Tedious? Sometimes, but it is a crucial step

toward a lot of fun.toward a lot of fun.– Rewarding? Definitely. Analysis is especially Rewarding? Definitely. Analysis is especially

important for systems research.important for systems research.– Goal: strongly motivate research topics that solve Goal: strongly motivate research topics that solve

problems in the reality. problems in the reality.

3333

Backup SlidesBackup Slides

3434

Static and Dynamic ApproachesStatic and Dynamic Approaches

Static approaches (avoid producing memory Static approaches (avoid producing memory vulnerabilities in programs)vulnerabilities in programs)

Writing code with type safe languageWriting code with type safe language Compiler techniques to uncover memory vulnerabilitiesCompiler techniques to uncover memory vulnerabilities Compiler instruments source code according to program Compiler instruments source code according to program

annotations.annotations. Challenges: legacy code and low level code, Challenges: legacy code and low level code,

compatibility and performance.compatibility and performance. Fact: Memory vulnerabilities are still constantly Fact: Memory vulnerabilities are still constantly

discovered and exploited.discovered and exploited. Intrusion detection techniques (defeat attacks, Intrusion detection techniques (defeat attacks,

given the existence of vulnerabilities)given the existence of vulnerabilities)– Specialized techniques Specialized techniques

Defeat stack buffer overflow and format string attacks.Defeat stack buffer overflow and format string attacks.– Generic defense techniquesGeneric defense techniques

Most techniques are designed to defeat control-hijacking Most techniques are designed to defeat control-hijacking attacksattacks. Host intrusion detection system and control flow . Host intrusion detection system and control flow integrity protection techniques. very active research integrity protection techniques. very active research area.area.

Others have constraints and difficulties in their Others have constraints and difficulties in their deployments. (pointer encryption and address deployments. (pointer encryption and address randomization)randomization)

3535

One-Slide Intro to Equational LogicOne-Slide Intro to Equational Logic

Use term rewriting to establish proofs of theorems.Use term rewriting to establish proofs of theorems. Natural number addition expressed in the Maude Natural number addition expressed in the Maude

system. system.

0 : Natural .s_ : Natural -> Natural ._+_ : Natural Natural -> Natural .

vars N M : Natural Axiom: N + 0 = N .Axiom: N + s M = s (N + M) .

(s s s 0) + (s s 0) = s ((s s s 0) + (s 0)) = s( s((s s s 0) + 0)) = s(s((s s s 0)) = s s s s s 0Intuitively, this is a proof of “3 + 2 = 5” in natural number algebra.

3636

Axioms of Axioms of EvalEval and and ExpTExpT operationsoperationsEval(S, I) = I // I is an integer constantEval(S, ^ E1) = Ftch(S, Eval(S,E1))Eval(S, E1 + E2) = Eval(S, E1) + Eval(S, E2)Eval(S, E1 - E2) = Eval(S, E1) - Eval(S, E2) … …ExpT (S, I) = falseExpT(S, ^ E1) = LocT(S,Eval(S,E1)) ExpT(S,E1 + E2) = ExpT(S,E1) or ExpT(S,E2)ExpT(S,E1 - E2) = ExpT(S,E1) or ExpT(S,E2)… …

E.g., is the expression (^100)–2 tainted in store S?ExpT(S, (^100)–2) = ExpT(S, (^100)) or ExpT(S, 2) = LocT(S,100) or false = LocT(S,100)

Note: ^ is the dereference operator, ^100 gives the content in the location 100

3737

Taintedness-Aware Memory Taintedness-Aware Memory ModelModel

• A store represents a snapshot of the memory state at a point in the program execution. • For each memory location, we can evaluate two properties: content and taintedness (true/false).• Operations on memory locations:

•The fetch operation Ftch(S,A) gives the content of the memory address A in store S•The location-taintedness operation LocT(S,A) gives the taintedness of the location A in store S

• Operations on expressions:•The evaluation operation Eval(S,E) evaluates expression E in store S•The expression-taintedness operation ExpT(S,E) computes the taintedness of expression E in store S.

3838

Semantics of Language LSemantics of Language L The following instructions are defined:The following instructions are defined:

– mov [Exp1] <- Exp2mov [Exp1] <- Exp2– branch (Condition) Labelbranch (Condition) Label – call FuncName(Exp1,Exp2,…)call FuncName(Exp1,Exp2,…)

Axioms defining Axioms defining movmov instruction semantics instruction semantics– Specify the effects of applying Specify the effects of applying movmov instruction on a instruction on a

storestore– Allow taintedness to propagate from Exp2 to [Exp1].Allow taintedness to propagate from Exp2 to [Exp1].

Axioms defining the semantics of Axioms defining the semantics of recvrecv (similarly, (similarly, scanfscanf, , recvfrom: recvfrom: user input functions)user input functions)– Specify the memory locations tainted by the recv call.

3939

Example: strcpy()Example: strcpy()

char * strcpy (char * dst, char * src) { char * res;0: res =dst; while (*src!=0) {1: *dst=*src; dst++; src++; }2: *dst=0; return res;}

0: mov [res] <- ^ dst

lbl(#while#6)

branch (^ ^ src is 0) #ex#while#6

1: mov [^ dst] <- ^ ^ src

mov [dst] <- (^ dst) + 1

mov [src] <- (^ src) + 1

branch true #while#6

lbl(#ex#while#6)

2: mov [^ dst] <- 0

mov [ret] <- ^ res

Translate to formal semantics

a) Suppose S1 is the store before Line L1, then LocT(S1,dst) = false b) If S0 is the store before Line L0, and S2 is the store after Line L1, then

I < Eval(S0, ^dst) or Eval(S0, ^dst+dstsize) I => LocT(S2,I) = LocT(S0, I)

c) Suppose S3 is the store before Line L2, then LocT(S3,dst) = false

Theorem generation

Theorem proving

4040

Specifications ExtractedSpecifications Extracted

Specifications that are Specifications that are extracted by the theorem extracted by the theorem proving approachproving approach– srclensrclen <= <= dstsizedstsize– The buffers The buffers srcsrc and and dstdst do not do not

overlap in such a way that the overlap in such a way that the buffer buffer dstdst covers the string covers the string terminator of the terminator of the srcsrc string. string.

– The buffers The buffers dstdst and and srcsrc do not do not cover the function frame of strcpy.cover the function frame of strcpy.

– Initially, Initially, dst dst is not taintedis not tainted

Documented in Linux man page

Not documented

Suppose when function strcpy() is called, the Suppose when function strcpy() is called, the sizesize of of destination buffer (dst) is destination buffer (dst) is dstsizedstsize, the , the lengthlength of user of user input string (src) is input string (src) is srclensrclen

1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen...

Documents

Transcript of 1 Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen...