Finding Bugs in Dynamic Web Applications Shay Artzi, Adam Kiezun, Julian Dolby, Frank Tip, Danny...
-
Upload
randolf-tyler -
Category
Documents
-
view
214 -
download
0
Transcript of Finding Bugs in Dynamic Web Applications Shay Artzi, Adam Kiezun, Julian Dolby, Frank Tip, Danny...
Finding Bugs in Dynamic Web Applications
Shay Artzi, Adam Kiezun, Julian Dolby, Frank Tip, Danny Dig, Amit
Paradkar, Michael D. Earnst
Proceeding: ISSTA '08 (International Symposium on Software Testing and Analysis )
Dynamic Web Application
• Generates pages (HTML contents) on-the-fly• Content varies on user and user-specified
criteria• Obtained by server-side programming
• We can say that all big, known web applications are Dynamic Web Application
Source: Dynamic Web Application Development using PHP and MySQL – By Simon Stobart and David Parsons
Web Threats
• Web script crashes and malformed dynamically-generated Web pages impact usability of Web applications
• Current tools for Web-page validation cannot handle the dynamically-generated pages
Web Script Crash
• Missing included file• Call to undefined method• Wrong Database query• Uncaught exceptions
Malformed HTML
• HTML that does not conform to the WDG (Web Design Group) or W3C’s (World Wide Web Consortium) standard – Not using defined tags by W3C (e.g.
<html><table><div>..etc.)– Not maintaining the structure(e.g.
<html><header></header><body> .. </body></html>)– Not using proper opening and matching closing tag– etc.
• Web Scripting language can generate HTML
The Problem
• Bad scripts creating syntactically-malformed HTML– Partially displayable or Non-displayable HTML– Browser’s attempt to correct crashes– Slower HTML rendering– Discard important information– Trouble indexing correct pages for search engines
• Example
More Problems
• Dynamic web page testing challenges– HTML validation tools only perform testing of
static page– Can not fully capture behavior since not all of
functionality of code is found in the HTML result– No automatic validator for scripting languages
that dynamically generate HTML pages– HTML Kit validates every generated page but requires manual
generation of inputs that lead to displaying pages
What this paper presents…
• Presents automated technique for finding faults manifested as Web script crashes or malformed-HTML – extends dynamic test generation to scripting languages.
• Identifies minimal part of input responsible for triggering failures
• Uses an oracle to determine well-formed HTML• Creates a tool, Apollo that implements all these
in the context of PHP
Why ?
• Widely used in Web development– Network interactions– Database– HTTP processing
• Object oriented• Scripting • 21 millions domains1 (75%) are powered
including large websites like Wikipedia, WordPress, Facebook, Dig etc.
1Source Netcraft, April 2007
Example: program
• SchoolMate.php– Allows school administrators to manage classes
and users, teachers to manage assignments and grades and students to access their information
• Typical URL:schoolmate.php?
page=1&page2=100&login=1&username=user&password=password
‘printReportCards.php’ missing
‘printReportCards.php’ missing
make_footer() not executed in certain situations unclosed HTML tag
make_footer() not executed in certain situations unclosed HTML tag
Generates illegal <j2> tagGenerates illegal <j2> tag
Failures in PHP programs
• Targets two types of failures– Execution failures
• Web Script Crashes
– HTML failures• Malformed HTML
Failure-Finding in PHP Applications
• Concolic Testing – Dynamic Test Generation TechniqueExecute application on 1. Initially on empty input2. Then on additional inputs, obtained by solving constraints
that are derived from control flow paths• Extensions
– Validate to correctness of program output by using oracle– Use isset, isempty, require, etc. to require generation of
constraints absent in other OOPL’s– Use pre-specified set of values for database authentication– Simulate each user input by transforming source code
Transformation of Code
• Interactive HTML pages with buttons and menus
• For each page (h) that contains N buttons– Add additional input parameter p to PHP program
• Values range from 1 to N
– Switch statement inserted including appropriate PHP source file, depending on p
An example
<?/* Simulated User Input*/Switch ($_GET[“_btn”] {Case 1:
require_once(“mainmenu.php”);break;
Case 2:require_once (“newuser.php”);break;
}?>
<?phpecho “<h2>Webchess “.$Version.” login”</h2>;?><form method = “post” action = “mainmenu.php”><p>Nick: <input name=“txtNick” type=“text” size=“15” /><br />Password: <input name=“pwdPassword” type=“password” size =“15” /></p><p><input name=“login” value=“login” type=“submit” /><input name=“newAccount” value=“New Account” type=“button” onClick =“window.open(‘newuser.php’, ‘_self’)” /></p></form>
The Failure Detection Algorithm• parameters: Program P, oracle O• result : Bug reports B;• B : setOf (<failure, setOf (pathConstraint), setOf (input)>)1. P ′ ≔ simulateUserInput(P);2. B empty;≔3. pcQueue emptyQueue();≔4. enqueue(pcQueue, emptyPathConstraint());5. while not empty(pcQueue) and not timeExpired() do6. pathConstraint dequeue(pcQueue);≔7. input solve(pathConstraint);≔8. if input not equals to then⊥9. output executeConcrete(P≔ ′, input);10. failures getFailures(O, output);≔11. foreach f in failures do12. merge <f , pathConstraint, input>into B;13. c1 . . . cn executeSymbolic(P∧ ∧ ≔ ′, input);14. foreach i = 1,. . . ,n do15. newPC c1 . . . ci−1 ≔ ∧ ∧ ∧ ¬ ci;16. queue(pcQueue, newPC);17. return B;
Example: Execution 1 (Expose Third Fault)
true – sets page = 0
false
GoTo(20)
Execution
HTML validation tool determines output is legal• NotSet(page) ∧ page2 ≠ 1337 ∧ login ≠ 1HTML validation tool determines output is legal• NotSet(page) ∧ page2 ≠ 1337 ∧ login ≠ 1
parameters: Program P, oracle Oresult : Bug reports B;B : setOf (<failure, setOf (pathConstraint), setOf (input)>)1.P ′ ≔ simulateUserInput(P);2.B empty;≔3.pcQueue emptyQueue();≔4.enqueue(pcQueue, emptyPathConstraint());5.while not empty(pcQueue) and not timeExpired() do6. pathConstraint dequeue(pcQueue);≔7. input solve(pathConstraint);≔8. if input not equals to then⊥9. output executeConcrete(P≔ ′, input);10. failures getFailures(O, output);≔11. foreach f in failures do12. merge <f , pathConstraint, input>into B;13. c1 . . . cn executeSymbolic(P∧ ∧ ≔ ′, input);14. foreach i = 1,. . . ,n do15. newPC c1 . . . ci−1 ≔ ∧ ∧ ∧ ¬ ci;16. queue(pcQueue, newPC);17.return B;
NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1NotSet(page) ∧ page2 = 1337Set(page)
NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1NotSet(page) ∧ page2 = 1337Set(page)
Example: Execution 2 (The Opposite Path)
• NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1– Constraint solver may get page2 0; login 1
true
true
HTML validation tool discovers failure and generates bug report added to output set
of bug reports
Minimization on Path Constraints
• Find shorter path constraint for a given bug report
• Eliminates irrelevant constraints – better assist programmer to detect location of the fault
• Solution for a shorter path constraint is often a smaller input
• Does not guarantee returned path constraint is shortest that exposes failure
Minimization Example
• HTML malformation from previous example could have been reached from different execution paths
NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1
Set(page) ∧ page = 0 ∧ page2 ≠ 1337 ∧ login = 1Set(page) ∧ page = 0 ∧ page2 ≠ 1337 ∧ login = 1
page2 ≠ 1337 ∧ login = 1page2 ≠ 1337 ∧ login = 1
page2 ≠ 1337page2 ≠ 1337 login = 1 (login 1)login = 1 (login 1)
• parameters: Program P, oracle O, bug report b• result : Short path constraint that exposes b.failure1. c1 . . . cn intersect(b.pathConstraints);∧ ∧ ≔2. pc true;≔3. foreach i = 1, . . . , n do4. pci c1 . . . ci−1 ci+1 . . . cn;≔ ∧ ∧ ∧5. input solve(pci);≔6. if input not equals then⊥7. output executeConcrete(P, input);≔8. failures getFailures(O, output);≔9. if b.failure not belongs to failures then10. pc pc ci;≔ ∧11. input pc solve(pc);≔
12. if input pc not equals to then⊥
13. outputpc executeConcrete(P, input≔ pc );
14. failurespc getFailures(O, output≔ pc );
15. if b.failure failures∈ pc then
16. return pc;17. return shortest(b.pathConstraints);
Path Constraint Minimization Algorithm
Apollo
• User Input Simulator• Executor• Bug Finder
– Oracle– Bug Report Repository– Input minimizer
• Input Generator– Symbolic Finder– Constraint Solver– Value Generator
Executor: Shadow Interpreter
• Shadow Interpreter – Modified Zend PHP interpreter 5.2.2 to record
path constraints and information associated with output
– Performs symbolic execution along with concrete execution
– Records conditions for PHP-specific comparison operations such as isset and empty
Executor: Database Manager
• Database Manager– (Re) initializes DB used by a PHP application.
Restores DB before each execution– Supply additional information about
username/password pairs
Bug Finder
• Bug Report = Failure + Path constraint + Input inducing failure
• Failure = Type of Failure + Corresponding Message + PHP statement generating bad HTML
• Oracle – HTML validation tool (WDG and WC3)• Input Minimizer – uses the path constraints
minimization algorithm
Input Generator
• Symbolic Driver – generates new path constraints and select next path constraint
• Constraint Solver – computes an assignment of values to input parameters that satisfies a given path constraint.– Choco constraint solver
• Value Generator – generates value for parameters– Combines random value generation and constant
values mined from source code
Experimentation
Program #files LOC PHP LOC # DL’s
faqforge 19 1712 734 14164
webchess 24 4718 2226 32352
schoolmate 63 8181 4263 4466
phpsysinfo 73 16634 7745 492217
total 179 31245 14968 543199
faqforge = Tool for creating and managing documentswebchess = Online chess gameschoolmate = PHP/MySQL solution for administering schoolsphpsysinfo = Displays system info
Generation Strategies
• Compared to two other approaches– Halfond and Orso (Randomized)
• Random values to the parameters• Proposed for JavaScript
– Minamide’s static analysis• Approximates the string output of program with a
context-free grammar• Discovers malformed HTML faults
• Apollo’s test input generation previously discussed
Methodology
• 10-minute runs on each program– Generation of hundreds of inputs
• Ran on both Apollo and Random test input generation strategies
• WDG offline HTML validation tool
Results Classification
• Execution crash: PHP interpreter terminates with exception
• Execution error: PHP interpreter emits warning visible in generated HTML
• Execution warning: PHP interpreter emits warning invisible to HTML output
• HTML error: program generates HTML for which validation tool produces error report
• HTML warning: program generates HTML for which validation produces a warning report
Randomized
Results Analysis
Apollo
Average line coverage – 58.0%Faults Found on Subject Apps – 214
Average line coverage – 15.0%Faults Found on Subject Apps – 59
Tries to load two missing files
Database related
Unset Time-zone
Resulted in Malformed HTML
Line Coverage = Number of executed lines / Total lines with executable PHP code in application
Results Analysis
• Apollo Vs Randomized– 58% line coverage Vs 15.2% line coverage– 214 faults Vs 59 faults
• Apollo Vs Minamide’s tool– 2.7 more HTML validation faults (120 Vs 45)– 83 additional execution faults– 104 faults (10 minutes) Vs 14 faults (126 minutes)
• Apollo is more effective and efficient than both
Results Analysis: Path Constraint Minimization
Program Success rate % Path Constraints Inputs
Orig. Size Reduction Orig. Size Reduction
faqforge 64 22.3 0.22 9.3 0.31
webchess 91 23.4 0.19 10.9 0.40
schoolmate 51 22.9 0.38 11.5 0.58
phpsysinfo 82 24.3 0.18 17.5 0.26
Reduces size of inputs by up to factor of 0.18 for more than 50% of faults
Reduces size of inputs by up to factor of 0.18 for more than 50% of faults
Success rate – Percentage of faults whose exposing input was minimizedOrig. size – Average size of original path constraints (# of conjuncts) and inputs (# of key-value pairs)Reduction columns – Ratio of minimized to un-minimized size. The lower the ratio, the more successful the minimization
Limitations• Simulating user inputs statically• JavaScript code in the generated HTML not
tracked• Limited line coverage for native C methods• Limited sources of input parameters
– Only inputs from global arrays (_POST, _GET and _REQUEST)