Rapid Apperceptioninsurehub.org/sites/default/files/reports/FinalReport...Rapid Apperception Machine...

Rapid Apperception

Machine Assisted Semantic Understanding of Code

Francis V. Adkins Luke T. Jones

[email protected] [email protected]

Northeastern University

Project Report

prepared for

CSfC INSuRE

24 April, 2015

ABSTRACT

One of the greatest challenges facing source code auditors today is the sheer effort required to understand an unknown code base. However, despite significant academic research and multiple commercial products, relatively little work has been done to assess the usability of code understanding tools. In this research, we return to the basics and address the fundamental question of whether such tools truly can help an auditor. To assess this, we construct a simple greplike utility to cluster key words under a given semantic tag. We then use this tool to dissect several vulnerable specimen and compare auditor performance both with and without the tool. In this paper, we highlight our findings and discuss the differences in methodology both using and not using such tools. Ultimately, we conclude that, despite existing issues, code understanding tools can have merit, provided that usability take precedence over technical novelty.

1

INTRODUCTION

Source code auditing has long been acknowledged as an invaluable part of the software

development lifecycle, particularly as it relates to the practice of defensive programming. The

key hurdle for auditors, however, has always been the necessity to understand large and

unknown code bases in a limited amount of time. Any programmer who has been newly assigned

to an existing project can attest to the seemingly insurmountable learning curve associated with

this process.

To address this issue, a large body of research and several commercial products have

been directed to help the security auditor. However, despite the complexity of these tools,

practical surveys have found that auditors seldom use them in production. In fact, this

complexity often dissuades auditors from adopting them for everyday use and forces their

existence into obscurity.

Ultimately, this continuing lack of wellknown or applicable tools has allowed the

auditors’ issues to persist. As a realworld ramification of this, the opensource movement has

apparently failed to garner the support that it needs. One of the greatest features of the

opensource mentality states that: anyone can read the code, therefore anyone can audit it.

However, recent experience has shown that, just because someone can audit the code, does not

mean that they will . The year 2014 was popularized as the year of opensource vulnerabilities

with HeartBleed, ShellShock, POODLE, and a number of others. These vulnerabilities took

place in the opensource backbone of our modern infrastructure and affected millions of users,

yet took years to discover. The subsequent media focus on these events brought the source code

2

auditing process to the attention of the general public and has spurred both greater demand and

more accountability for their success.

Therefore, it seems that there is high demand for source code auditing but a fundamental

disconnect between the users and any existing tools that could aid in their work. This begs the

question of what purpose these tools are meant to serve and what can be done to improve them.

As a general trend, each new attempt in the academic domain to address program

comprehension has sought to surmount some technical challenge. Generally, such approaches

proclaim greater extraction of relevant features and improved higherlevel design recovery.

However, this technical success does not necessarily beget an improved outcome for the user,

particularly if there are no users of the approach. Therefore, in this research, we return the root

question of the matter and seek to identify if source code understanding tools truly can help an

auditor.

It is worth noting at this point that we confine our criticisms to those tools that are

directly intended to aid in source code understanding. That is to say that we place the more

simple code isolation and navigation tools such grep and ctags in an alternate category. These

tools may be said to increase overall understanding, but ultimately contribute only minute

portions of the overall picture. Instead, our critiques address the more robust understanding tools

that are intended to draw overarching correlations or produce some highlevel functionality or

architecture summary. It is these tools that we question and whose validity we seek to test.

To answer this question, we have implemented our own simple automated source code

understanding tool. This tool draws very minor correlations across a code base and presents this

3

information to the user. We then set out to analyze unknown code bases both with and without

the tool and gathered comparative information on the relative effectiveness of both approaches.

LITERATURE REVIEW

Machineassisted semantic understanding of code is a subset of program comprehension

which deals with automated processes designed to augment an auditor’s knowledge of a

codebase in useful ways. In this section we examine both the free and commercial tools that

already exist which attempt to assist generic analysts in source code understanding and

vulnerability analysis (the goal of many audits), the current research thrusts into the realm of

machineassisted semantic understanding of code, and the academic standards of measuring code

understanding. However, we find that both tools and research papers lack the focus we are

looking for: a simple auditorcentric algorithm implemented and evaluated for effectiveness.

After our literature review, we discovered a startling gap between axiomatically practical and

effective tools quite often simple and their much more complex and novel counterparts in the

research domain. Our research aims to fill in the gap by providing a step up from a simple tool

used everyday by most developers, grep, and demonstrating empirical evidence of its

effectiveness.

First, we dive into the domain of the pro bono toolsmith, examining some free tools

which attempt to help analysts understand source code. The first of the free tools we examine is

Source Navigator NG which enables editing code, jumping to declarations or implementations of

functions, variables and macros, and displaying both relationships between classes, functions and

4

members, and call trees [Source]. The second, GNU GLOBAL, is less like an integrated

development environment (IDE) than Source Navigator NG, and is instead a selflabeled “source

tagging system” that allows the user to quickly locate functions, macros, structs, classes, etc.,

independent of any specific editor [GNU]. This functionality differs from RA in that RA remains

agnostic concerning the semantics of the tagged language constructs, whereas GNU GLOBAL

provides strong hints about the semantics of functions. Lastly, CScout is a refactoring browser

that enables identifier changes, static call graph construction, and querying for files, identifiers

and functions based on properties, metrics and many other attributes [Spinellis]. These three free

tools, selected from among many, are comparable to IDE’s with some extra bells and whistles.

The applicability of IDE’s and IDElike programs to source code browsing is axiomatically

effective since IDE’s are used to create software and the history of their usage also supports this

assumption. But how much do extra features such as static call graph reconstruction or automatic

inference of class relationships help in auditing? Free tools implement simple algorithms and

solutions and assume that their effectiveness is apparent and do not bother to gather empirical

evidence. Indeed, in some cases, evidence of efficacy for some techniques would be extraneous

such as in the case of the IDElike tool, however, many simple techniques like static call graph

reconstruction would benefit from academic rigor in testing and evaluation.

In the commercial world, a much sparser biome than the free and opensource world,

products such as Imagix 4D are more geared towards the software auditor or reverse engineer

with automated analysis of control flow and dependencies, and visualizations of source code

aimed at improving program comprehension [Imagix]. Scitools Understand is a proprietary IDE

that provides a wide variety of information about code including dependency analysis, call

5

graphing, code standards testing, and a variety of metrics [Scitools]. It is directly aimed at

helping an analyst understand code more quickly and is therefore very closely aligned with our

own goals. The commercial world of tools, then, is very similar to equivalent free tools, except

the average level of sophistication is higher. Companies present what their tools can do, but they

do not present rigorous academic testing on whether these tools really help or not. Just as with

free and opensource tools, commercial tools stand to gain much from the empirical verification

of their effectiveness.

Next, we dive into the domain of the researcher, examining a range of papers, from

feature location to automatic summarization. Dit et al. created a taxonomy and survey of 89

articles from 25 venues on feature location in source code [Dit], a technique that is applicable to

software reverse engineering, auditing and maintenance, but pursues increased code

understanding very differently than RA. Research in the niche field of feature extraction assumes

that its methods are useful because the latest algorithms are better and faster than their

predecessors. Feature extraction is not the only field in program comprehension that makes this

assumption. Lucia et al. compared the results of automatic information retrieval (IR) methods for

software artifact labeling to manual methods for labeling and found that simpler IR methods

work better in some cases [De Lucia]. Their work serves as inspiration for RA’s utility testing

because it indicates that complex methodology does not necessarily mean better results. The

question we must ask is: can auditors more quickly understand code assisted by our tool or not?

Ning et al. created Cobol System Renovation Environment (Cobol/SRE), a tool for reusable

component recovery [Jim]. Cobol/SRE aims more at software developers understanding legacy

systems so that useful components can be extracted, although auditors or reverse engineers could

6

use it as well. Lastly, Moreno et al. created JSummarizer, an Eclipse plugin that automatically

generates natural language summaries of Java classes [Moreno]. We consider this to be one of

the most relevant research results for a software auditor trying to quickly understand code,

though it lacks shedding light on the specificities of implementations, so it would not help an

auditor find possible security vulnerabilities. RA fills this gap by providing semantic assistance,

not at the class level, but largely at the function call level. This provides enough granularity to

conduct security audits. Research papers tended to focus on incremental improvements in

algorithms instead of improvements in code auditor abilities due to better algorithms. Obviously,

this by no means invalidates nor minimizes this research, though just as in the world of tools, the

world of research into machineassisted semantic understanding of code could be vastly

improved by a thoughtful approach to testing the practical efficacy of esoteric program

comprehension algorithms. In fact, one of the most important papers we reviewed was by Maalej

et al. called “On the Comprehension of Program Comprehension” in which they report

observations of 28 developers in industry and their processes in comprehending new software.

The researchers found that comprehension tools (in the traditional sense) are almost unknown in

industry and had littletono impact in practice. The developers instead opted for more basic

strategies such as GUI tinkering, debug prints, and simply talking to the original developers.

They posit that existing tools can tend to be too esoteric and a simplified approach may have

greater impact [Maalej]. If existing tools are too esoteric, certainly many frontline research

techniques are downright obscure. RA intends to be the one of the first research techniques that

conceptually extends a familiar and intuitive base, the Unix grep utility, and backs up its tool

with quantitative and qualitative analysis of its effectiveness to the auditor.

7

Lastly, we examined previous research that evaluated the tools that test subjects used to

understand code. Most methods erect a framework and compare tools without consideration for

measuring the effect on the user’s understanding. One such method compares tools based on

“data structures”, “visualizations”, “information requesting” and “navigation features”

[Koskinen]. Another opts for evaluation based on “context”, “intent”, “users”, “input”,

“technique”, “output”, “implementation” and “tool” [Guéhéneuc]. These methods seem

reasonable for comparing tools, but not for finding the ground truth about their effectiveness.

The research on evaluating a programmer’s understanding of code is sparse because such a task

is necessarily ambiguous and hard to measure absolutely. In some sense, measuring human

understanding is more the realm of psychology than computer science. However, we take cues

from von Mayrhauser’s work in [Von Mayrhauser] and use anecdotal evidence to evaluate

human understanding. Taking it a step farther, we decided to quantify our understanding by

timing our performance on test samples with and without using RA. In this way, we extend the

current standard of academic rigor for empirical testing of human understanding and apply it to

our tool.

PROBLEM STATEMENT

In this age of digital reliance, software security is more important than ever before. With

large portions of the internet’s infrastructure based on opensource code, any optimization to the

code auditor’s workflow is eminently useful. However, modern tools are frequently discarded

due to their overwhelming complexity and poor usability. In this research, we have returned to

the root of the problem and questioned if code understanding tools have any merit whatsoever.

8

To do this, we have created a tool of our own that is designed to address criticisms against

existing approaches by being both simple and highly usable. We then evaluated our use of this

tool and gathered both quantitative and qualitative descriptions of the process. By comparing

these metrics, we can then provide support either for or against pursuing such tools to a greater

degree. Some assumptions are made during the evaluation process, and these are delved into with

greater detail in the following section.

METHODS AND PROCEDURES

To evaluate the efficacy of source code understanding tools, we first addressed the

criticisms against existing approaches. Namely, existing tools are often considered too complex

and obscure for realworld use. However, simpler utilities such as grep are used with resounding

frequency and to great success. Therefore, we sought to bridge this gap by creating a source

understanding tool of our own based on the underlying principles of grep. This tool operates on

the concept of pairing a semantic tag to relevant language keywords and then locating all

instances of these keywords within a code base. This effectively allows us to locate and visualize

all portions of the code that deal with some specific functionality such user Input/Output,

Database interactions, or many others. For the purposes of this research, this tool has been

dubbed Rapid Apperception, or RA.

The intent behind creating such a tool is to provide the minimum amount of functionality

necessary to meet the definition of “code understanding”. For our purposes, this definition

necessitates the ability to derive some higherlevel correlation among various parts of the code

9

base. Therefore, by overcoming criticisms against existing approaches, we are able to more

adequately evaluate the conceptual basis for source understanding tools in general. To conduct

this evaluation, we have constructed the following general experiment:

Given a vulnerable application and a sufficiently vague description of the vulnerability, a

security auditor has two hours to develop a patch that mitigates it. Within this experiment, the

usefulness of RA can then be measured by comparing the quantitative results achieved via

timing as well as qualitative reports produced by the auditors as they record their methodology

and impressions. If a patch could not be developed in the two hour time frame, then the tester

would compose a description of the patch that they would create if they had enough time.

Due to the limited experimentation timeline, the role of security auditors in this

experiment were performed by the authors of this paper. To derive any reasonably valid results,

the experimental process was repeated on several code bases and the use of RA was rotated

among participants. To mitigate any existing inherent speed differences, we first established a

baseline among the participants and used this as the guide for further comparison. A more

detailed description of this procedure follows.

As a first step, we prepopulated the database of tag to keyword pairings with relevant

tags for the Java language. These pairings were taken from the OWASP Code Review Guide

v1.1 [OWASP] where they were specifically identified as being relevant to the security auditing

process. We next identified a set of Java projects that are known to contain at least one

vulnerability. We verified the exploitability of these vulnerabilities by leveraging modules from

the Metasploit project.

10

To obtain a reasonably vague description of the vulnerabilities, we enlisted the assistance

of a securityknowledgeable colleague. Their assignment was to look at the respective Metasploit

modules and filter the existing vulnerability summary to remove any detailed information that

might indicate the exact location of the vulnerability or any associated functions. The intent

behind this was to narrow the scope of the security audit to a reasonable functionality subset, yet

not reveal so much as to make the test uninformative. Upon practicing on a preliminary test, not

included in the results, we also decided to allow viewing of the exploit code used by metasploit.

We retained the redacted descriptions, but allowed static analysis of the exploit in order to

mitigate our time constraints on our tests and also the enormity of the codebase of the tests.

For testing, the auditors used identical Virtual Machines running Ubuntu 10.04. Then,

given an uninterrupted time period, they were directed to complete the experiment for each Java

test subject and record their experiences as well as any timing results. Timing began after the

testers verified that the exploit worked on the application and began the task of understanding

how the application was exploited, and then timing stopped when an acceptable patch was

implemented and tested or the proposed patch was determined to have a high chance of success,

but required far more time than the allotted two hours to implement. In the case of our tests,

every single patch found was determined to have a high likelihood of success, but require far

more time to implement than two hours. For purposes of quantitative analysis, one auditor was

assigned the use of RA and the other was prohibited its use. The rules for this competition were

as follows:

Static analysis only

11

Exception: An auditor’s patch is only deemed successful by running the

respective Metasploit module and verifying nonexploitation.

Permitted applications:

Without tool: vim, grep

With tool: vim, grep, RA

Maximum auditing time: 2 hours

As researchers, we acknowledge that, despite the academic rigor that this approach

provides over previous work, this experimental process is still highly subjective and presents

room for a large margin for error. Factors that may contribute to this error include the relative

immaturity of the participants as security auditors as well as the small sample size of the Java

projects selected for experimentation. However, as the measurement of “understanding” is itself

a highly subjective concept, it is our hope that our novel testing procedures and the results

gathered therein will still serve as a benefit to future program comprehension research. The

results from these experiments have been compiled and are presented in the following section.

RESULTS

Over the course of about a week, we analyzed four test subjects. One as practice and

three as actual test candidates. First, we found metasploit exploit modules on exploitdb, and

then downloaded the source code for the projects at their respective repositories. Our findings are

tabularized as follows:

12

Table 1: Patch Design Speed with and without RA (max 120 minutes)

Tester 1 Tester 2 ratio

ElasticSearch 1.1.1 (Baseline)

wo/tool: 38 min wo/tool: 40 min 0.95

Apache Struts 2.3.16 w/tool: 120 min wo/tool: 120 min 1

Apache Roller 5.0.1 wo/tool: 120 min w/tool: 120 min 1

The first test, our baseline, was designed to mitigate any difference in speed that we as

testers would have regardless of using our tool or not. However, it turned out that we had

approximately the same speed when it came to designing a patch for ElasticSearch. We found the

issue to be arbitrary execution of Java almost certainly included as a feature to be the problem

that the metasploit module took advantage of. Tester 1 proposed sandboxing the Java execution

or implementing a domain specific language. Tester 2 proposed something quite different: an

authentication requirement to be able to use the Java execution feature of ElasticSearch. Both

testers' methodologies were very similar, involving heavy use of grep and starting by searching

for the "script_fields" parameter as seen in the exploit code. From that point, they both "hunted

and pecked" for various components, reading them in vim and finding other files to investigate

using grep. Both testers found a line by which they could disable the exploit, but proposed the

above patches as an actual viable inproduction server.

13

In the next test on Apache Struts, we introduced the use of RA. Notably, both testers ran

into the time limit for this test. They found the issue to be the use of ObjectGraph Navigation

Language (OGNL) expressions which managed to bypass security measures and execute

arbitrary Java. Both testers arrived at the same patch: filtering access to "staticMethodAccess"

field on the "OgnlUtil" object. Another, much simpler, though trivial option would be to not run

production servers in developer mode. However, we sought to fix the OGNL security bypass

even for developer mode. The tester without the use of RA used a method exactly similar to that

described above for ElasticSearch, while the tester with RA used grep to determine jumping in

points to the codebase, but then used RA as a code browser and manually added tags as needed.

This manual addition of tags was required because the prefabricated database of tags from the

OWASP document had no tags for OGNL. The fact that the test candidate's vulnerability dealt in

code that did not have premade tags made the use of RA not as easily beneficial as it could have

been. However, the tester using RA still found it to yield incremental understanding increases

much more steadily than just using grep in the baseline test. Observing the audit logs, the tester

using RA was able to find the exact location of script execution, though the tester not using RA

proposed a more exact possible solution OGNL expression execution, though the increased

preciseness was not verified to be tenable.

For the last test (Apache Roller), the testers switched who was using RA and again began

the process of trying to understand the source enough to patch the metasploit vulnerability. Both

testers also ran into the time limit for this test. Both testers soon found that the same OGNL issue

was being exploited in Roller as in Struts; furthermore, Roller actually included a version of

Struts that was being exploited by the metasploit module. However, in spite of this, the patch

14

was not any easier to construct; in fact, neither tester could find the exact location of the payload

execution, however the tester with RA demonstrated a much better understanding of the code

given their audit log.

DISCUSSION

Our first and primary goal was to assess whether a simple algorithm augmented auditor

understanding of code or not. To accomplish this goal, we made two contributions to the

program comprehension field: a novel understanding testing methodology, and a novel program

comprehension tool. We'll first discuss our proposed abstract outline of the testing framework,

and then our implementation and usage of it for our tests.

During the development of our testing methodology, we were cognizant that evaluating

human understanding of anything, even of a computer program, is more of a psychological or

social science question than computer science one. Inherently, surmounting the qualitative nature

of this kind of analysis required creativity and a willingness to investigate analyses mostly

foreign to computer science. We suspect that the difficulty of quantitative analysis of human

understanding of programs is why there is such a lack of any comprehension analysis in modern

literature. This, however, is abundantly unacceptable, because as many niche program

comprehension algorithms gain undeniable, incremental improvements, there can be no

intelligent vectoring of effort based on what niche research fields are most beneficial to

realworld auditors, reverse engineers, and software engineers. To begin addressing this issue in

the program comprehension field we propose an abstract outline of a testing framework that

15

should be applied to existing tools to evaluate their efficacy to frontline auditors, reverse

engineers and others who require code source understanding:

1. Choose qualitative and quantitative metrics

2. Choose test corpus and testers

3. Conduct tests

4. Evaluate effectiveness of metrics, corpus and testers

5. If metrics acceptably relevant, evaluate results

6. Present results, metrics and metric effectiveness

We’ll now discuss how we applied this framework for testing RA.

For our qualitative metric, we drew on von Mayrhauser’s work and adopted recorded

anecdotal evidence. For the quantitative metric, we found no precedents to draw upon, but

decided to time our usage of our tool in order to patch a vulnerability to an available exploit and

compare our timing data to hopefully objectively determine if our tool added value to the code

auditing process or not. Since we lacked the time to solicit external testers and instead had to act

as testers ourselves, timing successful patch development to known exploits gave us a viable

method of testing and comparing our understanding to each other without having to know

anything about the code beforehand or collaborate during the testing to determine our respective

levels of understanding. Next, in evaluating our metrics, corpus and testers, we find that there is

much room for improvement.

First, our quantitative metric of timing was unhelpful in determining whether our tool

was beneficial or not. This is because we set a testing time limit of two hours, and both testers hit

16

the limit for both tests. Additionally, our stopping point for the timing amounted to “breaking”

the exploit, which can be accomplished by multiple levels of complexity and time commitment,

from disabling the vulnerable service, to implementing authentication. For timing results to

really be comparable, there must be fewer solutions to the problem, preferably one. Otherwise

the testers’ paths of understanding are different and therefore not truly comparable. However,

this begs the question that if the application can only be understood in one way whether human

understanding is even needed. Our qualitative metric of recorded anecdotes was unsurprisingly

more successful, though it, by nature, requires expertise and domain knowledge to interpret. The

testers’ method of recording their findings and insights would be benefitted from standardization

which would make them more comparable.

Second, our choice of using userlevel (nonJVM/JRE) Java applications with metasploit

exploits as our corpus had two unforeseen consequences; one actually desirable and another less

so. Desirably, the exploits in our corpus were designlevel vulnerabilities which require much

more understanding of the source code than alternatives such as buffer overflows in a language

such as C or sanitization issues as in PHP. Undesirably, we found only about five test candidates

from a metasploit exploit database of 1,350 modules, therefore limiting our choices in terms of

testing.

Lastly, the choice of using ourselves as testers was probably the most detrimental factor

in our whole testing procedure. This prohibited us from constructing our own toy code for more

manageable understanding challenges. Having external testers would enable likely better

quantitative timing results and better ability to compare qualitative audit logs, especially

17

impartially. In spite of all these ways to improve our testing procedure, we still consider our

qualitative metrics effective enough to draw conclusions about RA from the results of our tests.

Qualitatively, we found RA to be somewhat useful.It easily prevented grepping for the same

strings over and over, as often happens when using grep on source code bases. From the

qualitative audit logs, the tester with RA seemed to have more success in understanding the code

than the tester without. In Apache Struts, the tester with RA found the exact place of script

execution and in the Apache Roller example, the tester with RA could construct a much more

comprehensive picture of the application and suggest a much more precise fix for the

vulnerability. Since both testers, when they used RA, added custom tags, we believe that this

indicates that the novel functionality introduced by RA is what helps, not just having a code

browser. However, since the test candidates we examined had vulnerabilities in parts of the code

not addressed by our preconstructed tag database, we cannot conclusively say whether

prepopulated tags truly help or not. It’s only reasonable to believe that they do, and are in fact

probably much more helpful than needing to construct one’s own tags. However, our tests,

unfortunately, did not provide any empirical evidence for this.

Some definite challenges we encountered during testing included the lethargy of response

when custom tags were added to large projects, the judiciousness and expertise needed for

adding custom tags, and lack of user interface sophistication. The lethargy was due largely to our

Python implementation of the tagging engine. We could replace this engine with a bash oneliner

that would be much quicker, though we have not tested this method on any test candidates. Not

only was adding custom tags was painfully slow, but also installation of the tool, as it required

18

Apache, MongoDB, PHP and Python. This complexity could be easily mitigated by creating an

aptget package.

Concerning the expertise needed to add custom tags, we discovered that adding tags to

increase understanding of an unfamiliar codebase was not as trivial in the general case as it might

seem and required understanding itself. In some cases, trivial custom tags could be eminently

useful, in the case of identifying wrapper functions or custom implementations of library

functions. But our test candidates included no such lowhanging fruits. This, again, was due to

our choice of Java projects as our test corpus. We believe that if we had used C or PHP projects

as tests the usual buffer overflow and input validation culprits would have been easier for RA to

highlight, though would require less understanding of the code.

Lastly, even though we focused on testing the usefulness of semantic tagging as an

approach, sophistication of the user interface is an instrumental force multiplier when

considering human understanding. A poor user interface can be a bottleneck between the user

and an algorithm given that the user has good comprehension ability and the algorithm is

reasonably useful. Having the ability to display multiple tags at the same time and define

subtags would greatly increase the effectiveness of RA. Even simple things like keybindings

and autocompleting tag names would greatly help the auditor using RA. These changes might

yield high increases in usefulness for what amounts to a improvement in presentation.

CONCLUSION

19

In the domain of machineassisted semantic understanding of code, a subdomain of

program understanding, there are many tools that consider their effectiveness to be apparent and

not needing verification. There are also many bleedingedge research techniques that objectively

improve upon past techniques, however do not provide evidence of actually augmenting humans’

understanding. We sought to break this mold with Rapid Apperception, a tool incrementally

more complex than the commonly accepted grep utility but empirically verified with rigorous

testing procedures.

Our hypothesis was that by prepopulating a semantic tag database with related keywords

and making a simple user interface that would highlight these tags that a code auditor’s job could

be made noticeably easier. To test this hypothesis, we constructed a basic tool and established a

novel testing methodology to evaluate both the tool and the methodology itself. The crux of our

testing ended up depending on analyzing recorded anecdotes of understanding, an approach we

validated via von Mayrhauser. Though we found that our testing procedures could use significant

improvement, we derived enough qualitative evidence from our tests to determine that semantic

tagging is a useful technique to employ in code auditing. Although, its usefulness is directly

connected with the amount of expertise and applicability embodied by the semantic tag database.

If custom tagging is required and no prepopulated tags apply, then understanding starts at square

one, but can be more quickly gained than without our tool. Also, with a team of auditors, once

custom tags are added, the understanding process is bootstrapped and no longer needs to be

repeated.

20

Lastly, though the underlying algorithm of a machineassisted semantic understanding of

code technique needs to be viable for any utility to be derived from its use, the user interface

which presents the results of the algorithm can be a bottleneck or force multiplier depending on

its design and features. We consider our work to only be the first step in evaluating a field of

research that has long needed more robust empirical evaluation, and more effort towards

bringing the best of program comprehension research to the fingertips of code auditors.

21

REFERENCES

Dit, B., Revelle, M., Gethers, M. and Poshyvanyk, D. (2013), Feature location in source code:a taxonomy and survey. J. Softw. Evol. and Proc., 25: 53–95.

De Lucia, A.; Di Penta, M.; Oliveto, R.; Panichella, A.; Panichella, S., "Using IR methods for labeling source code artifacts: Is it worthwhile?," Program Comprehension (ICPC), 2012 IEEE 20th International Conference on , vol., no., pp.193,202, 1113 June 2012

"GNU GLOBAL Source Code Tagging System." GNU Project . N.p., n.d. Web. 07 Feb. 2015. <http://www.gnu.org/software/global/>.

Guéhéneuc, Yanngaël. "A Comparative Framework for Design Recovery Tools." Proceedings of the Conference on Software Maintenance and Reengineering (n.d.): n. pag. Web. 24 Apr. 2015.

Imagix Corp. "Analyze Your Source Code." Reverse Engineering and Source Code Analysis Tools . N.p., n.d. Web. 07 Feb. 2015. <http://imagix.com/>.

Jim Q. Ning, Andre Engberts, and W. Voytek Kozaczynski. 1994. Automated support for legacy code understanding. Commun. ACM 37, 5 (May 1994), 5057.

Koskinen, Jossi, and Tero Lehmonen. "Analysis of Ten Reverse Engineering Tools." Advanced Techniques in Computing Sciences and Software Engineering (n.d.): n. pag. Web. 24 Apr. 2015.

Moreno, L.; Marcus, A.; Pollock, L.; VijayShanker, K., "JSummarizer: An automatic generator of natural language summaries for Java classes," Program Comprehension (ICPC), 2013 IEEE 21st International Conference on, vol., no., pp.230,232, 2021 May 2013

Maalej, Walid, et al. "On the comprehension of program comprehension." ACM Transactions on Software Engineering and Methodology (TOSEM) 23.4 (2014): 31.

OWASP, “OWASP Code Review Guide V1.1”. Web. 2008. <https://www.owasp.org/images/2/2e/OWASP_Code_Review_GuideV1_1.pdf>

"Source Navigator NG." N.p., n.d. Web. 07 Feb. 2015. <http://sourcenav.sourceforge.net/>.

Spinellis, Diomidis. "The CScout Refactoring Browser." Department of Management Science and Technology, Athens University of Economics and Business, n.d. Web. <http:// www.spinellis.gr/cscout/doc/index.html>.

"Understand Your Code | SciTools.com." SciTools.com . N.p., n.d. Web. 10 Feb. 2015. <https://scitools.com/>.

Von Mayrhauser, A. "From Code Understanding Needs to Reverse Engineering Tool Capabilities." Computer-Aided

Software Engineering (n.d.): n. pag. Web. 24 Apr. 2015.

22

http://www.gnu.org/software/global/

http://imagix.com/

https://www.owasp.org/images/2/2e/OWASP_Code_Review_GuideV1_1.pdf

http://sourcenav.sourceforge.net/

http://www.spinellis.gr/cscout/doc/index.html

Rapid Apperceptioninsurehub.org/sites/default/files/reports/FinalReport...Rapid Apperception Machine...

Documents

Transcript of Rapid Apperceptioninsurehub.org/sites/default/files/reports/FinalReport...Rapid Apperception Machine...