Hunting in the enterprise: Forensic triage and incident ... · Digital forensics is increasingly...

e at SciVerse ScienceDirect

Digital Investigation 10 (2013) 89–98

Contents lists availabl

Digital Investigation

journal homepage: www.elsevier .com/locate/di in

Hunting in the enterprise: Forensic triage and incidentresponse

Andreas Moser, Michael I. Cohen*

Google Inc., Brandschenkestrassa 110, Zurich 8002, Switzerland

a r t i c l e i n f o

Article history:Received 22 November 2012Received in revised form 11 March 2013Accepted 14 March 2013

Keywords:TriageDigital forensicsIncident responseInformation securityMalware detectionScalable investigationEnterprise fleet management

* Corresponding author. Tel.: þ41 788399834.E-mail address: [email protected] (M.I. Cohen

1742-2876/$ – see front matter ª 2013 Elsevier Ltdhttp://dx.doi.org/10.1016/j.diin.2013.03.003

a b s t r a c t

In enterprise environments, digital forensic analysis generates data volumes that tradi-tional forensic methods are no longer prepared to handle. Triaging has been proposed as asolution to systematically prioritize the acquisition and analysis of digital evidence. Weexplore the application of automated triaging processes in such settings, where reliabilityand customizability are crucial for a successful deployment.We specifically examine the use of GRR Rapid Response (GRR) – an advanced open sourcedistributed enterprise forensics system – in the triaging stage of common incidentresponse investigations. We show how this system can be leveraged for automated pri-oritization of evidence across the whole enterprise fleet and describe the implementationdetails required to obtain sufficient robustness for large scale enterprise deployment. Weanalyze the performance of the system by simulating several realistic incidents and discusssome of the limitations of distributed agent based systems for enterprise triaging.

ª 2013 Elsevier Ltd. All rights reserved.

Triage is a process commonly applied in themedical fieldin order to ration limited resources and to maximize theiroverall effectiveness (Iserson and Moskop, 2007). In themedical arena, responders follow a systemic approach toassess the severity of injuries and the likelihood of suc-cessful treatments for medical casualties. Triage is essen-tially a prioritization process optimizing the usage of limitedresources toward achieving the best overall outcome.

Digital forensics is increasingly used in more diversecontexts, such as criminal and civil cases as well as inincident response. Increase in the utilization of digitalforensic capabilities has lead to larger case load and longerbacklogs of evidence which must be analyzed by a limitednumber of highly trained investigators (Richard andRoussev, 2006). The increased data volume places chal-lenges on traditional forensic methods and has led toseveral proposals for updating best practice techniques inorder to cope with the work load (Jones et al., 2012).

).

. All rights reserved.

1. Triage in digital forensics

Triaging has been proposed for managing long casebacklogs (Rogers et al., 2006). Drawing inspiration frommedical triage techniques (Hogan and Burstein, 2007), thegoal of digital forensic triaging is to prioritize evidencefor acquisition and analysis in order to maximize casethroughput.

Particularly, many digital forensic procedures aredesigned to discover if evidence exists at all on a computerwhich was potentially relevant to the case. Drawing ananalogy from medical triaging (Iserson and Moskop, 2007),one can define triaging as a process which classifies digitalevidence into three groups:

� The system is likely to contain crucial evidence but thisevidence is unlikely to be destroyed in the near future.

� The system is likely to contain crucial evidence whichmay be imminently destroyed.

� The system is not likely to contain relevant evidence andtherefore should not be acquired or processed.

mailto:[email protected]://crossmark.dyndns.org/dialog/?doi=10.1016/j.diin.2013.03.003&domain=pdfwww.sciencedirect.com/science/journal/17422876http://www.elsevier.com/locate/diinhttp://dx.doi.org/10.1016/j.diin.2013.03.003http://dx.doi.org/10.1016/j.diin.2013.03.003

A. Moser, M.I. Cohen / Digital Investigation 10 (2013) 89–9890

Classifying potential systems into these categories al-lows to prioritize evidence acquisition and analysis. Digitalevidence collection must follow the Order Of Volatility(OOV) (Farmer and Venema, 2005; Brezinski and Killalea,2002), in that some sources of evidence are more volatileand likely to change, hence should be collected sooner.For example, memory images are more volatile than diskimages, since system state is likely to change quickly(Schuster, 2008), or be completely lost if the system ispowered down. Yet the contents of memory containextremely valuable evidence in many investigations(Walters and Petroni, 2007). In this context, it is critical toobtain the memory image as fast as possible.

It is important to contrast the aims of triaging analysiswith traditional digital forensic analysis. While traditionaldigital forensic analysis aims to establish irrefutable find-ings upon which a solid case may be built, triage analysishas a much lower evidentiary burden of proof. The triagingstep is merely trying to establish whether the system islikely to be involved with the case. This lower burden ofproof opens the possibility for wider automation in thetriaging process, with a higher acceptable false positiverate. However, the danger with automated triaging is thatsubtle evidence might be missed.

For example, consider a case where the investigationrequires analysis of the URLs the suspect accessed using abrowser. A full forensic analysis might require the cachematerial to be examined, browsing history reconstructedand time lines created. On the other hand, the triaging stepmerely discovers whether the browser cache contains anyreferences to the user name, or web site in question. Thus,the triaging step can be implemented as a simple keywordsearch, where a significant number of hits result in classi-fying the system as potentially containing evidence, leadingto acquisition and further analysis of the system.

While triaging analysis is less rigorous than a full digitalforensic examination, it must necessarily be applied tomany more systems. This can be achieved by recruiting lesstrained investigators to perform the analysis using stan-dard tools (e.g. recruiting police officers, provided withminimal digital forensic training and using commercialtools). Alternatively, specialized tools may be developed toensure that triaging analysis is as automated as possible, forexample the FBI’s image scan tool – a law enforcement onlytool used to triage for contraband images (Cantrell et al.,2012). Ideally, the triaging strategy is tailored to the spe-cifics of each case.

1.1. Privacy considerations

Another complication with applying the triaging pro-cess is the effect it has on the privacy of the system’s owner.In traditional digital forensic analysis, the systems acquiredand inspected have a very high probability of being relevantto the case. In criminal matters, these systems must fallunder the terms of the relevant warrant before they can beexamined at all. Usually, the warrant lists all systems thatare to be examined in advance, in order to minimize thechances of examining unrelated systems.

In contrast, triaging affects a wider selection of systems,some of which may turn out to be irrelevant to the case, or

unrelated to the suspect. A triage step using too generalsearch criteria may therefore reveal private informationirrelevant to the case, as well as select unrelated systemsfor further inspection with traditional forensic analysisprocedures, thus inadvertently violating the owners’ pri-vacy. Carefully tailored search criteria lower the false pos-itive rate, resulting in a more focused and effective triage,while simultaneously protecting the privacy of systemowners. For example, consider again a search for thepresence of specific keywords in the user’s web history. Aspecific and unique term such as an email address ordomain name is likely to produce a lower false positive ratethan a general term which is likely to occur in unrelatedweb history.

1.2. Exculpatory evidence

In legal proceedings, exculpatory evidence is any evi-dence which is favorable to the defendant and might provethe defendant innocence. Legal due process requires thatexculpatory evidence be collected and disclosed to thedefendant before trial (Supreme Court of United States,1963).

By its nature, triagingmight not collect all evidence, andmight systematically omit to collect exculpatory evidence.The digital investigator must keep this requirement inmind, and design triaging processes which include excul-patory evidence collection.

2. Triaging and incident response

Aside from traditional forensic investigations accom-panying criminal cases, digital forensics is increasinglyemployed as part of enterprise incident response pro-cedures. Forensic readiness is defined as the proceduresthat an organization can take in advance of an intrusion inorder to expedite the incident response process (Endicott-Popovsky et al., 2007; Mitropoulos et al., 2006; Tan, 2001).

Enterprise incident response is typically time con-strained, requiring rapid collection and analysis of digitalevidence (Casey, 2006). For example, when responding to apotential security compromise, the need for acquisition offorensically sound evidence must be balanced with rapiddisruption and neutralization of the attacker threatand minimizing the resulting economic loss (Endicott-Popovsky et al., 2007).

In this enterprise context, applying a systemic triagingprocess is crucial (Lim et al., 2009). Not only does triagingreduce the number of systems which must be manuallyexamined to a manageable level, but triaging also ensuresthat investigators have the opportunity to acquire forensi-cally sound evidence of systems of value, whilemaintainingthe Order Of Volatility – thus ensuring the possibility forpost-incident legal proceedings.

For example, consider a typical network forensicinvestigation (Casey, 2006). Often, there are severalcompromised systems under the control of the attacker, allusing the same kind of Remote Access Tool (RAT). A triagingprocedure might require searching for unique evidence ofthe RAT in memory, on disk or in the Windows Registry.This might include specific registry keys used by the tool or

Fig. 1. GRR component overview. The client agent communicated over theinternet with the front end servers. These manage message queues medi-ating communication between the agent and the workers and console. Theworkers execute flows – which are predetermined modules of analysis.

A. Moser, M.I. Cohen / Digital Investigation 10 (2013) 89–98 91

binary strings unique to the RAT’s executable whichmay befound in memory.

If evidence of compromise is found in memory, themachine is then chosen for immediate response and a copyof memory is acquired immediately. However, if the RAThas been uninstalled such that it exists only on disk but isnot currently running, the machine’s hard disk can be ac-quired at a later time. This determination is performedaccording to the Order Of Volatility – it is crucial to obtainmemory images as soon as possible in order to freeze thestate of RAM for further analysis.

In the initial parts of an incident response investigation,it is not known how many systems are likely compromisedusing the same RAT. Therefore triaging must be applied toall systems rapidly in order to narrow down the list ofpossibly compromized ones. In an enterprise, this requiresrapid triaging of many hundreds if not thousands ofmachines.

Once systems have been triaged, evidence must be ac-quired. Forensic acquisition systems fall into two broadcategories. The first requires the investigator to havephysical access to the system and boot the system into amode which allows low level access to the physical disks(e.g. a boot disk such as Encase Portable (GuidanceSoftware Inc., Oct 2012b)). This approach is more resilientagainst potential malware on the target system. However, itis more time consuming since it requires the system to betaken off line, and the investigator to be physically presentto acquire the image. This mode of operation is generallynot usable for obtaining amemory imagewithout installingspecialized hardware in advance (Vömel and Freiling,2011).

Another approach is to deploy an agent on the targetsystem and perform analysis remotely (e.g. Encase Enter-prise (Guidance Software Inc., Oct 2012a), GRR (Cohenet al., 2011)). This approach minimizes the time requiredto respond (since physical access is not required). However,it may be susceptible to interference from malware, sincethe agent relies on the integrity of the operating system tooperate.

3. The GRR Rapid Response system

GRR Rapid Response (GRR) is an open source agentbased scalable enterprise forensic platform (Cohen et al.,2011; Bilby et al., 2012). The goal of GRR is to lower thetotal cost of triage analysis by distributing that analysis tothe system agents. This allows the system to triage a largenumber of systems quickly, and allows investigators tofocus their efforts on those systemswhich best progress theinvestigation.

GRR is one of several agent based remote forensicplatforms (Guidance Software Inc., Oct 2012a; Khuranaet al., 2009). The following discussion focuses on someimplementation challenges addressed within the GRRsystem, but these challenges are also faced by other plat-forms. It is the discussion and analysis of these issues thatthe authors hope will advance the current state of the art inenterprise forensic triaging.

Although we have not formally evaluated and comparedother products to GRR, there are a number of products

which appear to offer similar functionality. For example,Encase Enterprise (Guidance Software Inc., Oct 2012a) of-fers advanced forensic capabilities such as remote imageacquisition, NSRL hash comparison and the ability to parsemany types of file formats. Similarly F-Response (F-Response, Feb 2013), is a tool which provides raw readonly access to the physical disk and memory of the systemunder investigation.

Fig. 1 shows an overview of the GRR system. The GRRAgent is deployed on enterprise assets and reports back tothe central server. Message queues on the front end are ableto queue messages to the agent – termed requests – andprocess messages from the agent – termed responses.

A sequence of requests and responses to the agentwhich achieve a single analysis goal are termed Flows.Flows can be suspended indefinitely by being serialized tothe data store. This technique allows the GRR Framework toachieve the scalability required to simultaneously interactwith many thousands of agents. The details of how flowsare implemented in the GRR framework are describedelsewhere (Cohen et al., 2011), however for the purpose ofthis discussion, a flow can be thought of as a single analysisoperation performed on the agent, for example fetching afile, performing a keyword search on a directory or runninga memory analysis module.

The following sections describe some of the innovativefeatures of GRR and how these features work to implementthe Triaging process described in Section 1.

3.1. AFF4 and data modeling

The Advanced Forensic Format v4 (AFF4) was conceivedas ameans of modeling and interchanging forensic data in aconsistent way between different systems (Cohen et al.,2009). In order to facilitate an extensible and scalableinterchangemechanism, an object oriented datamodel wasproposed based on the RDF relational model.


GRR uses the AFF4 data modeling system to model theinformation returned from the agents in a persistent datastore. The system maintains a persistent view of the entireenterprise fleet within the AFF4 space and all agent in-teractions can simply be viewed as updates to the AFF4 datamodel to reflect samples of system states on the agents.

While the detailed AFF4 data model description can befoundelsewhere (Cohenet al., 2009;ChowandShenoi, 2010),the following description highlights some of the importantconcepts as they are implemented in the GRR system.

At its heart, the AFF4 data model defines abstract AFF4Objects, each having a specific Type. These are merely col-lections of Attributes and type specific methods. An AFF4object is known by its Uniform Resource Name (URN),allowing the object to uniquely exist in the AFF4 name-space. For example, an AFF4 Object which has a type ofAFF4Stream provides the read(), seek() and tell() methods.GRR implements a number of specific AFF4 Object types, aswell as a number of Attribute types to represent remotefiles on the GRR agent system.

AFF4 objects are abstract objects, responsible for theirown serialization to a data store. The data store simplyprovides persistent storage for AFF4 objects, and the spe-cific implementation of the data store is not important. Forexample, for persistent interchange format, a data storemay be implemented using ZIP files (Cohen et al., 2009).The GRR framework utilizes an interchangeable no-sqlpersistent data store (Various, 2012a; Chang et al., 2008).

The original AFF4 scheme did not allow for versionedattributes. This is however required in a live remoteforensic system such as GRR to represent constantlychanging forensic artifacts. For example, if we retrieve a filefrom the agent, we know the file contents at that instant.However, at a later time, the file may have changed, sorepeating the file downloadwill require a new version to bestored in the AFF4 model.

This creates a time smear of data collected from theclient. Every piece of data has a timestamp associated withit and is referred to as the Age of the data.

Fig. 2 illustrates a file examined using the GRR GUI. Thefile is known by its URN (e.g. aff4:/C.cf4bcabf002d38e8/fs/os/bin/bash) as a unique reference within the global AFF4namespace. The URN itself can be broken down into a clientcomponent, followed by a Virtual Filesystem component,followed by the path of the file on the agent’s system. Notethat the GUI refers to the version of the file at a specific timeand the attributes are also listedwith their own timestamps.By clicking the selectors in the “Age” column, the user caneasily switch between viewing different versions of this file.

Froma triaging point of view, the systemmust be capableof handlingmultiple versions for each piece of data since thesamemachinemay be examinedmultiple times for differentinvestigations at different times. The triage process musttherefore be able to account for older versions of the samefile – for example, a specific cache history artifact was foundat one time instance but not at a later time instance.

3.2. Agent reliability and stability

The GRR system depends on the agent being availableand trusted on all enterprise assets in order to schedule large

scale triage hunts. Like all enterprise software, there arenon-security considerations for wide deployment, such assystem stability, resource consumption and licensing costs.

It is therefore extremely important to ensure that agentsuse as few resources and are as reliable as possible whenrunning on enterprise systems. While a significant level oftesting has gone into the agent, there are always situationswhich could cause unexpected extreme resource con-sumption. For example, a keyword term containing a reg-ular expression may have exponential running time on theagent (Cox, 2007), leading to uncontrolled resource usage.It is therefore imperative to provide guaranteed bounds onthe agent’s resource consumption, before it can be safelyinstalled on all corporate computing assets.

GRR uses a Nanny process to control agent memoryfootprint, and provides assurances about maximum agentresource utilization. The Nanny is a very small process(memory footprint around 200 kilobytes) which is regis-tered as a Windows Service. The Nanny is responsible forlaunching and monitoring the main GRR Agent. Once theagent is spawned, the Nanny constantly monitors for aheartbeat signal. If a heartbeat is not received during aconfigured time span, the Nanny kills the agent and restartsit after a small resurrection time.

The GRR Agent is specifically designed to expect everyaction to cause a crash. Therefore, the agent maintains atransaction log of actions it is currently performing. If thecrash occurs during the execution of an agent action, theagent is restarted by the Nanny. On start-up, the agentexamines the transaction log, and notices that the last ac-tion resulted in a crash. The agent then sends a failuremessage to the server to inform it about the crash.

This arrangement drastically improves the stability ofthe GRR system and its resilience to a number of errorconditions as described in the following sections.

3.2.1. Handling GRR agent crashesEven though the GRR Agent is very thoroughly tested,

sometimes crashes may still occur in the agent or perhapsin a third party library. A vivid example of this condition isthe case of loading amemory driver for memory analysis oracquisition. Although the memory driver is well tested,there are cases where memory imaging can lead to a bluescreen of death crash (BSOD). If the machine is rebooted,the agent will have to detect that a crash occurred, fail thememory analysis request, and inform the server (and ulti-mately the investigator) about the crash.

As a remote forensics tool, GRR interacts heavily withlow level third party libraries. For example, consider a GRRserver request for parsing the registry. If the third partyregistry parsing library contains a bug and crashes whenattempting to parse the registry file, the recovery process isas follows:

1. The server issues a request for the Agent to parse theregistry file.

2. The agent writes a transaction log with the currentlyrunning request.

3. The agent then calls the registry parsing libraryattempting to parse the file, but the agent crashes as aresult.

Fig. 2. Navigating the GRR VFS using the GUI.


4. The Nanny notices the child agent exited and restarts itafter a short time.

5. On startup, the agent examines the transaction log anddiscovers that it exited during a request. It then sends afailure message to the server for that request.

6. The server simply records the error message in therelevant flow and informs the user that parsing this filefailed. However, the agent remains available for furtheranalysis.

3.2.2. Agent memory footprintThe GRR Agent is a long running process which uses

many third party libraries. It is possible that some of theselibraries leak memory. However, as mentioned previously,it is crucial to ensure that the agent memory footprint onthe system is small.

The Agent maintains queues of incoming requests andoutgoing responses. As requests are queued up, the agent’smemory usage increases, however, when its memory usagerises above the soft limit, the agent refuses to accept newrequests. While processing the remainder of the incomingqueue might increase client memory for a short while,eventually client memory will begin to drop as its queuesempty.

Once the agent’s memory falls below the soft memorylimit, it can begin accepting new requests. This causes theclient to essentially track the soft memory limit level –going above it for a time, and falling below it for a while. Ifthere is a persistent memory leak however, the agent will

consume more and more memory and the memory usagewill eventually, even when queues are all empty, remainabove the soft limit. In this case, the agent will simplyvoluntarily exit. The Nanny will then restart the agent. Notethat in this case, no requests are lost and the agent simplycontinues where it left off. Since the agent maintains nolocal state, the agent restart does not affect any currentanalysis.

To demonstrate this in practice, we havemodified one ofthe GRR agents to leak memory at a constant rate and havemonitored its resource consumption during a large filetransfer. Fig. 3 illustrates how this modified agent managesits own memory footprint.

On the left side of the graph, the agent uses memory asit processes requests. The memory consumption spikes upas new requests are processed but stops shortly below the60 MB soft limit we have set for this experiment. However,as more and more memory is leaked, the soft limit isexceeded more frequently (around label (a)). As the agentsends more data to the server, some of this memory isreturned to the operating system and the agent falls belowthe soft limit.

At some point, the memory usage does not drop belowthe soft limit, even when there is nothing to send in theoutput queue. At this point, the agent simply exits. After aresurrection period (b), the nanny process restarts theagent, which immediately starts transferring the file again.

In addition to the soft memory limit, there is also a hardmemory limit configured. If an agent uses more than thisamount of memory, the Nanny process can forcibly

0 100 200 300 400 500Time (s)

0

10

20

30

40

50

60

70

80

90

MemoryUsage(MB)

Soft Limit

(a)

(b)

Client Memory Usage

Fig. 3. A footprint of an agent modified to leak memory.Fig. 4. A Find flow has terminated due to CPU quota exhaustion. The spec-ified CPU quota was 30 s, and the flow has run out of available CPU quota. Itis killed with the error message “CPU Quota Exceeded”.

Fig. 5. Scheduling hunts.


terminate the agent. Those two limits together effectivelyguarantee bounds on total memory consumption of theagent.

3.2.3. Excessive CPU utilization and hangsSometimes, unavoidably, the agent becomes unre-

sponsive while processing some actions. For example,attempting to access an NFS share which is unavailablemight block the agent in an IO call indefinitely. The GRRsystem is also capable of recovering from this condition asthe hung agent stops sending the Nanny the heartbeatsignals. After a while, the nanny process will kill and restartthe agent, which will then send a failure response to theGRR server’s request upon checking its transaction log.

Another problem is the total amount of system re-sources utilized during the execution of an analysis flow.For example, a keyword search might result in an unex-pected level of CPU usage if the files we intend to examineend up being large. So despite the search proceeding asnormal (i.e. the agent is continuing to send heartbeat sig-nals to the Nanny), the total resource utilization is stillunexpectedly large.

To control agent system resource usage, GRR imple-ments per flow resource quotas. As a flow progresses bysending requests to the agent, the agent reports theamount of resources utilized for each request (e.g. CPUutilization, bytes transmitted). The flow then tallies theseresources and shows the total amount of system resourcesused. When total resources utilized exceeds the allottedquota, the flow is forcibly terminated.

Fig. 4 shows an example of a flow that has been termi-nated because it has exhausted its CPU quotawhile runningon an agent.

3.3. Hunts

A GRR hunt is a mechanism for controlling the launch-ing, monitoring and aggregating of results from running aspecific flow on a large subset of agents registered in theGRR system. A hunt is the primary method for performingenterprise wide triaging.

The launching of a hunt is illustrated in Fig. 5. Deployedagents periodically check in with the GRR system, wherethey are examined by the Foreman component. The Fore-man has a set of rules which are compared against theagent’s known profile. For example, a rule can specify arequired operating system, usernames or hostname for theagent.

If a rule is found to have matched, the agent ID is passedto a Hunt flow for processing. The hunt then launches therequired analysis flow and tracks its progress. When thechild flow completes, its results are collected and recordedin a central location.

Investigators follow the following three phase processwhen issuing new hunts:

1. The planning phase: In this phase, the user designs thetype of analysis required to triage the system according tothe Triage process described in Section 1. An example forthis phase would be finding suitable keywords or searchingfor specific files. The analysis is then designed to single outthose systems which might contain important evidence orbe relevant to the investigation.

An important consideration here is to decide if imme-diate action is required in the event the system is found tocontain relevant evidence. Immediate actions mightinclude the collection and preservation of volatile evidencesuch as certain files or memory images, or the automatic

80

100Agent Coverage


notification to responders for urgent attention (e.g. toperform manual forensic disk acquisition).

The user must also decide if the hunt should run on allsystems or only on systems matching a particular profile.For example, the investigation may only be relevant tocertain operating systems or even operating system ver-sions. The user can then design a suitable Foreman rule toonly trigger hunt collection on the systems of interest.

2. The collection phase: In this phase, the hunt and itsrelevant Foreman rules are active. As clients check into theserver, the Foreman compares their profile against the ruleset and those agents which match the criteria are directedto the hunt. The hunt then launches flows on these agentsand maintains statistics on each client, as well as calculatesthe total resource usage of the entire hunt.

3. The analysis phase: In this phase, the user gathers datafrom the hunt flow and examines the results from thecollection phase. The user is then able to escalate furtheranalysis of triaged systems.

This general triaging procedure may be applied to awide range of typical investigation scenarios. The followingsections present some typical examples of applying thisgeneral methodology in practice.

4. Experiments

The following experiments were designed to show thecapabilities of GRR as well as understand the limitations ofenterprise wide triaging. Although we use the currentimplementation of GRR for these experiments, we expectsimilar general observations to be applicable to other en-terprise triaging tools. Indeed, as the GRR framework ma-tures and improves, we expect the absolute performancenumbers to improve somewhat, but we feel the followingdiscussion to remain relevant.

In order to emulate a typical GRR installation in acorporate environment we have installed the GRR agent ona large number of corporate workstations and laptops usedby enterprise employees (The experimental group). Spe-cifically, we show the usefulness of GRR for targeted tri-aging, anomaly detection through aggregate statistics,post-incident artifact detection, as well as the easy exten-sibility of triaging flows. We also measure server and agentresource consumption and show the effect of agent avail-ability on triaging hunts.

0 5 10 15 20 25

Time (h)

0

20

40

60

Agents%

Agents issued.Agents completed.

Fig. 6. Fraction of agents in the experimental group running the triage flowwith time. Initially the rate at which active agents process the triage huntdepends on system capacity, but after a while, the number of new agentsparticipating in the hunt depends on the rate at which these becomeavailable and online.

4.1. Drive by download

Consider a hunt designed to find potential downloadersof a malicious exploit from the Internet (This type of attackis often known as drive by download). For this example, weassume that analysis of the initial exploit indicates theexploit is only effective against Internet Explorer. We wishto find which corporate assets have been compromised,disrupt the attack and assess the overall damage.

Following the triaging procedure in Section 1, we candesign a hunt:

� Targets only Windows systems running Internet Ex-plorer – this is specified in the foreman rule set.

� Search for the malware URL in the browser historycaches. We decide to collect the browser cache andhistory files for triaged systems immediately, as there isa risk that the URL is flushed from the browser cache bythe time a responder is able to forensically image thesystem.

� We expect the search not to take longer than 1 CPUminute on the agent and transfer less than 100 Mb fromeach agent.

For this experiment, we launched the hunt on all Win-dows systems in the experimental group and recorded thenumber of systems completing the triage process. The re-sults are shown in Fig. 6.

As can be seen in the figure, the majority of the agentspick up the hunt in the first few minutes after it has beenstarted and after a run time of around 150 min the pro-gression of the hunt slows down considerably. The reasonfor this is that all the agents that were online at the time thehunt was started have all completed the hunt by that time.After this point, only clients that were offline before (theymight just be in a different time zone and therefore juststarting up as users start the workday) and just come backonline pick up the hunt.

Note that we stopped the hunt after 24 h. In a real worldscenario it might take much longer to reach 100% coveragesince some machines might not be online for an extendedperiod of time (e.g. laptops with users on vacation).

Fig. 7 shows the GRR GUI view of the total CPU andbandwidth transferred for each agent during this hunt. Thevariation in resource requirement can be attributed tovariation in Internet Explorer cache sizes between agentsystems. During this experiment, the average amount ofCPU seconds used by the agents was 0.70. However, 1.8% ofthe clients used more than 5 CPU seconds, and a few evenused more than 20 CPU seconds.

Fig. 7. Variation of CPU load in the drive by hunt.


4.2. Autoruns

Wide scale triaging brings the possibility of crosssectional analysis on enterprise systems. This type ofanalysis aims to detect anomalies in some systems bycomparing them to the entire population of similar sys-tems, and qualifying their variance from the norm (Thisanomaly may be attributed to a potential compromise).

As an example of this use case, we collected the Run/RunOnce registry keys from the entire Windows experi-mental group. Although not a complete location of mal-ware startup locations, this is a common mechanism formalware to persist across reboots. Typically, wheninspecting a single system it is difficult to ascertain if aspecific Run key starts a malicious binary without furtheranalysis. However, by comparing the total number of oc-currences across the entire fleet, we can isolate those keyswhich are less commonly used. We postulate that anoma-lous Run keys often result from malware infection.

Fig. 8 shows a histogram of all the different Run keys weencountered during the experiment. Note, that to generatethis histogram we ignored system specific values such asusernames and CIDs. As the figure illustrates, there aremany Run keys that launch legitimate binaries and arepresent on a high percentage of the investigated systems.One example for those common entries is the key “%pro-gramfiles%\windows sidebar\sidebar.exe /autorun” whichwas found on nearly every system we investigated. Otherexamples are common system drivers for printers or pop-ular software packages.

However, the figure also shows that there is a very longtail of keys that are only present on very few systems or

0 50 100 150 200 250 600

Runkey

1

10

100

1000

10000

#Entries(Log)

...

Runkey Histogram

Fig. 8. A histogram of run keys present on the client systems.

entirely unique. Specifically, 60% of the collected Run keyswere only present on a single machine. In our experiment, aquick manual investigation of just those keys identifiedmost of them as legitimate. Nevertheless, we alsoencountered interesting keys like “[.]/local/temp/good-luck (2).exe” that seem anomalous and warrant furtherinvestigation.

4.3. Artifact detection

Commonly after an intrusion, a similar backdoor isinstalled on a large number of systems to provide theattacker with the means to re-compromise the system inthe event it was cleaned up. Typically, this backdoor can beachieved by changing the system’s configuration orinstalling a Remote Access Tool (RAT).

In this experiment, we simulated a security incidentthat affected a small number of our experimental testbedmachines. Specifically, we changed a configuration file forthe cron service on a random sample of around 1% of thesystems in a way malicious programs would. We then usedthe GRR system to check all the machines for the contentsof this specific file using a “Find” flow using a regularexpression to search for the modification.

The goal of this experiment was to assess how quicklysimulated malicious modifications can be detected in theexperimental setup and the total time it takes to find allsuch instances.

Fig. 9 shows the results of this experiment. As in pre-vious experiments, a large fraction of the artifacts arediscovered in the first hour after scheduling the hunt. Afterthat the detection rate slows and gradually approaches100%. In a real world incident, somemachines might not beonline for prolonged periods so a 100% detection ratemightnot be achieved for a long time. The GRR hunt will, how-ever, just continue running (given a high enough expirationtime) so if those missing machines come back online at amuch later time, the artifacts will still be detected and analert can be raised.

The resource usage statistics give a clear view of theagent impact for this hunt. On average, each agent used less

0 2 4 6 8 10 12 14 16Time (h)

0

20

40

60

80

100

Detection%

Artifact Detection

Fig. 9. Percentage of artifacts detected over time.


than 0.01 CPU seconds to complete the action for thisexperiment (with a standard deviation of only 0.008 CPUseconds) and sent less than 100 bytes over the network.This more uniform level of resource usage can be attributedto smaller variance in the sizes of system configurationfiles.

4.4. A simple GRR triaging flow

Memory analysis represents one of the most volatilesources of evidence. In this example, we simulate a hunt tosearch agent memory for a known unique string. If thestring is found, we take the memory image using Vola-tility’s crash dump writing support (Various, 2012b), andsubsequently copy the image to the server for evidentiarypreservation.

To achieve this, we wrote a customized flow reusingexisting components. Fig. 10 shows a code snippet of thisflow. The flow consists of 3 distinct states. While the agentis performing the work, the flow can be serialized on theserver, not consuming server resources.

The Start state (Line 2) is the initial state of the flow. Itissues a Grep subflow to detect the pattern in the physicalmemory of the system. The results of this flow are thenexamined in the ExamineResults state (Line 11). If thepattern is found, the Volatility raw2dmp plugin is invokedto quickly save a copy of the live memory to the local disk,and then a GetFile flow is issued to download this file to theserver and preserve it, forming a chain of custody. Alter-natively, the GetFile flow can be used to copy memory overthe network without locally storing it, but this might takelonger and thus result in a more “smeared” image.

5. Discussion

We have chosen the experimental conditions to berepresentative of a typical enterprise hunt for a pro-activeincident response team. As illustrated in Fig. 6, the

Fig. 10. A GRR flow to search for memory s

response time of the system depends heavily on the usagepattern of the systems. For agents running on laptops andother roaming systems, usage patterns show clear timeswhere the system is not available. Beyond the scalability ofthe server itself, this non-availabily is a factor delaying thespeed of response. A significant number of systems are notavailable at any particular time (e.g., users in a differenttime zone) and some systems remain unavailable for anextended period (e.g., users on vacation).

The response curve in Fig. 6 can be divided into threephases. Within approximately an hour from the start of thehunt the response curve is very steep, as the active agentsare added to the flow. However, after that time, theresponse curve becomes much flatter, as the only newsystems added to the hunt are those which were notinitially active. Finally the curve flattens off, as most sys-tems in the fleet have participated in the hunt and newsystems become rarer.

Since systems are added to each hunt as they becomeavailable, increasing system capacity only has the effect ofcompressing the first phase (i.e. handling the initial loadpresented by currently available agents). Once the initialload is handled, we must wait for systems to becomeavailable, and an increase of the GRR system capacity is notexpected to speed up the hunt progression.

When considering the amount of resource use the driveby download hunt elicited in Fig. 7, we note that mostagents did not spend much time searching the InternetExplorer cache. However a few agents spent a significantamount of time performing this action, perhaps due tounexpectedly large cache configurations. This variationhighlights the need for guaranteed bounds on resourceutilization as even simple actions that usually do not resultin excessive load utilization can cause unexpected load onsome systems.

Section 4.2 explores some new possibilities whichbecome available when employing a wide reaching enter-prise triaging system, such as GRR. The ability to compare

ignatures and take memory images.


the running configurations of similar systems concurrentlypromises to reveal anomalous configurations. Indeed thecross sectional correlation anomaly detection is effective inisolating configuration changes which are not common, andtherefore potentially suspicious. However, the experimentalso highlights inherent difficulties with this approach –namely that the variations form a very long tailed distribu-tion. The manual examination of every system with apotentially unique autorun key is still time consuming. Po-tential variations of this approach might involve retrievingthe executable file with GRR and post processing it using anup to date virus scanner or other binary analysis engine.

6. Conclusions

Triaging is essentially an optimization using scarce re-sources more efficiently for the analysis of systems whichare more likely to be interesting for an investigation. Thegoal of a triaging procedure is simply to ascertain whetherthe system is likely to be of interest, and how urgentlytraditional forensic acquisition techniques must be appliedin order to maximize the evidentiary value of the system.

Since the burden of proof is far lower for a triage anal-ysis, it is possible to automate this analysis heavily. Auto-mation allows us to apply triaging to many more systems,resulting is a wider net. Automation also assists in pro-tecting the privacy of individuals who are found to not berelevant for the investigation.

GRR is an enterprise remote forensic platform which isconcentrated on forensic analysis and acquisition. Bylowering the cost of forensic analysis on each system,we canuse GRR to perform enterprise wide triaging procedures.

GRR is suited to runwidely on enterprise systems due tosafety mechanisms built into the agent, allowing analysis tobe run confidently on many systems, without the risk ofadversely affecting operational systems.

References

Bilby D, Caronni G, Cohen MI, Metz J, Moser A, Sanchez J. GRR RapidResponse. http://http://code.google.com/p/grr/; 2012.

Brezinski D, Killalea T. Rfc 3227: guidelines for evidence collection andarchiving. Internet Engineering Task Force; 2002.

Cantrell G, Dampier D, Dandass Y, Niu N, Bogen C. Research toward apartially-automated, and crime specific digital triage process model.Computer and Information Science 2012;5(2):29.

Casey E. Investigating sophisticated security breaches. Communicationsof the ACM 2006;49(2):48–55.

Chang F, Dean J, Ghemawat S, Hsieh W, Wallach D, Burrows M, et al.Bigtable: a distributed storage system for structured data. ACMTransactions on Computer Systems (TOCS) 2008;26(2):4.

Chow K, Shenoi S. In: Advances in digital forensics VI: sixth IFIP WG 11.9international conference on digital forensics, Hong Kong, China,January 4–6, 2010. Revised selected papers, vol. 337. Springer; 2010.

Cohen M, Garfinkel S, Schatz B. Extending the advanced forensic formatto accommodate multiple data sources, logical evidence, arbitraryinformation and forensic workflow. Journal of Digital Investigation2009;6:S57–68.

Cohen M, Bilby D, Caronni G. Distributed forensics and incident responsein the enterprise. Journal of Digital Investigation 2011;8:S101–10.

Cox R. Regular expression matching can be simple and fast (but is slow injava, perl, php, python, ruby, .). URL, http://swtch. com/wrsc/regexp/regexp1.html; 2007.

Endicott-Popovsky B, Frincke D, Taylor C. A theoretical framework fororganizational network forensic readiness. Journal of Computers2007;2(3):1–11.

F-Response. F-response enterprise edition. URL, http://www.f-response.com/; Feb 2013.

Farmer D, Venema W. Forensic discovery, vol. 18. Addison-Wesley; 2005.Guidance Software Inc.. Encase enterprise. URL, http://www.guidance

software.com/encase-enterprise.htm; 2012a.Guidance Software Inc.. Encase portable kit. URL, http://www.guidance

software.com/encase-portable.htm; 2012b.Hogan D, Burstein J. Disaster medicine. Lippincott Williams & Wilkins;

2007.Iserson K, Moskop J. Triage in medicine, part i: concept, history, and types.

Annals of Emergency Medicine 2007;49(3):275–81.Jones B, Pleno S, Wilkinson M. The use of random sampling in in-

vestigations involving child abuse material. Digital Investigation2012;9:S99–107.

Khurana H, Basney J, Bakht M, Freemon M, Welch V, Butler R. Palantir: aframework for collaborative incident response and investigation. In:Proceedings of the 8th symposium on identity and trust on theinternet. ACM; 2009. p. 38–51.

Lim K, Lee S, Lee S. Applying a stepwise forensic approach to incidentresponse and computer usage analysis. In: Computer science and itsapplications, 2009. CSA’09. 2nd International conference on. IEEE;2009. p. 1–6.

Mitropoulos S, Patsos D, Douligeris C. On incident handling andresponse: a state-of-the-art approach. Computers & Security 2006;25(5):351–70.

Richard III G, Roussev V. Next-generation digital forensics. Communica-tions of the ACM 2006;49(2):76–80.

Rogers M, Goldman J, Mislan R, Wedge T, Debrota S. Computer forensicsfield triage process model. In: Proceeding of the Conference on DigitalForensics Security and Law; 2006. p. 27–40.

Schuster A. The impact of Microsoft Windows pool allocation strategieson memory forensics. Digital Investigation 2008;5:S58–64.

Supreme Court of United States, 1963. Brady v. Maryland 373 US 83.Tan J. Forensic readiness. Cambridge, MA: Stake; 2001.Various. Mongodb. URL, http://www.mongodb.org/; 2012a.Various. The volatility framework. URL, https://code.google.com/p/

volatility/; 2012b.Vömel S, Freiling F. A survey of main memory acquisition and analysis

techniques for the windows operating system. Digital Investigation2011;8(1):3–22.

Walters A, Petroni N. Volatools: integrating volatile memory into thedigital investigation process. Black Hat DC 2007; 20071–18.

http://http://code.google.com/p/grr/http://swtch.%20com/%7ersc/regexp/regexp1.htmlhttp://swtch.%20com/%7ersc/regexp/regexp1.htmlhttp://swtch.%20com/%7ersc/regexp/regexp1.htmlhttp://www.f-response.com/http://www.f-response.com/http://www.guidancesoftware.com/encase-enterprise.htmhttp://www.guidancesoftware.com/encase-enterprise.htmhttp://www.guidancesoftware.com/encase-portable.htmhttp://www.guidancesoftware.com/encase-portable.htmhttp://www.mongodb.org/https://code.google.com/p/volatility/https://code.google.com/p/volatility/

Hunting in the enterprise: Forensic triage and incident response1. Triage in digital forensics1.1. Privacy considerations1.2. Exculpatory evidence

2. Triaging and incident response3. The GRR Rapid Response system3.1. AFF4 and data modeling3.2. Agent reliability and stability3.2.1. Handling GRR agent crashes3.2.2. Agent memory footprint3.2.3. Excessive CPU utilization and hangs

3.3. Hunts

4. Experiments4.1. Drive by download4.2. Autoruns4.3. Artifact detection4.4. A simple GRR triaging flow

5. Discussion6. ConclusionsReferences

Hunting in the enterprise: Forensic triage and incident ... · Digital forensics is increasingly...

Documents

Transcript of Hunting in the enterprise: Forensic triage and incident ... · Digital forensics is increasingly...