An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

26
An Automated Timeline Reconstruction Approach for Digital Forensic Investigations Written by Christopher Hargreaves and Jonathan Patterson Presented by Jason McKenzie November 8 th , 2013

description

An Automated Timeline Reconstruction Approach for Digital Forensic Investigations. Written by Christopher Hargreaves and Jonathan Patterson Presented by Jason McKenzie November 8 th , 2013. Introduction. - PowerPoint PPT Presentation

Transcript of An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Page 1: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

An Automated Timeline Reconstruction Approach for Digital Forensic InvestigationsWritten by Christopher Hargreaves and Jonathan Patterson

Presented by Jason McKenzie

November 8th, 2013

Page 2: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Introduction

Reconstruction: a process in which an event or series of events is carefully examined in order to find out or show exactly what happened (Merriam Webster)

Provenance: the origin or source of something

Low-level PC event: File modification, registry key update

High-level PC event: Connection of a USB device, like a USB stick

Goal: Construct a software prototype using Python to automatically reconstruct a timeline of events using low-level events to infer high-level events and their provenance

Page 3: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Background

Reconstruction is an essential aspect of digital forensics

Key challenge in digital forensics is the large volume of information that needs to be analyzed

Population owns an increasing number of digital devices

There are tools present that automate the extraction process of a digital investigation, and are useful for examining events that have occurred

There is a demand for explaining the sequence of digital events, and a tool to automatically reconstruct the events and produce a timeline is needed

Page 4: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Related Work

Related work is comprised of solutions that incorporate some form of timeline generation (non automatic)

Timelines based on file system times

Uses metadata from file systems to create a timeline

Modified, Accessed, and Created (MAC) times

The Sleuth Kit generates timeline from file activity

Encase creates graphical “Timeline” view

Times that the contents of files are examined are not captured in metadata and presents a limitation

Page 5: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Related Work (continued)

Timelines including time from inside files

Cyber Forensic Time Lab (CFTL)

Extracts system times from FAT and NTFS hard drives and some file types

Has incomplete source information of extracted events

Log2timeline

Has several enhancements and options that when combined could produce a timeline

Carbone and Bean addressed the need for a rich, event filled timeline in their paper “Generating computer forensic super-timelines under Linux” in 2011

Key to creating an event filled timeline is to capture more event times

Page 6: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Related Work (continued)

Visualizations

Encase

Visual Timeline

Zeitline

Imports file system times from other programs through the user of Import Filters

Complex events: events directly imported from system

Atomic events: comprised of atomic and other complex events.

Allows for filtering, searching, and combination of atomic into complex events

Aftertime

Performs enhanced timeline generation

Visualizes results as a histogram

Page 7: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Related Work (continued)

Summary

Importance of recovering times from inside files and using file system metadata

Two key challenges:

Too many events to effectively analyze

Difficult to visualize what is going on in the timeline due to the number of events

Highlighting patterns of activity to indicate areas of interest and maintaining records of source of extracted data is important

Page 8: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Methodology

As expressed previously, large volume of events creates a problem for analysis and an inability to visualize the timeline

To counteract this, an approach to automate the process of combining “low-level” events, into “high-level” events is being researched

By automating the conversion of low-level to high-level events a summary of activity would be produced that would help direct the investigation

To facilitate this, a software prototype was constructed

Page 9: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Methodology (continued)

Should frameworks be expanded to accommodate a timeline reconstruction system?

Would take extensive work to build upon an existing framework, like log2timeline

Best to implement a new framework without having to adjust data structures or adjust for legacy languages

Python 3 is chosen for this project due to readability of code

Page 10: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Design

Overall design

Python Digital Forensic Timeline (PyDFT)

Supports low-level event extraction and high-level event reconstruction

Also supports case management, conversion of different formats for date and time, and basic GUI’s

Page 11: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Design (continued)

Generation of low-level events

Overview

Low-level events are file system times and times extracted from within files

Analysis is performed on a mounted file system NOT a disk based image

Recommended approach is to mount disk image in read-only mode using Linux or Mac OS X

Extraction of file system times

Master File Table ($MFT)

Accessed directly on Linux or Mac OS X using NTFS driver from Tuxera

Created, modified, accessed, and entry modified times from Standard Information Attribute are used to build four events for reach file

Page 12: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Design (continued)

Generation of low-level events (continued)

Times from inside files

Extraction Manager calls GetTimesFromInsideFiles() for any files mounted in the file system and checked for time extractors

If found, extracts information from file pointer, file name, file path

Any time information extracted is added to low-level timeline

Time extractors used are browsing history found in Chrome, Firefox, Internet Explorer; Skype, Windows Live Mail, etc.

Page 13: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Design (continued)

Generation of low-level events (continued)

Parsers and bridges

Parsers: process raw data structures and recover data in a useable form

Bridges: takes information from parsers and maps it to a low-level event object

Design approach makes it easier to accommodate new parsers, and code in the parsers easier to reuse

Page 14: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Design (continued)

Generation of low-level events (continued)

Traceability

If extractor returns a low-level event, it also points to the raw data that produced the event.

Different types of provenance based upon event

Low-level event format

Different events have different provenance and have different fields

Id, date_time_min, date_time_max, evidence, provenance, etc.

Page 15: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Design (continued)

Generation of low-level events (continued)

Backing store for the low-level timeline

A back-end storage is required due to the use of Python classes

SQLite chosen as the backing store and allows for multiple advanced queries

Summary

Extraction manager extracts low-level events that are converted to a standard format and added to timeline

Timeline stored in SQLite

Fields like date/time, provenance, and information about the raw data

Page 16: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Design (continued)

Reconstruction of high-level events

Overview

Use of predetermined rules using plug-in scripts to automatically convert low-level events to high-level events

Basic event matching using test events

SQLite requires knowledge of SQL

By creating a test event with all the conditions of the low-level event it’s possible to add events to the high-level timeline without extensive knowledge SQL queries

Comparison match (not exact match) with test events and low-level events

Matching field values can produce SQL searches for those fields and then create high-level events

Page 17: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Design (continued)

Reconstruction of high-level events (continued)

Matching multiple artefacts

“Test events” serve as triggers and any matches are used to construct a hypothesis of a high-level event

Low-level timeline created in memory for a specific period determined by the analyzer

Analyzer searches for all low-level events occurring in this period

If matches are found are considered supporting artefacts

If matches are not found are considered contradictory artefacts

One ore more high-level events created based upon these artefacts

Page 18: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Design (continued)

Reconstruction of high-level events (continued)

High level event format

Similar to low-level event format

Includes files, trigger_evidence_artefact, supporting_evidence_artefact, contradictory_evidence_artefact

High-level timeline output

Not stored in SQLite

Exports to XML and individual high-level event HTML reporting

Page 19: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Design (continued)

Reconstruction of high-level events (continued)

Summary

Searching timeline through the use of “test events” that have similarities to desired low-level events

One or more match leads to one or more high-level event

Since low-level event information is preserved, it can still point to the raw data that generated the low-level event

Produces two timelines

Low-level event timeline (not very readable)

High-level event timeline (human readable)

Page 20: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Results

Examples of high-level events constructed

Google searches

11:28:30 Google search for ‘how to hack wifi’

USB device connection

“Setup API entry for USB found (VIBL07AB PID:FCF6 Serial:07A80207B128BE08)”

Page 21: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Results (continued)

Visualization

Since there are usually not a large amount of high-level events it’s possible to use a third-party program like Timeflow to display them graphically

In the high-level timeline below there are 2894 low-level events that have occurred (obviously not displayed)

Page 22: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Results (continued)

Performance

Calculations based on Intel Core 2 Duo 2.28-28 GHz and 4-8GB of ram

1 Million events, ~2min per analyzer, 22 analyzers = 44 minutes to process 1 million events

Equivalent to other indexing or searching forensics tools (“start search and walk away”)

No plans to optimize performance

Page 23: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Evaluation

Results section reinforces that the use of “test events” matching low-level events, which is considered “temporal proximity pattern matching”, is effective at creating high-level events automatically

Need to develop more analyzers and time extractors to further reinforce feasibility of “temporal proximity pattern matching”

Need to implement low-level extractors that are currently not available for some aspects of the disk like Recycle Bin

Need to determine if keeping high-level provenance of information is required since the associated low-level provenance is preserved

Page 24: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Evaluation (continued)

Although performance is within limits compared to other forensics tools a bottleneck exists due to each analyzer searching through the timeline linearly for patterns

More analyzers means a greater bottleneck

Needs optimization for multi-core processors

Optimization of SQLite secondary indexing could improve performance

Need to implement a way of verifying target PC’s clock is correct

Need more robust testing of the prototype

Page 25: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Future work

Creation of more low-level event extractors

Creation of more analyzers

Formalizing low-level event information

Inputting data from other tools

Testing of framework against real world data

Adding complexity to analysis scripts, such as Bayesian networks

Development of more robust visual data tools for timelining

Page 26: An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

Conclusions

Illustrates possibility of pattern matching to automatically reconstruct high-level human-understandable events which then creates a readable visualization of the timeline

Preserves provenance of low-level events

Not to be used to replace a full forensic analysis by an experienced, trained analyst