The Extension and Customisation of the Maltego Data-Mining ...The Extension and Customisation of the...

The Extension and Customisation of theMaltego Data-Mining Environment into

an Anti-Phishing System

Submitted in partial fulfilment

of the requirements of the degree of

Bachelor of Science (Honours)

of Rhodes University

Matthew Marx

Grahamstown, South Africa

November 2, 2014

Contents

1 Introduction 1

1.1 Problem Statement and Research Goals . . . . . . . . . . . . . . . . . . . . 1

1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 4

2.1 History and background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Phishing and Pharming . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.2 The Anatomy of a phishing attack . . . . . . . . . . . . . . . . . . 5

2.2 The cost of a phishing attack . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Phishing and Data Breaches . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Online Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 ICANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.2 WHOIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.3 Certificate Authorities . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.3.1 The role of Certificate Authorities in Phishing and Anti-

Phishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.4 PhishTank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1

CONTENTS 2

2.4 Types of phishing attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.1 Clone Phishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.2 Tabnabbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4.3 Spear Phishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Anti-Phishing methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5.1 Anti-Phishing Collectives . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5.1.1 Anti-Phishing Work Group . . . . . . . . . . . . . . . . . 18

2.5.1.2 US-CERT . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5.1.3 PhishTank . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5.2 Website take down . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5.3 Browser Anti-Phishing mechanisms . . . . . . . . . . . . . . . . . . 19

2.5.3.1 Anti-Phishing Heuristics . . . . . . . . . . . . . . . . . . . 20

2.5.3.2 Phishing Blacklists . . . . . . . . . . . . . . . . . . . . . . 20

2.5.4 Email Filtering and Content Filtering . . . . . . . . . . . . . . . . . 20

2.6 Abuse Reporting Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6.1 Blacklisting services . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6.2 Placing an abuse report with domain registrar . . . . . . . . . . . . 21

2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Design 23

3.1 System Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Underlying Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.1 Maltego . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

CONTENTS 3

3.2.2 Maltego Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2.3 Programming Languages . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.4 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3 Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3.1 Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.2 Email Address: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.3 IPv4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.4 Email Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.5 Abuse Report Email . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.6 EmailSourceDirectory . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.7 Potential Phishing URL . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.8 Suspicious Email Address . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.9 Phishing Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3.10 Confirmed Phishing URL . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3.11 Phishing Kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4.1 Verify Phishing Link . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4.2 Generating Abuse Report Emails . . . . . . . . . . . . . . . . . . . 38

3.4.3 Directory Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4.4 Link Extraction and analysis . . . . . . . . . . . . . . . . . . . . . . 39

3.4.5 WHOIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.5 Automating the process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

CONTENTS 4

4 Case Studies 43

4.1 An attack launched from a compromised server . . . . . . . . . . . . . . . 43

4.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1.2 Exploration and Fingerprinting . . . . . . . . . . . . . . . . . . . . 44

4.1.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 Correlating relationships between larger data-sets . . . . . . . . . . . . . . 48

4.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3 Automated monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5 Conclusion 55

5.1 Analysis of Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.2.1 Introducing additional online services . . . . . . . . . . . . . . . . . 56

5.2.2 Extension into analysis of attachments . . . . . . . . . . . . . . . . 57

5.2.3 Reporting Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2.4 Tool Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

References 59

A Appendix 62

List of Figures

2.1 The mechanics of a Phishing attack . . . . . . . . . . . . . . . . . . . . . . 6

2.2 A Typical Phishing URL . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 An example of a certificate verifying the identify of an online service . . . . 13

2.4 A typical phishing email . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.5 Tabnabbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6 Mechanics of a spear-phishing attack . . . . . . . . . . . . . . . . . . . . . 17

3.1 Creating a new graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2 Running a transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 The transform produces a new IPv4 entity . . . . . . . . . . . . . . . . . . 27

3.4 A more complex set of entities and relationships . . . . . . . . . . . . . . . 28

3.5 Block Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.7 Circular Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.6 Hierarchical Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.8 Regular expression used to extract links and URLs . . . . . . . . . . . . . 40

3.9 Regular expression used to email addresses . . . . . . . . . . . . . . . . . 40

4.1 Creating an email source entity . . . . . . . . . . . . . . . . . . . . . . . . 45

5

LIST OF FIGURES 6

4.2 Analysis of the emailSource entity . . . . . . . . . . . . . . . . . . . . . . . 46

4.3 Exploring the domain involved in the attack . . . . . . . . . . . . . . . . . 46

4.4 http://www.venisetours.com . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.5 The redirected page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.6 The redirected page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.7 Multiple emails represented in Phishtego . . . . . . . . . . . . . . . . . . . 49

4.8 Closed Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.9 Related Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.10 Related Attacks with Malicious Links reported . . . . . . . . . . . . . . . 51

4.11 Closed Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.12 Closed Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.13 Automated email retrieval and transforms . . . . . . . . . . . . . . . . . . 54

List of Tables

3.1 Maltego : Minimum Hardware Requirements . . . . . . . . . . . . . . . . . 25

3.2 A Summary of Phishtego Entities . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Verify Phishing URL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4 Generating abuse report emails . . . . . . . . . . . . . . . . . . . . . . . . 38

3.5 Monitoring a local directory for emails . . . . . . . . . . . . . . . . . . . . 39

3.6 Link extraction and analysis . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.7 WHOIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

7

Abstract

Phishing attacks prove to remain one of the most serious threats to data assets. In

particular, the ease and lack of cost associated with setting up and running a successful

attack mean that there is no substantial barrier to entry into the phishing world. One

of the most important means of understanding and combating a phishing attack is to

fingerprint the attack by extrapolating information contained in a phishing email. This

includes a substantial amount of information that is contained in the emails headers that

is often ignored in the viewing of an email. This project looks to provide an extension

to the Maltego framework to provide exploration and reaction to a phishing campaign.

In doing so it provides abuse reporting mechanisms and integration with both Google’s

SafeBrowsing and the Phishtank API.

ACM Computing Classification System Classification

Thesis classification under the ACM Computing Classification (2012 version, valid through

2014)

I.4.3.2 [Security and privacy]: Phishing

M.2.6.1 [Social and professional topics]: Computer Crime

General terms: Phishing, Abuse Reporting, Attack Fingerprinting

Acknowledgements

This work was undertaken in the Distributed Multimedia CoE at Rhodes University, with

financial support from Telkom SA, Tellabs, Genband, Easttel, Bright Ideas 39, THRIP

and NRF SA (TP13070820716). The authors acknowledge that opinions, findings and

conclusions or recommendations expressed here are those of the author(s) and that none

of the above mentioned sponsors accept liability whatsoever in this regard. I would like to

thank everyone that has supported me in this long journey. In particular I’d like to thank

the following people. Professor Barry Irwin for his outstanding guidance, resources and

valuable input during the writing of this thesis. I’d like to thank my parents, Paul and

Cathy Marx for their input and support both financially and otherwise and for allowing

me the opportunity to attend such a prestigious university. MWRInfoSecurity for their

support financially as well as with funding and supporting other independent research

throughout the course of this year.

Chapter 1

Introduction

The ushering in of the Digital Age has presented the world with previously unimagined

inter-connectivity, sharing of information and the ability to process enormous volumes

of data very quickly. However, the developments bring with them new challenges. One

of the challenges at the forefront of this development is the need to store sensitive data

safely. Data assets are as important as physical assets to most companies and protecting

these is complex and difficult. One of the most serious threats posed by malicious external

actors to a companies assets is the attack on the individuals within the company in order

to procure information out of the business or institution that will result in gaining access

to data assets. Phishing is an example of this kind of attack.

Financial loss due to phishing campaigns stands to be one of the more worrying con-

cerns within information security. This is largely due to two characteristics of a phishing

campaign. The first characteristic is that a successful phishing campaign requires very

little technical know how to conduct. The underground black market sells hundreds of

variations of pre-made phishing kits that can be purchased and require very little effort to

setup to a functional state. The second overarching concern is how successful such attacks

are. Given the relative ease and low cost involved in setting up a phishing campaign the

success rate at which credentials are gathered is economically viable for an attacker.

1.1 Problem Statement and Research Goals

This situation places the people responsible for defending their data assets at a disad-

vantage. The ability to better track, correlate and gather information in an automated

1

1.2. SCOPE 2

fashion around potential phishing campaigns would no doubt stand the person responsible

for this in better stead from the point of view of being able to make informed decisions

around the attack, including counter measures and reporting mechanisms available.

As companies and organisations expand, there is an increasingly likely chance that they

will hold some form of data asset. The asset itself could potentially be one of many kinds

of data ranging from intellectual property to credit card information. As the stockpile

of data assets grows, the company becomes increasingly attractive to attackers. The

research will look primarily into developing a piece of software that can be used to analyse

and explore phishing campaigns with the hope of providing useful information that will

eventually inform decisions that are made regarding counter measures around phishing

campaigns. The goal is not to produce a one size fits all solution for monitoring and

tracking a phishing campaign but rather provides what in general would prove to be

useful in gathering intelligence around an attack, with the hope of informing counter

measures. Visualisation plays an important role in providing useful and manageable

intelligence gathered from available data in terms of creating something that is useful and

understandable when considering the sheer volume of data that can be derived from an

attack. Bearing this in mind, the project is set out to achieve specifically three primary

goals:

• Create a system that models phishing attacks that can be deployed locally on a

machine

• Produce meaningful information from the large volumes of raw data that can be

gathered from a phishing campaign

• Provide a means of facilitating decision making around reacting to phishing cam-

paigns and automating this response where possible

1.2 Scope

Outlining the scope of the project proves is an important part of the project as the scope

has potential to grow quickly and often needlessly given the volume of information that can

be derived from an email. This is largely due to the sheer number of different technologies

involved in performing even a relatively simple information gathering procedure on some

information in an email. As such, the scope is limited by the functionality that the

project can realistically look to deliver. There is ultimately little intelligence built into

1.3. DOCUMENT STRUCTURE 3

the application that looks to make macro decisions about what the best ways are to react

to a given attack that take the whole of the situation into account. Instead, the idea is

to provide the user with as much useful information as possible with the aim of leaving

decisions in the hands of the user who is better informed to act intelligently on an attack.

1.3 Document Structure

This thesis explores the development of a system and as such is structured in the following

manner:

• Chapter 2 serves as an introduction to the relevant areas of research and technologies

that are addressed by the project.

• Chapter 3 explains the design and development of the platform. It contains an

outline of all of the necessary dependencies and software required for the project.

• Chapter 4 contains a number of case studies that look to present the usefulness and

validity of the system in tracking and describing a phishing attack.

• Chapter 5 contains a conclusion and ideas for future work and summarises results

and the system.

It is important to note that this paper makes significant use of technical reports and

whitepapers. This is largely because annual statistics and global reports are often pub-

lished as whitepapers and technical reports by companies. Additionally, many of the

interactions between systems and existing API’s are done at a non-academic level or at

least are not documented widely at an academic level which means that technical reports

are often among the latest and best documented approaches to anti-phishing techniques.

Additionally, some of the more interesting counter measures proposed in academic papers

that are a year or older are already being circumvented by current phishing techniques

such as bypassing filters proposed outlined by fet (2007) including content as attachments

for example and so often recently published technical reports are in some senses more

valuable.

Chapter 2

Background

“Phishing is a form of deception in which an attacker attempts to fraudulently acquire

sensitive information from a victim by impersonating a trustworthy source” (Tom Jagatic,

2007). This definition though broad, serves as a good starting point into the exploration of

phishing attacks. Phishing as a practice is by no means a new attack vector employed by

cyber attackers but it has certainly developed in new and increasingly complex ways. Of

particular concern is that there is very little in terms of expertise, technology or manpower

required in order to carry out a successful phishing attack. This means that very little

outside of a list of email addresses and freely available software is needed to carry out a

fully fledged and successful phishing attack.

2.1 History and background

Attackers have for a long while attempted to elicit information out of users in the form of

their passwords or usernames for as long as we have used them as a form of authorisation

mechanism. Initially, this was largely conducted through social engineering. In the 1990s

with the explosion of interconnected networking and the internet, there was a definite

move by attackers away from effort intensive social engineering toward and toward an

automated system that attacked the mass consumer market (Watson et al., 2005). The

combination of social engineering coupled with these technological advances has created

what we refer to today as phishing.

4

2.1. HISTORY AND BACKGROUND 5

2.1.1 Phishing and Pharming

Phishing and Pharming are terms that are often found together in literature on the matter.

For the sake of clarity it might help the reader to define the distinction between the two

practices before continuing.

Phishing Is the merger between social engineering attacks and technological advances

and manifests itself as an attempt to draw information out of an unsuspecting user

which are typically usernames, passwords or other private information that can be

used to perform fraudulent actions under the guise of the user. They are typically

carried out through bulk emailing.

Pharming Pharming attacks ultimately look to achieve the same end result as phishing

attacks but are typically more complex and technical in nature than phishing. There

are several possible means through which attackers pharm a victim but the majority

rely on manipulating the mechanism a computer uses to resolve a domain name to

an IP address(DNS) and substitute a fraudulent IP with that of the IP actually

associated with the domain name. Delving into the full complexities of pharming

are beyond the scope of this discussion but it will be sufficient to have the current

level of understanding in order to understand the rest of the paper (Karlof et al.,

2007).

2.1.2 The Anatomy of a phishing attack

The attack typically begins with a large number of spam messages sent over various

mediums to targets which include content that aims to entice a user into following a

link contained in the message to a website that the attacker is in control of. Mediums of

communication that attackers use include phishing attacks via SMS and telephonic attacks

which are also referred to as SMShing and vishing. The message that is distributed is

usually crafted to resemble closely, if not identically an authentic message the authority

they are impersonating is likely to send. Typically, the message will contain something

that requires that user’s immediate attention such as an “impending account suspension,

a payment for a marketing survey or a report of a transaction that the user will know to

be fake and therefore want to be cancelled” (Moore & Clayton, 2007b).

In a successful attack the user then connects to the URL supplied in the message is

directed to the website under the attackers control as is indicated by stage 1 in figure 2.1.

2.1. HISTORY AND BACKGROUND 6

Figure 2.1: The mechanics of a Phishing attack

Notably, at this stage in the process, browsers will typically employ a number of blacklist

consultations and apply heuristic checks against the URL the user has requested to open

against large public databases that are often crowd sourced. Next, the user will typically

be met with a website that looks identical to the site the attacker looks to impersonate.

This leads the user to suspect nothing to be out of order and to proceed with entering

personal information into the website as is demonstrated in stage 2 and 3 of the attack.

The credentials are often stored locally on the website itself to collect at a later stage

or sometimes mailed to an email address owned by the attacker. The website originally

used in an attack is on occasion hosted on a free webspace where anybody can register

an account and upload data. Other times, the attacker uses a hijacked machine that

he acquired previously though a security vulnerability. Stage four and five involve the

attacker using the newly acquired credentials to access the users account and steal money.

The URL of a page of a phishing page is typically constructed to look similar to the

domain name of the body being impersonated in the attack. An example of such a

URL often closely resembles http://www.impersonatedname.freehostprovider.com/

passwordreset which at first glance appears to the average user to be close enough

to the URL the user might expect to see if it came from a legitimate source and so is

successfully ‘phished’ (Moore & Clayton, 2007b). There are other interesting mechanisms

that attackers are currently using with the aim of bypassing traditional filters and to

deceive the user which are beyond the scope of this discussion at present but which can

be found in research conducted by Garera et al. (2007).In this example, the name of the

impersonated entity would replace the impersonatedname portion of the URL and the

name of the free webhosting provider would occupy the freehostprovider portion of the

URL. Often the user finds the URL convincing purely based upon a lack of understanding

2.2. THE COST OF A PHISHING ATTACK 7

Figure 2.2: A Typical Phishing URL

of how URLs are actually constructed. An example of this taken from an actual phishing

attack is shown in figure 2.2 in which to the unsuspecting user, far from looking suspicious

the URL looks at first glance to be completely legitimate.

2.2 The cost of a phishing attack

In 2013 alone, there were nearly 450 000 phishing attacks worldwide resulting in an

estimated USD $5.9 billion in damages. Phishing attacks have and still remain a seemingly

ever present and ominous threat. A separate study conducted in 2013 that observed

72,758 unique recorded phishing attacks found that the average uptime of each attack

was an average of 44 hours and 39 minutes (Rasmussen et al., 2013). One of the reasons

for the continued and intensified barrage of phishing attacks lies in the ‘commoditized

marketplace’ which fuels vendors to maintain and provide competitive and accurate prices

and services (RSA, 2014).

The Ponemon Institute releases an annual report which studies data breaches across 9 dif-

ferent countries. The countries considered in the study are the United Kingdom, United

States, Germany, France, Australia, India, Italy, Japan and Brazil. The study “examines

the costs incurred by 277 companies in 16 industry sectors after those companies experi-

enced the loss or theft of protected personal data” (Ponemon-Institute, 2013). There were

several aspects to what was involved in the total costing of the attack. These included

“outlays for detection, escalation, notification, and after-the-fact (ex-post) response.”

There were a number of interesting distinctions that were made between countries when

it came to the data breaches. On average, Australian and US companies had the largest

number of exposed records averaging 34 249 and 28 765 records exposed per breach

respectively while Japanese and Italian companies had the smallest number of breached

records with the average record breach 18 285 and 18 237 exposed records respectively.

There is also a negative change in the volume of customers that use a service following a

data breach, which was especially severe in Australia and France. There were a number

2.2. THE COST OF A PHISHING ATTACK 8

of factors that decrease the cost of a data breach including having a strong security

scheme and policy in place, having an incident response scheme and appointing a Chief

Information Security Officer (Ponemon-Institute, 2013).

2.2.1 Phishing and Data Breaches

Data breaches encompass a broader sphere than just phishing attacks but it is important

to note that phishing attacks do play a large role in data breaches FACTS (2006). The

Anti-Phishing Working Group published a preliminary report in 2008 that looked at the

cost of a phishing attack in particular to an organisation. The study found that the

“duration of the phishing attack is a key factor” in determining the cost of the attack but

that “most costs are incurred during the first 24 hours of the attack” (Cyveillance, 2008).

The study splits the cost of a phishing attack into two kinds of costs.

Hard Costs

These are financial costs that can be directly measured in terms of time, money, manpower

and effort. The study outlines the following as the central hard costs involved with a

phishing attack:

1. Fraudulent charges associated with the compromised payment mechanism (e.g.

credit card).

2. Cash withdrawals from compromised accounts.

3. Time spent by employees dealing with the fraudulent transaction.

4. Customer service and support calls.

Soft Costs

Soft costs include the kind of intangible cost that a compromise has on an institution

which are typically much harder to quantify and measure. These include:

1. The loss of customer trust in online applications.

2.3. ONLINE IDENTITY 9

2. A decline in customer satisfaction.

3. Reputation damage.

Phishing attacks form a significant part of the criminal attacks that lead to data breaches

and as such, are of particular concern to organisations that hold any form of financial or

sensitive personal data Wu et al. (2006).

2.3 Online Identity

Understanding phishing requires a broad understanding of a large number of the mecha-

nisms and protocols that make up the internet since a phishing attack is often comprised

of several stages each of which exist in different but related spheres of the internet. It

is important to understand how domain registration is handled in full so as to better

understand the challenges faced in counteracting phishing attacks.

2.3.1 ICANN

The Internet Corporation for Assigned Names and Numbers (ICANN) is a private, non-

profit organisation that performs a variety of important jobs involved in ‘maintaining’ the

internet. Broadly speaking, ICANN performs three main functions1:

1. The coordination of the assignment of technical protocol parameters.

2. The administration of certain responsibilities associated with internet DNS root

zone management.

3. The allocation of internet numbering resources.

It works to coordinate the allocation of Internet Protocol address space, in allocating both

IPv4 and IPv6 address space. Originally ICANN was solely responsible for the distribution

of address space and domain name registration however in ICANN now allows ‘resellers’

to operate as sellers of domain names on the condition that the reseller has signed the 2009

Registrar Accreditation Agreement2 which looks to provide additional levels of protection

1https://www.icann.org/en/about/welcome2http://www.icann.org/registrar-reports/accredited-list.html


for registrants and requires a greater level of accountability for registrars. In view of

understanding phishing attacks, it is most useful to know the requirements placed upon

the registrar. As of May 2009, ICANN requires that registrars provide the following

information3:

1. The name of the Registered Name being registered;

2. The IP addresses of the primary nameserver and secondary nameserver(s) for the

Registered Name;

3. The corresponding names of those nameservers;

4. Unless automatically generated by the registry system, the identity of the Registrar;

5. Unless automatically generated by the registry system, the expiration date of the

registration;

6. Any other data the Registry Operator requires be submitted to it.

Understanding the role that ICANN plays in the broader scheme of the internet is an

important part of piecing together some of the mitigations that are employed to counter

phishing attacks.

2.3.2 WHOIS

“WHOIS is a TCP-based transaction-oriented query/response protocol that is widely used

to provide information services to Internet users” (iet, 2004). ICANN is committed to

enforcing its current WHOIS policy which looks to “maintain timely, unrestricted and

public access to accurate WHOIS information including registrant, technical, billing and

administrative contact information”4. The WHOIS information plays an important role in

facilitating abuse reporting mechanisms with regard to phishing allowing fast and efficient

website take-downs. Figure 3 is an example of the WHOIS information obtained from the

Phishtank.org domain:

A number of important fields are provided in this WHOIS listing. In the event that this

site was compromised and was being used to launch a phishing attack, it provides the

3http://www.icann.org/en/resources/registrars/raa/ra-agreement-21may09-en.htm4http://whois.icann.org/en/history-whois

11

Listing 1 The whois information available for the Phishtank.org domainDomain Name:PHISHTAN.K.ORG

Domain ID: D128067610-LROR

Creation Date: 2006-08-30T23:19:41Z

Updated Date: 2013-10-02T00:20:25Z

Registry Expiry Date: 2014-08-30T23:19:41Z

Sponsoring Registrar:PDR Ltd. d/b/a PublicDomainRegistry.com (R27-LROR)

Sponsoring Registrar IANA ID: 303

Domain Status: ok

Registrant ID:DI 2954579

Registrant Name:OpenDNS

Hostmaster Registrant Organization:OpenDNS

Registrant Street: 410 Townsend st.

Registrant City:San Francisco Registrant State/Province:California Registrant Postal

Code:94105

Registrant Country:US

Registrant Phone:+001.4153443118

Registrant Email:[email protected]

Admin ID:DI 2954579

Admin Name:OpenDNS

Hostmaster Admin Organization:OpenDNS

Admin Street: 410 Townsend st.

Admin City:San Francisco

Admin State/Province:California

Admin Postal Code:94105

Admin Country:US

Admin Phone:+001.4153443118

Admin Email:[email protected]

Tech ID:DI 2954579 Tech Name:OpenDNS

Hostmaster Tech Organization:OpenDNS

Tech Street: 410 Townsend st.

Tech City:San Francisco

Tech State/Province:California

Tech Postal Code:94105

Tech Country:US

Tech Phone:+001.4153443118

Tech Email:[email protected]

Name Server:AUTH1.OPENDNS.COM



DNSSEC:Unsigned


relevant contact information about the registrant to contact and resolve the matter in a

timely fashion. There is both an ‘Admin Phone’ field and an ‘Admin Email’ field both of

which provide means of contacting the owner of the domain.

2.3.3 Certificate Authorities

Certificate authorities play an important role in providing identity assurance between

both clients and servers on the internet. Certificate authorities play the role of “binding

a public key to a particular entity” (Kurose & Ross, 2013). A Certificate authority (CA)

serves two primary roles.

1. A CA verifies that an entity is who they say they are. There is no standardised

procedure as to how this is to be achieved and so a large degree of unchecked trust

must be placed in the authority. As such, the CA is only as good as the verification

techniques that it employs (Kurose & Ross, 2013).

2. Once the CA has undertaken its verification procedure, it generates a certificate

that binds the public key of the entity the identification information of the entity.

The certificate is then signed by the CA (Kurose & Ross, 2013).

Certificate authorities play an important role in assuring users of the identity of the entity

that they are interacting with. It is of particular importance to the Secure Socket Layer

and its successor Irish et al. (2001)- the Transport Layer Security - protocol implemen-

tations which operate at the Transport layer of the IP stack. Notably, with regard to

phishing, CA’s play an integral role in the Hypertext Transfer Protocol Secure (HTTPS)

implementation which is an combination of TLS/SSL and the Hypertext Transfer Proto-

col . It is an attempt to provide credibility to a website. To lend weight to claim that the

website makes about its identity, more specifically, that it is who it claims to be.

2.3.3.1 The role of Certificate Authorities in Phishing and Anti-Phishing

Recent attacks on Certificate Authorities have resulted in breaches that allow an attacker

to generate and obtain fraudulent certificates (Turner et al., 2012). There are four primary

methods of compromising the integrity of a Certificate based system.


Figure 2.3: An example of a certificate verifying the identify of an online service

1. Impersonation: This entails a person persuading the CA that he or she is someone

else and being issued a certificate with the impersonated person or system’s name

in it.

2. Registration Authority: The registration authority (RA) is an entity that exists

between the end user and CA and reviews and approves all certificate requests. An

attack on the RA would entail the attacker being able to authorise the issuing of

new fraudulent certificates.

3. CA System Compromise: If the attacker is able to gain access to the CA systems

then the attacker can issue fraudulent certificates.

4. CA Signing Key Compromise: In this scenario, the attacker gets access to a

copy of the CA signing key and is able sign fraudulent certificates.

In each of these attack scenarios, the attack has the ability to successfully gain the unde-

served trust of the end user which undermines one of the primary roles of a CA which is

to reliably assure an end user of the identity of another end point on the internet.

2.4. TYPES OF PHISHING ATTACKS 14

2.3.4 PhishTank

“PhishTank is a free community site where anyone can submit, verify, track and share

phishing data”5 additionally, it is “free to everyone, both the website and the data”6.

It is run by OpenDNS. Phishtank provides the internet community a means and a way

of sharing data pertaining to phishing attacks, both current and historical. This is an

incredibly valuable resource in tracking and monitoring phishing activity between groups

and targets. It maintains the URL to the reported phishing website and the status of the

URL by indicating whether the URL is still available online. They provide a free API

which allows developers to develop tools and software that can interface with PhishTank’s

data. This allows potentially, for a worldwide real-time collaboration of the tracking and

monitoring of phishing attacks on a global scale.

2.4 Types of phishing attacks

Security researchers have identified and classified several variations of phishing attacks.

The distinction between the variations of phishing attacks comes not from the overall

objective of the attack, but rather from the way in which the attack is conducted. The

following discussion looks to dissect three of the most commonly seen attacks. It is

important to bear in mind that the period of the attack considered is the most critical

phase of the phishing campaign - the deception of the user.

2.4.1 Clone Phishing

Clone phishing is the most commonly seen phishing attack needing both the least skill and

technical know how to execute (Kirda & Kruegel, 2006). This type of phishing, involves

the attacker creating a cloned email from a legitimate email that was historically or is

presently used by the authority he is trying to imitate. The cloned email looks for all

intents and purposes identical to the original to the user most often bearing the images,

layout and font used in the original email. Additionally, the attacker replaces the sent

from field in the email with the one that the institute or body he is impersonating (Shi

& Saleem, 2012). An example of this can be seen in figure 5.

5http://www.Phishtank.com/faq.php#whatisphishtank6http://www.Phishtank.com/faq.php#doesphishtankcostany

15

Figure 2.4: A typical phishing email

2.4. TYPES OF PHISHING ATTACKS 16

Figure 2.5: Tabnabbing

2.4.2 Tabnabbing

Tabnabbing is a relatively new and creative twist on the tried and tested phishing attack

and was disclosed by Aza Raskin who serves as the Creative Lead of Firefox (Suri et al.,

2012). The idea behind tabnabbing is to take advantage of two of the fundamental features

of web-browsing. First that the typical user today has multiple ‘tabs’ open at a time. A

user might simultaneously have a tab with their online shopping, online email client and

social media websites open. This allows the user to quickly navigate and switch between

the sites the user frequents in a manner that is convenient and fast. Second is that with

the number of tabs that the user has open it also means that the user is most often

unaware of which of his accounts are open on which tab and indeed which accounts the

user has signed into at present. As such, an attacker redirects a user to a completely

legitimate site. This site need not replicate an institution but instead waits for page to

lose focus. When the page has lost focus, the page is replaced dynamically with a phishing

page. This poses a serious threat to the user, who often without much conscious effort will

proceed to login to the page handing over the credentials to his account to the attacker.

2.4.3 Spear Phishing

Spear Phishing is “highly targeted phishing aimed at specific individuals or groups within

an organisation” (Trend Micro, 2012). These phishing attacks differ from the traditional

2.5. ANTI-PHISHING METHODOLOGIES 17

Figure 2.6: Mechanics of a spear-phishing attack

phishing attack in that they are most often more personal in that they address their

targets by their name or position, rank or job role rather than using a generic title such

as ‘Sir’ or ‘Madam’. These are often used to get more significant targets such as high

ranking management to open phishing emails. Spear phishing “significantly raises the

chances that targets will read a message that will allow attackers to compromise their

networks” (Trend Micro, 2012). In most cases, a spear-phishing email will contain an

attachment of some description often being file-types that are likely to be used in the

business or organisation being targeted. Examples of the files include PDF and Microsoft

Office documents. Spear phishing attacks usually include a period of reconnaissance in

which the attacker seeks to find as much information as possible on the target publicly

available before tailoring a phishing email to be as enticing as possible to the recipient

(Trend Micro, 2012).

2.5 Anti-Phishing methodologies

There has been an concerted effort to fight back following the rapid growth and prominence

of phishing attacks. Techniques have been developed to counter phishing attacks at

different layers of the internet. Some have focused on preventing phishing emails from

ever reaching the user, while some have focused on correlating ranges of IP addresses to


phishing syndicates and domains and adding these to globally accessible and maintained

blacklists. Encouragingly, there are a number of collective bodies that have sprung up in

response to the surge in phishing attacks. Such an effort will surely play a key role in

stemming the damage caused by phishing attacks.

2.5.1 Anti-Phishing Collectives

There are a number of groups that have formed in recent times to combat phishing attacks

typically using some form of crowd sourced initiative. The following discussion looks at

discussing some of the most prominent collectives at the time of writing.

2.5.1.1 Anti-Phishing Work Group

The APWG is the worldwide coalition unifying the global response to cybercrime across

industry, government and law-enforcement sectors7. They work to provide the global

community with resources and a knowledge base from which to combat phishing attacks

and organise a number of public awareness initiatives that look to educate and inform the

public of their role in combating phishing. Additionally, APWG’s projects have created

new institutions such as the eCrime Researchers Summit8 which publishes peer reviewed

anti-phishing articles in IEEE.

2.5.1.2 US-CERT

The United States Computer Emergency Readiness Team leads efforts to improve cy-

bersecurity posture, coordinate cyber information sharing and proactively manage cyber

risks9. It provides a means of reporting phishing attacks by either submitting the phishing

email or at the very least, the malicious URL in the email associated with the attackers

machine. This information is collected and website and email messages analysed and

distributed so as to assist in a global anti-phishing effort.

7http://apwg.org/about-APWG/8http://ecrimeresearch.org/events/eCrime2013/9http://www.us-cert.gov/about-us


2.5.1.3 PhishTank

PhishTank is a globally available, crowd sourced project that looks to track in real time,

phishing attacks. It provides the ability to report phishing domains through URL’s which

are then verified either positively or negatively by peers using the PhishTank website.

The current up-to-date database of current phishing campaigns is freely available and can

be integrated into tools and products free of charge.

2.5.2 Website take down

Website take down is the process of removing a website from the publicly accessible inter-

net. In South Africa, the Internet Service Providers Association (ISPA) is a South African

Internet industry body not for gain10. The ISPA is formally recognised as an Industry

Representative Body. There are also commercial entities that perform this function.

A study conducted by Moore & Clayton in 2007 looked to assess the effectiveness of

website take-down in combating phishing attacks. After studying a collection of phishing

attacks that were in part conducted by a single group some important statistics were

gathered. The mean lifetime of a phishing website was found to be 61.69 hours. Inter-

estingly, only 28% of the websites involved lasted more than 2 days but the longest was

available for over 17 weeks (Moore & Clayton, 2007b).

2.5.3 Browser Anti-Phishing mechanisms

“Internet service providers, mail providers, browser vendors, registrars and law enforce-

ment” all have a significant role to play in mitigating the damage incurred through phish-

ing attacks however, web browser vendors play a “key role” due to the “strategic posi-

tion of the browser and the concentration of the browser market” (Sheng et al., 2009).

Browsers have the potential to act as a final buffer between the user and the phishing

site. There is a tangible and direct interaction between the user and the browser and thus

the browser has potentially the best chance not only of informing the user of risks he or

she are taking when navigating to a certain website but also of reporting and preventing

additional attacks (Sheng et al., 2009). There have been two primary methods that have

been employed in attempting to integrate Anti-Phishing mechanisms into browsers.

10http://ispa.org.za/about-ispa/


2.5.3.1 Anti-Phishing Heuristics

There are a number of heuristics that modern browsers have built into them in order to

identify phishing attempts. Machine learning algorithms play an important part in draw-

ing up identification mechanisms and rule-sets (Sheng et al., 2009). One of the advantages

of this method is that it allows the identification of phishing websites immediately without

having to wait for a public blacklist to be updated. The danger with relying solely around

heuristics for phishing detection is that a phishing attack may be designed to bypass a

given heuristic rule-set. Additionally, heuristics may produce false positives.

2.5.3.2 Phishing Blacklists

Blacklisting has become one of “the predominant spam filtering techniques” (Sheng et al.,

2009). There are a number of publicly available blacklists. Blacklists are used in part by

browsers in blocking domains and IP addresses of known phish attempts however, this is

not always a viable means of blocking attacks since phishing attacks are often launched

through compromised servers and to block an entire domain on the grounds of a single

phish on the domain is not always possible. Another issue with this technique is that

even if a phishing campaign is identified early, there is a lag period between the period in

which the phish has been reported and the blacklist preventing users from accessing the

content.

2.5.4 Email Filtering and Content Filtering

Another means of combating phishing lies in between the user and the attacks. The idea

is that by blocking the malicious email directed at the user, the user never has the chance

to get phished. The suggestion is that for some users, by the time that a user has received

a phishing email it is already too late (Sheng et al., 2009). Many email providers such

as Google have integrated phishing “detection, prevention and notification” into their

email services (Goodman et al., 2009). These use a combination of machine learning and

heuristics to pinpoint and remove from the users mailbox phishing or spam emails.

With the development and growing efficiency of such mechanisms, so the complexity and

creativity of the spammers have grown. Their response has been to change the content

of messages so as not to fit too neatly a ‘traditional’ phishing email template, increasing

message volume, new delivery mechanisms and attacking the anti-spam groups themselves

2.6. ABUSE REPORTING MECHANISMS 21

(Wittel & Wu, 2002). Essentially, a content based spam filter “distills” a document “into

a set of features such as words, phrases, meta-data et cetera” which is then represented as

a vector. From this point, “the classification algorithm uses the feature vector as a basis

upon which the document is judged” (Wittel & Wu, 2002). The algorithm uses a rule set

which can either be crafted or automatically generated. Machine learning algorithms are

primarily driven by statistics derived from the feature vectors. Bayesian classification is

one of the most widely used methods that “attempts to calculate the probability that a

message is spam based upon previous feature frequencies in spam and legitimate email”

(Wittel & Wu, 2002).

2.6 Abuse Reporting Mechanisms

Once a phishing attack is identified and traced back to a domain there are a number of

steps that a body looking to stop the attack can take.

2.6.1 Blacklisting services

There are a number of blacklisting services available. These include services provided by

Symantec11, Google’s Safebrowsing12 and PhishTank13 as some examples of such services.

There is typically some form of verification that a submitted phish must undergo in order

to be confirmed and blacklisted. This is because the ramifications for a domain that is

incorrectly blacklisted as a phishing site are severe and could take a substantial period of

time to be completely reversed.

PhishTank for example, uses a form of crowd sourced verification that involves members of

the PhishTank community either agreeing that the website submitted has been correctly

identified as a phishing website or disagreeing with it.

2.6.2 Placing an abuse report with domain registrar

WHOIS information kept by the registrar includes an abuse contact email, telephone

number and address. This provides a means of contacting the owner of the domain in

11https://submit.symantec.com/antifraud/phish.cgi12http://www.Google.com/safebrowsing/report phish/?rd=113http://www.Phishtank.com/

2.7. SUMMARY 22

the case of needing to take down a webpage that is currently involved in a phishing

attack. There are however, a number of cases in which this is not a useful route to

take in combating phishing. Take for example the scenario in which an attacker has

registered a domain with a registrar. It is unlikely that the attacker has registered with

a company that enforces strict or stringent policies in terms of verifying information

provided to them by a registrant. In this way, it is quite likely that an attacker registers the

domain with false or fraudulent information in which case attempting to file a complaint

against them is completely ineffectual. This mechanism is useful however in the case of

an attack being launched through a compromised, legitimate website. Examining the

mechanics of a phishing attack show us that often, phishing campaigns are launched from

compromised systems in which the legitimate website owners are completely unaware

of its being compromised. In this case, the owners of the domain can take swift and

rapid steps in an attempt to locate and halt the hosting of the phishing website on their

infrastructure.

One of the problems with this approach is that there is that it takes a period of time for

each of the notifying and compromised parties involved to respond and act on this infor-

mation. In this time period there are a large number of users that are still compromised

(Moore & Clayton, 2007b).

2.7 Summary

Phishing attacks remain a significant threat to internet users. It preys upon the sus-

ceptibility of the end user to divulge sensitive to a seemingly trustworthy source. This

makes it exceptionally difficult to combat at a user level because to attempt to create an

educated user base across the entirety of the internet seems a difficult, if not impossible

task. A large number of the anti-phishing techniques currently employed experience a

lag period between correctly identifying and blocked phishing campaigns. However there

have been promising strides taken forward in recent developments with email and content

filters as well as heuristics surrounding identification of phishing attacks. The underlying

technologies play an important role in understanding the overall structure of the system

and the discussion in following chapters will rely heavily upon the information discussed

in this chapter.

Chapter 3

Design

This chapter looks to outline some of the fundamental design decisions made in the

creation of the Phishtego framework. It will look to again address the goals of the system

before looking at some of the underlying architecture of the system. In doing so, it will

address the mechanics of the Maltego system. The sections following this will consider

the transforms and entities associated with Phishtego before looking at the process of

automating the system.

3.1 System Goals

The overall goal of the project is to provide the user with a software solution that allows

for the modelling, correlation and exploration of a phishing attack which can be broken

down into more specific goals as were outlined in Chapter one.

There were several of design decisions that needed to be made in order to facilitate the

original aims of the project in order to meet the goals of the project as outlined in

Chapter one. There are a large number of technologies and concepts that are involved

with something as simple as sending an email. With this in mind, attempting to track

and model even a part of a phishing attack becomes a complex task both to implement

but also to engage with usefully as a user. There is a host of data that we can gather

from an attack, but the challenge is really to streamline this data and extract meaningful

information from a users point of view. A second challenge lies in the fact that many of

these technologies overlap. An email is only as good as the transport layer that transports

it and the mail servers that forward and receive it. In turn the mail servers are likely

23

3.2. UNDERLYING ARCHITECTURE 24

addressed by a host-name which must be first resolved to its IP address through the

DNS system which in turn opens the door to a whole new collection of technologies and

concepts.

This is a good example of how complex pivoting on provided information can be both

in terms of the connecting of various underlying technologies as well as the sheer volume

of data that is generated when doing so. In this way, one of the underlying philosophies

of the development of Phishtego is to hide as much complexity as possible from the user

of the application while still performing complex and useful back-end transformations on

data.

3.2 Underlying Architecture

Phishtego consists of a combination of several technologies and programming languages.

The significant design decisions made around the system implementation are documented

in this chapter.

3.2.1 Maltego

Maltego is a powerful graphing software solution that places an emphasis on relationships

between nodes in the graph.

“Maltego uses a client/server architecture for the purposes of data collection to determine

the relationships and real world links between pieces of data especially Internet infras-

tructures”1. In this way, Maltego proved to be a natural choice in choosing an existing

platform in which to integrate the phishing monitoring platform. Maltego generates a

node graph in which nodes called entities are plotted and relationships between nodes

are represented with directional arrows. In this way, both obvious but more importantly

previously unrecognised relationships can be realised between entities on a graph.

The Maltego application comes in two forms:

1. Commercial edition : This can be used for commercial uses. It has no restrictions

on the volume of results that can be returned by applying transforms on entities

and includes frequent updates and support.

1http://www.paterva.com/malv3/303/M3GuideGUI.PDF


2. Community edition : The community edition is essentially at its core the same

application and functionality but has imposed some limitations. There are less

frequent updates, requires registration and an API key to use and significantly is

limited to only returning 12 entities for any given transform.

Since the transforms the Phishtego system exist completely independently of the Maltego

edition, the decision was made that the Community edition would be sufficient as the

subsequently developed transforms would run equally as well on both systems. Addition-

ally, there is no restriction on the transforms that are a part of Phishtego and they may

be equally integrated into either edition in the future.

Maltego is a Java based application which affords it portability across a large number

of operating systems. This was important in developing the system as it allows a wider

spread of use across the IT world in terms of not lending preference to one operating

system over another. The Maltego platform is also relatively undemanding with regard

to its hardware requirements.

Table 3.1: Maltego : Minimum Hardware Requirements

Hardware Minimum Requirement

RAM 2 GB

Processor 2 GHz

Internet Connection speed 64 Kb

Hard-drive Space 100 MB

Display 1024x768

Two of the key components of the Maltego framework are the entities and transforms.

Entities represent objects which transforms are run on. They are used to represent a

number of things ranging from telephone numbers to IPv4 addresses. Transforms are

then pieces of code that take in an entity as input and perform some kind of manipulation

on this entity and often return as output a new entity. Figure 3.1 illustrates the process

involved with executing a transform on an entity.

The first step in using Maltego is creating a new graph. A graph can be saved locally and

loaded for editing at a later stage. A new graph is created by selecting the new graph

icon in Maltego.


Figure 3.1: Creating a new graph

The next step is then dragging an entity into the graph. In this example, a website

entity was dragged into the graph. Selecting the entity is done with a single left click.

Right clicking the entity brings up a menu from which transforms can be selected. In this

example, the transform resolve to IP via DNS was selected.

Figure 3.2: Running a transform


The application then passes the website entity to a piece of transform application back-

end which then performs some manipulation on the entity and programatically creates

and returns a new entity. In this case, the website will be passed back-end as an entity

and perform some kind of look-up procedure to return an IPv4 address in the form of an

IPv4 address entity.

Figure 3.3: The transform produces a new IPv4 entity

In this example, it is significant not only that we have generated previously unknown

information about a given entity, but also that we have mapped the relationship between

the two entities. This is fairly obvious in the case of a single transform performed on a

single entity however the Maltego framework was developed so as to handle thousands of

complex entities and relationships between entities. In a more complex example, we might

run a number of transforms on the same IPv4 entity that we derived from the website.

In this case, since the website is a part of a shared hosting scheme, we find that there are

a number of domains that resolve to this IP address. In the next example, a total of 16

transforms were run on the IP address which produced a total of 49 entities including:

• People

• DNS Names

• Domains

• IPv4 Addresses


• Websites

• Email Addresses

• NS Records

• MX Records

• Geographical Locations

This example perhaps more so than the useful information it generates presents an exam-

ple of how the relationships between various of the underlying technologies can be useful

in considering the web of relationships that bind each layer of technology together.

Figure 3.4: A more complex set of entities and relationships

In addition to providing an easy to navigate and relational graphing system, it provides

the user the ability to change the way in which the graph can be represented. There are

four modes in which a Maltego graph can be represented. These are:

1. Organic (Figure 11)

2. Block (Figure 12)

3. Hierarchical (Figure 13)

4. Circular (Figure 14)


Figure 3.5: Block Layout

Figure 3.7: Circular Layout

3.2.2 Maltego Machines

“Maltego machines allow you to string together transforms to work with entities on a

graph”2. Machines can be set to run on intervals automatically, extracting new infor-

mation dynamically as it is generated. This is of particular interest to us as a means

2http://www.paterva.com/web6/documentation/developer.php


Figure 3.6: Hierarchical Layout

of automating the process of modelling and monitoring ongoing phishing attacks. It is

possible to programatically call transforms this way and this means that not only does it

mean that this can happen without interaction from the user, but it also performs more

quickly that if a user were to manually call each transform.

3.2.3 Programming Languages

The transforms that were developed are written in python. This is as a result of the

interpreted nature of the language as well as the speed of the development cycle that can

be achieved using python. It also allows the transforms to work across various platforms

in keeping with the portability of the Maltego platform. With relatively little effort, the

entire Phishtego system can be setup and run on windows and most Linux and Unix desk-

top platforms. The speed penalty that an interpreted language such as python imposes

does not impact on the overall experience with the system, since a large number of the

transforms involve expensive I/O operations including interactions with the underlying

network so that any performance increase gained with a compiled language would be

insignificant given the overall time required to execute the transform.


3.2.4 Transforms

In the interests of outlining the internal workings of the transform, the following describes

how Maltego interacts with the transforms developed for Phishtego. As a working exam-

ple, the Listing A.1 is the Phishtank.py class developed to handle interactions with the

Phishtank API and which was implemented for the Phishtego framework. This example

is taken from the transform which will be outlined in Table 3.3 shortly.

This class contains a constructor which requires a number of parameters including an

API key, and optional updateInterval and web parameters as can be seen on line 13 in the

Listing A.1. There are several methods contained within the class that include updating

the Phishtank locally and validating a link against the database. For the sake of brevity,

this will be the only code listing just so as to illustrate the relationships between external

code, transforms and the Maltego framework. Now that we have a stand alone class that

can be instantiated and URLs validated against, we look to integrate this into a transform.

The following illustrates a simple transform that makes use of the class listed.

Listing 3.1: A Maltego Transform

1 #Phish ing V e r i f i c a t i o n

2 from MaltegoTransform import ∗3 import sys

4 import phishtank

5 #−−−−−6

7 #The Maltego Framework passes the e n t i t y to perform

8 #the transfom on as an argument to the a p p i c a t i o n

9 obj = sys . argv [ 1 ]

10

11 #We i n s t a n t i a t e a new p h i s h t a n k o b j e c t wi th an API key

12 #and 60 minute update i n t e r v a l

13 phishtank = phishtank . phishtank ( ”xxxx” , update Inte rva l =60)

14

15 #We i n s t a n t i a t e a MaltegoTransform o b j e c t

16 me = MaltegoTransform ( )

17

18 def check ( ) :

19 #We v a l i d a t e the URL a g a i n s t the p h i s h t a n k API

3.3. ENTITIES 32

20 i f phishtank . checkURL( obj ) :

21 #I f the URL i s indeed mal ic ious , we add an

22 #e n t i t y to the Transform o b j e c t

23 me . addEntity ( ” ph i shtego . MaliciousURL” , obj )

24

25 check ( )

26

27 #Upon comple t ion o f the transform , we re turn the MaltegoTransform

28 #o b j e c t to the Maltego Framework

29 me . returnOutput ( )

In order to implement a transform, the Maltego library is imported as can be seen in

Listing 3.1 on line 2. The transform is passed the entity to perform the transform on at a

system level as an application argument. This is accessed by python on line 9 in Listing

3.1 and saved as a variable. The transform then instantiates a new instance of the class

listed in Listing A.1. Next, a MaltegoTransform object is created and saved as a variable

as can be seen on line 16 of Listing 3.1. A function is then defined on line 18 in Listing 3.1

which calls a method of the class defined in Listing A.1 and if this returns a true boolean

value we add an entity to the MaltegoTransform object previously declared on line 16 of

Listing 3.1. Finally, the transform returns the MaltegoTransform object at the end of the

program on line 29.

For each of the transforms included, there are often multiple python classes that interact

together in the back end of the application to produce meaningful information and repre-

sent this information in the form of an entity. The following transforms are not outlined

in as much detail in the interests of brevity but fundamentally operate in the same way.

They are executed by the Maltego framework which performs some sort of manipulation

on the entity passed to it and responds appropriately.

3.3 Entities

There are a number of entities that ship with Maltego out of the box. In some cases,

these have been integrated into the Phishtego project however the majority of the project

required the development of completely new entities. This is because Maltego is not

designed to first and foremost explore phishing attacks. It seeks to provide intelligence

gathering but provide extensibility by providing a rich API to work with. The Phishtego

3.3. ENTITIES 33

Entity Name Entity Icon Description

Domain An entity that represents an internet domain

Email Address An entity that represents an email address

IPv4 Address An entity that represents an internet domain

Email Source An entity that represents the source of an email including headers

Abuse Report Email An entity that represents an email address to contact with regard to abuse of the domain

EmailSourceDirectory An entity that represents a local directory that stores a number of email source files

Potential Phishing URL An entity that represents a link extracted from a phishing email that is potentially malicious

Suspicious Email Address An entity that represents an email address that appears as a contact in a phishing email

Phishing Target An entity that represents a target in a phishing campaign

Confirmed Phishing URL An entity that represents a verified malicious URL

Phishing Kit An entity that represents a phishing kit used by an attacker

Table 3.2: A Summary of Phishtego Entities

framework makes use of a number of entities that attempt to best model some of the

information that would be useful to a user wanting to gather additional information

around an attack. Table 3.2 looks at the entities that are integrated or created as a part

of the Phishtego framework along with a discussion of each entity and at an abstract level

a motivation for usefulness of choosing to represent each entity from the point of the view

of the user.

Each entity shares a relationship with one or more other entities in the framework. The

following serves to expand upon each of the above entities and consider potential rela-

tionships that can be shared between entities.

3.3. ENTITIES 34

3.3.1 Domain

A domain plays a centrally important role in the internet. A domain might serve as the

host of a phishing campaign or command and control centre. Alternatively a domain

might be used to represent a domain under the control of a user from which the user

can evaluate and correlate information regarding the attack and the domain that the

user controls. There are often mail servers as well as a number of potentially interesting

services associated with a domain that might lead to discovering further useful information

surrounding a domain.

3.3.2 Email Address:

Naturally, an email address is probably one of the most important elements in making

sense of a phishing attack. Not only are victims most often targeted through their email

address but are important part in the correspondence that occurs between an attacker

and victim. Email addresses may also in turn yield interesting information about which

domains are involved in the attack.

3.3.3 IPv4

The Internet Protocol version 4 addressing scheme provides possible one of the most

useful means of identifying actors in the form of end points in a phishing attack. It is

unlikely that an IP address will provide us an real information as to the physical identity

or situation of an attacker behind the attack. It does however, provide a useful indication

of how large an attack may be, how many domains are involved in the attack or the

number of mail-servers involved in an attack. Extracting information about IP addresses

often provides some of the most telling information around a phishing attack both for the

use of identifying the threat and mitigating these threats.

3.3.4 Email Source

The source of the email is simply the body of text that makes up an email including the

headers that are for the most part not displayed by email clients. This body of text con-

tains a wealth of useful information about where the email originated from, attachments,

mail servers and names.

3.3. ENTITIES 35

3.3.5 Abuse Report Email

Abuse report emails are emails that allow for the reporting of the abuse of the given

domain to people that are responsible for the running of that domain. In the event that

the domain being used to orchestrate a phishing campaign is not one registered by the

attacker but one that the attacker has managed to take control of by compromising a

host this provides a vital means of alerting the owner of the domain of the illegal activity

being launched from the host. It gives the owner of the host the opportunity to begin to

take steps to remove the threat and perhaps being forensic investigation into the matter.

3.3.6 EmailSourceDirectory

Though simple in concept, this entity provides the opportunity to build some complex

external functionality into Phishtego. It is a simple entity in that it merely represents a

directory on the local machine running Phishtego in which the source of emails can be

stored.

3.3.7 Potential Phishing URL

This is a URL that has been included in a malicious email that may require further

investigation. It is interesting to note both the commonality between links as well as the

information that is posted to a server along with the link. The link has not been marked

as malicious by either Google Safe Browse or Phishtank but is nevertheless to be treated

with extreme caution.

3.3.8 Suspicious Email Address

In the event that the attacker rather than trick the victim into following a malicious link

tries to illicit information out of the user via email correspondence, it would be useful for

the user to be able to identify some of the potential addresses that the attacker would use

to facilitate the correspondence. This looks to extract email addresses included in links,

and the body of the email as well as the sender in order to identify potential threats.

3.4. TRANSFORMS 36

3.3.9 Phishing Target

This entity looks to represent a target or institution that is the target of a phishing attack.

This might be a bank, financial institution, insurance company or indeed an individual.

This is significant because the Phishtank API allows us to draw connections between

various targets that may have previously have been thought to have been unrelated by

providing information on who the target in question was with each attack.

3.3.10 Confirmed Phishing URL

A URL contained within an email is checked against two significant entities. The first is

Phishtank and the second being Google Safe Browse. If either of these services confirm a

URL as being a phishing page or as serving malware, this entity visually alerts the user

to this fact. The visual impact of this entity is important in providing an immediate and

intuitive warning to the user. This is significant in confirming the suspicions of the user

who has stood correctly in treating the URL with caution.

3.3.11 Phishing Kit

In the event that Phishtank positively identifies a URL as a part of a phishing campaign,

it may be possible to determine the phishing kit that was used to create and carry out

the attack. This information could prove to be useful in correlating the groups behind

phishing attacks.

3.4 Transforms

In the hierarchical graphical structure within Maltego, “transforms should be thought

of as pieces of code that change one type of information to another”3. The following

details and expands upon some of the transforms that Phishtego implements or makes

use of. Each transform is identified by name and expands upon both its input and output

requirements.

3https://www.paterva.com/web6/documentation/developer-local.php

3.4. TRANSFORMS 37

3.4.1 Verify Phishing Link

Table 3.3: Verify Phishing URL

Transform Name verifyURL

Maltego Input Entity suspiciousLink

Maltego Output Entity MaliciousURL

Average Run-time 8s

Performing the verification of a suspicious link invokes two sub processes. The first is a

verification and look-up against the Phishtank API. If the URL appears as malicious or

previously reported the Phishtego confirms this by creating a malicious URL entity. The

next verification performed is done against the Google safesearch API. This is provides a

vitally important part of the exploration process and might lead to the user concluding

one of two things. If the URL is confirmed to be malicious it serves as a confirmation

of the users suspicion and provides confirmation from a third party that the URL is a

part of a broader malicious campaign. The alternative is that the URL is not identified

as malicious by either of the third parties. This of course says nothing of the credibility

of the email since there are a number of ways that a malicious URL might not be flagged

by either of the other third parties. The first possibility is that it is malicious and simply

has not been reported yet in which case it presents the end user with the opportunity to

report the URL as malicious if indeed that is the case. A second possibility is that the

link is in fact, not malicious in which case the user can be more confident in deciding that

the link is not a real potential threat and proceed accordingly. An entity to represent the

absence of a given URL from either of the services was intentionally not incorporated.

This is done to avoid presenting the URL as safe even if this is the case, as there are a

number of scenarios in which a URL will not be flagged by either of these services but

still be unsafe. Two examples of this might be when:

1. The URL is malicious but in a part of a new attack that has not been reported to

either of the services by another end point on the internet.

2. The URL is malicious but is not flagged as such by either of the services due to the

fact that it is a part of a narrower spear-phishing attack (see chapter 2) that is not

likely to have been encountered by other institutions.

3.4. TRANSFORMS 38

In this way the design decision was made to simply avoid returning any form of output

in the event that it is not flagged by either Phishtank or Google safesearch so as not to

inspire a sense of false confidence in the safety of a URL that is not flagged as malicious.

In the case of the Phishtank API, the Phishtego has a relatively thorough back-end to

interact with the API. In order to reduce the bandwidth demands of the application both

locally and on the Phishtank servers, the system periodically downloads and maintains

locally a database from Phishtank which can be consulted and queried extremely quickly

and more thoroughly than one might be tempted to with a large volume of API calls.

3.4.2 Generating Abuse Report Emails

Table 3.4: Generating abuse report emails

Transform Name Prepare Abuse Report Email

Maltego Input Entity IPv4 Address

Maltego Output Entity None

Average Run-time 0.006s

An abuse report email address is an email address that is often required by the domain

registrar in the process of registering a domain. It is accessible through the WHOIS

protocol and allows an internet user to lodge complaints with the owner of the domain

around issues of abuse. This is significant in the case of phishing because as was previously

discussed, phishing campaigns are often controlled from or launched from compromised

servers on the internet. These servers were originally and probably still are running

completely legitimate services online. In these cases, it is often possible to disrupt or

completely halt a phishing attack at the source once the owner of the domain has been

made aware of it. This is of course not true for an attack in which the owner of the

domain is also the registrant as the attacker is unlikely to be sympathetic toward requests

to stop the campaign.

3.4. TRANSFORMS 39

3.4.3 Directory Monitoring

Table 3.5: Monitoring a local directory for emails

Transform Name Directory Monitor

Maltego Input Entity EmailSourceDirectory

Maltego Output Entity EmailSource


This transform is possibly one of the most interesting and powerful features built into

Phishtego. The function that it performs is simple in that it periodically monitors a local

folder for new email source files that are included. While this in and of itself provides

some immediately obvious use cases, it also presents the opportunity to integrate some

more complex systems with Phishtego. A simple example of a use case would be simply

to house a number of suspicious emails in a directory and use the transform to load each

email individually for analysis. A more interesting use case however might be to have

written a script or application that automatically retrieves suspicious emails from spam

filters or alternatively from user submitted emails in a large corporation and saves them in

the given directory for automatic parsing and analysing. In this way, Phishtego with the

addition of a relatively simple transform provides a simple but effective means of allowing

generic adapting and integration into currently existing systems that may already be in

place in an organisation.

3.4.4 Link Extraction and analysis

Table 3.6: Link extraction and analysisTransform Name LinkExtractionAnalysis

Maltego Input Entity EmailSource

Maltego Output Entity Potential Phishing Link, Confirmed Phishing Link


This transform parses the body of an email using regular expression to extract links.

Isolating links in an email can be more challenging that it sounds as there are periodically

strangely formed and bizarre URLs that occasionally seem to slip through the regular

expression matching. Nonetheless, after several iterations the regular expression that

Phishtego uses back-end when parsing the email body looks like:

3.4. TRANSFORMS 40

Figure 3.8: Regular expression used to extract links and URLs

www. ( ? : [ a−zA−Z ] | [ 0 − 9 ] | [ $− @ . & + ] | [ !∗\ (\ ) , ] | ( ? : % [ 0 −9 a−fA−F][0−9a−fA−F]))+

http [ s ] ? : / / ( ? : [ a−zA−Z ] | [ 0 − 9 ] | [ $− @ . & + ] | [ !∗\ (\ ) , ] | ( ? : % [ 0 −9 a−fA−F][0−9a−fA−F]))+

Figure 3.9: Regular expression used to email addresses

[\w\.−]+@[\w\.−]+

The second part of the extraction involves extracting email addresses included in the

email. These may be email addresses in the FROM field in an email, email addresses

included as a part of the message body or email addresses that might occur in the subject

line of the email. Again, using a simple regular expression proved to be the most efficient

means of extracting email addresses. The regular expression used in the extraction is

listed in figure 3.9.

3.4.5 WHOIS

Table 3.7: WHOISTransform Name phishtegoWHOIS

Maltego Input Entity IPv4 Address, Domain

Maltego Output Entity Abuse Report emails


The WHOIS data as referred to in Chapter 2 contains information surrounding the owner

of a given website. This is particularly useful to us as users in the event that the owner of a

domain is unaware and not involved with the malicious activity. The WHOIS specification

has an abuse report email field that if present provides a means of contacting the relevant

owner. Of particular interest to us is the abuse report email which is used in the previously

described transform. The WHOIS information gathered in Phishtego is gathered from a

number of regional internet registry bodies including services available at the following

addresses:

• whois://whois.ripe.net

3.5. AUTOMATING THE PROCESS 41

• whois://whois.apnic.net

• whois://whois.lacnic.net

• whois://whois.afrinic.net

• whois://whois.cymru.com

3.5 Automating the process

The transforms in Phishtego work together to produce a useful collection of information

and relationships between entities. The transform in table 3.5 as previously mentioned

provides a simple but powerful means of integrating the Phishtego system into any number

of existing systems simply by storing suspected phishing or malicious emails to disk and

representing this location in Phishtego using an EmailSourceDirectory entity. In the event

that the saving of such emails can be automated and in the event that Phishtego could

somehow be automated, the end user has a self sufficient system that the user can consult

at any given time and have at a glance a fair idea of the kinds of threats that are currently

posed by ongoing phishing attacks to their company or organisation.

Integrated into Phishtego is a machine which performs a number of automated transforms

on an EmailSourceDirectory as well as subsequent entities that are derived from each

transform. The machine performs the following tasks:

• On EmailSource entities, extract all information from them using multiple trans-

forms including extracting links, email addresses and domains present in the email.

• On all email entities, verify that they do in fact exist and are not simply dummy

addresses.

• Validate all SuspiciousLink entities against the online API’s previously discussed.

• On all domains, check if a website exists on port 80 for the domain. If it does, create

a website entity.

• On all website entities, resolve these to IPv4 addresses and represent these as enti-

ties.

• On all IPv4 entities, resolve these to WHOIS information and extract the abuse

report email from the data if it is present.

3.5. AUTOMATING THE PROCESS 42

• On all AbuseReportEmail entities, prepare automated emails to be sent to report

the phishing attack.

This machine as well as the ability to integrate arbitrary systems into Phishtego are

perhaps the most powerful and useful features of the framework.

Chapter 4

Case Studies

In order to provide some examples of how best to utilise the system, this chapter explores

some case studies relating to phishing attacks. Section 4.1 looks at the use of Phishtego

in fingerprinting and identifying an attack. In order to illustrate the system, Phishtego

needs some data. Due to the sensitivity of phishing attacks, attaining real data from

actual phishing campaigns is both difficult and probably provides little benefit over an

generic phishing data.

4.1 An attack launched from a compromised server

4.1.1 Background

In this light, the test data that Phishtego uses in the following case studies are all de-

rived from generic phishing campaigns that have been randomly selected from the online

‘throwaway’ email service Mailinator1. These services are typically not used as a part of a

genuine online identity but are used to sign up for services that users suspect may result

in receiving unnecessarily large volumes of correspondence from in the form of SPAM.

Additionally, some of the less savoury online websites and services that people sign up

for are run by entities that trade their email addresses to third party SPAM houses. This

makes the service an ideal source of dummy data with each address typically receiving

hundreds of phishing related emails each hour.

1http://www.mailinator.com

43

4.1. AN ATTACK LAUNCHED FROM A COMPROMISED SERVER 44

Listing 2 A Phishing email

Subject : uBuyaPi l l s Todayi documenter. . .From : ” S t e r l i n g Green” <e l l a . dickson@chron . com>Date : Thu , 11 Sep 2014 00 : 07 : 57 +0700To : redacted

<html><head><t i t l e >uBuyuTablets Herek unconsc ionable </ t i t l e ></head><body> iGetExc lus ive dMedicaments Todayw<br> <a h r e f=”http :// v e n i s e t o u r s . com/ c a l c e o l a r i a . php”>http :// v e n i s e t o u r s . com/ c a l c e o l a r i a . php</a></body></html>

The first case study looks at performing the process on a single phishing email. The

structure of the email looks as follows:

For all intents and purposes, the email appears to be a common phishing email. The

service promises to sell some sort of probably illegally peddled medicine in exchange for a

credit card number and some other personal information. It serves as what appears to be

a relatively standard phishing email and provides a good starting point for demonstrating

the usefulness of the system.

4.1.2 Exploration and Fingerprinting

The first stage to making sense of the attack, is to pull out and identify information

surrounding the email. We begin by adding the email source to Phishtego in the form of

an emailSource entity as is demonstrated figure 4.1.

We then perform the a transform on the email source which produce a number of new

entities. In this case, the transform run is the Link extraction and analysis transform

referred to in figure 3.6. The transform in this example creates four new entities. These

entities are illustrated in figure 4.2 and include:

• Two suspicious email addresses that were included in the email (A and C in figure

4.2)


Figure 4.1: Creating an email source entity

• A suspicious link entity that was also included in the email (B in figure 4.2)

• The domain associated with the suspicious link in question (D in figure 4.2)

The next step was to perform a lookup the URL included against the online services that

are integrated into the framework. In this instance, it was not recognised as a known

malicious domain or link. This poses an interesting situation as the domain associated

with the link provided appears to look at first glance like a legitimate domain. This is

a good example of how useful visually representing information can be. From this point,

we further explore the domain in question which was confirmed to be hosting a website.

From this point, using a set transforms we resolve the website to its IPv4 address. From

this address, further exploration enabled the retrieval of gather the abuse report email

from the WHOIS information of the registrant of the domain. This however would only

be useful information to have in the event that the website serving the phishing content

was not intentionally involved in the attack but rather had been compromised and was

being used as a front for the attack because we are probably reasonable in assuming that

a malicious actor would probably not show much concern for an abuse report email.


Figure 4.2: Analysis of the emailSource entity

Figure 4.3: Exploring the domain involved in the attack

4.1.3 Analysis

Before visiting the full URL in question locally, it was worth visited the landing page of

the website. This landing page appears to be a legitimate website even if the site itself is

dormant and under construction.


Figure 4.4: http://www.venisetours.com

However, after visiting the URL that was included with the email originally we see a

quite different result. The page redirects to the actual phishing page. The domain that

the user is redirected to is far more interesting and immediately looks suspicious. The

URL that we are redirected to is http://kztefobn.com. This is a common tactic used

by phishers to bypass SPAM filters, complicate blacklisting procedures and present the

user with something that looks to be a legitimate link dha (2006) .

Figure 4.5: The redirected page

Based on this, we might conclude that the site has indeed been compromised and setup

as a front for a phishing campaign by tricking users into following a legitimate looking

link rather than the quite obviously suspicious http://kztefobn.com . Thus, in order to

speed up the process of contacting the abuse report contact we run the abuseReportEmail

4.2. CORRELATING RELATIONSHIPS BETWEEN LARGER DATA-SETS 48

transform which generates an automated complaint ready to send to the abuse report

email.

Figure 4.6: The redirected page

This first use case is a good example of how using the Phishtego system on a single

suspicious email informed and facilitated a measured and sensible response to a phishing

campaign by first identifying the architecture of the attack and then by automating a

response mechanism. The hope is that the owner of the domain reacts appropriately and

act to stop the malicious content being server off of his server.

4.2 Correlating relationships between larger data-sets

The purpose of the second case study is to illustrate the value in graphing and examining

multiple phishing related emails on the same graph. It is not particularly easy to illustrate

without the use of several targeted specific phishing attacks. However as the following

case study shows, even with completely randomly selected emails from Mailinator in the

absence of having a number of phishing emails aimed at a single organisation it is pos-

sible to draw links and relationships between what appear to be completely independent

phishing attacks.


4.2.1 Background

For the purposes of this study, there were 20 phishing emails randomly selected that

were targeted at Mailinator users. These emails were then loaded into Maltego and ran

multiple transforms of them in order to look for relationships that might exist between

them. The hope is that even in completely independent email addresses there might exist

some commonality and that by using the framework we are able to detect this commonality

represent this visually. Figure 4.7 shows a number of these emails represented alongside

each other. It is also interesting to note that at least one of the emails has already been

confirmed as malicious be one or both of our external crowd sourced anti-phishing services.

This is labelled in figure 4.7 as ‘Malicious Link’.

Figure 4.7: Multiple emails represented in Phishtego

As might be expected, the sample data has within it mainly ‘closed systems’ which have

no relationship with any of the other phishing attacks. In this sense a closed system refers

to the system of links, domains, emails and targets to be separate as is the case in figure

4.8.

Figure 4.8 shows two closed systems. Each email is a part of a distinctly different attack.

Without performing any additional exploration into the closed systems it became apparent

that they were less interesting for this example than exploring emails which appeared to

have shared commonalities. Further exploration provided some useful insight into some

common threads shared across a couple of the attacks. The emails detailed in figure 4.9

are an example of a correlation that was found after analysing a number of seemingly

unrelated attacks aimed at different recipients. It is evident from the transformations


Figure 4.8: Closed Systems

that the domain in question, doctorttdf.ru, is common across all four of the emails

analysed.

Figure 4.9: Related Attacks

By exploring a number of seemingly random and unrelated emails in the Phishtego frame-

work, we have been able to correlate and derive relationships from the emails. These

emails which were addressed to different recipients seem to share some commonalities

which seem to suggest the same origin too.

In this case, the domain appears to have been setup and registered by a malicious actor.


The domain does not appear to be consistent with a domain name that someone might

register for legitimate causes. The likelihood that contacting the abuse report email would

produce anything useful is unlikely at best. After visiting the links included in the emails

and finding that the links were in fact malicious and looked to elicit personal data from

users, these were reported to both Phishtank and Google SafeBrowsing. At present, these

actions have to be done manually at present, but this could possibly be automated to some

extend in the future which would further streamline the process. Several minutes after

reporting the links as malicious and subsequently checking these links against our online

validation services, the graph reflects this by representing the links with a relationship

with a malicious link entity which are now confirmed as malicious by the online services.

Figure 4.10 illustrates the same graph with the links now with MaliciousLink entities that

share a relationship with the original links.

Figure 4.10: Related Attacks with Malicious Links reported

This use case is a good example of how Phishtego can be used to find and derive relation-

ships between phishing campaigns that might originally not be obvious. In this way, the

system would be a valuable asset to anyone that was interested in monitoring and track-

ing phishing campaigns in general but more specifically monitoring and understanding

campaigns surrounding a specific organisation or target.

4.3. AUTOMATED MONITORING 52

4.3 Automated monitoring

As has been previously mentioned, one of the possible uses of Phishtego is a completely

automated monitoring system of a local directory. After creating a Maltego machine for

Phishtego, the following chain of events occurs programatically every 30 seconds:

• The email source directory is checked for any new emails. If there are new emails,

create new entities in the framework representing them.

• On EmailSource entities, extract all information from them using multiple trans-

forms including extracting links, email addresses and domains present in the email.

• On all email entities, verify that they do in fact exist and are not simply dummy

addresses.

• Validate all SuspiciousLink entities against the online API’s previously discussed.

• On all domains, check if a website exists on port 80 for the domain. If it does, create

a website entity.

• On all website entities, resolve these to IPv4 addresses and represent these as enti-

ties.

• On all IPv4 entities, resolve these to WHOIS information and extract the abuse

report email from the data if it is present.

• On all AbuseReportEmail entities, prepare automated emails to be sent to report

the phishing attack.

In this example, the email source directory initially contained 4 phishing emails. Figure

4.11 shows the graph after running the machine over a single iteration.



Figure 4.11 shows the first iteration of the Phishtego monitoring machine. This has

automatically run transforms off the EmailSourceDirectory and generated entities and

relationships from this starting point. With the aim of simulating the addition of new

suspicious emails that might in a real example be taken from a spam filter or alternatively

manually reported by an employee of a company for example, we manually add several new

emails to the directory. Keep in mind that this need not have been the case. The emails

could have been inserted into the directory automatically after being pulled down from a

spam filter or have been flagged by a user on the network as suspicious for example. The

following iteration of the machine demonstrates the effectiveness of the having integrated

a means of automating the procedure.



Upon detecting the addition of the new emails, the framework creates new EmailSource

entities on the graph. The machine then automates the running of resulting transforms

on each entity.

Figure 4.13: Automated email retrieval and transforms

This example serves to illustrate the power of automating the exploration process. This

grants the user the ability to leave the system unattended and intermittently check the

status of the phishing related attacks on an institution.

Chapter 5

Conclusion

5.1 Analysis of Goals

As was stated in the first chapter, there were three primary project goals:

1. Create a system that models phishing attacks that can be deployed locally on a

machine

2. Produce meaningful information from the large volumes of raw data that can be

gathered from a phishing campaign

3. Provide a means of facilitating decision making around reacting to phishing cam-

paigns and automating this response where possible

The first goal was achieved in the creation of the Phishtego framework. This not only can

be deployed as a part of the free community edition of the Maltego framework but the

transforms can all be deployed and run locally on a machine without being dependent on

an external server.

The second goal was achieved by carefully choosing which data to return as relevant.

The system does this by making intelligent decisions about what information is relevant

in an email. Consider for example, a WHOIS transform that displays only the abuse

report email which facilitates the remediation aspect of the framework instead of plotting

volumes of meaningless data. This and other design decisions help to keep the framework

light and fast, as well as shielding the user from being overwhelmed by information. This

55

5.2. FUTURE WORK 56

leaves the user with a much clearer idea of the structure of the phishing attack without

having to worry about the complex mess of data that lies beneath it.

The final goal was to facilitate decision making around possible reactions to an ongoing

phishing attack. Key to achieving this was to provide the user with an understanding

about the kind of attack in question. Some of the use cases in Chapter 4 highlight

this process of first exploring and analysing and attack and then shifting proactively

into deciding how best to deal with the attack. Phishtego also provides a means of

automatically generating and inserting content into an email to send to an abuse report

email.

The Phishtego framework provides a means of exploring, analysing, correlating and re-

acting to phishing campaigns by illustrating relationships between actors in a phishing

campaign, deriving useful information from existing data and facilitating response mech-

anisms. In addition to these goals, the mechanisms behind the framework were written

in such a way as to be easily automated which includes the addition of machines into the

framework.

Additionally, one of the most versatile and potentially powerful features of the framework

is the ability to integrate existing solutions into the framework through the use of directory

monitoring which means that the project has the potential to perform either the central

role within a anti-phishing system, or merely compliment any existing solution perhaps

by being a visual representation of phishing related attacks.

5.2 Future Work

During the course of this research, there were a number of areas to potentially expand the

project into that were beyond the scope of this work. Some of these ideas are suggested

below as possible extensions to the project in the future.

5.2.1 Introducing additional online services

There is much that can be done in terms of future work. One major improvement could

be to integrate additional phishing services. There are a number of new and growing

services that provide online API’s which would increase the accuracy and effectiveness of

the identification of phishing attacks. Some suggestions for this include:

5.2. FUTURE WORK 57

• Webroot Real-Time Anti-Phishing API

• ISIT Phishing

The more correlation that the system can infer from a given services, the more accurate

the validation it can provide on a given link is possible.

Within the realm of this interaction to be built upon is the automated reporting of links

to the relevant online services. At present, it is not possible to completely autonomously

report a link. This is largely due to the potential abuse of this functionality. This area

warrants further exploration.

5.2.2 Extension into analysis of attachments

More and more frequently, phishers are attempting to bypass phishing filters and phishing

protection mechanisms. Phishers present their content through the use of attachments

including images, PDF documents and office documents. There is a large volume of

work that can be done with regard to analysing, processing, identifying and classifying

attachments included in an email. Analysis of Malware is a large field in and of itself and

the extension of the environment into this field of information security would certainly be

challenging. However, this would further enhance the frameworks ability to identify more

complex phishing attacks.

5.2.3 Reporting Mechanism

It would be particularly useful to be able to generate reports from an existing graph.

This could be generated at set intervals which would present possible suggestions and the

general ‘health’ of the organisation with regards to phishing attacks. It could potentially

also highlight problem areas and areas for concern. This would also involve a fair amount

of artificial intelligence to run efficiently but is something that would no doubt would be

a useful addition to the framework.

5.2.4 Tool Integration

Another possibility would be to shift from dynamically modelling individual attacks aimed

at organisations and move towards drawing up relationships between much larger sets of

5.2. FUTURE WORK 58

data. The Phishtank service provides a means of caching data offline. Providing a means

of analysing, correlating and expanding on a specified file format would be something

useful to look into. For example, specifying a CSV file that contained phishing related

data that could be integrated into the system. This would further enhance the systems

ability to integrate with data provided by other existing systems (once formatted according

to our specification) and would be a valuable addition to the framework.

References

2004. RFC 3912 - WHOIS Protocol Specification.

2006. Why Phishing Works. Montreal: Conference on Human Factors in Computing

Systems, for ACM.

2007 (May). Learning to Detect Phishing Emails. Vol. 16. World Wide Web Conference.

Binsalleeh, H., Ormerod, T., Boukhtouta, A., Sinha, P., Youssef, A., Debbabi, M., &

Wang, L. 2008. On the Analysis of the Zeus Botnet Crimeware Toolkit. Technical

report. National Cyber Forensics and Training Alliance Canada and the Computer

Security Laboratory, Concordia University.

Carmel, David, Mishne, Gilad, & Lempel, Ronny. 2005 (May). Blocking Blog Spam with

Language Model Disagreement. Technical report. Informatics Institute, University of

Amsterdam.

Central Intelligence Agency. 2009. Country Comparison :: Internet users. Online. Avail-

able from: https://www.cia.gov/library/publications/the-world-factbook/

rankorder/2153rank.html.

Cyveillance. 2008. The Cost of Phishing: Understanding the True Cost Dynamics Behind

Phishing Attacks. Technical report. Cyveillance.

EMC. 2013 (June). Fraud Report Bugat Trojan Joins the Mobile Revolution. Technical

report. EMC.

FACTS, PHISHING. 2006. Phishing mongers and posers. Communications of the ACM,

49(4), 21.

Garera, Sujata, Provos, Niels, Chew, Monica, & Rubin, Aviel D. 2007. A framework for

detection and measurement of phishing attacks. Pages 1–8 of: Proceedings of the 2007

ACM workshop on Recurring malcode. ACM.

59

REFERENCES 60

Goodman, J.T., Rehfuss, P.S., Rounthwaite, R.L., Mishra, M., Hulten, G.J., Richards,

K.G., Averbuch, A.H., Penta, A.P., & Deyo, R.C. 2009 (Dec. 15). Phishing detection,

prevention, and notification. US Patent 7,634,810.

Gu, Guofei, Perdisci, Roberto, Zhang, Junjie, & Lee, Wenke. 2008. BotMiner: Clustering

Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection.

Technical report. Georgia Institute of Technology.

Irish, John, Morgan, Stephen, Pittelli, Frank, & Varga, Michael. 2001. Internet authenti-

cation with multiple independent certificate authorities.

Karlof, Chris, Tygar, J.D., Wagner, David, & Shankar, Umesh. 2007. Dynamic Pharming

Attacks and Locked Same-origin Policies for Web Browsers. In: Fourteenth ACM

Conference on Computer and Communications Security.

Kirda, Engin, & Kruegel, Christopher. 2006. Protecting Users against Phishing Attacks.

The British Computer Society.

Kurose, James, & Ross, Keith. 2013. Computer Networking : A top down approach.

Pearson.

Milletary, Jason. 2005. Technical Trends in Phishing Attacks. Technical report. United

States Computer Emergency Readiness Team.

Moore, Tyler, & Clayton, Richard. 2007a. An Empirical Analysis of the Current State of

Phishing Attack and Defence. In: WEIS.

Moore, Tyler, & Clayton, Richard. 2007b. Examining the impact of website take-down

on phishing. Pages 1–13 of: APWG eCrime Researchers Summit.

Ponemon-Institute. 2013. 2013 Cost of Data Breach Study: Global Analysis. Technical

report. Symantec.

Rasmussen, Rod, Aaron, Greg, & Routt, Aaron. 2013 (September). Global Phishing

Survey: Trends and Domain Name Use in 1H2013. Technical report. Anti-Phishing

Working Group.

RSA. 2013 (February). Fraud Phishing Report - The same wolf just different sheeps

clothing. Technical report. RSA.

RSA. 2014. RSA MONTHLY FRAUD REPORT FRAUD REPORT 2013 A YEAR IN

REVIEW. Technical report. RSA.

REFERENCES 61

Sheng, Steve, Wardman, Brad, Warner, Gary, Cranor, Lorrie Faith, Hong, Jason, &

Zhang, Chengshan. 2009. An Empirical Analysis of Phishing Blacklists.

Shi, Junxiao, & Saleem, Sara. 2012. Computer Security Research Reports : Phishing.

Technical report. University of Arizona.

Suri, Rableen Kaur, Tomar, Deepak Singh, & Sahu, Divya Rishi. 2012. An Approach

to Perceive Tabnabbing Attack. INTERNATIONAL JOURNAL OF SCIENTIFIC &

TECHNOLOGY RESEARCH, 1(6), 90–94.

Tom Jagatic, Nathaniel Johnson, Markus Jakobsson Filippo Menezer. 2007. Social Phish-

ing. Communications of the ACM, 50(10), 72–80.

Trend Micro. 2012. Spear-Phishing Email: Most Favored APT Attack Bait. Technical

report. Trend Micro.

Turner, Paul, Polk, William, & Barker, Elaine. 2012. Preparing for and Responding

to Certification Authority Compromise and Fraudulent Certificate Issuance. Technical

report. National Institute of Standards and Technology.

Watson, David, Holz, Thorsten, & Mueller, Sven. 2005 (May). Know your Enemy: Phish-

ing. Technical report. The Honeynet Project & Research Alliance.

Wittel, Gregory, & Wu, S. Felix. 2002. On Attacking Statistical Spam Filters. Department

of Computer Science University of California.

Wu, Min, Miller, Robert C, & Little, Greg. 2006. Web wallet: preventing phishing attacks

by revealing user intentions. Pages 102–113 of: Proceedings of the second symposium

on Usable privacy and security. ACM.

Appendix A

Appendix

Listing A.1: An External Python Class

1 #Phishtank i n t e g r a t i o n

2 import u r l l i b 2

3 import r e q u e s t s

4 import time

5 import os

6 import pandas

7 import time

8 #−−−−−−−9

10 class phishtank :

11

12 #Constructor . I n i t i a l i z e and s e t program parameters i n c l u d i n g API key

13 def i n i t ( s e l f , api key , update In te rva l = 45 , web = True ) :

14 s e l f . runs ta t e = True

15 s e l f . ap i key = api key

16 s e l f . update In te rva l = update In te rva l

17 s e l f . data = ””

18 s e l f . web = web

19 currentAttacks = {}20 try :

21 f = open( ’ l a s tupdate ’ , ’ r ’ )

22 s e l f . lastUpdate = f loat ( f . read ( ) . r e p l a c e ( ”\n” , ”” ) )

23 f . c l o s e ( )

62

63

24 except :

25 s e l f . updatePhishTankData ( )

26 pass

27 s e l f . update ( )

28

29 #I n i t i a t e the run sequence

30 def run ( s e l f ) :

31 while ( s e l f . runs ta t e ) :

32 s e l f . update ( )

33 print ” s l e e p i n g 60 seconds ”

34 time . s l e e p (60)

35

36 #Determine whether the a p p l i c a t i o n needs to update again

37 def checkLastUpdate ( s e l f ) :

38 f = open( ’ l a s tupdate ’ , ’ r ’ )

39 s e l f . lastUpdate = f loat ( f . read ( ) . r e p l a c e ( ”\n” , ”” ) )

40 f . c l o s e ( )

41

42 #Update the Phishtank r e p o s i t o r y l o c a l l y

43 def updatePhishTankData ( s e l f ) :

44 print ” updating phishtank ”

45 u r l = ” http :// data . phishtank . com/ data /{0}/ on l ine−v a l i d . csv ”

46 . format ( s e l f . ap i key )

47 re sponse = u r l l i b 2 . ur lopen ( u r l )

48 s e l f . data = response . read ( )

49 f = open( ” phishtank . data ” , ’ a ’ )

50 f . wr i t e ( s e l f . data )

51 f . c l o s e ( )

52 f = open( ” l a s tupdate ” , ’w ’ )

53 f . wr i t e ( str ( time . time ( ) ) )

54 f . c l o s e ( )

55 print ” f i n i s h e d updating phishtank ”

56

57 #Determine whether or not the p h i s h t a n k data needs to be updated

58 def update ( s e l f ) :

59 s e l f . checkLastUpdate ( )

60 print ”Checking update”

64

61 now = time . time ( )

62 print ” l a s t update was ” + str ( int (now − s e l f . lastUpdate ) ) +

63 ” seconds ago . I n t e r v a l i s ” + str ( s e l f . update Inte rva l ∗ 60)

64 i f ( int (now − s e l f . lastUpdate ) > s e l f . update Inte rva l ∗ 6 0 ) :

65 print ” updating ”

66 s e l f . updatePhishTankData ( )

67 print ”Done”

68 return

69 print ”Not updating ”

70

71 #Ver i fy a g a i n s t the database whether a URL i s p r e s e n t

72 def checkURL( s e l f , u r l ) :

73 i f s e l f . web :

74 x = u r l l i b 2 . ur lopen ( ” http :// checkur l . phishtank . com/ checkur l / . . .

75 . format ( u r l ) )

76 i f ” t rue ” in x . read ( ) . lower ( ) :

77 re turn True

78 e l s e :

79 re turn Fal se

80 s e l f . checkLastUpdate ( )

81 i f ( r >0):

82 re turn True

83 e l s e :

84 re turn Fal se

The Extension and Customisation of the Maltego Data-Mining ...The Extension and Customisation of the...

Documents

Transcript of The Extension and Customisation of the Maltego Data-Mining ...The Extension and Customisation of the...